Fix#112593 by adding support in lowering to concatenation with an
absent optional _assumed length_ dummy argument because:
1. Most compilers seem to support it (most likely by accident).
2. This actually makes the compiler codegen simpler. Codegen was going
out of its way to poke the LLVM optimizer bear by producing an undef
argument for the length.
I insist on the fact that no compiler support this with _explicit
length_ optional arguments and the executable will segfault and I would
discourage users from using that "feature" because runtime checks for
bad optional dereference will kick when used (For instance, "nagfor
-C=present" will produce an executable that abort with an error message
. Flang does not have such runtime check option so far).
Hence, I am not updating the Extensions.md document because this is not
something I think we should advertise.
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
arm_neon.td currently generates the same 24 `vcmla` intrinsic prototypes
for each of the f16, f32, and f64 base types. This is incorrect, the
only valid vcmla intrinsics for the f64 base type are:
- `vcmlaq_f64`
- `vcmlaq_rot90_f64`
- `vcmlaq_rot180_f64`
- `vcmlaq_rot270_f64`
(see ACLE
https://github.com/ARM-software/acle/blob/main/neon_intrinsics/advsimd.md)
This patch removes the incorrect intrinsic prototypes.
The register list in vscclrm is unusual in three ways:
* The encoded size can be zero, meaning the list contains only vpr.
* Double-precision registers past d15 are permitted even when the
subtarget doesn't have them, they are instead ignored when the
instruction executes.
* The single-precision variant allows double-precision registers d16
onwards, which are encoded as a pair of single-precision registers.
Fixing this also incidentally changes a vlldm/vlstm error message: when
the first register is in the range d16-d31 we now get the "operand must
be exactly..." error instead of "register expected".
Currently we have `MLIRContext::getRegisteredOperations` which returns
all operations for the given context, with the addition of
`MLIRContext::getRegisteredOperationsByDialect` we can now retrieve the
same for a given dialect class.
Closes#111591
This patch moves the part of operation verifiers dependent on the
contents of their regions to the corresponding `verifyRegions` method.
This ensures these are only triggered after the operations in the region
have themselved already been verified in advance, avoiding checks based
on invalid nested operations.
The `LoopWrapperInterface` is also updated so that its verifier runs
after operations in the region of ops with this interface have already
been verified.
This patch implements new features introduced in 2024 release of ARM ISA
and creates predicates, which will be used by new instructions.
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Spencer Abson spencer.abson@arm.com
This commit adds support for the following PTX predefined special
registers
* warpid
* nwarpid
* smid
* nsmid
* gridid
* lanemask.*
* globaltimer
* envreg* And added lit tests under nvvmir.mlir
`UnsignedWhenEquivalent` doesn't really need any dialect conversion
features and switching it normal patterns makes it more composable with
other patterns-based transformations (and probably faster).
I realised we are missing tests to cover more loops with multiple early
exits - some countable and some uncountable.
I've also added a few SVE versions of the test in the AArch64 directory.
Once we can vectorise such early exit loops it's a good sanity check to
make sure they also vectorise for SVE. Also, for some of the tests I
expect there to be some divergence from the same tests in the top level
directory once we start vectorising them.
When using AAPCS-compliant frame chains with PACBTI return address
signing, there ware a number of bugs in the generation of the frame
pointer and function prologues. The most obvious was that we sometimes
would modify r11 before pushing it to the stack, so it wasn't preserved
as required by the PCS. We also sometimes did not push R11 and LR
adjacent to one another on the stack, or used R11 as a frame pointer
without pointing it at the saved value of R11, both of which are
required to have an AAPCS compliant frame chain.
The original work of this patch was done by James Westwood, reviewed as
#82801 and #81249, with some tidy-ups done by Mark Murray and myself.
If the current working directory of `clang-include-cleaner` differs from
the directory of the input files specified in the compilation database,
it doesn't adjust the input file paths properly. As a result,
`clang-include-cleaner` either writes files to the wrong directory or
fails to write files altogether.
This pull request fixes the issue by adjusting the input file paths
based on the directory specified in the compilation database. If that
directory is not writable, `clang-include-cleaner` will write the output
relative to the current working directory.
Fixes#110843.
When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
Example of such an unnecessary relaxation:
from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
Without this gcc warned like
../lib/Target/RISCV/RISCVVLOptimizer.cpp:760: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
760 | VLOp.getReg() != RISCV::X0 && "Did not expect X0 VL");
|
Implement -sanitizer-coverage-gated-trace-callbacks to gate the
invocation of the tracing callbacks based on the value of a global
variable, which is stored in a specific section.
When this option is enabled, the instrumentation will not call into the
runtime-provided callbacks for tracing, thus only incurring in a trivial
branch without going through a function call. It is up to the runtime to
toggle the value of the global variable in order to enable tracing.
This option is only supported for trace-pc-guard.
Note: will add additional support for trace-cmp in a follow up PR.
Patch by Filippo Bigarella
rdar://101626834
This allows calling ::dump() on various sub-classes of VPSingleDefRecipe
directly, as it resolves an ambigous name lookup.
Previously, calling VPWidenRecipe::dump() (and others), would result in
the following errors:
llvm/unittests/Transforms/Vectorize/VPlanTest.cpp:1284:19: error: member 'dump' found in multiple base classes of different types
1284 | WidenR->dump();
| ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:434:8: note: member found by ambiguous name lookup
434 | void dump() const;
| ^
llvm/include/../lib/Transforms/Vectorize/VPlanValue.h:108:8: note: member found by ambiguous name lookup
108 | void dump() const;
| ^
1 error generated.
Scans each global variable declaration and its members and collects all
required resource bindings in a new `SemaHLSL` data member `Bindings`.
New fields are added `HLSLResourceBindingAttr` for storing processed
binding information so that it can be used by CodeGen (`Bindings` or any
other Sema information is not accessible from CodeGen.)
Adjusts the existing register binding attribute handling and diagnostics
to:
- do not create HLSLResourceBindingAttribute if it is not valid
- diagnose only the simple/local errors when a register binding
attribute is parsed
- additional diagnostic of binding type mismatches is done later and
uses the new `Bindings` data
Fixes#110719
The `check-fuzzer` runs fine with cl build llvm, but the following lit
tests fail with clang-cl build llvm
```
********************
Timed Out Tests (2):
libFuzzer-x86_64-default-Windows :: fork-ubsan.test
libFuzzer-x86_64-default-Windows :: fuzzer-oom.test
********************
Failed Tests (22):
libFuzzer-x86_64-default-Windows :: acquire-crash-state.test
libFuzzer-x86_64-default-Windows :: cross_over_copy.test
libFuzzer-x86_64-default-Windows :: cross_over_insert.test
libFuzzer-x86_64-default-Windows :: exit_on_src_pos.test
libFuzzer-x86_64-default-Windows :: fuzzer-alignment-assumption.test
libFuzzer-x86_64-default-Windows :: fuzzer-implicit-integer-sign-change.test
libFuzzer-x86_64-default-Windows :: fuzzer-implicit-signed-integer-truncation-or-sign-change.test
libFuzzer-x86_64-default-Windows :: fuzzer-implicit-signed-integer-truncation.test
libFuzzer-x86_64-default-Windows :: fuzzer-implicit-unsigned-integer-truncation.test
libFuzzer-x86_64-default-Windows :: fuzzer-printcovpcs.test
libFuzzer-x86_64-default-Windows :: fuzzer-timeout.test
libFuzzer-x86_64-default-Windows :: fuzzer-ubsan.test
libFuzzer-x86_64-default-Windows :: minimize_crash.test
libFuzzer-x86_64-default-Windows :: minimize_two_crashes.test
libFuzzer-x86_64-default-Windows :: null-deref-on-empty.test
libFuzzer-x86_64-default-Windows :: null-deref.test
libFuzzer-x86_64-default-Windows :: print-func.test
libFuzzer-x86_64-default-Windows :: stack-overflow-with-asan.test
libFuzzer-x86_64-default-Windows :: trace-malloc-2.test
libFuzzer-x86_64-default-Windows :: trace-malloc-unbalanced.test
libFuzzer-x86_64-default-Windows :: trace-malloc.test
```
The related commits are
53a81d4d26
and
e31efd8f6f.
Following the change in
e31efd8f6f
can fix these failures.
As for the issue mentioned in the comment that alternatename support in
clang not good enough(https://bugs.llvm.org/show_bug.cgi?id=40218). I
find that using `__builtin_function_start(func)` instead of directly
using `func` would make it work as intended.
The goal is to move `SuspendedThreadsList` related code into
the beginning of the loop, and prepare for extraction the rest
of the loop body into a function.