In the vmerge peephole, we currently allow different AVLs for the vmerge
and its true operand.
If vmerge's VL > true's VL, vmerge can "preserve" elements from false
that would otherwise be clobbered with a tail agnostic policy on true.
mask 1 1 1 1 0 0 0 0
true x x x x|. . . . AVL=4
vmerge x x x x f f|. . AVL=6
If we convert this to vmv.v.v we will lose those false elements:
mask 1 1 1 1 0 0 0 0
true x x x x|. . . . AVL=4
vmv.v.v x x x x . .|. . AVL=6
Fix this by checking that vmerge's AVL is <= true's AVL.
Should fix#149335
(cherry picked from commit eafe31b293a5166522fff4f3e2d88c2b5c881381)
This patch fixes a bug introduced in #145878. A dependency was added in
the wrong direction, causing an assertion failure due to broken
topological order.
(cherry picked from commit 6df012ab48ececd27359bdc9448ee101b39eea7a)
getVectorInstrCostHelper would return costs of zero for vector
inserts/extracts that move data between GPR and vector registers, if
there was no 'real' use, i.e. there was no corresponding existing
instruction.
This meant that passes like LoopVectorize and SLPVectorizer, which
likely are the main users of the interface, would understimate the cost
of insert/extracts that move data between GPR and vector registers,
which has non-trivial costs.
The patch removes the special case and only returns costs of zero for
lane 0 if it there is no need to transfer between integer and vector
registers.
This impacts a number of SLP test, and most of them look like general
improvements.I think the change should make things more accurate for any
AArch64 target, but if not it could also just be Apple CPU specific.
I am seeing +2% end-to-end improvements on SLP-heavy workloads.
PR: https://github.com/llvm/llvm-project/pull/146526
Without thunks, programs will encounter link errors complaining that the
branch target is out of range. Thunks will extend the range of branch
targets, which is a critical need for large programs. Thunks provide
this flexibility at a cost of some modest code size increase.
When configured with the maximal feature set, the hexagon port of the
linux kernel would often encounter these limitations when linking with
`lld`.
The relocations which will be extended by thunks are:
* R_HEX_B22_PCREL, R_HEX_{G,L}D_PLT_B22_PCREL, R_HEX_PLT_B22_PCREL
relocations have a range of ± 8MiB on the baseline
* R_HEX_B15_PCREL: ±65,532 bytes
* R_HEX_B13_PCREL: ±16,380 bytes
* R_HEX_B9_PCREL: ±1,020 bytes
Fixes#149689
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
---------
Co-authored-by: Alexey Karyakin <akaryaki@quicinc.com>
(cherry picked from commit b42f96bc057fd9e31572069b241ba130c21144e5)
Handle the case where the assigned variable also has a pointer
attribute.
Fixes#121721
(cherry picked from commit fc0a978327215aa8883ae6f18d1e316f3c04520a)
Fixes#149089 and #149700.
Before #145837, when processing a reduction symbol not yet supported by
OpenMP lowering, the reduction processor would simply skip filling in
the reduction symbols and variables. With #145837, this behvaior was
slightly changed because the reduction symbols are populated before
invoking the reduction processor (this is more convenient to shared the
code with `do concurrent`).
This PR restores the previous behavior.
(cherry picked from commit 36c37b019b5daae79785e8558d693e6ec42b0ebd)
Update LV to vectorize maxnum/minnum reductions without fast-math flags,
by adding an extra check in the loop if any inputs to maxnum/minnum are
NaN, due to maxnum/minnum behavior w.r.t to signaling NaNs. Signed-zeros
are already handled consistently by maxnum/minnum.
If any input is NaN,
*exit the vector loop,
*compute the reduction result up to the vector iteration that contained
NaN inputs and
* resume in the scalar loop
New recurrence kinds are added for reductions using maxnum/minnum
without fast-math flags.
PR: https://github.com/llvm/llvm-project/pull/148239
There are cases we end up removing some intructions that use stackified
registers after RegStackify. For example,
```wasm
bb.0:
%0 = ... ;; %0 is stackified
br_if %bb.1, %0
bb.1:
```
In this code, br_if will be removed in CFGSort, so we should unstackify
%0 so that it can be correctly dropped in ExplicitLocals.
Rather than handling this in case-by-case basis, this PR just
unstackifies all stackifies register with no uses in the beginning of
ExplicitLocals, so that they can be correctly dropped.
Fixes#149097.
(cherry picked from commit b13bca7387a7aad6eaf3fa1c55bd06fe091f65ed)
DTLTO-related COFF LLD changes were cherry-picked to the LLVM 21
release branch in:
https://github.com/llvm/llvm-project/pull/149979
This commit adds the corresponding release note, modeled after
the previously added note for ELF LLD DTLTO support.
A thin archive is an archive/library format where the archive itself
contains only references to member object files on disk, rather than
embedding the file contents.
For the non-/wholearchive case, we use the path to the archive member as
the identifier for thin-archive members (see comments in
`enqueueArchiveMember`). This patch modifies the /wholearchive path to
behave the same way.
Apart from consistency, my motivation for fixing this is DTLTO
(#126654), where having the member identifier be the path on disk allows
distribution of bitcode members during ThinLTO.
(cherry picked from commit 9f733f4324412ef89cc7729bf027cdcab912ceff)
Closes#147869Closes#149679
Adds a check for a missing loop preheader during analysis. This fixes a
nullptr dereference that happened whenever LoopSimplify was unable to
generate a preheader because the loop was entered by an indirectbr
instruction (as stated in the LoopSimplify.cpp doc comment).
(cherry picked from commit 04107209856bb39e041aa38cf40de0afa90a6b2d)
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in COFF LLD.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
Note: Bitcode members of non-thin archives are not currently supported.
This will be addressed in a future change. This patch is sufficient to
allow for self-hosting an LLVM build with DTLTO if thin archives are
used.
Testing:
- LLD `lit` test coverage has been added, using a mock distributor to
avoid requiring Clang.
- Cross-project `lit` tests cover integration with Clang.
For the design discussion of the DTLTO feature, see:
https://github.com/llvm/llvm-project/pull/126654
(cherry picked from commit bbbbc093febffcae262cde1baa429b950842d76e)
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in Clang.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
Testing:
- `lit` test coverage has been added to Clang's Driver tests.
- The DTLTO cross-project tests will use this Clang support.
For the design discussion of the DTLTO feature, see:
https://github.com/llvm/llvm-project/pull/126654
(cherry picked from commit 5004c59803fdeb3389d30439a6cc8b1ff874df0c)
This patch adds an emergency spill slot when ran out of registers.
PR #139201 introduces `vstelm` instructions with only 8-bit imm offset,
it causes no spill slot to store the spill registers.
(cherry picked from commit 64a0478e08829ec6bcae2b05e154aa58c2c46ac0)
The sizes of the struct stat on MIPS64 differ in musl vs glibc.
See https://godbolt.org/z/qf9bcq8Y8 for the proof. Prior to this change,
compilation for MIPS64 musl would fail.
Signed-off-by: Jens Reidel <adrian@travitia.xyz>
(cherry picked from commit a5d6fa68e399dee9eb56f2671670085b26c06b4a)
The verifier check was in the wrong place, meaning it wasn't actually
checking many instructions.
Fixing that causes a test failure (coro-dwarf-key-instrs.cpp) because
coros turn off the feature but still annotate instructions with the
metadata (which is a supported situation, but the verifier doesn't like
it, and it's hard to teach the verifier to like it).
Fix that by avoiding emitting any key instruction metadata if the
DISubprogram has opted out of key instructions.
(cherry picked from commit 653872f782e1faaabc1da23769e6b35b10e74bde)
If the environment is considered to be the triple component as a whole,
so, including the object format, if any, and if that is the intended
behaviour, then the loongarch64 function `computeTargetABI()` should be
changed to not rely on `hasEnvironment()`, but, rather, to check if
there is a non-unknown environment set.
Without this change, using a (ideally valid) target of
loongarch64-unknown-none-elf, with a manually specified ABI of lp64s,
will result in a completely superfluous warning:
```
warning: triple-implied ABI conflicts with provided target-abi 'lp64s', using target-abi
```
(cherry picked from commit 9ed8816dc63776259d2190bdc8a7a29698c62749)
This is a follow-up PR for post-commit comments in
https://github.com/llvm/llvm-project/pull/146610
- Changed "exporteddllmain" references to "importeddllmain".
- Add support for x86 target and test coverage.
- Changed a comment to better express why we're skipping importing
`DllMain`.
(cherry picked from commit fcacd4e880c9a0b3f2bdaa43603aeddfa1b1cd2e)
Mips was the only architecture having PtrDiffType = SignedInt and
IntPtrType = SignedLong
This fixes a problem on mipsel-windows-gnu triple, where uintptr_t was
wrongly defined as unsigned long instead of unsigned int, leading to
problems in compiler-rt.
compiler-rt/lib/interception/interception_type_test.cpp:24:17: error:
static assertion failed due to requirement
'__sanitizer::is_same<unsigned long, unsigned int>::value':
24 | COMPILER_CHECK((__sanitizer::is_same<__sanitizer::uptr,
::uintptr_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
| ^~~~
compiler-rt/lib/interception/interception_type_test.cpp:25:17: error:
static assertion failed due to requirement '__sanitizer::is_same<long,
int>::value':
25 | COMPILER_CHECK((__sanitizer::is_same<__sanitizer::sptr,
::intptr_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
| ^~~~
compiler-rt/lib/interception/interception_type_test.cpp:27:17: error:
static assertion failed due to requirement '__sanitizer::is_same<long,
int>::value':
27 | COMPILER_CHECK((__sanitizer::is_same<::PTRDIFF_T,
::ptrdiff_t>::value));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compiler-rt/lib/interception/../sanitizer_common/sanitizer_internal_defs.h:369:44:
note: expanded from macro 'COMPILER_CHECK'
369 | #define COMPILER_CHECK(pred) static_assert(pred, "")
(cherry picked from commit 13906724ff7aa1bc58202faac62690570dfe0dc3)
Fixes#149669; the old check compared with the end of the literal, but
we can just check that after parsing digits, we're pointing to one
character past the token start.
(cherry picked from commit 8366dc207a2e6b50cb8afe2d98fca68bd78bd0fa)
The pattern i32xi32->i64, should be matched to the sign-extended
multiply op, instead of explicit sign- extension of the operands
followed by non-widening multiply (this takes 4 operations instead of
one). Currently, if one of the operands of multiply inside a loop is a
constant, the sign-extension of this constant is hoisted out of the loop
by LICM pass and this pattern is not matched by the ISEL.
This change handles multiply operand with Opcode of the type AssertSext
which is seen when the sign-extension is hoisted out-of the loop.
Modifies the DetectUseSxtw() to check for this.
(cherry picked from commit fcabb53f0c349885167ea3d0e53915e6c42271a7)
The instruction does `rd = (rs1 << shamt) + rs2` but the ISEL patterns
had `rs1` and `rs2` the other way around which is incorrect.
(cherry picked from commit 84e689b1db02be1687c3093d66ace913250780bd)
This is so that it's performed also for flang and not just for clang.
This should fix https://github.com/llvm/llvm-project/issues/138494.
(cherry picked from commit 38fc453afdb6a4511b7c8e189f12a92559ecc396)
While building libclang_rt.asan-hexagon.so, lld would assert in
lld:🧝:hexagonTLSSymbolUpdate().
Fixes#132766
(cherry picked from commit 3e9ceae29f39456508eef5b4af4d3c895048706a)
This reverts commit 85d09de5fa19a32bbcc400928d55f9d633077640. This
change caused broken symlinks in the build directory, and the subsequent
commits fixing that introduced other issues.
This reverts commit 81e6552a3d6835c4e10eb981402febfac9df6156. This
change caused libclc to start installing broken symlinks that contain
absolute paths to the build directory rather than relative paths
within the install directory.
This reverts commit 8a7a64873b13e6fd931b748fbf50b3da26fe7fca. It broke
standalone builds since the necessary variables are now limited in scope
to `libclc/utils` while they are used in the top-level CMakeLists.
Hexagon instructions are VLIW "bundles" of up to four instruction words
encoded as a single MCInst with operands for each sub-instruction.
Previously, the disassembler's getInstruction() returned the full
bundle, which made it difficult to work with llvm-objdump.
For example, since all instructions are bundles, and bundles do not
branch, branch targets could not be printed.
This patch modifies the Hexagon disassembler to return individual
sub-instructions instead of entire bundles, enabling correct printing of
branch targets and relocations. It also introduces
`MCDisassembler::getInstructionBundle` for cases where the full bundle
is still needed.
By default, llvm-objdump separates instructions with newlines. However,
this does not work well for Hexagon syntax:
{ inst1
inst2
inst3
inst4 <branch> } :endloop0
Instructions may be followed by a closing brace, a closing brace with
`:endloop`, or a newline. Branches must appear within the braces.
To address this, `PrettyPrinter::getInstructionSeparator()` is added and
overridden for Hexagon.
(cherry picked from commit ac7ceb3dabfac548caa993e7b77bbadc78af4464)
cc https://github.com/llvm/llvm-project/pull/138299
rustc sets `allockind("uninitialized")` - if we copy the attributes
as-is when creating a dummy function, Verify complains about
`allockind("uninitialized,zeroed")` conflicting, so we need to clear the
flag.
Co-authored-by: Jamie Hill-Daniel <jamie@osec.io>
(cherry picked from commit 74c396afb26dec74c0b799e218c63f1a26e90d21)
When not in the prologue we do not want to set the FrameSetup flag, by
passing the flag as argument we can use allocateStack correctly on those
cases.
This fixes the allocation and probe in eliminateCallFramePseudoInstr.
(cherry picked from commit 1db9eb23209826d9e799e68a9a4090f0328bf70c)
Reverts llvm/llvm-project#148613
Considering object argument conversion qualifications perfect leads to
situations where we prefer a non-template const qualified function over
a non-qualified template function, which is very wrong indeed.
I explored solutions to work around that, but instead, we might want to
go the GCC road and prefer the friend overload in the #147374 example,
as this seems a lot more consistent and reliable
(cherry picked from commit 28e1e7e1b4b059a2e42f68061475cddb4ad0a6a3)
This is a follow-up to #149079. Seems like we forgot about the fact that
the symbols also need to be in `libclang.map`.
(cherry picked from commit 7e0fde0c2f6b0b9d727ce9196956b36e91961ac4)