4955 Commits

Author SHA1 Message Date
David Green
637aa61732
[ARM] Fix VBICimm and VORRimm generation under Big endian. (#107813)
This is a smaller follow on to #105519 that fixes VBICimm and VORRimm
too. The logic behind lowering vector immediates under big endian
Neon/MVE is to treat them in natural lane ordering (same as little
endian), and VECTOR_REG_CAST them to the correct type (as opposed to
creating the constants in big endian form and bitcasting them). This
makes sure that is done when creating VORRIMM and VBICIMM.
2024-09-13 10:59:57 +01:00
David Green
11eae671b7 [ARM] Add and extend big-endian testing for vorrimm and vbicimm. NFC 2024-09-07 15:36:54 +01:00
Austin
3242e77841
[ARM][Codegen] Fix vector data miscompilation in arm32be (#105519)
Fix #102418, resolved the issue of generating incorrect vrev during
vectorization in big-endian scenarios
2024-09-07 14:09:29 +08:00
Craig Topper
e6e857cdf9
[GISel] Use Function::getFunctionType() instead of getType() in some remarks. (#107651)
getType() on a Function is always 'ptr'. We should use getFunctionType()
so we get the function signature.
2024-09-06 19:59:44 -07:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Nikita Popov
a7697c8655
[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)
These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCombine to improve
that alignment.

If desired I could also adjust the clang frontend to add alignment
annotations equivalent to the previous behavior, but I don't see any
indication that such an assumption is correct in the ARM intrinsics
docs.

Fixes https://github.com/llvm/llvm-project/issues/59081.
2024-09-05 09:26:53 +02:00
Nikita Popov
224112f833 [ARM] Regenerate test checks (NFC) 2024-09-02 14:15:03 +02:00
Oliver Stannard
9cf68679c4
[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721)
This test case was failing to compile with a "ran out of registers
during register allocation" error at -O0. This was because CMP_SWAP_64
has 3 operands which must be an even-odd register pair, and two other
GPR operands. All of the def operands are also early-clobber, so
registers can't be shared between uses and defs. Because the function
has an over-aligned alloca it needs frame and base pointers, so r6 and
r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the
only valid register pairs, and if the two individual GPR operands happen
to get allocated to registers in different pairs then only 2 pairs will
be available for the three GPRPair operands.

To fix this, I've merged the two GPR operands into a single GPRPair
operand. This means that the instruction now has 4 GPRPair operands,
which can always be allocated without relying on luck. This does
constrain register allocation a bit more, but this pseudo instruction is
only used at -O0, so I don't think that's a problem.
2024-09-02 08:54:10 +01:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Kiran
c50d11e6d9 Revert "[ARM] musttail fixes"
committed by accident, see #104795

This reverts commit a2088a24dad31ebe44c93751db17307fdbe1f0e2.
2024-08-27 11:17:17 +01:00
Kiran
ad468da038 Revert "Seperate frontend changes, add debug directives, remove redundant stuff from tests"
This reverts commit 1a908c6be3317bbbac73e6a6fc52cabefbdebf7d.
2024-08-27 10:46:18 +01:00
Kiran
1a908c6be3 Seperate frontend changes, add debug directives, remove redundant stuff from tests 2024-08-27 10:44:06 +01:00
Kiran
a2088a24da [ARM] musttail fixes
Backend:
- Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call
- Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret
- Allowed tail calls if byval args are used
- Added debug trace for IsEligibleForTailCallOptimisation

Frontend (clang):
- Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already

Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec
2024-08-27 10:44:06 +01:00
David Green
9f82f6daa5 [ARM] Add a number of extra vmovimm tests for BE. NFC 2024-08-24 20:20:23 +01:00
David Green
05d17a1c70
[GlobalISel] Bail out early for big-endian (#103310)
If we continue through the function we can currently hit crashes. We can
bail out early and fall back to SDAG.

Fixes #103032
2024-08-19 18:50:47 +01:00
Craig Topper
abc1acf8df
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.
2024-08-14 08:44:57 -07:00
Pierre van Houtryve
7389545d0d
Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942)
Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll`

Solves #100383
2024-08-12 09:00:22 +02:00
Peter Rong
74e4694b8c
[LTO] enable ObjCARCContractPass only on optimized build (#101114)
\#92331 tried to make `ObjCARCContractPass` by default, but it caused a
regression on O0 builds and was reverted.
This patch trys to bring that back by:

1. reverts the
[revert](1579e9ca9c).
2. `createObjCARCContractPass` only on optimized builds.

Tests are updated to refelect the changes. Specifically, all `O0` tests
should not include `ObjCARCContractPass`

Signed-off-by: Peter Rong <PeterRong@meta.com>
2024-08-09 13:04:25 -07:00
David Green
dad1cb9cf9 [ARM] Regenerate big-endian-vmov.ll. NFC 2024-08-09 15:24:54 +01:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
Sergei Barannikov
34157f694c
[ARM] Fix operand order of tBLXr in a test (NFC) (#102312)
The $noreg should be a part of `pred` complex operand.
2024-08-08 01:12:45 +03:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Oliver Stannard
d06303ffc1
[ARM] t2CALL_BTI pseudo-inst clobbers LR (#102117)
The t2CALL_BTI pseudo-instruction expands to a tBL instruction, so needs
the same implicit uses and defs as it.
2024-08-07 10:24:17 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Alexis Engelke
fa92d51f9e
[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652)
Similar to #97727; avoid an extra pass over the entire IR by performing
the lowering as part of the pre-isel-intrinsic-lowering pass.
2024-08-06 09:27:59 +02:00
Martin Storsjö
8dd065d5bc
[ARM] [Windows] Use IMAGE_SYM_CLASS_STATIC for private functions (#101828)
For functions with private linkage, pick
IMAGE_SYM_CLASS_STATIC rather than IMAGE_SYM_CLASS_EXTERNAL;
GlobalValue::isInternalLinkage() only checks for
InternalLinkage, while GlobalValue::isLocalLinkage() checks for both
InternalLinkage and PrivateLinkage.

This matches what the AArch64 target does, since commit
3406934e4db4bf95c230db072608ed062c13ad5b.

This activates a preexisting fix for the AArch64 target from
1e7f592a890aad860605cf5220530b3744e107ba, for the ARM target as well.

When a relocation points at a symbol, one usually can convey an offset
to the symbol by encoding it as an immediate in the instruction.
However, for the ARM and AArch64 branch instructions, the immediate
stored in the instruction is ignored by MS link.exe (and lld-link
matches this aspect). (It would be simple to extend lld-link to support
it - but such object files would be incompatible with MS link.exe.)

This was worked around by 1e7f592a890aad860605cf5220530b3744e107ba by
emitting symbols into the object file symbol table, for temporary
symbols that otherwise would have been omitted, if they have the class
IMAGE_SYM_CLASS_STATIC, in order to avoid needing an offset in the
relocated instruction.

This change gives the symbols generated from functions with the IR level
"private" linkage the right class, to activate that workaround.

This fixes https://github.com/llvm/llvm-project/issues/100101, fixing
code generation for coroutines for Windows on ARM. After the change in
f78688134026686288a8d310b493d9327753a022, coroutines generate a function
with private linkage, and calls to this function were previously broken
for this target.
2024-08-04 23:20:45 +03:00
Sergei Barannikov
411d31ad69
Partially revert 92e18ffd803365c64910760ba20278f875d93681 (#101673)
It is likely to cause stage2 build failures:

https://lab.llvm.org/buildbot/#/builders/122/builds/389
https://lab.llvm.org/buildbot/#/builders/79/builds/552

I don't have an ARM machine to investigate, so I'm just reverting ARM
changes to see if it helps make the bots green again.
2024-08-02 16:38:31 +03:00
Sergei Barannikov
92e18ffd80
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752)
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
2024-08-02 12:29:39 +03:00
Alexis Engelke
b5fc083dc3
[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727)
Currently, the LowerConstantIntrinsics pass does an RPO traversal of
every function... only to find that many functions don't have constant
intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is
already a pre-isel intrinsic lowering pass, which iterates over
intrinsic declarations and lowers all users. Call
lowerConstantIntrinsics from this pass to avoid the extra iteration over
the entire IR and the RPO traversal.
2024-08-01 17:44:32 +02:00
Simon Pilgrim
1b4be6a474 [ARM] Regenerate vselect_imax.ll 2024-07-29 15:53:42 +01:00
John Brawn
f0bd705c9b
[CodeGen] Restore MachineBlockPlacement block ordering (#99351)
PR #91843 changed the algorithm used to find the next unplaced block so
that it iterates through the blocks in BlockFilter instead of iterating
through the blocks in the function and checking if they are in the block
filter. Unfortunately this sometimes results in a different block
ordering being chosen, as the order of blocks in BlockFilter comes from
the order in MachineLoopInfo, and in some cases this differs from the
order they are in the function. This can also give an end result that
has worse performance.

Fix this by making collectLoopBlockSet place blocks in its output in the
order that they are in the function.
2024-07-24 10:49:50 +01:00
Simon Pilgrim
5bd38a98d7
[DAG] ComputeNumSignBits - subo_carry(x,x,c) -> bitwidth 'allsignbits' (#99935)
Handle cases where the subo_carry is subtracting the same operand (=zero) - so only the subtraction of the 0/1 carry bit is affecting the result, giving a 0/-1 allsignbits value.

Noticed while improving ABDS/ABDU expansion.
2024-07-23 11:49:12 +01:00
Volodymyr Vasylkun
e094abde42
[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774)
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:

  ```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```

This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
2024-07-16 20:56:18 +01:00
Florian Hahn
d0d05aec3b
[Darwin] Fix availability of exp10 for watchOS, tvOS, xROS. (#98542)
Update availability information added in 1eb7f055d9a. exp10 is available
on iOS >= 7.0 and macOS >= 10.9. On all other platforms, it is available
on any version. Also drop the x86 check, as the availability only
depends on the OS version, not the target platform.

PR: https://github.com/llvm/llvm-project/pull/98542
2024-07-11 22:57:34 +01:00
Daniel Kiss
1782810b84 [Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.

This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.

Releand with test fixes.
2024-07-10 11:32:41 +02:00
Daniel Kiss
4b2daeccc7
Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284)
Reverts llvm/llvm-project#82819
2024-07-10 10:22:38 +02:00
Daniel Kiss
e15d67cfc2
[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
 
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
2024-07-10 10:06:14 +02:00
Manish Kausik H
69192e0193
[LegalizeDAG] Optimize CodeGen for ISD::CTLZ_ZERO_UNDEF (#83039)
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.

The details of the optimization are outlined in #82075

Fixes #82075

Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
2024-07-08 14:01:32 +01:00
hstk30-hw
ef465bf8b1
[ARM] Fix arm32be softfp mode miscompilation for neon sdiv (#97883)
Related issue: https://github.com/llvm/llvm-project/issues/97782
2024-07-08 14:18:38 +08:00
Yingwei Zheng
4997af98a0
[SimplifyCFG] Simplify nested branches (#97067)
This patch folds the following pattern (I don't know what to call this):
```
bb0:
  br i1 %cond1, label %bb1, label %bb2
bb1:
  br i1 %cond2, label %bb3, label %bb4
bb2:
  br i1 %cond2, label %bb4, label %bb3
bb3:
  ...
bb4:
  ...
```
into
```
bb0:
  %cond = xor i1 %cond1, %cond2
  br i1 %cond, label %bb4, label %bb3
bb3:
  ...
bb4:
  ...
```

Alive2: https://alive2.llvm.org/ce/z/5iOJEL
Closes https://github.com/llvm/llvm-project/issues/97022.
Closes https://github.com/llvm/llvm-project/issues/83417.

I found this pattern in some verilator-generated code, which is widely
used in RTL simulation. This fold will reduces branches and improves the
performance of CPU frontend. To my surprise, this pattern is also common
in C/C++ code base.
Affected libraries/applications:
cmake/cvc5/freetype/git/gromacs/jq/linux/openblas/openmpi/openssl/php/postgres/ruby/sqlite/wireshark/z3/...
2024-07-01 03:35:39 +08:00
Nikita Popov
00ae6bb6c2 [ARM] Regenerate MIR test (NFC) 2024-06-26 15:40:10 +02:00
Serge Pavlov
4c9b71dd91
[GlobalISel][ARM] Legalze set_fpmode and get_fpmode (#96467)
Implement handling of get/set floating point control modes for ARM in
Global Instruction Selector.
2024-06-26 19:41:44 +07:00
Eli Friedman
39a0aa5876
[SelectionDAG] Lower llvm.ldexp.f32 to ldexp() on Windows. (#95301)
This reduces codesize. As discussed in #92707.
2024-06-25 10:25:48 -07:00
Lucas Duarte Prates
78ff617d3f
[ARM] CMSE security mitigation on function arguments and returned values (#89944)
The ABI mandates two things related to function calls:
 - Function arguments must be sign- or zero-extended to the register
   size by the caller.
 - Return values must be sign- or zero-extended to the register size by
   the callee.

As consequence, callees can assume that function arguments have been
extended and so can callers with regards to return values.

Here lies the problem: Nonsecure code might deliberately ignore this
mandate with the intent of attempting an exploit. It might try to pass
values that lie outside the expected type's value range in order to
trigger undefined behaviour, e.g. out of bounds access.

With the mitigation implemented, Secure code always performs extension
of values passed by Nonsecure code.

This addresses the vulnerability described in CVE-2024-0151.

Patches by Victor Campos.

---------

Co-authored-by: Victor Campos <victor.campos@arm.com>
2024-06-20 10:22:01 +01:00
Farzon Lotfi
7ad12a7c04
[ARM] Add tan intrinsic lowering (#95439)
- `ARMISelLowering.cpp` - Add f16 type and neon and mve vector support
for tan
2024-06-14 10:35:50 -04:00
Pierre van Houtryve
ab0d01a5f0
[MC] Cache MCRegAliasIterator (#93510)
AMDGPU has a lot of registers, almost 9000. Many of those registers have
aliases. For instance, SGPR0 has a ton of aliases due to the presence of
register tuples. It's even worse if you query the aliases of a register
tuple itself. A large register tuple can have hundreds of aliases
because it may include 16 registers, and each of those registers have
their own tuples as well.

The current implementation of MCRegAliasIterator is not good at this. In
some extreme cases it can iterate, 7000 more times than
necessary, just giving duplicates over and over again and using a lot of
expensive iterators.

This patch implements a cache system for MCRegAliasIterator. It does the
expensive part only once and then saves it for us so the next iterations
on that register's aliases are just a map lookup.

Furthermore, the cached data is uniqued (and sorted). Thus, this speeds
up code by both speeding up the iterator itself, but also by minimizing
the number of loop iterations users of the iterator do.
2024-06-14 11:20:45 +02:00
David Green
706e197540
[CodeGen] Remove target SubRegLiveness flags (#95437)
This removes the uses of target flags to disable subreg liveness,
relying on the `-enable-subreg-liveness` flag instead. The
`-enable-subreg-liveness` flag has been changed to take precedence over
the subtarget if set, and one use of `Subtarget->enableSubRegLiveness()`
has been changed to `MRI->subRegLivenessEnabled()` to make sure the
option properly applies.
2024-06-14 08:51:56 +01:00
Nikita Popov
db08b0999d
[ARM][AArch64] Bail out if CandidatesWithoutStackFixups is empty (#95410)
The following code assumes that RepeatedSequenceLocs is non-empty. Bail
out if there are less than 2 candidates left, as no outlining is
possible in that case. The same check is already present in all the
other places where elements from RepeatedSequenceLocs may be dropped.

This fixes the issue reported at:
https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716
2024-06-14 09:29:21 +02:00
Paul T Robinson
32add2435f
Fix test to have correct requirements (#95106) 2024-06-11 06:04:09 -07:00
Paul T Robinson
3f88311124
[Driver] Rearrange some Apple version testing (#94514)
There were four tests in Driver that actually tested bits of Driver and
bits of CodeGen, and therefore had target restrictions. Rework those
four tests into one Driver test (with no target restrictions) and two
target-specific CodeGen tests.
2024-06-11 07:51:21 -04:00