1499 Commits

Author SHA1 Message Date
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
cceerczw
6f8e8faa12
[TargetLowering] Fix the problem of emulated-TLS implementation witho… (#101490)
For a __thread variable x, when emulated TLS is enabled and there is an
access to x, the compiler first looks up the symbol __emutls_v.x within
the module. However, the issue arises with an alias y of x, the compiler
still tries to look up __emutls_v.y instead of __emutls_v.x. As a
result, the lookup returns a nullptr, causing the compiler to crash. The
purpose of this MR (Merge Request) is to ensure that in emulated TLS,
before checking __emutls_v.y, the compiler first identifies which global
value y is an alias of.
2024-08-07 21:56:48 +04:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Sergei Barannikov
4527fba9ad
Revert "[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall" (#101740)
Reverts the rest of llvm/llvm-project#99752
2024-08-03 01:51:26 +03:00
Fangrui Song
0b92e70dfb Revert "[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723)"
This reverts commit 92fbc963a51683d32f70d0c7f3783bb13983f08d.

The patch also affected ARM and caused an assertion failure during
CurDAG->Legalize
(https://github.com/llvm/llvm-project/pull/100723#issuecomment-2266154211).
2024-08-02 14:43:36 -07:00
Pierre van Houtryve
92fbc963a5
[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723)
Solves #100383
2024-08-02 12:22:42 +02:00
Sergei Barannikov
92e18ffd80
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752)
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
2024-08-02 12:29:39 +03:00
Julius Alexandre
7231776a02
Recommit "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types" (#101223)
Previous reverted merge: https://github.com/llvm/llvm-project/pull/99913

Original message:
**Issue:** https://github.com/rust-lang/rust/issues/124790
**Previous PR:** https://github.com/llvm/llvm-project/pull/99614

https://rust.godbolt.org/z/T7eKP3Tvo

**Aarch64:** https://alive2.llvm.org/ce/z/dqr2Kg
**x86:** https://alive2.llvm.org/ce/z/ze88Hw
2024-07-30 19:00:46 -07:00
Craig Topper
fed94333fd Revert "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913)"
This reverts commit d5521d128494690be66e03a674b9d1181935bf77.

The AArch64 test is failing on the bots.
2024-07-27 18:35:44 -07:00
Julius Alexandre
d5521d1284
[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913)
**Issue:** https://github.com/rust-lang/rust/issues/124790
**Previous PR:** https://github.com/llvm/llvm-project/pull/99614

https://rust.godbolt.org/z/T7eKP3Tvo

**Aarch64:** https://alive2.llvm.org/ce/z/dqr2Kg
**x86:** https://alive2.llvm.org/ce/z/ze88Hw

cc: @RKSimon @topperc
2024-07-27 17:33:09 -07:00
Matt Arsenault
361d4cf533
DAG: Lower is.fpclass fcSubnormal|fcZero to fabs(x) < smallest_normal (#100390)
Produces better code on x86_64 only in the unordered case. Not
sure what the exact condition should be to avoid the regression. Free
fabs might do it, or maybe requires legality checks for the alternative
integer expansion.
2024-07-26 22:45:47 +04:00
AtariDreams
871740761f
[CodeGen] Remove checks for vectors in unsigned division prior to computing leading zeros (#99524)
It turns out we can safely use
DAG.computeKnownBits(N0).countMinLeadingZeros() with constant legal
vectors, so remove the check for it.
2024-07-19 12:15:36 +08:00
AtariDreams
a51f343b43
[CodeGen] Emit more efficient magic numbers for exact udivs (#87161)
Have simpler lowering for exact udivs in both SelectionDAG and
GlobalISel.

The algorithm is the same between unsigned exact divs and signed divs
save for arithmetic vs logical shift for even divisors, according to
Hacker's Delight, 2nd Edition, page 242.
2024-07-17 12:19:02 -07:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Volodymyr Vasylkun
e094abde42
[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774)
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:

  ```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```

This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
2024-07-16 20:56:18 +01:00
Froster
c8dc21d77f
[SelectionDAG][RISCV] Fix break of vnsrl pattern in issue #94265 (#95563)
Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265.

Fixes #94265
2024-07-14 12:09:37 +01:00
Dmitry Borisenkov
a38d5e0632
[SelectionDAG] Use LAST_INTEGER_VALUETYPE instead of i64 (#98299)
When looking for a largest legal integer type for a target
`TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is
the largets possible integer type. The patch removes this assumption and
uses `MVT::LAST_INTEGER_VALUETYPE` instead.
2024-07-10 21:38:50 +04:00
Craig Topper
8419da8bd4
[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653)
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
2024-07-04 18:33:25 -07:00
Craig Topper
3141c11fe8
[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)
This argument is no longer used inside the function. Remove it from the
interface.
2024-07-04 15:24:54 -07:00
Simon Pilgrim
92715cf43b
[DAG] expandAVG - attempt to extend to a wider integer type for the add/shift to avoid overflow handling (#95788) 2024-06-26 13:33:09 +01:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Poseydon42
995835fe6d
[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
2024-06-17 11:16:52 +02:00
Simon Pilgrim
76c5158aed [DAG] combineShiftToAVG - don't create avgfloor with scalar constant operands unless legal.
Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime.

Fixes #95284
2024-06-13 12:37:43 +01:00
Simon Pilgrim
ca33796d54 [DAG] combineShiftToAVG - only create new types before LegalTypes
Fixes #95271
2024-06-12 18:49:49 +01:00
Simon Pilgrim
760ad23e48 [DAG] combineShiftToAVG - ensure the reduced demanded value type is smaller than the original.
Now we have promotion support we should be able to remove the next-power-of-2 code entirely, but this is good enough for now.
2024-06-12 18:13:30 +01:00
Simon Pilgrim
ea2ee5dc2f
[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes (#92096)
Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization.

I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.
2024-06-12 14:11:07 +01:00
c8ef
b25b1db819
[KnownBits] Remove hasConflict() assertions (#94568)
Allow KnownBits to represent "always poison" values via conflict.

close: #94436
2024-06-07 17:01:22 +02:00
Matt Arsenault
212b78aad4
DAG: Improve fminimum/fmaximum vector expansion logic (#93579)
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client
wanted it expanded, and it can always be expanded. This logic was tied
up with what the VectorLegalizer wanted.
    
Prefer using the min/max opcodes, and unrolling if we don't have a
vselect.
This seems to produce better code in all the changed tests.
2024-06-06 19:01:39 +02:00
Simon Pilgrim
8725b67207 [DAG] expandABS - add missing FREEZE in abs(x) -> smax(x,sub(0,x)) expansion
Noticed while working on #94601
2024-06-06 13:26:17 +01:00
Simon Pilgrim
2b1dfd2b35
[DAG] Replace getValid*ShiftAmountConstant helpers with getValid*ShiftAmount helpers to support KnownBits analysis (#93182)
The getValidShiftAmountConstant/getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant helpers only worked with constant shift amounts, which could be problematic after type legalization (e.g. v2i64 might be partially scalarized or split into v4i32 on some targets such as 32-bit x86, Thumb2 MVE).

This patch proposes we generalize these helpers to work with ConstantRange+KnownBits if a scalar/buildvector constant isn't available.

Most restrictions are the same - the helper fails if any shift amount is out of bounds, getValidShiftConstant must be a specific constant uniform etc.

However, getValidMinimumShiftAmount/getValidMaximumShiftAmount now can return bounds values that aren't values in the actual data, as they are based off the common KnownBits of every vector element.

This addresses feedback on #92096
2024-06-01 16:48:26 +01:00
Matt Arsenault
aef0bdd36d
DAG: Preserve flags when expanding fminimum/fmaximum (#93550)
The operation selection logic here doesn't really work when vector types
need to be split. This was also dropping the flags, and losing nnan made
the combine from select back to fmin/fmax unrecoverable. Preserve the
flags to assist a future commit.
2024-05-29 12:26:27 +02:00
Simon Pilgrim
8a395b00b8 [DAG] Use auto* for cast/dyn_cast results (style). NFC. 2024-05-28 10:03:04 +01:00
Craig Topper
a1c9b9673c
[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (#93221)
This maintains consistency with the non-VP ISD opcodes.
2024-05-24 09:03:19 -07:00
Simon Pilgrim
aefd2572a5
[DAG][X86] expandABD - add branchless abds/abdu expansion for 0/-1 comparison result cases (#92780)
If the comparison results are allbits masks, we can expand as `abd(lhs, rhs) -> sub(cmpgt(lhs, rhs), xor(sub(lhs, rhs), cmpgt(lhs, rhs)))`, replacing a sub+sub+select pattern with the simpler sub+xor+sub pattern.

This allows us to remove a lot of X86 specific legalization code, and will be useful in future generic expansion for the legalization work in #92576

Alive2: https://alive2.llvm.org/ce/z/sj863C
2024-05-23 11:07:35 +01:00
Yingwei Zheng
c8dc6b59d6
[SDAG] Improve SimplifyDemandedBits for mul (#90034)
If the RHS is a constant with X trailing zeros, then the X MSBs of the
LHS are not demanded.

Alive2: https://alive2.llvm.org/ce/z/F5CyJW
Fixes https://github.com/llvm/llvm-project/issues/56645.
2024-05-22 22:43:10 +08:00
Yingwei Zheng
cf128305bd
[SDAG] Don't treat ISD::SHL as a uniform binary operator in ShrinkDemandedOp (#92753)
In `TargetLowering::ShrinkDemandedOp`, types of lhs and rhs may differ
before legalization.
In the original case, `VT` is `i64` and `SmallVT` is `i32`, but the type
of rhs is `i8`. Then invalid truncate nodes will be created.

See the description of ISD::SHL for further information:
> After legalization, the type of the shift amount is known to be
TLI.getShiftAmountTy(). Before legalization, the shift amount can be any
type, but care must be taken to ensure it is large enough.


605ae4e93b/llvm/include/llvm/CodeGen/ISDOpcodes.h (L691-L712)

This patch stops handling ISD::SHL in `TargetLowering::ShrinkDemandedOp`
and duplicates the logic in `TargetLowering::SimplifyDemandedBits`.
Additionally, it adds some additional checks like
`isNarrowingProfitable` and `isTypeDesirableForOp` to improve the
codegen on AArch64.

Fixes https://github.com/llvm/llvm-project/issues/92720.
2024-05-22 20:20:33 +08:00
Simon Pilgrim
bbc4c2e047 [DAG] SimplifyDemandedBits - ensure we have simplified the shift operands before folding to AVG
Pulled out of #92096 - ensure we have completed a topological simplification of the SRA/SRL shift operands before we try to combine to a AVG node, as its difficult to later simplify through AVG nodes.
2024-05-22 11:55:03 +01:00
Simon Pilgrim
117d755b1b
[DAG] SimplifyDemandedBits - use ComputeKnownBits instead of getValidShiftAmountConstant to check for constant shift amounts. (#92412)
This allows us to handle cases where the constant has already been type legalized behind a bitcast

Despite calling ComputeKnownBits I'm not seeing any notable change in compile time.
2024-05-16 17:04:30 +01:00
Simon Pilgrim
311339e25c [DAG] SimplifyDemandedBits - ISD::AND - only request DemandedElts when looking for a splat constant
Limit the isConstOrConstSplat call to the vector elements we care about

Noticed while investigating regressions in #92096
2024-05-16 13:05:35 +01:00
PiJoules
19008d3218
[llvm] Support fixed point multiplication on AArch64 (#84237)
Prior to this, fixed point multiplication would lead to this assertion
error on AArhc64, armv8, and armv7.

```
 _Accum f(_Accum x, _Accum y) { return x * y; }

// ./bin/clang++ -ffixed-point /tmp/test2.cc -c -S -o - -target aarch64 -O3
clang++: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:10245: void llvm::TargetLowering::forceExpandWideMUL(SelectionDAG &, const SDLoc &, bool, EVT, const SDValue, const SDValue, const SDValue, const SDValue, SDValue &, SDValue &) const: Assertion `Ret.getOpcode() == ISD::MERGE_VALUES && "Ret value is a collection of constituent nodes holding result."' failed.
```

This path into forceExpandWideMUL should only be taken if we don't
support [US]MUL_LOHI or MULH[US] for the operand size (32 in this case).
But we should also check if we can just leverage regular wide
multiplication. That is, extend the operands from 32 to 64, do a regular
64-bit mul, then trunc and shift. These ops are certainly available on
aarch64 but for wider types.
2024-05-14 11:23:45 -07:00
Matt Arsenault
6a8d30b1c1
DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326)
dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to
assume these order -0 as less than +0.
2024-05-09 14:41:13 +02:00
Jinsong Ji
2dade0041a
[Analysis] Attribute Range should not prevent tail call optimization (#91122)
- Remove Range attr when comparing for tailcall
- Add test for testcall with range
2024-05-07 22:02:10 -04:00
Min-Yih Hsu
539f626ecd
[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502)
This intrinsic is the VP version of `experimental.cttz.elts`.
2024-04-30 09:27:10 -07:00
Qiu Chaofan
4a8f2f2e1a
[Legalizer] Expand fmaximum and fminimum (#67301)
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.

Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
2024-04-29 15:09:54 +08:00
fengfeng
36230f90ee
[SelectionDAG] Propagate Disjoint flag. (#88370)
Signed-off-by: feng.feng <feng.feng@iluvatar.com>
2024-04-15 11:01:15 +02:00
Björn Pettersson
33e6b488be
[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646)
The load narrowing part of TargetLowering::SimplifySetCC is updated
according to this:

1) The offset calculation (for big endian) did not work properly for
   non byte-sized types. This is basically solved by an early exit
   if the memory type isn't byte-sized. But the code is also corrected
   to use the store size when calculating the offset.
2) To still allow some optimizations for non-byte-sized types the
   TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is
   added. By default it assumes that scalar integer types are padded
   starting at the most significant bits, if the type needs padding
   when being stored to memory.
3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is
   true, as that hook makes it possible for TargetLowering to know
   how the non byte-sized value is aligned in memory.
4) Update the algorithm to always search for a narrowed load with
   a power-of-2 byte-sized type. In the past the algorithm started
   with the the width of the original load, and then divided it by
   two for each iteration. But for a type such as i48 that would
   just end up trying to narrow the load into a i24 or i12 load,
   and then we would fail sooner or later due to not finding a
   newVT that fulfilled newVT.isRound().
   With this new approach we can narrow the i48 load into either
   an i8, i16 or i32 load. By checking if such a load is allowed
(e.g. alignment wise) for any "multiple of 8 offset", then we can find
   more opportunities for the optimization to trigger. So even for a
   byte-sized type such as i32 we may now end up narrowing the load
   into loading the 16 bits starting at offset 8 (if that is allowed
   by the target). The old algorithm did not even consider that case.
5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset
   when creating the new ptr. This way we get "nsw" on the add.
2024-04-12 16:18:12 +02:00
Jay Foad
1b761205f2
[APInt] Add a simpler overload of multiplicativeInverse (#87610)
The current APInt::multiplicativeInverse takes a modulus which can be
any value, but all in-tree callers use a power of two. Moreover, most
callers want to use two to the power of the width of an existing APInt,
which is awkward because 2^N is not representable as an N-bit APInt.

Add a new overload of multiplicativeInverse which implicitly uses
2^BitWidth as the modulus.
2024-04-04 16:11:06 +01:00
Simon Pilgrim
2d0087424f
[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480)
Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds.

This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements.

There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads?

Fixes #86968
2024-04-04 11:10:55 +01:00