1573 Commits

Author SHA1 Message Date
Paul Walker
02dd6b1014
[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803)
Specifically:
  fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
  fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
  ftrunc
2024-10-07 13:01:59 +01:00
Matt Arsenault
5883ad34d6
DAG: Handle vector legalization of minimumnum/maximumnum (#109779)
Follow the same patterns as the other min/max variants.
2024-09-30 13:43:35 +04:00
Jonas Paulsson
14120227a3
Target ABI: improve call parameters extensions handling (#100757)
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.

As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.

Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
2024-09-19 16:59:31 +02:00
Pierre van Houtryve
758444ca3e
[AMDGPU] Promote uniform ops to I32 in DAGISel (#106383)
Promote uniform binops, selects and setcc between 2 and 16 bits to 32
bits in DAGISel

Solves #64591
2024-09-19 09:00:21 +02:00
David Green
960c975acd
[AArch64] Expand scmp/ucmp vector operations with sub (#108830)
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
2024-09-16 18:44:52 +01:00
Lawrence Benson
b74e779219
[x86] Add lowering for @llvm.experimental.vector.compress (#104904)
This is a follow-up to #92289 that adds lowering of the new
`@llvm.experimental.vector.compress` intrinsic on x86 with AVX512
instructions. This intrinsic maps directly to `vpcompress`.
2024-09-13 21:48:01 +02:00
YunQiang Su
5773adb0bf
SelectionDAG: Remove unneeded getSelectCC in expandFMINIMUMNUM_FMAXIMUMNUM (#107416)
ISD::FCANONICALIZE is enough, which can process NaN or non-NaN
correctly, thus getSelectCC is not needed here.
2024-09-11 09:53:04 +08:00
Simon Pilgrim
7e07c1df67 [DAG] expandAVG - consistently use getShiftAmountConstant for constant shift amounts. NFC 2024-09-10 09:25:58 +01:00
Matt Arsenault
77f1b481b8
DAG: Lower single infinity is.fpclass tests to fcmp (#100380)
InstCombine also should have taken care of this, but this
should be helpful when the fcmp based lowering strategy tries
to combine multiple tests.
2024-09-06 09:15:18 +04:00
Matt Arsenault
fc3e6a8186
DAG: Handle lowering unordered compare with inf (#100378)
Try to take advantage of the nan check behavior of fcmp.
x86_64 looks better, x86_32 looks worse.
2024-09-05 19:54:32 +04:00
Dávid Ferenc Szabó
e9eaf19eb6
[CodeGen] Allow mixed scalar type constraints for inline asm (#65465)
GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i
is integer type and f is floating point type. Currently this code
produces an error with Clang. The change allows mixed scalar types
between input and output constraints.

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2024-08-29 22:53:28 +04:00
David Majnemer
ea1f05e618 [SelectionDAG] Fix lowering of IEEE 754 2019 minimum/maximum
We used integer comparisons instead of floating point comparisons
resulting in very odd behavior.
2024-08-20 05:09:49 +00:00
Craig Topper
f7d94b783f [SelectionDAG] Use getAllOnesConstant. 2024-08-17 17:57:05 -07:00
Craig Topper
067f2e9f18 [SelectionDAG] Use getSignedConstant/getAllOnesConstant. 2024-08-17 00:04:01 -07:00
Craig Topper
7afb51e035
[SelectionDAG][X86] Add SelectionDAG::getSignedConstant and use it in a few places. (#104555)
PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.

This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
2024-08-16 09:21:11 -07:00
Craig Topper
3dea42f3e5
[TargetLowering] Don't call SelectionDAG::getTargetLoweringInfo() from TargetLowering methods. NFC (#104197)
If we are inside a TargetLowering method,
`SelectionDAG::getTargetLoweringInfo()` should be the same as `this`.
2024-08-15 12:33:12 -07:00
YunQiang Su
fb9e685fc4
Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.

This patch introduces support only support for scalar values. The
support of
  vector (vp, vp.reduce, vector.reduce),
  experimental.constrained
will be added in future patches.

With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.

Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.

The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.

Background

https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
   1) Fallback to fmin(3)/fmax(3): return qNaN.
   2) ARM64/ARM32+Neon: same as libc.
   3) MIPSr6/LoongArch/RISC-V: return NUM.

And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
2024-08-15 14:09:36 +08:00
Craig Topper
e687a9f2dd [TargetLowering] Remove unncessary null check. NFC 2024-08-14 12:26:41 -07:00
Craig Topper
abc1acf8df
[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751)
If the upper bits of the shr aren't demanded.

This helps with cases where the outer srl was originally an sra and was
converted to a srl by SimplifyDemandedBits before it had a chance to
combine with the inner sra. This can occur when the inner sra was part
of a sign_extend_inreg expansion.

There are some regressions in ARM and Thumb2.
2024-08-14 08:44:57 -07:00
Craig Topper
51bad732dc [SelectionDAG] Replace EVTToAPFloatSemantics with MVT/EVT::getFltSemantics. (#103001) 2024-08-13 11:35:28 -07:00
Pierre van Houtryve
7389545d0d
Reapply "[AMDGPU] Always lower s/udiv64 by constant to MUL" (#101942)
Reland #100723, fixing the ARM issue at the cost of a small loss of optimization in `test/CodeGen/AMDGPU/fshr.ll`

Solves #100383
2024-08-12 09:00:22 +02:00
Craig Topper
0c783be985 [TargetLowering] Use APInt::isSubsetOf to simplify an expression. NFC 2024-08-09 22:09:40 -07:00
Bjorn Pettersson
bbefd5713f [TargetLowering] Handle vector types in expandFixedPointMul (#102635)
In TargetLowering::expandFixedPointMul when expanding fixed point
multiplication, and when using a widened MUL as strategy for the
lowering, there was a bug resulting in assertion failures like this:
   Assertion `VT.isVector() == N1.getValueType().isVector() &&
   "SIGN_EXTEND result type type should be vector iff the operand "
   "type is vector!"' failed.

Problem was that we did not consider that VT could be a vector type
when setting up the WideVT. This patch should fix that bug.
2024-08-10 00:25:57 +02:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Simon Pilgrim
13d04fa560 [DAG] Add legalization handling for ABDS/ABDU (#92576) (REAPPLIED)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv

REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization
2024-08-08 11:39:05 +01:00
cceerczw
6f8e8faa12
[TargetLowering] Fix the problem of emulated-TLS implementation witho… (#101490)
For a __thread variable x, when emulated TLS is enabled and there is an
access to x, the compiler first looks up the symbol __emutls_v.x within
the module. However, the issue arises with an alias y of x, the compiler
still tries to look up __emutls_v.y instead of __emutls_v.x. As a
result, the lookup returns a nullptr, causing the compiler to crash. The
purpose of this MR (Merge Request) is to ensure that in emulated TLS,
before checking __emutls_v.y, the compiler first identifies which global
value y is an alias of.
2024-08-07 21:56:48 +04:00
Simon Pilgrim
e4e96b3e26 Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576)"
Reverting #92576 while we identify a reported regression
2024-08-07 17:11:25 +01:00
Simon Pilgrim
b1234ddbe2
[DAG] Add legalization handling for ABDS/ABDU (#92576)
Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization.

abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs))
Alive2: https://alive2.llvm.org/ce/z/dVdMyv
2024-08-06 10:18:06 +01:00
Sergei Barannikov
4527fba9ad
Revert "[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall" (#101740)
Reverts the rest of llvm/llvm-project#99752
2024-08-03 01:51:26 +03:00
Fangrui Song
0b92e70dfb Revert "[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723)"
This reverts commit 92fbc963a51683d32f70d0c7f3783bb13983f08d.

The patch also affected ARM and caused an assertion failure during
CurDAG->Legalize
(https://github.com/llvm/llvm-project/pull/100723#issuecomment-2266154211).
2024-08-02 14:43:36 -07:00
Pierre van Houtryve
92fbc963a5
[AMDGPU] Always lower s/udiv64 by constant to MUL (#100723)
Solves #100383
2024-08-02 12:22:42 +02:00
Sergei Barannikov
92e18ffd80
[SDag][ARM][RISCV] Allow lowering CTPOP into a libcall (#99752)
The main change is adding CTPOP to `RuntimeLibcalls.def` to allow
targets to use LibCall action for CTPOP. DAG legalizers are changed
accordingly.
2024-08-02 12:29:39 +03:00
Julius Alexandre
7231776a02
Recommit "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types" (#101223)
Previous reverted merge: https://github.com/llvm/llvm-project/pull/99913

Original message:
**Issue:** https://github.com/rust-lang/rust/issues/124790
**Previous PR:** https://github.com/llvm/llvm-project/pull/99614

https://rust.godbolt.org/z/T7eKP3Tvo

**Aarch64:** https://alive2.llvm.org/ce/z/dqr2Kg
**x86:** https://alive2.llvm.org/ce/z/ze88Hw
2024-07-30 19:00:46 -07:00
Craig Topper
fed94333fd Revert "[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913)"
This reverts commit d5521d128494690be66e03a674b9d1181935bf77.

The AArch64 test is failing on the bots.
2024-07-27 18:35:44 -07:00
Julius Alexandre
d5521d1284
[DAG] Reducing instructions by better legalization handling of AVGFLOORU for illegal data types (#99913)
**Issue:** https://github.com/rust-lang/rust/issues/124790
**Previous PR:** https://github.com/llvm/llvm-project/pull/99614

https://rust.godbolt.org/z/T7eKP3Tvo

**Aarch64:** https://alive2.llvm.org/ce/z/dqr2Kg
**x86:** https://alive2.llvm.org/ce/z/ze88Hw

cc: @RKSimon @topperc
2024-07-27 17:33:09 -07:00
Matt Arsenault
361d4cf533
DAG: Lower is.fpclass fcSubnormal|fcZero to fabs(x) < smallest_normal (#100390)
Produces better code on x86_64 only in the unordered case. Not
sure what the exact condition should be to avoid the regression. Free
fabs might do it, or maybe requires legality checks for the alternative
integer expansion.
2024-07-26 22:45:47 +04:00
AtariDreams
871740761f
[CodeGen] Remove checks for vectors in unsigned division prior to computing leading zeros (#99524)
It turns out we can safely use
DAG.computeKnownBits(N0).countMinLeadingZeros() with constant legal
vectors, so remove the check for it.
2024-07-19 12:15:36 +08:00
AtariDreams
a51f343b43
[CodeGen] Emit more efficient magic numbers for exact udivs (#87161)
Have simpler lowering for exact udivs in both SelectionDAG and
GlobalISel.

The algorithm is the same between unsigned exact divs and signed divs
save for arithmetic vs logical shift for even divisors, according to
Hacker's Delight, 2nd Edition, page 242.
2024-07-17 12:19:02 -07:00
Lawrence Benson
177ce1900f
[LLVM] Add llvm.experimental.vector.compress intrinsic (#92289)
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.

The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.

In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.

See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
2024-07-17 14:24:24 +02:00
Volodymyr Vasylkun
e094abde42
[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774)
The previous expansion of [US]CMP was done using two selects and two
compares. It produced decent code, but on many platforms it is better to
implement [US]CMP nodes by performing the following operation:

  ```
[us]cmp(x, y) = (x [us]> y) - (x [us]< y)
```

This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.
2024-07-16 20:56:18 +01:00
Froster
c8dc21d77f
[SelectionDAG][RISCV] Fix break of vnsrl pattern in issue #94265 (#95563)
Added a RISCV overload of `isTruncateFree` to fix the break of vnsrl described in issue #94265.

Fixes #94265
2024-07-14 12:09:37 +01:00
Dmitry Borisenkov
a38d5e0632
[SelectionDAG] Use LAST_INTEGER_VALUETYPE instead of i64 (#98299)
When looking for a largest legal integer type for a target
`TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is
the largets possible integer type. The patch removes this assumption and
uses `MVT::LAST_INTEGER_VALUETYPE` instead.
2024-07-10 21:38:50 +04:00
Craig Topper
8419da8bd4
[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653)
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
2024-07-04 18:33:25 -07:00
Craig Topper
3141c11fe8
[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)
This argument is no longer used inside the function. Remove it from the
interface.
2024-07-04 15:24:54 -07:00
Simon Pilgrim
92715cf43b
[DAG] expandAVG - attempt to extend to a wider integer type for the add/shift to avoid overflow handling (#95788) 2024-06-26 13:33:09 +01:00
Nikita Popov
f2f18459d4 Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.

This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21 08:34:04 +02:00
YunQiang Su
8988148003
Intrinsic: introduce minimumnum and maximumnum (#93841)
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:

When we compare sNaN vs NUM:

ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
     +0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.

So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.

Half-fix: #93033
2024-06-21 11:53:08 +08:00
Poseydon42
995835fe6d
[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
2024-06-17 11:16:52 +02:00
Simon Pilgrim
76c5158aed [DAG] combineShiftToAVG - don't create avgfloor with scalar constant operands unless legal.
Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime.

Fixes #95284
2024-06-13 12:37:43 +01:00
Simon Pilgrim
ca33796d54 [DAG] combineShiftToAVG - only create new types before LegalTypes
Fixes #95271
2024-06-12 18:49:49 +01:00