558 Commits

Author SHA1 Message Date
Michael Maitland
6f9cb9a75c
[RISCV][GISEL] Legalize G_VAARG through expansion. (#73065)
G_VAARG can be expanded similiar to SelectionDAG::expandVAArg through
LegalizerHelper::lower. This patch implements the lowering through this
style of expansion.

The expansion gets the head of the va_list by loading the pointer to
va_list. Then, the head of the list is adjusted depending on argument
alignment information. This gives a pointer to the element to be read
out of the va_list. Next, the head of the va_list is bumped to the next
element in the list. The new head of the list is stored back to the
original pointer to the head of the va_list so that subsequent G_VAARG
instructions get the next element in the list. Lastly, the element is
loaded from the alignment adjusted pointer constructed earlier.

This change is stacked on #73062.
2023-12-08 13:24:27 -05:00
Craig Topper
d605d9d7a1
[RISCV][GISel] Support G_ROTL/G_ROTR with Zbb. (#72825) 2023-12-04 13:00:34 -08:00
Momchil Velikov
c1140d49ec
[AArch64] Stack probing for dynamic allocas in GlobalISel (#67123)
Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>
2023-12-04 09:44:02 +00:00
David Green
295edaab13 [AArch64][GlobalISel] Better vecreduce.fadd lowering. (PR #73294)
This changes the fadd legalization to handle fp16 types, and treats more types
as legal so that the backend can produce the correct patterns. This is
currently a missing identity fold for `fadd x -0.0 -> x`
2023-11-27 08:20:54 +00:00
Craig Topper
5d501b1091
[GISel][RISCV] Fix several boundary cases in narrow G_SEXT_INREG. (#72719)
This fixes cases when SizeInBits is a multiple of the narrow size.

If SizeBits is equal to NarrowTy size, the first block would create an
illegal G_SEXT_INREG where the the extension size is equal to the type.
I tried to turn it into G_TRUNC+G_SEXT, but that just turned back into
G_SEXT_INREG causing an infinite loop. So punt to the splitting case.

In the for loop we should copy when the part ends on SizeInBits. In that
case there is no G_SEXT_INREG needed for partial. But we should note
that register in PartialExtensionReg for the first full part to use.

If the part starts on SizeInBits then we should do an AShr of
PartialExtensionReg.

We should only get to the G_SEXT_INREG case if the SizeInBits is in the
middle of the part.
2023-11-24 08:39:38 -08:00
Min-Yih Hsu
7c3c8a1277
[RISCV][GISel] Add support for G_IS_FPCLASS in F and D extensions (#72000)
Add legalizer, regbankselect, and isel supports for floating point
version of G_IS_FPCLASS.
2023-11-22 16:43:20 -08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Craig Topper
44e8bea400
[GISel][AArch64] Notify the Observer when CTTZ lowering changes the opcode to CTPOP. (#72008) 2023-11-12 19:36:24 -08:00
David Green
10ce319320
[AArch64][GlobalISel] Expand handling for sitofp and uitofp (#71282)
Similar to #70635, this expands the handling of integer to fp
conversions. The code is very similar to the float->integer conversions
with types handled oppositely. There are some extra unhandled cases
which require more handling for ASR operations.
2023-11-10 13:41:13 +00:00
David Green
54574d3272
[AArch64][GlobalISel] Expand handling for fptosi and fptoui (#70635)
Now that we have more types handled for zext/sext and trunc, it is
possible to get more types working for the vector float to integer
conversions. This patch adds fp16, widening and narrowing vector support
to handle more types. The smaller types wil be expanded to the size of
the larger element type. A couple of case require more awkward truncates
to get working as they go from illegal to illegal types.
2023-11-04 11:47:05 +00:00
Craig Topper
3750558ee1
[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635)
Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.
2023-10-13 20:34:45 -07:00
chuongg3
d88d9834e9
[AArch64][GlobalISel] Support more types for TRUNC (#66927)
G_TRUNC will get lowered into trunc(merge(trunc(unmerge),
trunc(unmerge))) if the source is larger than 128 bits or the truncation
is more than half of the current bit size.

Now mirrors ZEXT/SEXT code more closely for vector types.
2023-10-11 16:05:25 +01:00
Serge Pavlov
462d5830da [GlobalISel] Add support for *_fpmode intrinsics
The change implements support of the intrinsics `get_fpmode`,
`set_fpmode` and `reset_fpmode` in Global Instruction Selector. Now they
are lowered into library function calls.

Differential Revision: https://reviews.llvm.org/D158260
2023-10-09 21:14:07 +07:00
Matt Arsenault
1328a8534b
AMDGPU: Fix handling of -0 in round lowering (#65761) 2023-09-19 09:14:17 +03:00
Allen
eaf23b2480
[GIsel][AArch64] Legalize <2 x i16> for G_INSERT_VECTOR_ELT (#65830)
Widen the vector elements to 64 bits to make sure it legal instead by
clamping the number of elements. Depend on D153394.

Fixes https://github.com/llvm/llvm-project/issues/63826
2023-09-12 21:15:01 +08:00
Jay Foad
71ca53b6cf
[GlobalISel] Lower G_SHUFFLE_VECTOR with scalar result (#65275) 2023-09-04 13:32:43 -04:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
David Green
58a2f839fd [AArch64][GISel] Expand coverage of FDiv and move into place.
This adds some more extensive test coverage for fdiv through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases and moving it into the position of the existing code which is no longer
needed.
2023-08-30 22:09:53 +01:00
David Green
ef0b8cf3f4 [AArch64][GISel] Expand coverage of FAdd and FSub.
This adds some more extensive test coverage for fadd/fsub through global isel,
switching the opcodes to use the more complete ActionDefinitions to handle more
cases.
2023-08-23 09:51:06 +01:00
Tuan Chuong Goh
a40c984976 [AArch64][GlobalISel] Support more legal types for EXTEND
Expand (s/z/any)ext instructions to be compatible with more
types for GlobalISel.
This patch mainly focuses on 64-bit and 128-bit vectors with
element size of powers of 2.
It also notably handles larger than legal vectors.

Differential Revision: https://reviews.llvm.org/D157113
2023-08-21 09:51:17 +01:00
Craig Topper
c6dee6982f [GlobalISel][Mips] Sync G_UADDE and G_USUBE legalization with LegalizeDAG.
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.

Comments taken from LegalizeDAG and tweaked.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158232
2023-08-17 20:36:55 -07:00
Jie Fu
d1a4b8c56f [GlobalISel] Remove unused variable 'Or' (NFC)
/Users/jiefu/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:3450:10: error: unused variable 'Or' [-Werror,-Wunused-variable]
    auto Or = MIRBuilder.buildOr(CarryOut, And, Res_ULT_LHS);
         ^
1 error generated.
2023-08-18 06:40:41 +08:00
Craig Topper
ebb2e5ebb2 [GlobalISel][Mips] Correct corner case in G_UADDE legalization.
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.

In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157943
2023-08-17 15:06:16 -07:00
David Green
cf65afbf93 [AArch64][GISel] Extend lowering for fp round intrinsics.
This extends the lowering of ceil, floor, nearbyint, rint, round, roundeven and
trunc. They are all very similar, so can reuse the same legalization info.
selectIntrinsicTrunc and selectIntrinsicRound can be removed as they can be
selected via tablegen patterns, and G_INTRINSIC_ROUNDEVEN is marked as a gisel
equivalent of froundeven. Otherwise this reuses the existing code, filling it
out to handle more types.

Differential Revision: https://reviews.llvm.org/D157679
2023-08-17 16:25:32 +01:00
David Green
a3f2751f78 [AArch64][GISel] Add handling for G_VECREDUCE_FMAXIMUM and G_VECREDUCE_FMINIMUM
This is a lot of copy-pasting for the existing handling of
G_VECREDUCE_FMAX/G_VECREDUCE_FMIN to add handling for
G_VECREDUCE_FMAXIMUM/G_VECREDUCE_FMINIMUM in the same way.

Differential Revision: https://reviews.llvm.org/D156615
2023-08-14 10:03:25 +01:00
David Green
d199478af4 [AArch64][GISel] Handling for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX
This adds legalization for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX, where the
selection can go via tablegen patterns. I haven't tried to get non-power2 types
working yet, just the more legal types.

Differential Revision: https://reviews.llvm.org/D156614
2023-08-14 09:19:47 +01:00
Bjorn Pettersson
a7ee80fab2 [llvm] Drop some more typed pointer bitcasts etc. 2023-08-13 16:46:56 +02:00
Amara Emerson
b9669789c3 [GlobalISel][NFC] Introduce a GVecReduce wrapper class and a minor refactor. 2023-08-12 13:55:08 -07:00
David Green
acd17ea662 [AArch64][GISel] Expand handling for G_FSQRT to more vector types
Similar to G_FABS, these can reuse the existing lowering to successfully handle
more types.
2023-08-11 10:16:45 +01:00
Matt Arsenault
1ca0808db2 GlobalISel: Don't expand stacksave/stackrestore in IRTranslator
In some (likely invalid edge cases anyway), it's not correct to
directly copy the stack pointer register.
2023-08-09 18:33:55 -04:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
David Green
6edc9a7662 [AArch64][GISel] Additional FPExt vector lowering
Similar to D155311, this adds lowering for more vector cases for FPExt

Differential Revision: https://reviews.llvm.org/D155601
2023-07-23 16:58:13 +01:00
David Green
74c0bdff7d [AArch64][GISel] Additional FPTrunc vector lowering
I was attempting to add llvm.reduce.fminimum/fmaximum support for GlobalISel.
In the process I noticed that llvm.reduce.fmin/fmax was missing, and could do
with being added first. That led on to adding additional vector support for
minnum/maxnum, which in turn led to needing to handle fptrunc and fpext for
some of the fp16 types. So this patch extends the vector handling for fptrunc,
adding support for f16 types which are clamped to 4 elements, and scalarizing
the rest.

I went round in circles a little with how smaller than legal vectors should be
handled, but this seems simple and seems to work, if not always optimally yet.

Differential Revision: https://reviews.llvm.org/D155311
2023-07-18 18:52:19 +01:00
Ivan Kosarev
e705b2b1f4 Fix warnings about unused varibles on builds without asserts. 2023-07-12 14:45:29 +01:00
Ivan Kosarev
15e7749e19 [Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154528
2023-07-12 11:55:19 +01:00
Matt Arsenault
61820f8b5d CodeGen: Optimize lowering of is.fpclass fcZero|fcSubnormal
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.

https://reviews.llvm.org/D143182
2023-07-06 13:03:57 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
David Green
2802739dfd [NFC] Replace ;; with ; 2023-06-11 10:25:24 +01:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Mateja Marjanovic
cf76074a36 [AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary
Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass,
check if it has the exact same value.

For LLVM vector types that don't have a corresponding Register Class, widen them during legalization.
That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR.

Differential revision: https://reviews.llvm.org/D148096
Reviewers: foad, arsenm
2023-05-03 17:32:24 +02:00
Mateja Marjanovic
6175ec0bb6 Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR"
This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.
2023-05-03 17:28:01 +02:00
Mateja Marjanovic
b25c7cafcb [AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR
Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT,
G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.
2023-05-03 17:14:38 +02:00
Sergei Barannikov
38d84e3d76 [GISel] Legalize G_FSUB to G_FADD + G_FNEG even if G_FNEG is illegal
`G_FNEG` used to be legalized to `G_FSUB -0, x` causing infinite loop.
This is no longer the case after D84287.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148187
2023-04-15 08:11:49 +03:00
Amara Emerson
719024a0d0 [GlobalISel][NFC] Add MachineInstr::getFirst[N]{Regs,LLTs}() helpers to extract regs & types.
These reduce the typing and clutter from:
Register Dst = MI.getOperand(0).getReg();
Register Src1 = MI.getOperand(1).getReg();
Register Src2 = MI.getOperand(2).getReg();
Register Src3 = MI.getOperand(3).getReg();
LLT DstTy = MRI.getType(Dst);
... etc etc

To just:
auto [Dst, Src1, Src2, Src3] = MI.getFirst4Regs();
auto [DstTy, Src1Ty, Src2Ty, Src3Ty] = MI.getFirst4LLTs();

Or even more concise:
auto [Dst, DstTy, Src1, Src1Ty, Src2, Src2Ty, Src3, Src3Ty] =
     MI.getFirst4RegLLTs();

Differential Revision: https://reviews.llvm.org/D144687
2023-04-12 16:43:14 -07:00
Matt Arsenault
9356ec1516 CodeGen: Reorder case handling for is.fpclass legalization
Subnormal and zero checks can be combined into one, so move
the code closer to reduce the diff in a future change.
2023-03-17 11:29:50 -04:00
Matt Arsenault
61f2f2c64a GlobalISel: Use FPClassTest in is.fpclass lowering 2023-03-17 10:23:01 -04:00
Vladislav Dzhidzhoev
3a51eed948 [AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR with smaller dest size
Legalize G_SHUFFLE_VECTOR having destination vector length smaller than
source vector length by reshaping destination vector.

Differential Revision: https://reviews.llvm.org/D144670
2023-02-27 23:46:44 +01:00
Jessica Del
fc672b6a8b [AMDGPU] Improved wide multiplies
These checks show optimized instructions if an operand is known to be
(partially) zero.

Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a

Differential Revision: https://reviews.llvm.org/D140208
2023-02-22 16:39:06 +01:00
Kazu Hirata
b7ffd9686d Use APInt::getAllOnes instead of APInt::getAllOnesValue (NFC)
Note that getAllOnesValue has been soft-deprecated in favor of
getAllOnes.
2023-02-19 22:54:23 -08:00