50770 Commits

Author SHA1 Message Date
Craig Topper
0a459dd4e9 [RISCV] Add tests for selecting G_BRCOND+G_ICMP. NFC
These should have been part of e0e0891d741588684b0803d7724e5080f9c75537
2023-11-13 15:29:34 -08:00
Craig Topper
29d75cb9dc [RISCV][GISel] Update legalize-smin.mir and legalize-smax.mir to test G_SMIN/G_SMAX.
Looks like an incomplete fixup was done after copying the umin/umax tests.
2023-11-13 13:49:14 -08:00
Craig Topper
915e092400 [RISCV] Select zext as sext when sign bit is 0 for -riscv-experimental-rv64-legal-i32
In our default SelectionDAG where i32 isn't legal, the zext will become
and i64 AND and often get optimized out on its own. With i32 legal, we
need to turn it in into sext.w and rely on RISCVOptWInstrs to remove it.
2023-11-13 12:21:36 -08:00
Craig Topper
05300222ba
[RISCV][GISel] Add really basic support for FP regbank selection for G_LOAD/G_STORE. (#70896)
Coerce the register bank based on the users of the G_LOAD or the
defining instruction for the G_STORE.

s64 on rv32 is handled by forcing the FPRB register bank.
2023-11-13 12:12:16 -08:00
Craig Topper
70ce047f7e
[RISCV] Legalize G_CTLZ/G_CTLZ_ZERO_UNDEF/G_CTTZ/G_CTTZ_ZERO_UNDEF. (#72014)
The base ISA does not support these operations. A future patch will
enable them for Zbb.
2023-11-13 11:22:48 -08:00
Tom Stellard
877226f01f
[X86] Simplify regex in pr42616.ll test (#71980) 2023-11-13 11:05:42 -08:00
Craig Topper
d8576e4542 [RISCV][GISel] Update RV64 legalize-ctpop.mir to account for constant shift amounts being i64 now.
This changed while the ctpop patch was in review and I forgot to update it.
2023-11-13 10:38:36 -08:00
Craig Topper
90dd4c470f
[RISCV][GISel] Legalize G_CTPOP. (#72005)
The base ISA does not have an instruction for this so we need to lower.
Zbb support will come in a future patch.
2023-11-13 10:26:32 -08:00
Felipe de Azevedo Piovezan
83729e6716
[SelectionDAG] Disable FastISel for swiftasync functions (#70741)
Most (x86) swiftasync functions tend to use both SelectionDAGISel and
FastISel lowering:
* FastISel argument lowering can only handle C calling convention.
* FastISel fails mid-BB in a number of ways, including in simple `ret
void` instructions under certain circumstances.

This dance of SelectionDAG (argument) -> FastISel (some instructions) ->
SelectionDAG(remaining instructions) is lossy; in particular, Argument
information lowering is cleared after that first SelectionDAG run.

Since swiftasync functions rely heavily on proper Argument lowering for
debug information, this patch disables the use of FastISel in such
functions.
2023-11-13 08:27:29 -08:00
Yingwei Zheng
d64d5ea102
[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053)
After #71534 and #72052, the transform `zext -> zext nneg` in
`RISCVCodeGenPrepare` is redundant.
2023-11-13 22:45:33 +08:00
David Green
2238363a5f
[AArch64] Prevent v1f16 vselect/setcc type expansion. (#72048)
PR #71614 identified an issue in the lowering of v1f16 vector compares,
where the `v1i1 setcc` is expanded to `v1i16 setcc`, and the `v1i16
setcc` tries to be expanded to a `v2i16 setcc` which fails. For floating
point types we can let them scalarize instead though, generating a
`setcc f16` that can be lowered using normal fp16 lowering.

07a8ff4892b2a54f0bd5843f863bcffa7a258f1f added a special case combine
for v1 vselect to expand the predicate type to the same size as the fcmp
operands. This turns that off for float types, allowing them to
scalarize naturally, which hopefully fixes the issue by preventing the
v1i16 setcc, meaning it wont try to widen to larger vectors.

The codegen might not be optimal, but as far as I can tell everything
generated successfully, providing that no `v1i16 setcc v1f16`
instructions get generated.
2023-11-13 14:42:52 +00:00
Jay Foad
a4196666ac
[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710)
This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.
2023-11-13 13:53:10 +00:00
Simon Pilgrim
1a9fbf6166 [X86] combineLoad - reuse an existing VBROADCAST_LOAD constant for a smaller vector load of the same constant
Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector

Fixes AVX2 case for Issue #70947
2023-11-13 11:59:04 +00:00
Jay Foad
47f29043f0 [AMDGPU] Fix a GlobalISel RUN line
This was added in D149795 without actually enabling GlobalISel.
2023-11-13 11:30:15 +00:00
Nemanja Ivanovic
563720c3be
[RISCV] Fix lowering of negative zero with Zdinx 32-bit (#71869)
The compiler currently abends with an impossible reg-to-reg copy when
producing a negative zero FP immediate on RV32 with -Zdinx. This is
because we emit a negation that uses FP registers. Emit the right node
to produce correct code.
2023-11-13 07:38:14 +01:00
Craig Topper
44e8bea400
[GISel][AArch64] Notify the Observer when CTTZ lowering changes the opcode to CTPOP. (#72008) 2023-11-12 19:36:24 -08:00
Carl Ritson
edc38a6cbd
[AMDGPU] Add option to pre-allocate SGPR spill VGPRs (#70626)
SGPR spill VGPRs are WWM registers so allow them to be allocated by
SIPreAllocateWWMRegs pass.
This intentionally prevents spilling of these VGPRs when enabled.
2023-11-13 12:21:18 +09:00
Carl Ritson
52b247b1d3
[PHIElimination] Handle subranges in LiveInterval updates (#69429)
Add subrange tracking and handling for LiveIntervals during PHI
elimination.
This requires extending MachineBasicBlock::SplitCriticalEdge to also
update subrange intervals.
2023-11-13 12:16:26 +09:00
Han Shen
ca10e3b2e5
[LLVM][NVPTX] Add BF16 vector instruction and fix lowering rules (#69415)
Add support for bf16x2 instructions such as setp, fneg, fabs, etc;
Fix the instructions that were not differentiated between sm_80 and
sm_90 support, such as fpround etc.
Add more bf16 test cases to ensure the correct behavior.

---------

Co-authored-by: shenhan03 <shenhan03@kuaishou.com>
2023-11-12 21:48:31 +08:00
David Green
a31538d29c [AArch64] Add a test showing inefficient register allocation around loop IVs. NFC 2023-11-12 13:11:03 +00:00
Craig Topper
ee95819503 [RISCV][GISel] Legalize G_FSHL/G_FSHR. 2023-11-11 20:23:29 -08:00
Craig Topper
b0e97c7757 [RISCV][GISel] Legalize G_ROTL/G_ROTR. 2023-11-11 20:12:07 -08:00
Craig Topper
7965a21f7a [RISCV] Add more packh patterns. 2023-11-11 19:31:23 -08:00
Craig Topper
bfb7843580 [RISCV] Add packw/packh patterns for -riscv-experimental-rv64-legal-i32 2023-11-11 17:52:22 -08:00
Craig Topper
6b9752cc72 [RISCV] Add rv64zbkb.ll test for -riscv-experimental-rv64-legal-i32. NFC 2023-11-11 17:52:22 -08:00
Craig Topper
fdc904e568 [RISCV] Add isel pattern to turn (or (zext X), Y) into add.uw when X and Y are disjoint.
Improve code for -riscv-experimental-rv64-legal-i32.
2023-11-11 15:51:38 -08:00
Craig Topper
bf0963620c [RISCV] Add (shl (zext GPR:), uimm5:) pattern for -riscv-experimental-rv64-legal-i32. 2023-11-11 15:14:02 -08:00
Craig Topper
994d882e15 [RISCV] Add an slli.uw pattern using zext for -riscv-experimental-rv64-legal-i32
We already had the pattern for GlobalISel. Move it over to SelectionDAG.
2023-11-11 14:41:56 -08:00
Momchil Velikov
e8209b2486
[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443)
After performing sink-and-fold over a COPY, the original instruction is
replaced with one that produces its output in the destination of the
copy. Its value is still available (in a hard register), so if there are
debug instructions which refer to the (now deleted) virtual register
they could be updated to refer to the hard register, in principle.
However, it's not clear how to do that, moreover in some cases the debug
instructions may need to be replicated proportionally to the number of
the COPY instructions replaced and in some extreme cases we can end up
with quadratic increase in the number of debug instructions, e.g:

        int f(int);
    
        void g(int x) {
          int y = x + 1;
    
          int t0 = y;
          f(t0);
    
          int t1 = y;
          f(t1);
        }
2023-11-11 19:43:14 +00:00
David Green
0bd67566f7 [AArch64] Remove AArch64/aarch64-neon-v1i1-setcc.ll test. NFC
These are replicated in llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll with
more tests and updated check lines. Remove the duplicate test.
2023-11-11 18:22:41 +00:00
Craig Topper
bab2cf2d01 [RISCV][GISel] Promote s32 constant shift amounts to s64 on RV64.
This allows us to reuse isel patterns from SelectionDAG.

This is similar to what is done on AArch64.
2023-11-10 23:07:00 -08:00
Craig Topper
647c490f8a [RISCV] Add an add.uw pattern using zext for -riscv-experimental-rv64-legal-i32 and global isel 2023-11-10 21:36:29 -08:00
Craig Topper
7e0bae5b34 [RISCV][GISel] Add isel patterns for SHXADD with s32 type on RV64. 2023-11-10 19:52:57 -08:00
Craig Topper
a93dfb589d [RISCV] Peek through zext in selectShiftMask.
This improves the code for -riscv-experimental-rv64-legal-i32
2023-11-10 19:02:14 -08:00
Craig Topper
83cc24e598 [RISCV] Add test case showing unnecessary zext of shift amounts with -riscv-experimental-rv64-legal-i32. NFC 2023-11-10 19:02:13 -08:00
Shoaib Meenai
c5dd1bbcc3 Revert "Revert "[IR] Mark lshr and ashr constant expressions as undesirable""
This reverts commit 8ee07a4be7f7d8654ecf25e7ce0a680975649544.

The revert is breaking AMDGPU backend tests (which I didn't have
enabled), and I don't want to risk breakages over the weekend, so just
revert for now.
2023-11-10 17:26:14 -08:00
Shoaib Meenai
8ee07a4be7 Revert "[IR] Mark lshr and ashr constant expressions as undesirable"
This reverts commit 82f68a992b9f89036042d57a5f6345cb2925b2c1.

cd7ba9f3d090afb5d3b15b0dcf379d15d1e11e33 needs to be reverted to fix
test failures on builds without assertions, and this one needs to be
reverted first for that.
2023-11-10 17:08:35 -08:00
Craig Topper
ca603343db
[RISCV][GISel] Legalizer and register bank selection for G_JUMP_TABLE and G_BRJT (#71970)
Testing together since they should come paired.

Instruction selection will be a separate PR.
2023-11-10 13:09:24 -08:00
Joseph Huber
a3bd87b100 [AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)
Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are stored in the same
order as
the ones in the INIT_ARRAY section so we need to traverse it in reverse
order.

Relanding after the revert in fe7b5e2cfcf6848287010291081f85fa1f6bb2ef
using the IR builder interface instead of ConstantExpr.
2023-11-10 11:01:02 -06:00
Nikita Popov
fe7b5e2cfc Revert "[AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)"
This reverts commit c1d5865a313d0a8a254b37c852bdd444453c0f73.

Introduces a new use of ConstantExpr::getAShr().
2023-11-10 17:01:06 +01:00
Joseph Huber
c1d5865a31
[AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)
Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are stored in the same
order as
the ones in the INIT_ARRAY section so we need to traverse it in reverse
order.
2023-11-10 09:34:04 -06:00
Joseph Huber
af8ebfdcd9
[NVPTX] Allow the ctor/dtor lowering pass to emit kernels (#71549)
Summary:
This pass emits the new "nvptx$device$init" and "nvptx$device$fini"
kernels that are callable by the device. This intends to mimic the
method of lowering for AMDGPU where we emit `amdgcn.device.init` and
`amdgcn.device.fini` respectively. These kernels simply iterate a symbol
called `__init_array_start/stop` and `__fini_array_start/stop`.
Normally, the linker provides these symbols automatically. In the AMDGPU
case we only need call the kernel and we call the ctors / dtors.
However, for NVPTX we require the user initializes these variables to
the associated globals that we already emit as a part of this pass.

The motivation behind this change is to move away from OpenMP's handling
of ctors / dtors. I would much prefer that the backend / runtime handles
this. That allows us to handle ctors / dtors in a language agnostic way,

This approach requires that the runtime initializes the associated
globals. They are marked `weak` so we can emit this per-TU. The kernel
itself is `weak_odr` as it is copied exactly.

One downside is that any module containing these kernels elicitis the
"stack size cannot be statically determined warning" every time from
`nvlink` which is annoying but inconsequential for functionality. It
would be nice if there were a way to silence this warning however.
2023-11-10 09:33:29 -06:00
Nikita Popov
82f68a992b [IR] Mark lshr and ashr constant expressions as undesirable
These will no longer be created by default during constant folding.
2023-11-10 16:29:13 +01:00
David Green
10ce319320
[AArch64][GlobalISel] Expand handling for sitofp and uitofp (#71282)
Similar to #70635, this expands the handling of integer to fp
conversions. The code is very similar to the float->integer conversions
with types handled oppositely. There are some extra unhandled cases
which require more handling for ASR operations.
2023-11-10 13:41:13 +00:00
Yingwei Zheng
650026897c
[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364)
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.

This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
2023-11-10 21:38:47 +08:00
Valery Pykhtin
87b8d94371
[AMDGPU] Fix GCNUpwardRPTracker. (#71186)
Fixed:

1. Maximum register pressure calculation at the instruction level. 
Previously max RP included both def and use of registers of an
instruction. Now maximum RP includes _uses_ and _early-clobber defs_.

2. Uses were incorrectly tracked and this resulted in a mismatch of
live-in set reported by LiveIntervals and tracked live reg set when the
beginning of the block is reached.

Interface has changed, moveMaxPressure becomes deprecated and
getMaxPressure, resetMaxPressure functions are added. reset function
seem now more consistent.
2023-11-10 13:44:10 +01:00
Serge Pavlov
5b0f703918 Revert "[ARM][FPEnv] Lowering of fpenv intrinsics"
This reverts commit d62f040418bd167d1ddd2b79c640a90c0c2ea353.
Some cuda buildbots start failing.
2023-11-10 16:24:51 +07:00
Serge Pavlov
d62f040418 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-10 16:06:33 +07:00
Diana Picus
20e9e4f797 [AMDGPU] si-wqm: Skip only LiveMask COPY
si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first COPY instruction in the entry block. This is
incorrect for functions where the LiveMask doesn't need to be saved, and
therefore the first COPY is more likely a COPY from a function argument
and might need to be in some non-exact mode.

This patch fixes the issue by also checking that the source of the COPY
is the EXEC register.

This produces different code in 3 of the existing tests:

In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than
outside. This is benign.

In wave32.ll, we end up with an extra register copy. This is because
the first COPY in the block is now part of the WWM block, so
si-pre-allocate-wwm-regs will allocate a new register for its
destination (when it was outside of the WWM region, the register
allocator could just re-use the same register). We might be able to
improve this in si-pre-allocate-wwm-regs but I haven't looked into it.

The same thing happens in dual-source-blend-export.ll, but for that
one it's harder to see because of the scheduling changes. I've uploaded
the before/after si-wqm output for it here:
https://reviews.llvm.org/differential/diff/553445/

Differential Revision: https://reviews.llvm.org/D158841
2023-11-10 09:30:44 +01:00
Matt Arsenault
67c3cb4f6b AMDGPU: Use an explicit triple in test to avoid bot failures 2023-11-10 17:09:55 +09:00