52796 Commits

Author SHA1 Message Date
Craig Topper
6b9752cc72 [RISCV] Add rv64zbkb.ll test for -riscv-experimental-rv64-legal-i32. NFC 2023-11-11 17:52:22 -08:00
Craig Topper
fdc904e568 [RISCV] Add isel pattern to turn (or (zext X), Y) into add.uw when X and Y are disjoint.
Improve code for -riscv-experimental-rv64-legal-i32.
2023-11-11 15:51:38 -08:00
Craig Topper
bf0963620c [RISCV] Add (shl (zext GPR:), uimm5:) pattern for -riscv-experimental-rv64-legal-i32. 2023-11-11 15:14:02 -08:00
Craig Topper
994d882e15 [RISCV] Add an slli.uw pattern using zext for -riscv-experimental-rv64-legal-i32
We already had the pattern for GlobalISel. Move it over to SelectionDAG.
2023-11-11 14:41:56 -08:00
Momchil Velikov
e8209b2486
[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443)
After performing sink-and-fold over a COPY, the original instruction is
replaced with one that produces its output in the destination of the
copy. Its value is still available (in a hard register), so if there are
debug instructions which refer to the (now deleted) virtual register
they could be updated to refer to the hard register, in principle.
However, it's not clear how to do that, moreover in some cases the debug
instructions may need to be replicated proportionally to the number of
the COPY instructions replaced and in some extreme cases we can end up
with quadratic increase in the number of debug instructions, e.g:

        int f(int);
    
        void g(int x) {
          int y = x + 1;
    
          int t0 = y;
          f(t0);
    
          int t1 = y;
          f(t1);
        }
2023-11-11 19:43:14 +00:00
David Green
0bd67566f7 [AArch64] Remove AArch64/aarch64-neon-v1i1-setcc.ll test. NFC
These are replicated in llvm/test/CodeGen/AArch64/arm64-neon-v1i1-setcc.ll with
more tests and updated check lines. Remove the duplicate test.
2023-11-11 18:22:41 +00:00
Craig Topper
bab2cf2d01 [RISCV][GISel] Promote s32 constant shift amounts to s64 on RV64.
This allows us to reuse isel patterns from SelectionDAG.

This is similar to what is done on AArch64.
2023-11-10 23:07:00 -08:00
Craig Topper
647c490f8a [RISCV] Add an add.uw pattern using zext for -riscv-experimental-rv64-legal-i32 and global isel 2023-11-10 21:36:29 -08:00
Craig Topper
7e0bae5b34 [RISCV][GISel] Add isel patterns for SHXADD with s32 type on RV64. 2023-11-10 19:52:57 -08:00
Craig Topper
a93dfb589d [RISCV] Peek through zext in selectShiftMask.
This improves the code for -riscv-experimental-rv64-legal-i32
2023-11-10 19:02:14 -08:00
Craig Topper
83cc24e598 [RISCV] Add test case showing unnecessary zext of shift amounts with -riscv-experimental-rv64-legal-i32. NFC 2023-11-10 19:02:13 -08:00
Shoaib Meenai
c5dd1bbcc3 Revert "Revert "[IR] Mark lshr and ashr constant expressions as undesirable""
This reverts commit 8ee07a4be7f7d8654ecf25e7ce0a680975649544.

The revert is breaking AMDGPU backend tests (which I didn't have
enabled), and I don't want to risk breakages over the weekend, so just
revert for now.
2023-11-10 17:26:14 -08:00
Shoaib Meenai
8ee07a4be7 Revert "[IR] Mark lshr and ashr constant expressions as undesirable"
This reverts commit 82f68a992b9f89036042d57a5f6345cb2925b2c1.

cd7ba9f3d090afb5d3b15b0dcf379d15d1e11e33 needs to be reverted to fix
test failures on builds without assertions, and this one needs to be
reverted first for that.
2023-11-10 17:08:35 -08:00
Craig Topper
ca603343db
[RISCV][GISel] Legalizer and register bank selection for G_JUMP_TABLE and G_BRJT (#71970)
Testing together since they should come paired.

Instruction selection will be a separate PR.
2023-11-10 13:09:24 -08:00
Joseph Huber
a3bd87b100 [AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)
Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are stored in the same
order as
the ones in the INIT_ARRAY section so we need to traverse it in reverse
order.

Relanding after the revert in fe7b5e2cfcf6848287010291081f85fa1f6bb2ef
using the IR builder interface instead of ConstantExpr.
2023-11-10 11:01:02 -06:00
Nikita Popov
fe7b5e2cfc Revert "[AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)"
This reverts commit c1d5865a313d0a8a254b37c852bdd444453c0f73.

Introduces a new use of ConstantExpr::getAShr().
2023-11-10 17:01:06 +01:00
Joseph Huber
c1d5865a31
[AMDGPU] Call the FINI_ARRAY destructors in the correct order (#71815)
Summary:
The AMDGPU backend uses the linker-provided INIT_ARRAY and FINI_ARRAY
sections to call all the global constructors in a single kernel.
Previously this mistakenly used the same iteration logic for both
arrays. The destructors stored in FINI_ARRAY are stored in the same
order as
the ones in the INIT_ARRAY section so we need to traverse it in reverse
order.
2023-11-10 09:34:04 -06:00
Joseph Huber
af8ebfdcd9
[NVPTX] Allow the ctor/dtor lowering pass to emit kernels (#71549)
Summary:
This pass emits the new "nvptx$device$init" and "nvptx$device$fini"
kernels that are callable by the device. This intends to mimic the
method of lowering for AMDGPU where we emit `amdgcn.device.init` and
`amdgcn.device.fini` respectively. These kernels simply iterate a symbol
called `__init_array_start/stop` and `__fini_array_start/stop`.
Normally, the linker provides these symbols automatically. In the AMDGPU
case we only need call the kernel and we call the ctors / dtors.
However, for NVPTX we require the user initializes these variables to
the associated globals that we already emit as a part of this pass.

The motivation behind this change is to move away from OpenMP's handling
of ctors / dtors. I would much prefer that the backend / runtime handles
this. That allows us to handle ctors / dtors in a language agnostic way,

This approach requires that the runtime initializes the associated
globals. They are marked `weak` so we can emit this per-TU. The kernel
itself is `weak_odr` as it is copied exactly.

One downside is that any module containing these kernels elicitis the
"stack size cannot be statically determined warning" every time from
`nvlink` which is annoying but inconsequential for functionality. It
would be nice if there were a way to silence this warning however.
2023-11-10 09:33:29 -06:00
Nikita Popov
82f68a992b [IR] Mark lshr and ashr constant expressions as undesirable
These will no longer be created by default during constant folding.
2023-11-10 16:29:13 +01:00
David Green
10ce319320
[AArch64][GlobalISel] Expand handling for sitofp and uitofp (#71282)
Similar to #70635, this expands the handling of integer to fp
conversions. The code is very similar to the float->integer conversions
with types handled oppositely. There are some extra unhandled cases
which require more handling for ASR operations.
2023-11-10 13:41:13 +00:00
Yingwei Zheng
650026897c
[RISCV][SDAG] Prefer ShortForwardBranch to lower sdiv by pow2 (#67364)
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.

This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
2023-11-10 21:38:47 +08:00
Valery Pykhtin
87b8d94371
[AMDGPU] Fix GCNUpwardRPTracker. (#71186)
Fixed:

1. Maximum register pressure calculation at the instruction level. 
Previously max RP included both def and use of registers of an
instruction. Now maximum RP includes _uses_ and _early-clobber defs_.

2. Uses were incorrectly tracked and this resulted in a mismatch of
live-in set reported by LiveIntervals and tracked live reg set when the
beginning of the block is reached.

Interface has changed, moveMaxPressure becomes deprecated and
getMaxPressure, resetMaxPressure functions are added. reset function
seem now more consistent.
2023-11-10 13:44:10 +01:00
Serge Pavlov
5b0f703918 Revert "[ARM][FPEnv] Lowering of fpenv intrinsics"
This reverts commit d62f040418bd167d1ddd2b79c640a90c0c2ea353.
Some cuda buildbots start failing.
2023-11-10 16:24:51 +07:00
Serge Pavlov
d62f040418 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-10 16:06:33 +07:00
Diana Picus
20e9e4f797 [AMDGPU] si-wqm: Skip only LiveMask COPY
si-wqm sometimes needs to save the LiveMask in the entry block. Later
on, while looking for a place to enter WQM/WWM, it unconditionally
skips over the first COPY instruction in the entry block. This is
incorrect for functions where the LiveMask doesn't need to be saved, and
therefore the first COPY is more likely a COPY from a function argument
and might need to be in some non-exact mode.

This patch fixes the issue by also checking that the source of the COPY
is the EXEC register.

This produces different code in 3 of the existing tests:

In wwm-reserved.ll, a SGPR copy is now inside the WWM area rather than
outside. This is benign.

In wave32.ll, we end up with an extra register copy. This is because
the first COPY in the block is now part of the WWM block, so
si-pre-allocate-wwm-regs will allocate a new register for its
destination (when it was outside of the WWM region, the register
allocator could just re-use the same register). We might be able to
improve this in si-pre-allocate-wwm-regs but I haven't looked into it.

The same thing happens in dual-source-blend-export.ll, but for that
one it's harder to see because of the scheduling changes. I've uploaded
the before/after si-wqm output for it here:
https://reviews.llvm.org/differential/diff/553445/

Differential Revision: https://reviews.llvm.org/D158841
2023-11-10 09:30:44 +01:00
Matt Arsenault
67c3cb4f6b AMDGPU: Use an explicit triple in test to avoid bot failures 2023-11-10 17:09:55 +09:00
Wang Pengcheng
9bb69c1d96
[RISCV] Enable LoopDataPrefetch pass (#66201)
So that we can benefit from data prefetch when `Zicbop` extension is
supported.

Tune information for data prefetching are added in `RISCVTuneInfo`.
2023-11-10 15:39:58 +08:00
Craig Topper
fdbff88196
[RISCV][GISel] Add support for G_FCMP with F and D extensions. (#70624)
We only have instructions for OEQ, OLT, and OLE. We need to convert
other comparison codes into those.

I think we'll likely want to split this up in the future to support
optimizations. Maybe do some of it in the legalizer or in a new post
legalizer lowering pass. So this patch is just enough to get something
working without adding 11 additional patterns to tablegen for each type.
2023-11-09 20:45:35 -08:00
Craig Topper
aae30f9e2c [RISCV] Use Align(8) for the stack temporary created for SPLAT_VECTOR_SPLIT_I64_VL.
The value needs to be read as an 8 byte vector element which requires
the pointer to be 8 byte aligned according to the vector spec.

Fixes #71787
2023-11-09 20:43:22 -08:00
Matt Arsenault
dd57bd0efe Reapply "RegisterCoalescer: Generate test checks"
This reverts commit 9b2439167d9f794e317fecbdbb0a6e96f9ea4b56.

This was an unrelated NFC change to make a test more useful (really it should
have been first, it was supposed to show the test diff).
2023-11-10 10:29:08 +09:00
Craig Topper
247eb13fab [RISCV][GISel] Legalize G_BITREVERSE. 2023-11-09 16:27:21 -08:00
Maurice Heumann
8cbfc0b29d
[X86] Respect blockaddress offsets when performing X86 LEA fixups (#71641)
The X86FixupLEAs pass drops blockaddress offsets, when splitting up slow
3-ops LEAs, as can be seen in this example:

https://godbolt.org/z/bEsc3Poje

Before running the pass, the first instruction in bb.0 is a LEA with
ebp, ebx and a blockaddress.
After the transformation, the blockaddress is missing.

The reason this happens is because the 3-ops LEA is being splitup into a
2-ops LEA + an add instruction.
However, as hasLEAOffset does not take blockaddresses into
consideration, the add is not emitted and thus leading to the offset
being dropped.

Taking blockaddresses into consideration fixes this issue and results in
the add instruction being emitted.

This fixes #71667
2023-11-10 08:12:18 +08:00
stephenpeckham
1d1fede493
[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577)
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
2023-11-09 16:03:45 -06:00
Craig Topper
8b98d5b813 [RISCV][GISel] Enable libcall expansion for G_FCEIL and G_FFLOOR. 2023-11-09 13:14:42 -08:00
Craig Topper
679cc16c99 [RISCV] Disable early promotion for Zbs in performANDCombine with riscv-experimental-rv64-legal-i32
We can match this directly in isel with the i32 type being legal.

The generic DAG combine will unpromote part of the pattern and
prevent it from being matched in isel.
2023-11-09 09:51:31 -08:00
Craig Topper
24577bd089 [RISCV] Add BSET/BCLR/BINV/BEXT patterns for riscv-experimental-rv64-legal-i32. 2023-11-09 09:17:22 -08:00
Juergen Ributzka
6d1d7be133
Obsolete WebKit Calling Convention (#71567)
The WebKit Calling Convention was created specifically for the WebKit
FTL. FTL
doesn't use LLVM anymore and therefore this calling convention is
obsolete.

This commit removes the WebKit CC, its associated tests, and
documentation.
2023-11-09 09:08:41 -08:00
chuongg3
451bc3ec1d
[AArch64][GlobalISel] Legalize G_VECREDUCE_{MIN/MAX} (#69461)
Legalizes G_VECREDUCE_{MIN/MAX} and selects instructions for
vecreduce_{min/max}
2023-11-09 16:29:14 +00:00
Philip Reames
7ac8486e54
[RISCVInsertVSETVLI] Allow PRE with non-immediate AVLs (#71728)
Extend our PRE logic to cover non-immediate AVL values. This covers
large constant AVLs (which must be materialized in registers), and may
help some code written explicitly with intrinsics.

Looking at the existing code, I can't entirely figure out why I thought
we needed VL == AVL to perform the PRE. My best guess is that I was
worried about the VLMAX < VL < 2 * VLMAX case, but the spec explicitly
says that vsetvli must be determinist on any particular AVL value.

That case was, possibly by accident, covering another legality
precondition. Specifically, by only returning true for immediate and
VLMAX AVL values, we didn't encounter the case where the AVL was a
register and that register wasn't available in the predecessor (e.g. if
AVL is a load in the MBB block itself).

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2023-11-09 08:03:13 -08:00
Shengchen Kan
c9017bc793
[X86] Support EGPR (R16-R31) for APX (#70958)
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR

RFC:

https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

Explanations for some seemingly unrelated changes:

inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
  class in the enum generated by tablegen. This encoding will change
  any time a new register class is added. Since the number is part
  of the input, this means it can become stale.

seh-directive-errors.s:
   R16-R31 makes ".seh_pushreg 17" legal

musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.

This PR is to reland #67702 after #70222 in order to reduce some
compile-time regression when EGPR is not used.
2023-11-09 23:39:40 +08:00
Igor Kirillov
59a063d5c6
[ExpandMemCmp] Improve memcmp optimisation for boolean results (#71221)
This patch enhances the optimization of memcmp calls when only two
outcomes
are needed and comparison fits into one block, for example:

	bool result = memcmp(a, b, 6) > 0;

Previously, LLVM would generate unnecessary operations even when the
user of
memcmp was only interested in a binary outcome.
2023-11-09 11:52:04 +00:00
Craig Topper
e3c120a585 [RISCV] Add a Zbb+Zbs command line to rv*zbs.ll to get coverage on an existing isel pattern. NFC
This pattern wasn't tested

def : Pat<(XLenVT (and (rotl -2, (XLenVT GPR:$rs2)), GPR:$rs1)),
          (BCLR GPR:$rs1, GPR:$rs2)>;1
2023-11-08 22:31:49 -08:00
Jianjian Guan
d36eb79ccc
[RISCV] Support Strict FP arithmetic Op when only have Zvfhmin (#68867)
Include: STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV,
STRICT_FSQRT and STRICT_FMA.
2023-11-09 09:55:48 +08:00
Jun Wang
54470176af
[AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Simon Pilgrim
671d10ad39 [X86] Add fabs test coverage for Issue #70947 2023-11-08 16:20:34 +00:00
Simon Pilgrim
45f1db4855 [X86] vec_fabs.ll - add AVX2 test coverage 2023-11-08 16:20:34 +00:00
Dinar Temirbulatov
3f9d385e58
[AArch64][SME] Shuffle lowering, assume that the minimal SVE register is 128-bit, when NOEN is not available. (#71647)
We can assume that the minimal SVE register is 128-bit, when NEON is not
available. And we can lower the shuffle shuffle operation with one
operand to TBL1 SVE instruction.
2023-11-08 14:37:49 +00:00
alexfh
067632e141
Revert "[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use rotate or to getter constants." due to a miscompile (#71598)
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
2023-11-08 15:07:12 +01:00
Simon Pilgrim
33ecd93596 [X86] Add test coverage for ABDS/ABDU patterns with mismatching extension types 2023-11-08 10:33:18 +00:00
Jay Foad
d5f3b3b3b1
[RegScavenger] Simplify state tracking for backwards scavenging (#71202)
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.

This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.

There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
2023-11-08 09:49:07 +00:00