57510 Commits

Author SHA1 Message Date
Yeaseen
96c723374a
[llvm] Remove br i1 undef from some llvm/test/CodeGen tests (#128272) 2025-02-23 09:23:33 +00:00
Yingwei Zheng
dbd219aef4
[DAGCombiner][X86] Correctly clean up high bits in combinei64TruncSrlAdd (#128353)
A counterexample for original implementation:
https://alive2.llvm.org/ce/z/7ieYLg
This patch uses zext instead of anyext to fix the original issue.
BTW, we should keep low `64 - shamt` bits instead of `shamt - 32`:
https://alive2.llvm.org/ce/z/ruQP_Z
Some codes are simplified to avoid confusion.
Proof: https://alive2.llvm.org/ce/z/z_jdHD

Closes https://github.com/llvm/llvm-project/issues/128309.
2025-02-23 12:57:45 +08:00
Matt Arsenault
ccad5e7744
AMDGPU: Respect amdgpu-no-agpr in functions and with calls (#128147)
Remove the MIR scan to detect whether AGPRs are used or not,
and the special case for callable functions. This behavior was
confusing, and not overridable. The amdgpu-no-agpr attribute was
intended to avoid this imprecise heuristic for how many AGPRs to
allocate. It was also too confusing to make this interact with
the pending amdgpu-num-agpr replacement for amdgpu-no-agpr.

Also adds an xfail-ish test where the register allocator asserts
after allocation fails which I ran into.

Future work should reintroduce a more refined MIR scan to estimate
AGPR pressure for how to split AGPRs and VGPRs.
2025-02-23 09:00:37 +07:00
Vitaly Buka
50b0669e84
Revert "[X86] combineBROADCAST_LOAD - merge across chains" (#128380)
Reverts llvm/llvm-project#128209

Introduces "AddressSanitizer: use-after-poison".
2025-02-22 16:15:41 -08:00
Craig Topper
bac6e7b651
[RISCV][VLOpt] Put vmclr/vmset back in the RISCVVPseudo table. (#128293)
This allows them to be supported by the VLOptimizer.
2025-02-22 15:30:35 -08:00
Craig Topper
9e8d11d2df
[X86] Check that the type is integer before calling isUnsignedIntSetCC in combineExtSetcc. (#128263)
SETULT can be an unsigned less than integer compare or a unordered less
than FP compare. We need to check the VT to distinguish them.

Fixes on of the issues from #128237.
2025-02-22 10:10:51 -08:00
Simon Pilgrim
e21a1737f3
[X86] combineBROADCAST_LOAD - merge across chains (#128209)
Remove the restriction when reusing wider BROADCAST_LOAD nodes that both nodes couldn't have uses of their load chains - use makeEquivalentMemoryOrdering to merge the chains instead.
2025-02-22 15:59:25 +00:00
Phoebe Wang
fa64a210b8
[X86][FP16] Adding lowerings for FP16 ISD::LRINT and ISD::LLRINT (#127382)
Address comment in #126477
2025-02-22 21:17:26 +08:00
Luke Lau
e23ab73335
[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180)
This is a copy of #126177, since it was automatically and permanently
closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP
intrinsics during the EVL transform.

IIUC we initially did this to avoid `vl` toggles on RISC-V. However we
now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic
optimisations, both in the middle end and DAGCombiner, and we generally
have better patterns in the RISC-V backend for non-VP nodes. Sticking to
regular IR instructions is likely a lot less work than reimplementing
all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get
noticeably better code generation.
2025-02-22 19:38:11 +08:00
Yingwei Zheng
646e4f2eed
[DAGCombiner] visitFREEZE: Early exit when N is deleted (#128161)
`N` may get merged with existing nodes inside the loop. Early exit when
it is deleted to avoid the crash.
Alternative solution: use `DAGNodeDeletedListener` to refresh the value
of N.

Closes https://github.com/llvm/llvm-project/issues/128143.
2025-02-22 12:06:34 +08:00
Yingwei Zheng
3ec83f5774
[X86][DAGCombiner] Fix assertion failure in combinei64TruncSrlAdd (#128194)
Closes https://github.com/llvm/llvm-project/issues/128158.
2025-02-22 12:05:59 +08:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Alex MacLean
79261d4aab
[NVPTX][InferAS] assume alloca instructions are in local AS (#121710) 2025-02-21 14:32:54 -08:00
Brox Chen
61c6e0061c
[AMDGPU][True16][CodeGen] flat/global/scratch load/store pseudo for true16 (#127945)
T16D16 table is implemented in
https://github.com/llvm/llvm-project/pull/127673

this is a follow up patch to add load/store pseudo for:
flat_store 
global_load/global_store
scratch_load/scratch_store

in true16 mode and updated the codegen test file
2025-02-21 17:06:48 -05:00
Brox Chen
c896f7bdaa
[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904)
build_vector pattern in true16 SDAG
2025-02-21 14:02:12 -05:00
Matt Arsenault
0c50054820 Revert "RegAlloc: Fix verifier error after failed allocation (#119690)"
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.

Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
2025-02-22 00:23:21 +07:00
Simon Pilgrim
bd034ab111
[X86] combineX86ShuffleChain - always combine to a new VPERMV node if the root shuffle was a VPERMV node (#128183)
Similar to what we already do for VPERMV3 nodes - if we're trying to create a new unary variable shuffle and we started with a VPERMV node then always create a new one if it reduces the shuffle chain depth
2025-02-21 16:10:46 +00:00
zhijian lin
481e1eba3a
[NFC] add a pre-commit test case for patch #127121 that hoists xxsplitib out of loop (#127701)
This is a pre-commit test case for patch
https://github.com/llvm/llvm-project/pull/127121 that hoists xxsplitib
out of loop
2025-02-21 10:29:52 -05:00
Matt Arsenault
34167f9966
RegAlloc: Fix verifier error after failed allocation (#119690)
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616
2025-02-21 22:11:51 +07:00
Simon Pilgrim
884b79a478
[X86] Relax vbroadcast(vector load X) -> vbroadcast_load(X) to all types (#128039)
There's no need for a AVX1-only 32/64-bit scalar size limit - if the X86ISD::VBROADCAST node type is supported, X86ISD::VBROADCAST_LOAD will be as well.
2025-02-21 12:49:34 +00:00
Akshat Oke
bd16a87d05
[AMDGPU][NewPM] Port SIPostRABundler to NPM (#123717) 2025-02-21 16:05:58 +05:30
João Gouveia
0a913b5e3a
[X86] Fold some (truncate (srl (add X, C1), C2)) patterns to (add (truncate (srl X, C2)), C1') (#126448)
Addresses the poor codegen identified in #123239 and a few extra cases.
This transformation is correct for `eq`
(https://alive2.llvm.org/ce/z/qZhwtT), `ne`
(https://alive2.llvm.org/ce/z/6gsmNz), `ult`
(https://alive2.llvm.org/ce/z/xip_td) and `ugt`
(https://alive2.llvm.org/ce/z/39XQkX).

Fixes #123239
2025-02-21 17:17:09 +08:00
David Green
db9876760f [AArch64][GlobalISel] Add some gisel test coverage for existing select tests. NFC 2025-02-21 09:15:41 +00:00
Sudharsan Veeravalli
6757cf4e6f
[RISCV] [MachineOutliner] Analyze all candidates (#127659)
#117700 made a change from analyzing all the candidates to analyzing
just the first candidate before deciding to either delete or keep all of
them.

Even though the candidates all have the same instructions, the basic
blocks in which they are present are different and we will need to check
each of them before deciding whether to keep or erase them.
Particularly, `isAvailableAcrossAndOutOfSeq` checks to see if the
register (x5 in this case) is available from the end of the MBB to the
beginning of the candidate and not checking this for each candidate led
to incorrect candidates being outlined resulting in correctness issues
in a few downstream benchmarks.

Similarly, deleting all the candidates if the first one is not viable
will result in missed outlining opportunities.
2025-02-21 12:53:13 +05:30
Phoebe Wang
3302bef5b4
[X86] Combine FRINT + FP_TO_SINT to LRINT (#126477)
Based on Craig's suggestion on #126217

Alive2: https://alive2.llvm.org/ce/z/9XNpWt
2025-02-21 14:44:08 +08:00
Matt Arsenault
cc46d00a86
AMDGPU: Form v2f16 minimum3/maximum3 on gfx950 (#128123) 2025-02-21 12:11:51 +07:00
Matt Arsenault
e729dc759d
AMDGPU: Widen f16 minimum/maximum to v2f16 on gfx950 (#128121)
Unfortunately we only have the vector versions of v2f16 minimum3
and maximum. Widen to v2f16 so we can lower as minimum333(x, y, y).
2025-02-21 12:08:49 +07:00
Pravin Jagtap
7c2ebe5dbb
AMDGPU: Restrict src0 to VGPRs only for certain cvt scale opcodes. (#127464)
The Src0 operand width higher that 32-bits of cvt_scale opcodes
operating on FP6/BF6/FP4 need to be restricted to take only VGPRs.
2025-02-21 07:27:25 +05:30
Alex MacLean
f83ef281b5
[NVPTX] Remove redundant addressing mode instrs (#128044)
Remove load and store instructions which do not include an immediate,
and just use the immediate variants in all cases. These variants will be
emitted exactly the same when the immediate offset is 0. Removing the
non-immediate versions allows for the removal of a lot of code and would
make any MachineIR passes simpler.
2025-02-20 14:51:06 -08:00
Philip Reames
43f2968a02
[RISCV] Recognize VLA shift pairs from shuffle masks (#127710)
If we have a shuffle mask which can be represented as two slides + some
conditional masking, we can emit a VLA sequence which is at most
O(2*LMUL). This is essentially a generalization of the existing
isElementRotate, but is staged to only introduce the new match for the
moment. A follow up change will start consolidating code - see the notes
below.

A couple of notes:
1) I'm excluding bit rotates mostly to keep the diffs manageable. 
2) The existing isElementRotate logic is nearly redundant after this
   change.  However, we have some intersection between the bit rotate
   and element rotate matching.  To keep things simple, I left that in
   place for now, and will merge/cleanup in a separate change.
3) The individual asVSlideup and asVSlidedown are closely related, but
the former looks through extracts and the later changes VL. I'm leaving
these in place for now, but hope to common them up a bit as well.
2025-02-20 07:50:49 -08:00
Viktoria Maximova
9ffab5637c
[SPIR-V] Initial implementation of SPV_INTEL_long_composites (#126545)
This change introduces support of `OpTypeStructContinuedINTEL`
instruction.

Specification:

https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_long_composites.html
2025-02-20 16:09:06 +01:00
Simon Pilgrim
a03f064b60 [X86] combineX86ShufflesRecursively - peek through one use bitcasts to find additional (free) extract_subvector nodes 2025-02-20 13:49:49 +00:00
yingopq
0c809ea336
[Mips] Reserve hardware register HWR2 (#127775)
Fix pr https://github.com/llvm/llvm-project/pull/127553.
x86_64 failed to run readcyclecounter.ll when enable expensive_check,
it would error "Using an undefined physical register".
2025-02-20 20:53:30 +08:00
Piotr Fusik
0a8341fdb2
[RISCV] Avoid VMNOT by swapping VMERGE operands for mask extensions (#126751)
Fold:

    (select (not m),  1, 0) -> (select m, 0,  1)
    (select (not m), -1, 0) -> (select m, 0, -1)
2025-02-20 13:53:21 +01:00
David Green
70ed381b16
[GlobalISel][AArch64] Fix fptoi.sat lowering. (#127901)
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
2025-02-20 12:22:11 +00:00
Akshat Oke
9855d761f3
[AMDGPU][NewPM] Port SIOptimizeExecMaskingPreRA to NPM (#125351) 2025-02-20 17:35:56 +05:30
Simon Pilgrim
505d35aad3
[X86] getFauxShuffleMask - relax one use limit for insert_subvector concat splat pattern (#127981)
If we're splatting a subvector using a insert_subvector(insert_subvector(undef,sub,0),sub,c) pattern then permit multiuse of the sub as long as the insert_subvector nodes are the only users.
2025-02-20 12:04:41 +00:00
Simon Pilgrim
92a3192a96 [X86] vector-shuffle-v192.ll - regenerate VPTERNLOG comments 2025-02-20 11:58:45 +00:00
Simon Pilgrim
66cf2a88a4 [X86] sext-vsetcc.ll - regenerate VPTERNLOG comments 2025-02-20 11:58:45 +00:00
Piotr Fusik
9787240912 [RISCV][test] Add tests for extending negated mask 2025-02-20 11:37:03 +01:00
Dmitry Sidorov
55fa2fa348
[SPIR-V] Add SPV_INTEL_bindless_images extension (#127737)
Adds instructions to convert convert unsigned integer handles to images,
samplers and sampled images.

Spec:

https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_bindless_images.asciidoc

---------

Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
2025-02-20 10:27:15 +01:00
Simon Pilgrim
62d77fcb3c
[X86] combineX86ShuffleChain - don't combine to VPERM2W/VPERM2B from just any single variable mask (#127914)
Despite them being more expensive than other variable mask shuffles, we
were combining shuffle chains to VPERM2W/VPERM2B if any shuffle in the
chain was a variable shuffle - including very cheap shuffles like PSHUFB
or AND mask patterns.

This patch adjusts the BWI VPERMV3 threshold - it still always permits
the merge if the chain (of 2 or more shuffles) contains any
X86ISD::VPERMV/VPERMV3 shuffles (including DQ variants), but otherwise
only reduces the depth threshold based off the number of other variable
shuffles we'd fold away.
2025-02-20 09:11:29 +00:00
Diana Picus
611a648327
[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190)
Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to
explicitly preserve the inactive lanes of VGPRs of interest by adding
them as dummy arguments. The code usually looks something like this:

```
define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) {
entry:
  %c = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %c, label %shader, label %tail

shader:
  [...]

tail:
  %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader]
  [...]
  ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call
```

Unfortunately, this kind of phi node will get optimized away and the
backend won't be able to figure out that it's ok to use the active lanes
of `%inactive.vgpr*` inside `shader`.

This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic,
whose result can be used as a PHI operand instead of the poison. This
will be selected to an IMPLICIT_DEF, which the backend can work with.

At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values.
Support for other types can be added later if needed.
2025-02-20 09:25:48 +01:00
Luke Lau
df96b56b9f
[RISCV] Move VMV0 elimination past machine SSA opts (#126850)
This is the follow up to #125026 that keeps mask operands in virtual
register form for as long as possible throughout the backend.

The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer
kicking in.

The invariant that the mask COPY never has a subreg no longer holds
after MachineCSE (it coalesces some copies), so it needed to be relaxed.
2025-02-20 12:41:05 +08:00
Luke Lau
c58011dc65
[RISCV][VLOPT] Peek through copies in checkUsers (#127656)
Currently if a user of an instruction isn't a vector pseudo we bail. For
simple non-subreg virtual COPYs, we can peek through their uses by using
a worklist.

This is extracted from a loop in TSVC2 (s273) that contains a fcmp +
select, which produces a copy that doesn't seem to be coalesced away.
2025-02-20 12:01:06 +08:00
Matt Arsenault
37c341df28 Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)"
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.

This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
2025-02-20 10:19:14 +07:00
Benjamin Maxwell
f178e51747
[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895)
Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380

(`test_modf_ppcf128` is the test case that needed the additional
legalization)
2025-02-20 09:50:16 +07:00
Craig Topper
b0e24d17f2
[RISCV] Use opaque pointers in some tests. NFC (#127906) 2025-02-19 15:16:09 -08:00
David Tellenbach
0fe0968c93
[AArch64][FEAT_CMPBR] Codegen for Armv9.6-a compare-and-branch (#116465)
This patch adds codegen for all Arm9.6-a compare-and-branch
instructions, that operate on full w or x registers. The instruction
variants operating on half-words (cbh) and bytes (cbb) are added in a
subsequent patch.

Since CB doesn't use standard 4-bit Arm condition codes but a reduced
set of conditions, encoded in 3 bits, some conditions are expressed by
modifying operands, namely incrementing or decrementing immediate
operands and swapping register operands. To invert a CB instruction it's
therefore not enough to just modify the condition code which doesn't
play particularly well with how the backend is currently organized. We
therefore introduce a number of pseudos which operate on the standard
4-bit condition codes and lower them late during codegen.
2025-02-19 13:58:20 -08:00
Craig Topper
26e375046d Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store address. (#127151)"
Tests have been re-generated with recent scheduler changes.

Original message:

SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.

One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.

This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.

This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.

This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
2025-02-19 12:11:00 -08:00