52796 Commits

Author SHA1 Message Date
David Green
b94913b8ad [AArch64] Vector insert zero upper tests. NFC 2024-02-26 22:15:36 +00:00
Noah Goldstein
15a7de697a [SelectionDAG] Support sign tracking through {S|U}INT_TO_FP
Just a minimal amount of easily provable tracking.

Proofs: https://alive2.llvm.org/ce/z/RQYbdw

Closes #82808

Alive2 to has an issue with `(sitofp i1)`, but it can
be verified by hand: https://godbolt.org/z/qKr7hT7s9
2024-02-26 15:35:38 -06:00
Jeffrey Byrnes
113052b2b0 [AMDGPU] Prefer lower total register usage in regions with spilling
Change-Id: Ia5c434b0945bdcbc357c5e06c3164118fc91df25
2024-02-26 12:19:52 -08:00
Craig Topper
f1bb88bee2
[RISCV] Use PromoteSetCCOperands to promote operands for UMAX/UMIN during type legalization. (#82716)
For RISC-V, we were always choosing to sign extend when promoting
i32->i64. If the promoted inputs happen to be zero extended already, we
should use zero extend instead. This is what we do for SETCC.
2024-02-26 10:31:58 -08:00
Owen Anderson
ebb64d8370
[GlobalISel] Make the Combiner insert G_FREEZE when converting G_SELECT to binary operations. (#82733)
This is needed because the binary operators (G_OR and G_AND) do
not have the poison-suppressing semantics of G_SELECT.

Fixes https://github.com/llvm/llvm-project/issues/72475
2024-02-26 10:50:37 -05:00
Petar Avramovic
433f8e741e
MachineSSAUpdater: use all vreg attributes instead of reg class only (#78431)
When initializing MachineSSAUpdater save all attributes of current
virtual register and create new virtual registers with same attributes.
Now new virtual registers have same both register class or bank and LLT.
Previously new virtual registers had same register class but LLT was not
set (LLT was set to default/empty LLT).
Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers
created by MachineSSAUpdater need to have both register class and LLT.

patch 4 from: https://github.com/llvm/llvm-project/pull/73337
2024-02-26 13:46:13 +01:00
ostannard
749384c08e
[ARM] Update IsRestored for LR based on all returns (#82745)
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.

The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.

Fixes #80287.
2024-02-26 12:23:25 +00:00
Oliver Stannard
8779cf68e8 Pre-commit test showing bug #80287
This test shows the bug where LR is used as a general-purpose register
on a code path where it is not spilled to the stack.
2024-02-26 12:21:13 +00:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Luke Lau
3d084e37ab [RISCV] Add tests for fixed length concat_vector. NFC
These shufflevector chains will get combined into a n-ary concat_vectors node.
2024-02-26 20:03:25 +08:00
Yeting Kuo
e510fc7753
[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627)
RISC-V implements vector lrint/llrint by vfcvt.x.f.v.
2024-02-26 16:37:41 +08:00
hev
8be39b3901
[LoongArch] Improve pattern matching for AddLike predicate (#82767)
This commit updates the pattern matching logic for the `AddLike`
predicate in `LoongArchInstrInfo.td` to use the
`isBaseWithConstantOffset` function provided by `CurDAG`. This
optimization aims to improve the efficiency of pattern matching by
identifying cases where the operation can be represented as a base
address plus a constant offset, which can lead to more efficient code
generation.
2024-02-26 11:13:21 +08:00
Owen Anderson
2c5a68858b
Fix non-splat vector SREM expansion when one of the divisors is a power of two. (#82706)
The expansion previously used, derived from Hacker's Delight,
does not work correctly when the dividend is INT_MIN and the
divisor is a power of two. We now use an alternate derivation
of the A and Q constants specifically for the power-of-two divisor
case to avoid this problem. Credit to Fabian Giesen for the
new derivation.

Fixes https://github.com/llvm/llvm-project/issues/77169
2024-02-25 10:13:05 -05:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Thorsten Schütt
12d29cd171 test overflow intrinsics 2024-02-25 11:37:43 +01:00
Serge Pavlov
00c0638b56
[AArch64] Intrinsics aarch64_{get,set}_fpsr (#81867)
Two new intrinsics are introduced to read/write FPSR. They are similar
to the existing intrinsics aarch64_{get,set}_fpcr.
2024-02-24 20:25:21 +07:00
yingopq
96abee5eef
[Mips] Fix unable to handle inline assembly ends with compat-branch o… (#77291)
…n MIPS

Modify:
Add a global variable 'CurForbiddenSlotAttr' to save current
instruction's forbidden slot and whether set reorder. This is the
judgment condition for whether to add nop. We would add a couple of
'.set noreorder' and '.set reorder' to wrap the current instruction and
the next instruction.
Then we can get previous instruction`s forbidden slot attribute and
whether set reorder by 'CurForbiddenSlotAttr'.
If previous instruction has forbidden slot and .set reorder is active
and current instruction is CTI. Then emit a NOP after it.

Fix https://github.com/llvm/llvm-project/issues/61045.

Because https://reviews.llvm.org/D158589 was 'Needs Review' state, not
ending, so we commit pull request again.
2024-02-24 15:13:43 +08:00
Jeffrey Byrnes
8f2bd8ae68
[AMDGPU] Introduce iglp_opt(2): Generalized exp/mfma interleaving for select kernels (#81342)
This implements the basic pipelining structure of exp/mfma interleaving
for better extensibility. While it does have improved extensibility,
there are controls which only enable it for DAGs with certain
characteristics (matching the DAGs it has been designed against).
2024-02-23 17:13:20 -08:00
Visoiu Mistrih Francis
775bd60363
[RISCV] Add scheduling info for Zcmp (#82719)
The order of the entries in the list is:

outs, ins, Defs, Uses, implicit-defs, implicit uses, where the last two
are added programatically during codegen depending on the registers
saved/restored and are not described in the TD files.
2024-02-23 15:44:57 -08:00
Kevin P. Neal
3e9e5e2771 [FPEnv][SystemZ] Correct strictfp test.
Correct llvm-reduce strictfp test to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test needed the strictfp attribute added to function definitions.

Test changes verified with D146845.
2024-02-23 13:00:38 -05:00
Lukacma
08cb1a62f6
[AArch64][SVE] Add intrinsincs to assembly mapping for svpmov (#81861)
This patch enables translation of svpmov intrinsic to the correct
assembly instruction, instead of function call.
2024-02-23 15:40:44 +00:00
hev
c747b24262
[NFC] Precommit a memcpy test for isOrEquivalentToAdd (#82758) 2024-02-23 21:43:53 +08:00
Evgenii Kudriashov
790bcecce6
[GlobalISel] Fix a check that aligned tail call is lowered (#82016)
Despite of a valid tail call opportunity, backends still may not
generate a tail call or such lowering is not implemented yet.

Check that lowering has happened instead of its possibility when
generating G_ASSERT_ALIGN.
2024-02-23 12:11:50 +01:00
Yeting Kuo
850dde063b
[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370)
This patch also pick the MatchContext framework from DAGCombiner to an
indiviual header file to make the framework be used from other files in
llvm/lib/CodeGen/SelectionDAG/.
2024-02-23 14:17:15 +08:00
Heejin Ahn
6e6bf9f817
[WebAssembly] Disable multivalue emission temporarily (#82714)
We plan to enable multivalue in the features section soon (#80923) for
other reasons, such as the feature having been standardized for many
years and other features being developed (e.g. EH) depending on it. This
is separate from enabling Clang experimental multivalue ABI (`-Xclang
-target-abi -Xclang experimental-mv`), but it turned out we generate
some multivalue code in the backend as well if it is enabled in the
features section.

Given that our backend multivalue generation still has not been much
used nor tested, and enabling the feature in the features section can be
a separate decision from how much multialue (including none) we decide
to generate for now, I'd like to temporarily disable the actual
generation of multivalue in our backend. To do that, this adds an
internal flag `-wasm-emit-multivalue` that defaults to false. All our
existing multivalue tests can use this to test multivalue code. This
flag can be removed later when we are confident the multivalue
generation is well tested.
2024-02-22 19:17:15 -08:00
Alex MacLean
590c968e79
[NVPTX] fixup support for unaligned parameters and returns (#82562)
Add support for unaligned parameters and return values. These must be
loaded and stored one byte at a time and then bit manipulation is used
to assemble the correct final result.
2024-02-22 17:27:28 -08:00
Philip Reames
ac518c7c99
[RISCV] Vector sub (zext, zext) -> sext (sub (zext, zext)) (#82455)
This is legal as long as the inner zext retains at least one bit of
increase so that the sub overflow case (0 - UINT_MAX) can be
represented. Alive2 proof: https://alive2.llvm.org/ce/z/BKeV3W

For RVV, restrict this to power of two sizes with the operation type
being at least e8 to stick to legal extends. We could arguably handle i1
source types with some care if we wanted to.

This is likely profitable because it may allow us to perform the sub
instruction in a narrow LMUL (equivalently, in fewer DLEN-sized pieces)
before widening for the user. We could arguably avoid narrowing below
DLEN, but the transform should at worst introduce one extra extend and
one extra vsetvli toggle if the source could previously be handled via
loads explicit w/EEW.
2024-02-22 16:17:48 -08:00
Sumanth Gundapaneni
aaf2d078b6
[Hexagon] Clean up redundant transfer instructions. (#82663)
This patch adds a Hexagon specific backend pass that cleans up redundant
transfers after register allocation.
2024-02-22 17:31:37 -06:00
Nashe Mncube
744c0057e7
[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr (#81724)
When performing lowering of the fptrunc opcode returning fp16 with the
+nofp flag enabled we could trigger a compiler crash. This is because we
had no custom lowering implemented. This patch 
the case in which we need to promote an fp16 return type
for fptrunc when the +nofp attr is enabled.
2024-02-22 19:15:52 +00:00
yandalur
6599c022be
[HEXAGON] Fix bit boundary for isub_hi in HexagonBitSimplify (#82336)
Use bit boundary of 32 for high subregisters in HexagonBitSimplify. This
fixes the subregister used in an upper half register store.
2024-02-22 11:48:06 -06:00
Craig Topper
5b53fa04db
[RISCV] Enable -riscv-enable-sink-fold by default. (#82026)
AArch64 has had it enabled since late November, so hopefully the main
issues have been resolved.

I see a small reduction in dynamic instruction count on every benchmark
in specint2017. The best improvement was 0.3% so nothing amazing.
2024-02-22 09:07:21 -08:00
Craig Topper
c1716e3fcf
[DAGCombiner][RISCV] CSE zext nneg and sext. (#82597)
If we have a sext and a zext nneg with the same types and operand
we should combine them into the sext. We can't go the other way
because the nneg flag may only be valid in the context of the uses
of the zext nneg.
2024-02-22 09:06:49 -08:00
Craig Topper
c9afd1ad78 [RISCV] Add test case showing missed opportunity to form sextload when sext and zext nneg are both present. NFC 2024-02-22 08:38:42 -08:00
Yingwei Zheng
0107c8824b
[RISCV][SDAG] Improve codegen of select with constants if zicond is available (#82456)
This patch uses `add + czero.eqz/nez` to lower select with constants if
zicond is available.
```
(select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1)
(select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2)
```
The above code sequence is suggested by [RISCV Optimization
Guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html#_avoid_branches_using_conditional_moves).
2024-02-23 00:18:56 +08:00
Pierre van Houtryve
4235e44d4c
[GlobalISel] Constant-fold G_PTR_ADD with different type sizes (#81473)
All other opcodes in the list are constrained to have the same type on
both operands, but not G_PTR_ADD.

Fixes  #81464
2024-02-22 13:15:26 +01:00
Sander de Smalen
1f99a45012 [AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)
This patch removes the `-reverse-csr-restore-seq` option from
AArch64FrameLowering, since this is no longer used.

This patch was reverted because of a crash in PR#79623.
Merging it back as it was fixed in PR#82492.
2024-02-22 12:01:53 +00:00
Billy Laws
f17e415142
[AArch64] Mangle names of all ARM64EC functions with entry thunks (#80996)
This better matches MSVC output in cases where static functions have their addresses taken.
2024-02-22 12:36:18 +01:00
Harald van Dijk
4f12f47550
[AArch64] Switch to soft promoting half types. (#80576)
The traditional promotion is known to generate wrong code.

Like #80440 for ARM, except that far less is affected as on AArch64,
hardware floating point support always includes FP16 support and is
unaffected by these changes. This only affects `-mgeneral-regs-only`
(Clang) / `-mattr=-fp-armv8` (LLVM).

Because this only affects a configuration where no FP support is
available at all, `useFPRegsForHalfType()` has no effect and is not
specified: `f32` was getting legalized as a parameter and return type to
an integer anyway.
2024-02-22 10:45:27 +00:00
Vyacheslav Levytskyy
4a602d9250
Add support for the SPV_INTEL_usm_storage_classes extension (#82247)
Add support for the SPV_INTEL_usm_storage_classes extension:
*
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_usm_storage_classes.asciidoc
2024-02-22 11:05:19 +01:00
Vyacheslav Levytskyy
6cca23a3b9
[SPIRV] Prevent creation of jump tables from switch (#82287)
This PR is to prevent creation of jump tables from switch. The reason is
that SPIR-V doesn't know how to lower jump tables, and a sequence of
commands that IRTranslator generates for switch via jump tables breaks
SPIR-V Backend code generation with complains to G_BRJT. The next
example is the shortest code to break SPIR-V Backend code generation in
this way:

```
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-n8:16:32:64"
target triple = "spir64-unknown-unknown"

define spir_func void @foo(i32 noundef %val) {
entry:
  switch i32 %val, label %sw.epilog [
    i32 0, label %sw.bb
    i32 1, label %sw.bb2
    i32 2, label %sw.bb3
    i32 3, label %sw.bb4
  ]
sw.bb:
  br label %sw.epilog
sw.bb2:
  br label %sw.epilog
sw.bb3:
  br label %sw.epilog
sw.bb4:
  br label %sw.epilog
sw.epilog:
  ret void
}
```

To resolve the issue we set a high lower limit for number of blocks in a
jump table via getMinimumJumpTableEntries() and prevent undesirable (or
rather unsupported at the moment) path of code generation.
2024-02-22 10:30:00 +01:00
Vyacheslav Levytskyy
fddf23c6f4
[SPIRV] Add support for the SPV_KHR_subgroup_rotate extension (#82374)
This PR adds support for the SPV_KHR_subgroup_rotate extension that
enables rotating values across invocations within a subgroup:
*
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_subgroup_rotate.asciidoc
2024-02-22 10:27:59 +01:00
CarolineConcatto
c5253aa136
[AArch64] Restore Z-registers before P-registers (#79623) (#82492)
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.

The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.

This also fix the problem reported in (PR #79623) on Windows with
std::reverse and .base().

[1]https://github.com/llvm/llvm-project/pull/77665
2024-02-22 09:19:48 +00:00
Antonio Frighetto
25e7e8d993 [CGP] Permit tail call optimization on undefined return value
We may freely allow tail call optzs on undef values as well.

Fixes: https://github.com/llvm/llvm-project/issues/82387.
2024-02-22 10:09:15 +01:00
Nick Anderson
8bd327d6fe
[AMDGPU][GlobalISel] Add fdiv / sqrt to rsq combine (#78673)
Fixes #64743
2024-02-22 09:47:36 +01:00
Yeting Kuo
7e97ae35ae
[RISCV] Teach RISCVMakeCompressible handle Zca/Zcf/Zce/Zcd. (#81844)
Make targets which don't have C but have Zca/Zcf/Zce/Zcd benefit from
this pass.
2024-02-22 15:51:19 +08:00
Luke Lau
815644b4dd
[RISCV] Fix mgather -> riscv.masked.strided.load combine not extending indices (#82506)
This fixes the miscompile reported in #82430 by telling
isSimpleVIDSequence to sign extend to XLen instead of the width of the
indices, since the "sequence" of indices generated by a strided load
will be at XLen.

This was the simplest way I could think of getting isSimpleVIDSequence
to treat the indexes as if they were zero extended to XLenVT.

Another way we could do this is by refactoring out the "get constant
integers" part from isSimpleVIDSequence and handle them as APInts so we
can separately zero extend it.

Fixes #82430
2024-02-22 11:50:27 +08:00
Luke Lau
11d115d056 [RISCV] Adjust test case to show wrong stride. NFC
See https://github.com/llvm/llvm-project/pull/82506#discussion_r1498080785
2024-02-22 11:08:45 +08:00
Sumanth Gundapaneni
d62ca8def3
[Hexagon] Optimize post-increment load and stores in loops. (#82418)
This patch optimizes the post-increment instructions so that we can
packetize them together.
v1 = phi(v0, v3')
v2,v3  = post_load v1, 4
v2',v3'= post_load v3, 4

This can be optimized in two ways

v1 = phi(v0, v3')
v2,v3' = post_load v1, 8
v2' = load v1, 4
2024-02-21 19:50:47 -06:00
Sumanth Gundapaneni
4c0fdcdb33
[Hexagon] Generate absolute-set load/store instructions. (#82034)
The optimization finds the loads/stores of a specific form and translate
the first load/store to an absolute-set form there by optimizing out the
transfer and eliminate the constant extenders.
2024-02-21 19:50:29 -06:00
David Majnemer
be36812fb7 [TargetLowering] Be more efficient in fp -> bf16 NaN conversions
We can avoid masking completely as it is OK (and probably preferable) to
bring over some of the existant NaN payload.
2024-02-21 22:47:27 +00:00