52796 Commits

Author SHA1 Message Date
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Momchil Velikov
a9d0ab2ee5 Re-apply "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
This re-applies commit ace20e24287b, which was reverted in eff4ef25b3dc.

The issues were fixed in:

  * b30765caf874 [AArch64] Fix an incorrect handling of debug values in
    MachineSink (#68107)

  * b454b04d6869 [AArch64] Fix a compiler crash in MachineSink (#67705)
2023-10-06 09:34:42 +01:00
Diana Picus
2e1718adc8 Reland "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882)"
Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or
INSERT_SUBREG where the result is an SGPR, but some of the inputs are
constants materialized into VGPRs. This may happen in cases where for
instance several instructions use an immediate zero and SelectionDAG
chooses to put it in a VGPR to satisfy all of them. This however causes
the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may
lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into
an SGPR fixes the issue.

This was originally reverted because it triggered an unrelated bug in
PEI on one of the OpenMP buildbots. That bug has been fixed in #68299,
so it should be ok to try again.
2023-10-06 10:03:50 +02:00
Diana
be382de059
[AMDGPU] Use correct operand order for shifts (#68299)
In a special case in frame index elimination (when the offset is 0), we
generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code.
However, they don't expect their operands in the same order - S_LSHR_B32
takes the value to be shifted first and then the shift amount, whereas
V_LSHRREV_B32 has the operands reversed (hence the REV in its name).
Update the code & tests to take this into account. Also remove an
outdated comment (this code is definitely reachable now that non-entry
functions no longer have a fixed emergency scavenge slot).
2023-10-06 09:43:04 +02:00
Matt Arsenault
5082e827c1 AMDGPU/GlobalISel: Add test for packed sub selection
Mirror of the add test, I've had this lying around for a long
time.
2023-10-05 10:07:57 -07:00
Matt Arsenault
b5ebf07499 AMDGPU/GlobalISel: Add global-isel run lines to shrink add/sub test 2023-10-05 10:07:57 -07:00
Matt Harding
bd7ca98b66
Ensure NoTrapAfterNoreturn is false for the wasm backend (#65876)
In the WebAssembly back end, the TrapUnreachable option is currently
load-bearing for correctness, inserting wasm `unreachable` instructions
where needed to create valid wasm. There is another option,
NoTrapAfterNoreturn, that removes some of those traps and causes
incorrect wasm to be emitted.

This turns off `NoTrapAfterNoreturn` for the Wasm backend and adds new   
tests.
2023-10-05 09:17:45 -07:00
Matt Arsenault
2ca30eb8fd
AMDGPU/GlobalISel: Handle mubuf load/store for more types (#68268)
Fixes MUBUF path for most vectors and pointers, which unblocks fixing
the gfx6/7 run lines in assorted tests. Also fixes inconsistent behavior
for -flat-for-global.
2023-10-05 05:36:16 -07:00
Ivan Kosarev
f04aa1f814
[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002) 2023-10-05 14:22:29 +03:00
Kirill Stoimenov
0a776996af Revert "[DAG] Attempt shl narrowing in SimplifyDemandedBits"
This reverts commit 7a8c04ef84ecdab4390b451d4c2fe17bc45a7b63.
2023-10-04 22:15:41 +00:00
Jeffrey Byrnes
7794e16b49 [AMDGPU]: Allow combining into v_dot4
Differential Revision: https://reviews.llvm.org/D155995

Change-Id: Id15d232629a32a3549b13d47bf84d7a61b28b928
2023-10-04 13:31:36 -07:00
Luke Lau
3b0b84fd00
[RISCV] Fix illegal build_vector when lowering double id buildvec on RV32 (#67017)
When lowering a constant build_vector sequence of doubles on RV32, if
the
addend wasn't zero, or the step/denominator wasn't one, it would crash
trying
to emit an illegal build_vector of <n x i64> with i32 operands, e.g:

t15: v2i64 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>

This patch fixes this by lowering the splats with
SelectionDAG::getConstant
with the vector type, which handles making it legal via
splat_vector_parts.
2023-10-04 21:28:44 +01:00
Nitin John Raj
da9f9082ea
[RISCV][GlobalISel] Legalize G_FRAME_INDEX (#67746)
G_FRAME_INDEX is legal for pointers.
2023-10-04 13:27:23 -07:00
Alex Richardson
e86d6a43f0 Regenerate test checks for tests affected by D141060 2023-10-04 10:51:35 -07:00
Alex Richardson
83c4227ab7 Auto-generate test checks for tests affected by D141060
These files had manual CHECK lines which make the diff from D141060
very difficult to review.
2023-10-04 10:51:35 -07:00
Philip Reames
45a334d31c [RISCV] Generaize reduction tree matching to all integer reductions (#68014) (reapply)
This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch.  Fixed in 199cbec987ee68d70611db8e7961b43c3dbad83e.  Original commit message follows.

This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.

A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.
2023-10-04 10:41:29 -07:00
Philip Reames
199cbec987 [RISCV] Don't try to form VECREDUCE without vector instructions
This fixes a bug in f0505c which wasn't noticed until 7a0b9da had landed.  This triggered a revert of 7a0b9da, which will be reapplied after this fix.
2023-10-04 10:29:27 -07:00
Anatoly Trosinenko
d32cce5b75
[AArch64][PAC] Specify Defs and Uses of PAUTH_(PROLOGUE|EPILOGUE)
This is a follow-up to eb02ee44d32531931af5312cd450779011664eef.
2023-10-04 18:20:52 +03:00
Evgenii Kudriashov
0dcc65359b
[X86] Add combine tests for pointers of mixed sizes (NFC) (#68219)
Precommit for #67168 to solve #66873
2023-10-04 16:31:24 +02:00
Alex Bradbury
824251c9b3 Revert "[RISCV] Generaize reduction tree matching to all integer reductions (#68014)"
This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and
63bbc250440141b1c51593904fba9bdaa6724280.

I'm seeing issues (e.g. on the GCC torture suite) where
combineBinOpOfExtractToReduceTree is called when the V extensions aren't
enabled and triggers a crash due to RISCVSubtarget::getElen asserting.

I'll aim to follow up with a minimal reproducer. Although it's pretty
obvious how to avoid this crash with some extra gating, there are a few
options as to where that should be inserted so I think it's best to
revert and agree the appropriate fix separately.
2023-10-04 12:51:01 +01:00
Ivan Kosarev
cf80defae2
[AMDGPU][GFX11] Do not rewrite V_FMA/FMAC_* to V_FMAAK_F16_t16 on operand legalization. (#66202)
V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original
instructions would have VGPR_32 operands. Switching the opcodes without
updating operands' register classes leads to MachineVerifier complaining
about the classes not matching instruction definitions. The problem only
reveals itself of builds with expensive checks enabled because of
missing -verify-machineinstrs in the test.

This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for
GFX11, following the second attempt in a1e38e0b8e3e, partially reverted
in eaf737a4e004.
2023-10-04 12:41:46 +01:00
Simon Pilgrim
7a8c04ef84 [DAG] Attempt shl narrowing in SimplifyDemandedBits
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.

Followup to D146121

Differential Revision: https://reviews.llvm.org/D155472
2023-10-04 10:23:02 +01:00
Momchil Velikov
b30765caf8
[AArch64] Fix an incorrect handling of debug values in MachineSink (#68107) 2023-10-04 10:11:47 +01:00
David Green
20fc2ffb15 [AArch64][GlobalISel] Handle fp constant splats
This changes the DUP(constant) -> MOVI code to handle either integer or fp
types, allowing more constant to be selected, and fixes up some cases where fp
constants were being incorrectly selected.
2023-10-04 08:50:21 +01:00
Rahman Lavaee
61785ffcfc
Do not remove empty basic blocks which have address taken. (#67740)
This PR replaces `isMachineBlockAddressTaken` by `hasAddressTaken` to
include blocks which have their IR address taken as well. These blocks
are also not removable since their predecessors' terminators do not
directly point to the block.
2023-10-03 16:09:31 -07:00
Yuta Saito
da0ca5dee4
[WebAssembly] Define local sp if llvm.stacksave is used (#68133)
Usually `llvm.stacksave/stackrestore` are used together with `alloca`
but they can appear without it (e.g. `alloca` can be optimized away).
WebAssembly's function local physical user sp register, which is
referenced by `llvm.stacksave` is created while frame lowering and
replaced with virtual register.
However the sp register was not created when `llvm.stacksave` is used
without `alloca`, and it led MIR verification failure about
use-before-def of sp virtual register.

Resolves https://github.com/llvm/llvm-project/issues/62235
2023-10-03 14:51:35 -07:00
Alex Bradbury
eae1e28cc2
[RISCV] Mark the Zfa extension as non-experimental (#68113)
Following the version bump in #67964 and the bug fix in #68026 I believe
we're ready to mark Zfa as non-experimental. I'll note the GCC torture
suite passes now with Zfa enabled (though it's more of a litmus test
than anything else).
2023-10-03 18:16:13 +01:00
Alex Bradbury
18c3c46858
[RISCV] Update Zfa extension version to 1.0 (#67964)
The Zfa specification was recently ratified
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>. This
commit bumps the version to 1.0, but leaves it as an experimental
extension (to be done in a follow-on patch), so reviews can focus on
confirming there haven't been spec changes we have missed (which as
noted below, is more difficult than usual).

Because the development of the Zfa spec overlapped with the transition
of riscv-isa-manual from LaTeX to AsciiDoc, it's more difficult than
usual to confirm version changes. The linked PDF in RISCVUsage is for
some reason a 404. Key commit histories to review are:
* Changes to zfa.adoc on the main branch
<https://github.com/riscv/riscv-isa-manual/commits/main/src/zfa.adoc>
* Changes to zfa.tex on the now defunct latex branch
<https://github.com/riscv/riscv-isa-manual/commits/latex/src/zfa.tex>

From reviewing these, I believe there have been no changes to the spec
since version 0.1/0.2 (sadly the AsciiDoc and LaTeX versions of the spec
are inconsistent about version numbering).
2023-10-03 17:54:29 +01:00
Luke Lau
169c20584d
[RISCV] Relax some Zvbb patterns and lowerings to Zvkb (#68115)
vandn, vrev8 and vro{l,r} are now part of Zvkb, which Zvbb now implies.
This
patch updates the predicates to check for Zvkb instead of Zvbb in the
tablegen
patterns for the SD and VL nodes, as well as some of the lowering logic
in
RISCVISelLowering.
2023-10-03 17:42:40 +01:00
Simon Pilgrim
4c37372dae [X86] promoteExtBeforeAdd - determine if an addition is implicitly NSW/NUW
Pulled out of D155472
2023-10-03 17:32:28 +01:00
Simon Pilgrim
d97f49b7e0 [X86] Add pointer mask test coverage for implicit NSW/NUW adds
promoteExtBeforeAdd currently relies on a NSW/NUW flag, which have been lost by previous folds.
2023-10-03 17:32:28 +01:00
Kai Nacke
42de2b7e99
[SystemZ/z/OS] Add library names for intrinsics (#68114)
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
2023-10-03 18:53:52 +03:00
Dinar Temirbulatov
8232ab76d0 [AArch64][SVE][SVE2] Enable tbl, tbl2 for shuffle lowering for fixed vector types.
This change enablse some of shuffle lowering with TBL instruction with SVE and
SVE2 for indexing for one register and TBL version for SVE2 while indexing to
both registers.

Differential Revision: https://reviews.llvm.org/D152205
2023-10-03 15:19:00 +00:00
Philip Reames
7a0b9daac9
[RISCV] Generaize reduction tree matching to all integer reductions (#68014)
This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.

A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.
2023-10-03 07:34:39 -07:00
Mikhail Gudim
9b5120050f
[RISCV] A test demonstrating missed opportunity to combine addi into (#67022)
load / store offset.

The patch to address this will be in a separate PR.

A possible fix: https://github.com/llvm/llvm-project/pull/67024/files
2023-10-03 10:18:31 -04:00
Simon Pilgrim
77c43e1489 [X86][FastISel] X86SelectIntToFP - don't assume value type is simple.
Fixes #68068
2023-10-03 11:05:14 +01:00
Martin Storsjö
6ae36c0127
[AArch64] Disable loop alignment for Windows targets (#67894)
This should fix #66912. When emitting SEH unwind info, we need to be
able to calculate the exact length of functions before alignments are
fixed. Until that limitation is overcome, just disable all loop
alignment on Windows targets.
2023-10-02 23:55:23 +03:00
Alex Bradbury
0152e1f2d5
[RISCV] Fix incorrect codegen for Zfa with negated forms of constants in the lookup table (#68026)
The logic in `RISCVLoadFPImm::getLoadFPImm` recognises that the only
supported negative value is -1.0, but due to a typo returns `false`
otherwise (entry 0, which is -1.0) rather than returning -1 (indicating
no match found).
2023-10-02 21:20:38 +01:00
Craig Topper
3c0990c188
[RISCV] Generalize the (ADD (SLLI X, 32), X) special case in constant materialization. (#66931)
We don't have to limit ourselves to a shift amount of 32. We can support
other shift amounts that make the upper 32 bits line up.
2023-10-02 13:03:06 -07:00
Alex Bradbury
451255b207 [RISCV][test] Extend test coverage for Zfa's fli instructions to cover miscompile
There's a miscompile currently for negative numbers (other than -1) that
are the negated form of numbers in the fli lookup table. This adds tests
that capture the issue, with a fix to follow in a separate commit/PR.
2023-10-02 20:48:30 +01:00
Matt Arsenault
f79379398d Revert "CodeGen: Disable isCopyInstrImpl if there are implicit operands"
This reverts commit bc7d88faf1a595ab59952a2054418cdd0d9eeee8.

This is broken with 414ff812d6241b728754ce562081419e7fc091eb reverted.
2023-10-02 22:43:24 +03:00
Matt Arsenault
d4fb503f83 CodeGen: Add regressions from subreg_to_reg implicit-defs
These catch assertions hit after 414ff812d6241b728754ce562081419e7fc091eb
2023-10-02 22:38:31 +03:00
Kirill Stoimenov
e0f86ca200 Revert "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit 414ff812d6241b728754ce562081419e7fc091eb.
2023-10-02 18:19:52 +00:00
Simon Pilgrim
b4f591363c [DAG] visitSHL - move SimplifyDemandedBits after all standard folds to give them a chance to match
Pulled out of D155472
2023-10-02 16:09:35 +01:00
Felipe de Azevedo Piovezan
a41ce98064
[FastISel][DebugInfo] Handle dbg.value targeting allocas (#67187)
FastISel currently drops dbg.values targeting allocas. It may seem
surprising that a simple case would fail to be lowered, but dbg.values
targeting allocas are not common; we usually have dbg.declares doing
that, and those are handled by the common code between FastISel and
SelectionDAGISel.

This patch addresses the issue by querying the static alloca map from
FuncInfo. If we have a frame index for it, we create a DBG_VALUE
intrinsic from it.
2023-10-02 07:10:11 -07:00
Ben Shi
5db0a450be
[AVR] Fix a crash in AVRInstrInfo::insertIndirectBranch (#67324)
Fixes https://github.com/llvm/llvm-project/issues/67042
2023-10-02 21:14:22 +08:00
Matt Arsenault
bc7d88faf1 CodeGen: Disable isCopyInstrImpl if there are implicit operands
This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.

The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.

Hopefully fixes #64060 as well.

This needs to be applied to the release branch.

https://reviews.llvm.org/D156346
2023-10-02 15:16:40 +03:00
Simon Pilgrim
2984e3529b [X86] matchIndexRecursively - fold zext(addlike(shl_nuw(x,c1),c2) patterns into LEA
Pulled out of D155472 - handle zeroextended scaled address indices
2023-10-02 12:38:25 +01:00
Simon Pilgrim
2908142089 [X86] Add test coverage for zext(or(shl_nuw(x,c1),c2)) pointer math
Additional test coverage for D155472
2023-10-02 12:38:25 +01:00
JP Lehr
e816c89c84 Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills"
This reverts commit d8127b2ba8a87a610851b9a462f2fc2526c36e37.
2023-10-02 06:26:33 -05:00