52796 Commits

Author SHA1 Message Date
hezuoqiang
f4ba1db5bf [GlobalISel] Fix the error transformation of BRCOND to BCC
Fix https://github.com/llvm/llvm-project/issues/62309

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D150527
2023-07-13 21:20:01 +08:00
Simon Pilgrim
8d598531b3 Revert rGf269877dc30777354be8a512e871aba1b1f9fd7a "[X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements"
This appears to be causing some infinite loops (and is particularly bad when D152928 is applied).
2023-07-13 14:11:54 +01:00
Nikita Popov
7025ac81f0 [X86] Don't elide argument copies for scalarized vectors (PR63475)
When eliding argument copies, the memory layout between a plain
store of the type and the layout of the argument lowering on the
stack must match. For multi-part argument lowerings, this is not
necessarily the case.

The code already tried to prevent this optimization for "scalarized
and extended" vectors, but the check for "extends" was incomplete.
While a scalarized vector of i32s stores i32 values on the stack,
these are stored in 8 byte stack slots (on x86_64), so effectively
have padding.

Rather than trying to add more special cases to handle this (which
is not straightforward), I'm going in the other direction and
exclude scalarized vectors from this optimization entirely. This
seems like a rare case that is not worth the hassle -- the complete
lack of test coverage is not reassuring either.

Fixes https://github.com/llvm/llvm-project/issues/63475.

Differential Revision: https://reviews.llvm.org/D154078
2023-07-13 14:49:48 +02:00
Nikita Popov
f78a06ef11 [X86] Remove out of range extract in test (NFC)
As pointed out in https://reviews.llvm.org/D154078#inline-1500915.
2023-07-13 14:46:29 +02:00
Maurice Heumann
d1fc8f7211 [X86] Prevent infinite loop in SelectionDAG when lowering negations
In certain cases, lowering negations can cause an infinite loop in SelectionDAG on X86.

The following snippet shows that behaviour:
https://godbolt.org/z/5hP45T4hY

What happens is that ADD(XOR(..., -1), 1) is detected as the two's complement and transformed into SUB(0, ...)
However, immediates can not be encoded as the LHS of a SUB on X86.
Therefore it is transformed back into an ADD/XOR pair, which is then again transformed into a SUB and so on.

In that specific case, I still think it is valid to display this as a SUB(0,...) , because it should eventually be lowered as a NEG.
Which seems better than an ADD/XOR pair.

Adding an exception to the X86 specific handling for SUBs with 0 LHS operand fixes this infinite loop.

Differential Revision: https://reviews.llvm.org/D154575
2023-07-13 12:20:44 +01:00
David Green
50378a16d4 [AArch64] Extra tablegen patterns for smaller extracted addl/addw/subl/subw
During lowering, especially of smaller vector types, we can end up with
`add (extract_subvector(zext(x), extract_subvector(zext(y))`, which can
be turned into `extract_subvector(add(zext(y), zext(x)))`, which can use
the addl AArch64 instruction. This adds some tablegen patterns for it,
along with addw where only one operand is an extract/extend and subl/subw.

Differential Revision: https://reviews.llvm.org/D153632
2023-07-13 11:44:17 +01:00
Luke Lau
ed15e9119b [RISCV] Don't fold vmerge into ops if fp exception can be raised
We are already checking for fp exceptions if VL changes, but I believe we
should also be checking for them if the mask changes as well, since that also
affects the set of active elements. From the spec:
> A vector floating-point exception at any active floating-point element sets
> the standard FP exception flags in the fflags register. Inactive elements do
> not set FP exception flags.

Note that we don't change the mask if IsMasked is true, i.e. True is masked
already, since in that case we keep the existing mask.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154980
2023-07-13 11:42:23 +01:00
Luke Lau
becfb4612a [RISCV] Add test for vmerge combine that should be prevented
The fadd in these test cases is constrained and may set fflags differently
depending on the active elements (the nofpexcept flag isn't set on the node).
Therefore to preserve semantics we shouldn't change its mask.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D154979
2023-07-13 11:42:20 +01:00
Jon Chesterfield
9418c40af7 [amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables
These aren't implemented. They could be at moderate implementation
complexity. Raising an error is better than silently miscompiling.

Patching now because the patch at D155125 is a step towards using this metadata
more extensively as part of the lowering path and that will interact badly with
input variables with this annotation.

Lowering user defined variables at specific addresses would drop this error,
put them at the requested position in the frame during this pass, and then
use the same codegen that will be used for the kernel specific struct shortly.

Reviewed By: jmmartinez

Differential Revision: https://reviews.llvm.org/D155132
2023-07-13 11:32:03 +01:00
Simon Pilgrim
451af63551 [X86] Remove combineVectorTruncation and delay general vector trunc to lowering
Stop folding vector truncations to PACKSS/PACKUS patterns prematurely - another step towards Issue #63710. We still prematurely fold to PACKSS/PACKUS if there are sufficient signbits, that will be addressed in a later patch when we remove combineVectorSignBitsTruncation.

This required ReplaceNodeResults to extend handling of sub-128-bit results to SSSE3 (or later) cases, which has allowed us to improve vXi32->vXi16 truncations to use PSHUFB.

I also tweaked LowerTruncateVecPack to recognise widened truncation source operands so the upper elements remain UNDEF (otherwise truncateVectorWithPACK* will constant fold them to allzeros/allones values).
2023-07-13 11:29:21 +01:00
Simon Pilgrim
228442a14c [X86] canonicalizeShuffleMaskWithHorizOp - fold 256-bit permute(hop(x,y)) -> hop(extract(x),extract(x)) iff we only demand the lower elements
Attempt to recognise when we can narrow a 256-bit hop to a lower 128-bit hop by extracting the requested subvectors (and then widening back)
2023-07-13 10:37:08 +01:00
Simon Pilgrim
f269877dc3 [X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements
Help expose undef elements for further shuffle combines

Noticed while trying to improve truncation packss/packus patterns for sub-128-bit results.
2023-07-13 10:37:08 +01:00
eopXD
2c38d63323 [8/8][RISCV] Add rounding mode control variant for vfredosum, vfredusum, vfwredosum, vfwredusum
Depends on D154635

For the cover letter of the patch-set, please checkout D154628.

This is the 8th patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154636
2023-07-13 00:55:10 -07:00
eopXD
5d18d43f26 [7/8][RISCV] Add rounding mode control variant for conversion intrinsics between floating-point and integer
Depends on D154634

For the cover letter of the patch-set, please checkout D154628.

This is the 7th patch of the patch-set. This patch includes change to
vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f
vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154635
2023-07-13 00:54:07 -07:00
eopXD
51b9e33661 [6/8][RISCV] Add rounding mode control variant for vfsqrt, vfrec7
Depends on D154633

For the cover letter of the patch-set, please checkout D154628.

This is the 6th patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154634
2023-07-13 00:51:51 -07:00
eopXD
4085b23609 [5/8][RISCV] Add rounding mode control variant for vfwmacc, vfwnmacc, vfwmsac, vfwnmsac
Depends on D154632

For the cover letter of the patch-set, please checkout D154628.

This is the 5th patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154633
2023-07-13 00:49:59 -07:00
eopXD
e1f224a647 [4/8][RISCV] Add rounding mode control variant for vfmacc, vfnmacc, vfmsac, vfnmsac, vfmadd, vfnmadd, vfmsub, vfnmsub
Depends on D154631

For the cover letter of the patch-set, please checkout D154628.

This is the 4th patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154632
2023-07-13 00:47:27 -07:00
eopXD
1a905e8238 [3/8][RISCV] Add rounding mode control variant for vfmul, vfdiv, vfrdiv, vfwmul
Depends on D154629

For the cover letter of the patch-set, please checkout D154628.

This is the 3rd patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154631
2023-07-13 00:43:54 -07:00
eopXD
00093667b1 [2/8][RISCV] Add rounding mode control variant for vfwadd, vfwsub
Depends on D154628

For the cover letter of the patch-set, please checkout D154628.

This is the 2nd patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154629
2023-07-13 00:42:00 -07:00
eopXD
474e37c113 [1/8][RISCV] Add rounding mode control variant for vfsub, vfrsub
Depends on D152996.

This patch-set aims to add a variant for the RVV floating-point
intrinsics that controls the rounding mode (`frm`). The rounding mode
variant appends `_rm` before the policy suffix to distinguish from
those without them.

Specification PR: riscv-non-isa/rvv-intrinsic-doc#226

This is the 1st patch of the patch-set.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154628
2023-07-13 00:35:36 -07:00
eopXD
76482078cd [RISCV][POC] Model frm control for vfadd
Depends on D152879.

Specification PR: riscv-non-isa/rvv-intrinsic-doc#226

This patch adds variant of `vfadd` that models the rounding mode control.
The added variant has suffix `_rm` appended to differentiate from the
existing ones that does not alternate `frm` and uses whatever is inside.

The value `7` is used to indicate no rounding mode change. Reusing the
semantic from the rounding mode encoding for scalar floating-point
instructions.

Additional data member `HasFRMRoundModeOp` is added so we can append
`_rm` suffix for the fadd variants that models rounding mode control.

Additional data member `IsRVVFixedPoint` is added so we can define
pseudo instructions with rounding mode operand and distinguish the
instructions between fixed-point and floating-point.

Reviewed By: craig.topper, kito-cheng

Differential Revision: https://reviews.llvm.org/D152996
2023-07-13 00:34:00 -07:00
pvanhout
361e9eec51 [AMDGPU] Corrrectly emit AGPR copies in tryFoldPhiAGPR
- Don't create COPY instructions between PHI nodes.
- Don't create V_ACCVGPR_WRITE with operands that aren't AGPR_32

Solves SWDEV-410408

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155080
2023-07-13 08:55:22 +02:00
Caslyn Tonelli
b11559122e Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions"
This reverts commit 647aff28558b6b1379f0892138059b403192512a.

Differential Revision: https://reviews.llvm.org/D155122
2023-07-12 23:29:15 +00:00
Jon Roelofs
56e60bc5bb
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Divverential revision: https://reviews.llvm.org/D155095

This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.
2023-07-12 16:13:27 -07:00
Philip Reames
b5cbd9628e [RISCV] Remove legacy TA/TU pseudo distinction of vmerge and carry-in arithmetic operations [NFC[
his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295.

This is analogous to other patches in the series, but with one key difference - the resulting pseudo does *not* have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are *not* in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines.

Differential Revision: https://reviews.llvm.org/D154645
2023-07-12 15:31:02 -07:00
Noah Goldstein
a4c461c063 [SelectionDAG] Fill in some more cases in isKnownNeverZero
This mostly copies cases that already exist in ValueTracking, although
it skips the more complex ones. Those can be filled in as needed.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149199
2023-07-12 17:17:53 -05:00
Noah Goldstein
24f752ed2e [X86] Add tests for checking isKnownNeverZero; NFC
Differential Revision: https://reviews.llvm.org/D149299
2023-07-12 17:17:53 -05:00
Noah Goldstein
74f0ec5e24 [DAGCombiner] Make it so that udiv can be folded with (select c, NonZero, 1)
This is done by allowing speculation of `udiv` if we can prove the
denominator is non-zero.

https://alive2.llvm.org/ce/z/VNCt_q

Differential Revision: https://reviews.llvm.org/D149198
2023-07-12 17:17:53 -05:00
Noah Goldstein
eccb454177 [X86] Add tests for div/rem %x, (select c, <const>, 1); NFC
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D149197
2023-07-12 17:17:52 -05:00
Mikhail Gudim
17e2df6695 [RISCV] Removed the requirement of XLenVT for performSELECTCombine.
Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D153044
2023-07-12 16:29:09 -04:00
Hiroshi Yamauchi
6e6d2b7859 [AArch64][Windows] Fix the callee-saved registers for swiftcc on Windows/AArch64
Fix a miscompilation crash where swiftself on x20 gets corrupted due
to incorrect save/restore at prologue/epilogue.

Differential Revision: https://reviews.llvm.org/D155001
2023-07-12 13:13:21 -07:00
Jon Roelofs
cdc633e4bc
Revert "TargetLowering: fix an infinite DAG combine in SimplifySETCC"
This reverts commit b76c85b355578d9076c22a86faf4ea8de1745bdf.

It broke the RISCV-enabled bots. Oops.
2023-07-12 12:22:03 -07:00
Jon Roelofs
b76c85b355
TargetLowering: fix an infinite DAG combine in SimplifySETCC
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.

rdar://111847838

Differential revision: https://reviews.llvm.org/D155095
2023-07-12 11:44:15 -07:00
Jingu Kang
33e60484d7 [MachineLICM] Handle Subloops
MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant code can be hoisted to the preheader in general. This patch makes the
pass handle inner loops in general.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-12 16:32:14 +01:00
Craig Topper
1aecb0e000 [RISCV] Clear kill flags when forming FMA instructions in MachineCombiner.
If the operands to the mul have other uses we may be extending their
live range past a kill flag.

Reviewed By: asb, asi-sc

Differential Revision: https://reviews.llvm.org/D155046
2023-07-12 08:03:45 -07:00
Craig Topper
45b172c838 [LegalizeDAG] Prevent LegalizeLoadOps from creating extloads that mix int and fp types.
For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload
is considered legal because the LoadExtActions defaults to Legal
for all entries. Only fp/fp and int/int entries are changed to
Expand fore RISC-V.

This patch detects the FP-ness has changed and won't try to call
isLoadExtLegal.

Alternatively, we could add Expand for int/fp and fp/int, but that
seemed a little silly.

Fixes #63816

Reviewed By: asb, wangpc

Differential Revision: https://reviews.llvm.org/D155040
2023-07-12 08:03:35 -07:00
Jay Foad
58d1eaa3b6 [CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D154281
2023-07-12 14:29:26 +01:00
Marco Elver
de79233b2e [X86] Complete preservation of !pcsections in X86ISelLowering
https://reviews.llvm.org/D130883 introduced MIMetadata to simplify
metadata propagation (DebugLoc and PCSections).

However, we're currently still permitting implicit conversion of
DebugLoc to MIMetadata, to allow for a gradual transition and let the
old code work as-is.

This manifests in lost !pcsections metadata for X86-specific lowerings.
For example, 128-bit atomics.

Fix the situation for X86ISelLowering by converting all BuildMI() calls
to use an explicitly constructed MIMetadata.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D154986
2023-07-12 15:09:31 +02:00
Matt Devereau
fa58aa8e91 [SVE2p1][SME2] Add scalar addressing mode for LD1
Add the scalar addressing mode for multi vector LD1 instructions.

Differential Revision: https://reviews.llvm.org/D154829
2023-07-12 13:08:38 +00:00
Maciej Gabka
5b0e19a7ab [TLI][AArch64] Add mappings to vectorized functions from ArmPL
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.

Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries

Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
2023-07-12 12:53:18 +00:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
Ivan Kosarev
15e7749e19 [Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154528
2023-07-12 11:55:19 +01:00
John Brawn
210f61cbdd [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-12 11:48:01 +01:00
John Brawn
647aff2855 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-12 11:48:01 +01:00
David Stenberg
6aa94c64a5 [DWARF] Add printout for op-index
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.

Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.

Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D152535
2023-07-12 12:03:44 +02:00
Nikita Popov
d69033d245 [SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers
Instead of checking the pointer type, check the element type of
the GEP.

Previously we ended up reusing GEP increments that were not in
expanded form, thus not respecting LSRs choice of representation.

The change in 2011-10-06-ReusePhi.ll recovers a regression that
appeared when converting that test to opaque pointers.

Changes in various Thumb tests now compute the step outside the
loop instead of using add.w inside the loop, which is LSR's
preferred representation for this target.
2023-07-12 11:32:13 +02:00
David Green
86780f49ef [AArch64] Fix order of isReg and isDef checks in INSvi64 peephole.
The isDef asserts that the operand isReg, so the checks need to happen in the
other order.
2023-07-12 09:51:28 +01:00
Jay Foad
f7684d8510 [DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.

On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.

Differential Revision: https://reviews.llvm.org/D154960
2023-07-12 08:12:09 +01:00
Han Shen
65ef4d4357 [CodeGen] Part II of "Fine tune MachineFunctionSplitPass (MFS) for FSAFDO".
This CL adds a new discriminator pass. Also adds a new sample profile
loading pass when MFS is enabled.

Differential Revision: https://reviews.llvm.org/D152577
2023-07-11 22:40:25 -07:00
Jon Chesterfield
e75ce77cd7 [amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elements
More robust association between the kernels and lds struct.

Use poison instead of value() for lookup table elements introduced by dynamic lds lowering.

Extracted from D154946, new test from there verbatim. Segv fixed.

Fixes issues/63338

Fixes SWDEV-404491

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154972
2023-07-12 00:37:21 +01:00