52796 Commits

Author SHA1 Message Date
Matthew Devereau
23ea98f155
[AArch64][SVE2] Do not emit RSHRNB for large shifts (#66672)
rshrnb's shift amount operand must be between 1-EltSizeInBits. This
patch stops RSHRNB ISD nodes being emitted in this case
2023-09-22 10:36:54 +01:00
Ivan Kosarev
469b3bfad2 [AMDGPU] Add True16 register classes.
Reviewed By: rampitec, Joe_Nash

Differential Revision: https://reviews.llvm.org/D156099
2023-09-22 10:17:02 +01:00
Fangrui Song
4389252c58 Revert "[DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold"
This reverts commit 05926a5a557878aa233ac8431b3acddf54422e58.

Caused AArch64 crash

 #12 0x00007f09eec09181 skipExtensionForVectorMULL(llvm::SDNode*, llvm::SelectionDAG&)
 #13 0x00007f09eec08289 llvm::AArch64TargetLowering::LowerMUL(llvm::SDValue, llvm::SelectionDAG&) const
 #14 0x00007f09eec1a3fd llvm::AArch64TargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const
 #15 0x00007f09dc8586a7 (anonymous namespace)::VectorLegalizer::LowerOperationWrapper(llvm::SDNode*, llvm::SmallVectorImpl<llvm::SDValue>&)
2023-09-22 00:14:31 -07:00
David Green
22f423aa46 [ARM] Add some extra testing for MVE postinc loops. NFC 2023-09-22 07:08:49 +01:00
Amara Emerson
985362e2f3
[AArch64][GlobalISel] Avoid running the shl(zext(a), C) -> zext(shl(a, C)) combine. (#67045) 2023-09-22 09:37:52 +08:00
Amara Emerson
ddddf7f35e
[AArch64][GlobalISel] Split offsets of consecutive stores to aid STP … (#66980) 2023-09-22 09:35:43 +08:00
Arthur Eubanks
9b6b2a0cec
[X86] Use RIP-relative for non-globals in medium code model in classifyLocalReference() (#67070)
We only want to treat globals as potentially far away, not other things
like constants in the constant pool.
This matches the object file emission that only puts the large section
flag on globals.

Remove FIXME since the remaining differences are accesses to 0 sized
globals which are intentional.
2023-09-21 16:50:33 -07:00
Artem Belevich
d06b3e3b6a
[NVPTX] improve lowering for common byte-extraction operations. (#66945)
Some critical code paths we have depend on efficient byte extraction
from data loaded as integers. By default LLVM tries to extract bytes by 
storing/loading from stack, which is very inefficient on GPU.
2023-09-21 13:48:54 -07:00
Matthew Devereau
b967f3a1d7
[AArch64] Separate PNR into its own Register Class (#65306)
This patch separates PNR registers into their own register class instead
of sharing a register class with PPR registers. This primarily allows us
to return more accurate register classes when applying assembly
constraints, but also more protection from supplying an incorrect
predicate type to an invalid register operand.
2023-09-21 19:53:16 +01:00
Momchil Velikov
3769aaaf1f [AArch64] Pre-commit some tests for D152828 (NFC)
Generate a few of the relevant tests with `update_llc_test_checks.py`
and pre-commit. Makes it easier to spot the differences in D152828.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D157116
2023-09-21 18:49:14 +01:00
Momchil Velikov
0eb0a65d0f [AArch64] Correctly determine if {ADD,SUB}{W,X}rs instructions are cheap
These are marked to be "as cheap as a move".

According to publicly available Software Optimization Guides, they
have one cycle latency and maximum throughput only on some
microarchitectures, only for `LSL` and only for some shift amounts.

This patch uses the subtarget feature `FeatureALULSLFast` to determine
how cheap the instructions are.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D152827

Change-Id: I8f0d7e79bcf277ebf959719991c29a1bc7829486
2023-09-21 18:44:24 +01:00
Momchil Velikov
ededcb0041 [AArch64] Refactor AArch64InstrInfo::isAsCheapAsAMove (NFC)
- remove `FeatureCustomCheapAsMoveHandling`: when you have target
      features affecting `isAsCheapAsAMove` that can be given on command
      line or passed via attributes, then every sub-target effectively has
      custom handling

    - remove special handling of `FMOVD0`/etc: `FVMOV` with an immediate
      zero operand is never[1] more expensive tha an `FMOV` with a
      register operand.

    - remove special handling of `COPY` - copy is trivially as cheap as
      itself

    - make the function default to the `MachineInstr` attribute
      `isAsCheapAsAMove`

    - remove special handling of `ANDWrr`/etc and of `ANDWri`/etc: the
      fallback `MachineInstr` attribute is already non-zero.

    - remove special handling of `ADDWri`/`SUBWri`/`ADDXri`/`SUBXri` -
      there are always[1] one cycle latency with maximum (for the
      micro-architecture) throughput

    - check if `MOVi32Imm`/`MOVi64Imm` can be expanded into a "cheap"
      sequence of instructions

      There is a little twist with determining whether a
      MOVi32Imm`/`MOVi64Imm` is "as-cheap-as-a-move". Even if one of these
      pseudo-instructions needs to be expanded to more than one MOVZ,
      MOVN, or MOVK instructions, materialisation may be preferrable to
      allocating a register to hold the constant. For the moment a cutoff
      at two instructions seems like a reasonable compromise.

    [1] according to 19 software optimisation manuals

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D154722
2023-09-21 18:30:01 +01:00
Sirish Pande
e6f9483f77
[SelectionDAG] Flags are dropped when creating a new FMUL (#66701)
While simplifying some vector operators in DAG combine, we may need to
create new instructions for simplified vectors. At that time, we need to
make sure that all the flags of the new instruction are copied/modified
from the old instruction.

If "contract" is dropped from an instruction like FMUL, it may not
generate FMA instruction which would impact performance.

Here's an example where "contract" flag is dropped when FMUL is created.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-09-21 10:26:34 -05:00
Luke Lau
b5ff71e261
[RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector (#65997)
Similar to #65598, if we're using a vslideup to insert a fixed length
vector into another vector, then we can work out the minimum number of
registers it will need to slide up across given the minimum VLEN, and
shrink the type operated on to reduce LMUL accordingly.

This is somewhat dependent on #66211 , since it introduces a subregister
copy that triggers a crash with -early-live-intervals in one of the
tests.

Stacked upon #66211
2023-09-21 13:55:49 +01:00
David Spickett
1778d6802b Revert "[AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG"
This reverts commit fb8f59156f0f208f6192ed808fc223eda6c0e7ec and
b8e9450acb1ad10d002a85b7dafa9d14c764478f.

Due to test suite failures on AArch64:
https://lab.llvm.org/buildbot/#/builders/183/builds/16057
2023-09-21 12:33:11 +00:00
Jeffrey Byrnes
acb4854563
[AMDGPU] Precommit test for D159533 (#66965)
Precommit test ahead of https://reviews.llvm.org/D159533 for ISD::FSHR /
AMDGPUISD::PERM combine
2023-09-21 12:17:59 +01:00
Simon Pilgrim
05926a5a55 [DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold
Noticed on D159533 and I've finally deal with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern.
2023-09-21 12:07:49 +01:00
Paulo Matos
0495cd89fc
[UpdateTestChecks] Add support for SPIRV in update_llc_test_checks.py (#66213)
Support for SPIRV added, updated test SPV_INTEL_optnone.ll using the script.

Previously https://reviews.llvm.org/D157858
2023-09-21 12:51:42 +02:00
Mirko Brkušanin
ecfdc23dd2
[AMDGPU] Select gfx1150 SALU Float instructions (#66885) 2023-09-21 12:22:55 +02:00
David Green
af56c4a4cb [AArch64] Add an aarch64-enable-ext-to-tbl option. NFC
This transform has caused a few issues with operations that can naturally be
extended. This patch just adds a debug option for disabling the transform,
useful for testing cases where it might not be profitable.
2023-09-21 11:20:19 +01:00
Nathan Gauër
6bad175a87
[SPIRV][DX] Share one test between backends (#65975)
One big issue with DirectXShaderCompiler was test coverage: DXIL and
SPIR-V backends had their own tests. When a bug was found in one, the
other wasn't always checked. This lead to unequal support of HLSL for
both backends. We'd like to avoid those issues here, hence the
test-sharing.

By default, all the tests in this folder are marked as requiring
DirectX. But as SPIR-V support grows, each test drop this requirement,
and check the SPIR-V behavior.

I would have preferred to mark new tests as XFAIL for SPIR-V by default,
so we could differentiate real unsupported tests (as SPIR-V has no
equivalent), from newly added tests. But the way LIT is built, I don't
think this is possible.

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2023-09-21 12:15:55 +02:00
Pierre van Houtryve
fe2f67e4ba
[AMDGPU] Remove Code Object V2 (#65715)
Code Object V2 has been deprecated for more than a year now. We can
safely remove it from LLVM.

- [clang] Remove support for the `-mcode-object-version=2` option.
- [lld] Remove/refactor tests that were still using COV2
- [llvm] Update AMDGPUUsage.rst
- Code Object V2 docs are left for informational purposes because those
code objects may still be supported by the runtime/loaders for a while.
- [AMDGPU] Remove COV2 emission capabilities.
- [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2
- [AMDGPU] Update all tests that were still using COV2 - They are either
deleted or ported directly to code object v4 (as v3 is also planned to
be removed soon).
2023-09-21 12:00:45 +02:00
Nikita Popov
8b4e29b35d [X86] Add test for #66984 (NFC) 2023-09-21 09:14:10 +02:00
Craig Topper
cbd4596168 Recommmit "[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950)"
With MC test updates.

Original commit message

We can invert the value and treat it as if it had leading zeroes.
2023-09-20 18:51:51 -07:00
Craig Topper
ea064ba6a2 Revert "[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950)"
This reverts commit a8b8e9476451e125e81bd24fbde6605246c59a0e.

Forgot to update MC tests.
2023-09-20 17:05:00 -07:00
Craig Topper
a8b8e94764
[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950)
…tant has leading ones.

We can invert the value and treat it as if it had leading zeroes.
2023-09-20 16:52:51 -07:00
Matheus Izvekov
8b04f1e49a
[X86] fix combineSubSetcc to handle a large constant (#66941) 2023-09-20 23:17:33 +02:00
Aiden Grossman
3dc2f2618b
[MLGO] Move MBB Profile Dump test to Generic (#66856)
This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic
from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have
anything to do with the ML guided register allocation heuristic.
2023-09-20 11:50:33 -07:00
Fangrui Song
9f4c9b90c9 Revert D155711 "[SimplifyCFG] Hoist common instructions on Switch."
This reverts commit 96ea48ff5dcba46af350f5300eafd7f7394ba606.

The change may cause Verifier.cpp error
"musttail call must precede a ret with an optional bitcast"
2023-09-20 11:49:20 -07:00
Noah Goldstein
6d6314ba64 [DAGCombiner] Extend combineFMulOrFDivWithIntPow2 to work for non-splat float vecs
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154868
2023-09-20 13:28:24 -05:00
Noah Goldstein
47c642f9a0 [DAGCombiner] Fold IEEE fmul/fdiv by Pow2 to add/sub of exp
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.

This implements:

```
(fmul C, (uitofp Pow2))
    -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
    -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.

The patch is intentionally conservative about the transform, only
doing so if we:
    1) have IEEE floats
    2) C is normal
    3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
       bounds.

Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154805
2023-09-20 13:28:24 -05:00
Noah Goldstein
32a46919a2 [AMDGPU] Add tests for folding fmul/fdiv by Pow2 to add/sub of exp; NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D159405
2023-09-20 13:28:24 -05:00
Noah Goldstein
6ec53b4567 [X86] Add tests for folding fmul/fdiv by Pow2 to add/sub of exp; NFC
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154804
2023-09-20 13:28:24 -05:00
Sirish Pande
cc3491fd45
[SelectionDAG] [NFC] Add pre-commit test for PR66701. (#66796)
[SelectionDAG] [NFC] Add pre-commit test for PR66701.

Co-authored-by: Sirish Pande <sirish.pande@amd.com>
2023-09-20 11:37:18 -05:00
David Green
46a1908c26 [AArch64] Add some tests for setcc known bits fold. NFC 2023-09-20 17:36:01 +01:00
Vladislav Dzhidzhoev
b8e9450acb Cleanup fallback NOT checks 2023-09-20 18:22:54 +02:00
Vladislav Dzhidzhoev
fb8f59156f [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG
Follow-up of #65630.
2023-09-20 18:22:54 +02:00
Simon Pilgrim
ad762f2a9f [X86] Regenerate pr39098.ll 2023-09-20 16:58:00 +01:00
Simon Pilgrim
2ec697b4c7 [AMDGPU] Regenerate always-uniform.ll 2023-09-20 16:58:00 +01:00
Natalie Chouinard
47a377d5e0 [SPIRV] Fix OpConstant float and double printing
Print OpConstant floats as formatted decimal floating points, with
special case exceptions to print infinity and NaN as hexfloats.

This change follows from the fixes in
https://github.com/llvm/llvm-project/pull/66686 to correct how
constant values are printed generally.

Differential Revision: https://reviews.llvm.org/D159376
2023-09-20 15:26:41 +00:00
Joe Nash
2c0f2b510c
[AMDGPU] Convert tests rotr.ll and rotl.ll to be auto-generated (#66828)
and add GFX11 coverage. NFC
2023-09-20 10:32:04 -04:00
Matt Devereau
d297399b35 [AArch64][SME] Enable TPIDR2 lazy-save for za_preserved
This change makes callees with the __arm_preserves_za
type attribute comply with the dormant state requirements
when it's caller has the __arm_shared_za type attribute.
Several external SME functions also do not need to lazy
save.

5e67092434/aapcs64/aapcs64.rst (L1381)

Differential Revision: https://reviews.llvm.org/D159186
2023-09-20 13:34:41 +00:00
Natalie Chouinard
f7bfa583b7
[SPIR-V] Fix 64-bit integer literal printing (#66686)
Previously, the SPIR-V instruction printer was always printing the first
operand of an `OpConstant`'s literal value as one of the fixed operands.
This is incorrect for 64-bit values, where the first operand is actually
the value's lower-order word and should be combined with the following
higher-order word before printing.

This change fixes that issue by waiting to print the last fixed operand
of `OpConstant` instructions until the variadic operands are ready to be
printed, then using `NumFixedOps - 1` as the starting operand index for
the literal value operands.

Depends on D156049
2023-09-20 09:31:14 -04:00
Luke Lau
450dfab8c3
[RISCV] Add tests where bin ops of splats could be scalarized. NFC (#65747)
This adds tests for fixed and scalable vectors where we have a binary op
on two splats that could be scalarized. Normally this would be
scalarized in the middle-end by VectorCombine, but as noted in
https://reviews.llvm.org/D159190, this pattern can crop up during
CodeGen afterwards.

Note that a combine already exists for this, but on RISC-V currently it
only works on scalable vectors where the element type == XLEN. See
#65068 and #65072
2023-09-20 13:23:56 +01:00
Simon Pilgrim
3b7dfda79d [X86] Add test cases for gnux32 large constants Issue #55061
Test file showing current codegen for D124406
2023-09-20 12:24:51 +01:00
Simon Pilgrim
170ba6ee12 [X86] combineINSERT_SUBVECTOR - attempt to combine concatenated shuffles
If all the concatenated subvectors are targets shuffle nodes, then call combineX86ShufflesRecursively to attempt to combine them.

Unlike the existing shuffle concatenation in collectConcatOps, this isn't limited to splat cases and won't attempt to concat the source nodes prior to creating the larger shuffle node, so will usually only combine to create cross-lane shuffles.

This exposed a hidden issue in matchBinaryShuffle that wasn't limiting v64i8/v32i16 UNPACK nodes to AVX512BW targets.
2023-09-20 12:17:51 +01:00
Simon Pilgrim
0662791a13 [X86] vector-interleaved tests - add AVX512-SLOW/AVX512-FAST common prefixes to reduce duplication
These aren't always used but its lot more manageable to keep the vector-interleaved files using the same RUN lines wherever possible
2023-09-20 12:17:51 +01:00
Jay Foad
a68c7241ec
[AMDGPU] Run twoaddr tests with -early-live-intervals (#66775)
Sample test case:

%3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec

With LiveVariables this is converted to three-address form just because
there is no "killed" flag on %2. To make it do the same thing with
LiveIntervals I added a later use of %2:

%3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec
    S_ENDPGM 0, implicit %2
2023-09-20 08:22:00 +01:00
Yeting Kuo
976df42e6a
[RISCV] Fix bugs about register list of Zcmp push/pop. (#66073)
The pr does two things. One is to fix internal compiler error when we
need to spill callee saves but none of them is GPR, another is to fix
wrong register number for pushed registers are {ra, s0-s11}.
2023-09-20 15:20:13 +08:00
Dhruv Chawla
3e992d81af
[InferAlignment] Enable InferAlignment pass by default
This gives an improvement of 0.6%:
https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D158600
2023-09-20 12:08:52 +05:30