52796 Commits

Author SHA1 Message Date
Vladislav Dzhidzhoev
e69916c943 [AArch64][GlobalISel] Legalize integer across-lane intrinsics with actual type
Across-lane intrinsics with integer destination type (uaddv, saddv,
umaxv, smavx, uminv, sminv) were legalized with the destination type
given in the LLVM IR intrinsic’s definition. It was wider than
the actual destination type of the corresponding machine instruction.
InstructionSelect was implicitly supposed to generate underlying
extension instructions for these intrinsics, while the real destination
type was opaque for other GlobalISel passes.  Thus,
llvm/test/CodeGen/AArch64/arm64-vaddv.ll failed on GlobalISel since
the generated code was worse in functions that used the value of
an across-lane intrinsic in following FP&SIMD instructions (functions
with _used_by_laneop suffix).

Here intrinsics are legalized and selected with an actual destination
type, making it transparent to other passes. If the destination
value is used in further instructions accepting FPR registers, there
won’t be extra copies across register banks. i16 type is added to
the list of the types of the FPR16 register bank to make it possible,
and a few SelectionDAG patterns are modified to eliminate ambiguity
in TableGen.

Differential Revision: https://reviews.llvm.org/D156831
2023-08-17 18:19:56 +02:00
David Green
cf65afbf93 [AArch64][GISel] Extend lowering for fp round intrinsics.
This extends the lowering of ceil, floor, nearbyint, rint, round, roundeven and
trunc. They are all very similar, so can reuse the same legalization info.
selectIntrinsicTrunc and selectIntrinsicRound can be removed as they can be
selected via tablegen patterns, and G_INTRINSIC_ROUNDEVEN is marked as a gisel
equivalent of froundeven. Otherwise this reuses the existing code, filling it
out to handle more types.

Differential Revision: https://reviews.llvm.org/D157679
2023-08-17 16:25:32 +01:00
Fraser Cormack
c058eb998a [AArch64] Fix crash when neither Neon nor SVE are enabled
The subtarget was unconditionally reporting that SVE was to be used to
lower vectors when Neon was unavailable, even when SVE itself was
unavailable. This decision leads other parts of the compiler to crash,
e.g., when querying SVE vector sizes.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D158179
2023-08-17 16:07:27 +01:00
Pravin Jagtap
af5fd142d3 [AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic
This will be useful to avoid the bit-casting noise
required to extend support for Floating Point
Operations in atomic optimizer for DPP in D156301

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156647
2023-08-17 10:45:19 -04:00
Christudasan Devadasan
81827f8cfb [AMDGPU] Support wwm-reg AV spill pseudos
The wwm register spill pseudos are currently defined for VGPR_32
regclass. It causes a verifier error for gfx908 or above as the
regalloc sometimes restores the values to the vector superclass AV_32.
Fixing it by supporting AV wwm-spill pseudos as well.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155646
2023-08-17 20:04:18 +05:30
Tuan Chuong Goh
74e191da07 [AArch64][GlobalISel] Combiner for EXT
Keep components of UNMERGE larger after running the Artifact Combiner on it.
This was intended to help with <v16i64> = G_SEXT <v16i16>, but implementation
for legalizing EXT is in a following patch, therefore a test for this case
will be included in the following patch

Differential Revision: https://reviews.llvm.org/D157715
2023-08-17 14:34:45 +01:00
Simon Pilgrim
a342f9802b [X86] dagcombine-shifts.ll - add i686 test coverage 2023-08-17 13:35:05 +01:00
Simon Pilgrim
3c2432690a [X86] Remove combineVectorSignBitsTruncation and leave TRUNCATE -> PACKSS/PACKUS to legalization/lowering
Don't prematurely fold TRUNCATE nodes to PACKSS/PACKUS target nodes - we miss out on generic TRUNCATE folds.

Helps some regressions from D152928 and #63946

Fixes #63710
2023-08-17 12:23:29 +01:00
Simon Pilgrim
dd7ba38078 [X86] matchTruncateWithPACK - consistently prefer shuffles for truncation to sub-64-bit vXi16
If we're truncating from v2i32 / v2i64 then PSHUFLW / PSHUFD+PSHUFLW should more easily allow further shuffle combines than a PACK chain will
2023-08-17 12:23:28 +01:00
Nikita Popov
6f1d9fb987 [X86] Add some i128 argument passing tests (NFC) 2023-08-17 12:00:37 +02:00
Nabeel Omer
d06608060c [X86] Fix aliasing check between TargetFrameIndex and FrameIndex
Compare slot indices instead of comparing pointer values.

Closes #63645

Differential Revision: https://reviews.llvm.org/D157513
2023-08-17 10:48:08 +01:00
David Green
8fc6b1a18f [AArch64] Add some vcvt tests. NFC.
See D157679. This also removes some duplication from arm64-vfloatintrinsics.ll,
where the tests now exist elsewhere.
2023-08-17 10:41:52 +01:00
Jonas Hahnfeld
eeac4321c5 Disable two tests without {arm,aarch64}-registered-target 2023-08-17 10:04:38 +02:00
Han Shen
317a0fe5bd [Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation.
When building a fatbinary, the driver invokes the compiler multiple
times with different "--target". (For example, with "-x cuda
--cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with
--target=x86_64_...., once with --target=sm_70) If we use
-fsplit-machine-functions or -fno-split-machine-functions for such
invocation, the driver reports an error.

This CL changes the behavior so:

  - "-fsplit-machine-functions" is now passed to all targets, for non-X86
    targets, the flag is a NOOP and causes a warning.

  - "-fno-split-machine-functions" now negates -fsplit-machine-functions (if
    -fno-split-machine-functions appears after any -fsplit-machine-functions)
    for any target triple, previously, it causes an error.

  - "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions"
    enables MFS on host but disables MFS for GPUS without warnings/errors.

  - "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables
    MFS for GPUS without warnings/errors.

Reviewed by: xur, dhoekwater

Differential Revision: https://reviews.llvm.org/D157750
2023-08-16 23:41:34 -07:00
Fangrui Song
4c89277095 [Mips][MC] AttemptToFoldSymbolOffsetDifference: revert isMicroMips special case
D52985/D57677 added a .gcc_except_table workaround, but the new behavior
doesn't match GNU assembler.
```
void foo();
int bar() {
  foo();
  try { throw 1; }
  catch (int) { return 1; }
  return 0;
}

clang --target=mipsel-linux-gnu -mmicromips -S a.cc
mipsel-linux-gnu-gcc -mmicromips -c a.s -o gnu.o

.uleb128 ($cst_end0)-($cst_begin0)     // bit 0 is not forced to 1
.uleb128 ($func_begin0)-($func_begin0) // bit 0 is not forced to 1
```

I have inspected `.gcc_except_table` output by `mipsel-linux-gnu-gcc -mmicromips -c a.cc`.
The `.uleb128` values are not forced to set the least significant bit.

In addition, D57677's adjustment (even->odd) to CodeGen/Mips/micromips-b-range.ll is wrong.
PC-relative `.long func - .` values will differ from GNU assembler as well.

The original intention of D52985 seems unclear to me. I think whatever
goal it wants to achieve should be moved to an upper layer.

This isMicroMips special case has caused problems to fix MCAssembler::relaxLEB to use evaluateAsAbsolute instead of evaluateKnownAbsolute,
which is needed to proper support R_RISCV_SET_ULEB128/R_RISCV_SUB_ULEB128.

Differential Revision: https://reviews.llvm.org/D157655
2023-08-16 23:11:59 -07:00
Justin Bogner
72017fcf00 [DirectX] Only embed dxil when writing object files
When emitting assembly we don't particularly want the binary DXIL
embedded in the output. This was mostly there for testing purposes, so
we update those tests to run the test directly using `opt` and
restrict the -dxil-embed and -dxil-globals passes to running normally
only in the case where we're trying to emit a DXContainer.

Differential Revision: https://reviews.llvm.org/D158051
2023-08-16 13:12:32 -07:00
Daniel Hoekwater
2c43d591c6 [CodeGen] Move function splitting tests from X86 to Generic (NFC)
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.

MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.

Differential Revision: https://reviews.llvm.org/D157563
2023-08-16 18:11:23 +00:00
Matt Arsenault
c9d0d15e69 AMDGPU: Refine some rsq formation tests
Drop unnecessary flags and metadata, add contract flags that should be
necessary.
2023-08-16 13:37:03 -04:00
Kazushi (Jam) Marukawa
922ac64b04 [VE] Avoid vectorizing store/load in scalar mode
Avoid vectorizing store and load instructions in scalar mode.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D158049
2023-08-17 02:15:54 +09:00
Nicholas Guy
d65feccb12 [ARM] Set preferred function alignment
Aligning functions yields small performance gains on
embedded cores, moreso with numerous small function calls.
Similar to aligning loops, if the function can fit within
a single cache line then the performance overhead of
fetching more instructions can be limited.

Differential Revision: https://reviews.llvm.org/D157514
2023-08-16 17:31:21 +01:00
David Green
a047dfe0d5 [AArch64][GISel] Lower EXT of 0 to a COPY
This allows us to select G_SHUFFLE_VECTOR with identity masks (possibly
including undef elements), but avoid the actual EXT instruction if the shift
amount is 0.
2023-08-16 17:12:15 +01:00
Eduard Zingerman
8f28e8069c [BPF] support for BPF_ST instruction in codegen
Generate store immediate instruction when CPUv4 is enabled.
For example:

    $ cat test.c
    struct foo {
      unsigned char  b;
      unsigned short h;
      unsigned int   w;
      unsigned long  d;
    };
    void bar(volatile struct foo *p) {
      p->b = 1;
      p->h = 2;
      p->w = 3;
      p->d = 4;
    }

    $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
    ...
    0000000000000000 <bar>:
           0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
           1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
           2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
           3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
           4:	95 00 00 00 00 00 00 00	exit

Take special care to:
- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit:
  BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with
  sign extension. Thus it is fine to generate such write when
  immediate is -1, but it is incorrect to generate such write when
  immediate is +0xffff_ffff.

This commit was previously reverted in e66affa17e32.
The reason for revert was an unrelated bug in BPF backend,
triggered by test case added in this commit if LLVM is built
with LLVM_ENABLE_EXPENSIVE_CHECKS.
The bug was fixed in D157806.

Differential Revision: https://reviews.llvm.org/D140804
2023-08-16 17:51:28 +03:00
Philip Reames
3c2a66973e [RISCVInsertVSETVLI] Generalize scalar extract (vmv.x.s, and vmx.f.s) hamdling
vmv.x.s and vmv.f.s are unconditional. They read the low element of a vector
register (not vector group), and function even when VL=0 or VSTART>0. As such,
they are don't care with respect to both VL and LMUL.

We'd previously had handling in the forward pass only via the NoRegister
mechanusm.  (The only instructions with SEW but without VL are these extracts.)
This patch moves that handling into getDemanded so that the backwards pass
benefits as well.

Differential Revision: https://reviews.llvm.org/D157991
2023-08-16 07:50:59 -07:00
Philip Reames
b06e52c32f [RISCVInsertVSETVLI] Default to VL=1 for scalar extracts
We were defaulting to VL=0 when we didn't otherwise have a vsetv
nearby. Instead, let's use VL=1. VL=0 is very much a cornercase
in hardware, and let's avoid if we can.

Differential Revision: https://reviews.llvm.org/D158015
2023-08-16 07:35:00 -07:00
Matt Arsenault
66ee794064 AMDGPU: Fix verifier error on splatted opencl fmin/fmax and ldexp calls
Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars to the vector
type.

https://reviews.llvm.org/D158077
2023-08-16 09:42:26 -04:00
Paul Walker
566065207b [SelectionDAG] Use TypeSize variant of ComputeValueVTs to compute correct offsets for scalable aggregate types.
Differential Revision: https://reviews.llvm.org/D157872
2023-08-16 11:56:31 +00:00
Ivan Kosarev
d7efe41598 [AMDGPU] Autogenerate the v_cndmask.ll and llvm.amdgcn.image.msaa.load.ll codegen tests.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157970
2023-08-16 12:50:51 +01:00
Ivan Kosarev
f9ab235318 [AMDGPU] Autogenerate the fmuladd.f16.ll and llvm.fmuladd.f16.ll codegen tests.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D157966
2023-08-16 12:49:45 +01:00
David Green
983185d3c7 [AArch64] Regenerate postlegalizer-lowering-shuffle-splat.mir and postlegalizer-lowering-zip.mir. NFC 2023-08-16 11:00:50 +01:00
Carl Ritson
d0e246ff16 [LiveRange] Fix inaccurate verification of live-in PhysRegs
Fix verification that a PhysReg is live in to an MBB.
isLiveIn does not handle reg units, so cannot identify when a
register would be defined because its super register is partially
defined.
Additionally a PhysReg may be partial defined at block entry and
then fully defined before any use.

Reviewed By: foad, arsenm

Differential Revision: https://reviews.llvm.org/D157086
2023-08-16 17:42:42 +09:00
David Green
c5f763b563 [AArch64][GISel] Fix selection of G_CONSTANT_FOLD_BARRIER
As far as I understand - When lowering a G_CONSTANT_FOLD_BARRIER we replace the
DstReg with SrcReg, and need to check that the register class is equivalent
when doing so for the replacement to be legal. During lowering we could end up
visiting nodes in an odd order, leaving a G_CONSTANT_FOLD_BARRIER with a known
regclass for the src, but only a regbank for the dst. Providing the Regbank
contains the regclass, the replacement should still be safe.

This fixes an assert seen in the llvm-test-suite when lowering hoisted
constants, relaxing canReplaceReg to account for the case when the regbank
covers the regclass, so it is better able to handle differences in visiting
order.

Differential Revision: https://reviews.llvm.org/D157202
2023-08-16 08:33:16 +01:00
Noah Goldstein
e7f7b63fb3 [DAGCombiner][X86] Guard (X & Y) ==/!= Y --> (X & Y) !=/== 0 behind TLI preference
On X86 for vec types `(X & Y) == Y` is generally preferable to
`(X & Y) != 0`. Creating zero requires an extra instruction and on
pre-avx512 targets there is no vector `pcmpne` so it requires two
additional instructions to invert the `pcmpeq`.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D157014
2023-08-16 02:00:15 -05:00
Noah Goldstein
2549ec1866 [SelectionDAG] Improve isKnownToBeAPowerOfTwo
Add additional cases for:
select, vselect, {u,s}{min,max}, and, casts, rotl, rotr

And improve handling of constants and shifts.

Differential Revision: https://reviews.llvm.org/D156778
2023-08-16 02:00:15 -05:00
Noah Goldstein
ac485e4072 [SelectionDAG] Add/Improve cases in isKnownNeverZero
1) Handle casts a bit more cleanly just with a loop rather than with
   recursion.

2) Add additional cases for smin/smax

3 ) For shifts we can also deduce non-zero if the maximum shift amount
    on the known 1s is non-zero.

Differential Revision: https://reviews.llvm.org/D156777
2023-08-16 02:00:15 -05:00
Noah Goldstein
2937d03e49 [X86] Add more tests for isKnownNeverZero;
Differential Revision: https://reviews.llvm.org/D156776
2023-08-16 02:00:15 -05:00
Noah Goldstein
86345eb459 [X86] Add tests for isKnownToBeAPowerOfTwo; NFC
Differential Revision: https://reviews.llvm.org/D156775
2023-08-16 02:00:14 -05:00
Aiden Grossman
87cccf8f6d [MLGO] Remove unsupported tag from BB profile dump test
This was added originally as the test was failing on NVPTX before an
explicit target triple was set on the llc invocation. The test was fixed
in 4afb1ee7bc0e3674238da2d3668f8d8b80596c62 but the unsupported
directive was never removed.
2023-08-15 20:36:19 -07:00
Daniel Hoekwater
e8540723b3 Revert "[CodeGen] Move function splitting tests from X86 to Generic (NFC)"
This reverts commit 1670e0ea076b12b3fcea2ea63f01bc09e0e7f3b2.
Causes https://lab.llvm.org/buildbot/#/builders/188/builds/33943
2023-08-16 01:46:35 +00:00
Daniel Hoekwater
d7bca8e494 [AArch64] Relax cross-section branches
Because the code layout is not known during compilation, the distance of
cross-section jumps is not knowable at compile-time. Because of this, we
should assume that any cross-sectional jumps are out of range. This
assumption is necessary for machine function splitting on AArch64, which
introduces cross-section branches in the middle of functions. The linker
relaxes out-of-range unconditional branches, but it clobbers X16 to do
so; it doesn't relax conditional branches, which must be manually
relaxed by the compiler.

Differential Revision: https://reviews.llvm.org/D145211
2023-08-16 01:43:07 +00:00
Daniel Hoekwater
1670e0ea07 [CodeGen] Move function splitting tests from X86 to Generic (NFC)
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.

MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.

Differential Revision: https://reviews.llvm.org/D157563
2023-08-16 01:25:54 +00:00
Yeting Kuo
818e76d6f2 [RISCV] Add MC layer support for Zicfilp.
This adds extension Zicfilp and support pseudo instruction lpad.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157362
2023-08-16 08:52:51 +08:00
Craig Topper
6eb36aed86 [RISCV][GISel] Mark G_GLOBAL_VALUE as legal.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157950
2023-08-15 09:26:29 -07:00
Craig Topper
ae76574d4a [RISCV][GISel] Make G_ICMP of pointers legal.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157823
2023-08-15 09:25:52 -07:00
Craig Topper
88903fac1f [RISCV][GISel] Make G_SELECT of pointers legal
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157819
2023-08-15 09:20:08 -07:00
Simon Pilgrim
b0a77af4f1 [DAG] SimplifyDemandedBits - add sra(shl(x,c1),c1) -> sign_extend_inreg(x) demanded elts fold
Move the sra(shl(x,c1),c1) -> sign_extend_inreg(x) fold inside SimplifyDemandedBits so we can recognize hidden splats with DemandedElts masks.

Because the c1 shift amount has multiple uses, hidden splats won't get simplified to a splat constant buildvector - meaning the existing fold in DAGCombiner::visitSRA can't fire as it won't see a uniform shift amount.

I also needed to add TLI preferSextInRegOfTruncate hook to help keep truncate(sign_extend_inreg(x)) vector patterns on X86 so we can use PACKSS more efficiently.

Differential Revision: https://reviews.llvm.org/D157972
2023-08-15 16:32:03 +01:00
Joe Nash
a093032981 [AMDGPU][True16] Update FPToI1Pat GFX11 pat to use GFX11 instruction
These cmp patterns were using the pre-GFX11 pseudo instruction, and so
failed to compile.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D157912
2023-08-15 11:11:17 -04:00
Matt Arsenault
d251761660 AMDGPU: Replace log libcalls with log intrinsics 2023-08-15 10:48:46 -04:00
Matt Arsenault
81b278e613 AMDGPU: Fix fast f32 exp2
Mirror of the previous log changes, OpenCL conformance doesn't like
interpreting afn as ignore denormal handling but was previously hidden
by flag dropping.
2023-08-15 10:48:46 -04:00
Matt Arsenault
4b7b4b9458 AMDGPU: Fix fast f32 log/log10
OpenCL conformance didn't like interpreting afn as ignore the denormal
handling.

https://reviews.llvm.org/D157940
2023-08-15 10:48:46 -04:00
Matt Arsenault
e09b3593ba AMDGPU: Fix fast math log2 f32
Apparently afn doesn't allow you to drop the denormal handling
according to OpenCL conformance. This was hidden by losing the flags
during the library linking process. Fast log is still broken and needs
more work.

https://reviews.llvm.org/D157936
2023-08-15 10:48:46 -04:00