48353 Commits

Author SHA1 Message Date
Jay Foad
31fbfa57e7 [AMDGPU] Regenerate some spill checks 2023-06-07 17:50:51 +01:00
Vy Nguyen
e60b30d5e3 Reland "D144999 [MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs."
Reasons for rolling forward:
    - the crash reported from Chromium was fixed in D151824 (not related to this patch at all)
    - since D152824 was committed, it should now be safe to roll this forward.

New change:
    - add an additional _ in name check

This reverts commit 4980eead4d0b4666d53dad07afb091375b3a13a0.
2023-06-07 10:03:50 -04:00
Phoebe Wang
2011ad0cbb [X86][FP16] Do not generate VBROADCAST for fp16
We cannot lower VBROADCAST i16 under AVX1.

Fixes #63114

Differential Revision: https://reviews.llvm.org/D152350
2023-06-07 20:54:56 +08:00
David Green
beb3a9a5e6 [AArch64][SVE] Add a commutative VSelectCommPredOrPassthruPatFrags
This adds a commutative version of VSelectPredOrPassthruPatFrags (renamed from
EitherVSelectOrPassthruPatFrags) that checks both variants for commutative
operations like min/max. I have not attempted to handle fp operation that
require fast-math flags.

Differential Revision: https://reviews.llvm.org/D151084
2023-06-07 13:18:16 +01:00
Valery Pykhtin
342acfc9bb [AMDGPU] Turn off pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
There is a failure with this pass in the case when target register class for a subregister isn't known from instruction description (for ex. COPY).
Currently in this situation the RC is obtained using TargetRegisterInfo::getSubRegisterClass but in general it's not working.

In order to fix this two things should be done:
1. Stop processing a subregister if the target register class is unknown (conservative approach)
2. Improve deduction of subregister' target register class (i.e by processing COPY chain)

I was going to implement point 1 but my tests use implicit operands for S_NOP and they don't have associated target register class and all tests fail.
Therefore I decided to turn off the pass now, implement point 1 and fix my tests.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D152291
2023-06-07 12:05:25 +02:00
Simon Pilgrim
49bd51d918 [X86] Add test case for Issue #63108 2023-06-07 10:19:14 +01:00
Juan Manuel MARTINEZ CAAMAÑO
abe6ecd7e5 [AsmPrinter][AMDGPU] Generate uwtable entries in .eh_frame
Consider only targets where `MCAsmInfo::ExceptionsType == ExceptionHandling::None`
and that support CFI (when `MCAsmInfo::UsesCFIForDebug` is set to true):
currently, only AMDGPU.

This patch enables the emission of CFI information in the .eh_frame
section when the uwtable attribute is present on a function.

Before, we could generate CFI information for debugging puproses only.

This patch prepares AMDGPU to support collecting GPU stack traces in the future.

I did a first implementation (https://reviews.llvm.org/D139024)
but at the time I had not realized that no other platform used
`UsesCFIForDebug`.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D151806
2023-06-07 09:54:47 +02:00
Weining Lu
47601815ec [LoongArch] Define ual feature and override allowsMisalignedMemoryAccesses
Some CPUs do not allow memory accesses to be unaligned, e.g. 2k1000la
who uses the la264 core on which misaligned access will trigger an
exception.

In this patch, a backend feature called `ual` is defined to decribe
whether the CPU supports unaligned memroy accesses. And this feature
can be toggled by clang options `-m[no-]unaligned-access` or the
aliases `-m[no-]strict-align`. When this feature is on,
`allowsMisalignedMemoryAccesses` sets the speed number to 1 and returns
true that allows the codegen to generate unaligned memory access insns.

Clang options `-m[no-]unaligned-access` are moved from `m_arm_Features_Group`
to `m_Group` because now more than one targets use them. And a test
is added to show that they remain unused on a target that does not
support them. In addition, to keep compatible with gcc, a new alias
`-mno-strict-align` is added which is equal to `-munaligned-access`.

The feature name `ual` is consistent with linux kernel [1] and the
output of `lscpu` or `/proc/cpuinfo` [2].

There is an `LLT` variant of `allowsMisalignedMemoryAccesses`, but
seems that curently it is only used in GlobalISel which LoongArch
doesn't support yet. So this variant is not implemented in this patch.

[1]: https://github.com/torvalds/linux/blob/master/arch/loongarch/include/asm/cpu.h#L77
[2]: https://github.com/torvalds/linux/blob/master/arch/loongarch/kernel/proc.c#L75

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D149946
2023-06-07 13:40:58 +08:00
Jeffrey Byrnes
db61927951 [AMDGPU][IGLP]: Add rules to SchedGroups
Differential Revision: https://reviews.llvm.org/D146774

Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859
2023-06-06 19:19:21 -07:00
Craig Topper
2b09f53b32 [RISCV] Remove overly restrictive assert from negateFMAOpcode.
It's possible that both multiplicands are being negated. This won't
change the opcode, but we can delete the two negates. Allow this
case to get through negateFMAOpcode.

I think D152260 will also fix this test case, but in the future
it may be possible for an fneg to appear after we've already converted
to RISCVISD opcodes in which case D152260 won't help.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D152296
2023-06-06 18:55:58 -07:00
Florian Mayer
38f7c7eb1a Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).""
Revert broke even more stuff.

This reverts commit d5fbec30939f2c9f82475cf42c638619514b5c67.
2023-06-06 17:39:05 -07:00
Florian Mayer
d5fbec3093 Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C)."
Triggers UBSan error.

This reverts commit 58b2d652af49ee9d9ff2af6edd7f67f23b26bfee.
2023-06-06 17:30:07 -07:00
Artem Belevich
ef8655adc8 [NVPTX] Adapt tests to make them usable with CUDA-12.x
CUDA-12 no longer supports 32-bit compilation.

Tests agnostic to 32/64 compilation mode are switched to use nvptx64.
Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+.

Differential Revision: https://reviews.llvm.org/D152199
2023-06-06 14:22:12 -07:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Matt Arsenault
5d361ad2a4 AMDGPU/GlobalISel: Fix broken / copy paste error in sext_inreg test 2023-06-06 17:07:18 -04:00
Craig Topper
bb10612587 [RISCV] Use PACK in RISCVMatInt for constants that have the same lower and upper 32 bits.
This requires Zbkb.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D152293
2023-06-06 13:30:33 -07:00
Noah Goldstein
809b1d834d [KnownBits] Return 0 for poison {s,u}div inputs
It seems consistent to always return zero for known poison rather than
varying the value. We do the same elsewhere.

Differential Revision: https://reviews.llvm.org/D150922
2023-06-06 15:14:10 -05:00
David Green
2a8df8d0b9 [AArch64][SVE] Add one-use-check to EitherVSelectOrPassthruPatFrags
As pointed out in D149968 vselect predicate patterns could do with a one-use
check to prevent multiple operations being created. This updates the
EitherVSelectOrPassthruPatFrags pattern frags used in creating predicates
min/max.

Differential Revision: https://reviews.llvm.org/D151080
2023-06-06 21:10:32 +01:00
Craig Topper
58b2d652af [RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).
Where C is a simm32.

This costs an extra temporary register, but avoids a constant pool.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D152236
2023-06-06 11:59:12 -07:00
Simon Pilgrim
9a81b69757 [AArch64] Regenerate tests with missing immediate hex asm comments
Reduces diff in a future commit
2023-06-06 19:44:28 +01:00
Simon Pilgrim
a279a09ab9 Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:44:24 +01:00
Simon Pilgrim
78de45fd4a Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:07:33 +01:00
Jay Foad
a4a3ac10cb [AMDGPU] Remove extract_subvector patterns
Removing them seems to slightly increase code quality as well as
simplifying both the tablegen and C++ parts of the code.

Differential Revision: https://reviews.llvm.org/D149853
2023-06-06 14:04:50 +01:00
Ricardo Jesus
3a87c15026 [AArch64][NFC] Normalise name of indexed forms of SQRDMLAH/SQRDMLSH
Most indexed vector instructions are suffixed with v<N><TY>_indexed.

SQRDMLAH/SQRDMLSH are the exception, being suffixed with <TY>_indexed
instead, which can complicate matching them slightly.

Differential Revision: https://reviews.llvm.org/D152161
2023-06-06 13:02:36 +00:00
Simon Pilgrim
85b77b13e3 [GlobalISel][X86] Add G_IMPLICIT_DEF / G_CONSTANT legalization handling 2023-06-06 11:45:22 +01:00
Thorsten Schütt
60b8019ea0 [GlobalIsel][X86] Legalize G_ANYEXT, G_SEXT, and G_ZEXT
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152243
2023-06-06 12:22:09 +02:00
wangpc
26e41a80d0 [RISCV] Handle "o" inline asm memory constraint
This is the same as D100412.

We just found the same crash when we tried to compile some packages
like mariadb, php, etc.

For constraint "o", it means "A memory operand is allowed, but
only if the address is offsettable". So I think it can be handled
just like constraint "m" for RISCV target.

And we print verbose information when unsupported constraints occur.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D151979
2023-06-06 17:50:40 +08:00
Carl Ritson
6afc4b0629 [AMDGPU] WQM: Ensure exact mode placement before branches
Fix for D151797 where the change accidentally allowed exit to
exact mode between branch instructions.

Reviewed By: dstuttard

Differential Revision: https://reviews.llvm.org/D152228
2023-06-06 18:11:35 +09:00
Serge Pavlov
10e7899818 [FPEnv] Get rid of extra moves in fpenv calls
If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP
environment is represented as a region in memory, extra moves can
appear. For example the code:

  define void @func_01(ptr %ptr) {
    %env = call i256 @llvm.get.fpenv.i256()
    store i256 %env, ptr %ptr
    ret void
  }

produces DAG:

  ch = get_fpenv_mem ch, memory_region
  val: i256, ch = load ch, memory_region
  ch = store ch, ptr, val

In this case the extra moves can be avoided if `get_fpenv_mem` got
pointer to the memory where the FP environment should be finally placed.

This change implement such optimization for this use case.

Differential Revision: https://reviews.llvm.org/D150437
2023-06-06 14:54:52 +07:00
Carl Ritson
7275637505 [AMDGPU] Pre-commit test for D152228 (NFC) 2023-06-06 16:00:20 +09:00
Luo, Yuanke
787f3008be [X86] Pre-commit test case for D152227. 2023-06-06 14:56:45 +08:00
Luo, Yuanke
60b7dbb670 [X86] Add test cases for D152227. 2023-06-06 14:24:46 +08:00
Paulo Matos
9571a28ee4 [WebAssembly] Add tests ensuring rotates persist
Due to the nature of WebAssembly, it's always better to keep
rotates instead of trying to optimize it. Commit 9485d983
disabled the generation of fsh for rotates, however these
tests ensure that future changes don't change the behaviour for
the Wasm backend that tends to have different optimization
requirements than other architectures. Also see:
https://github.com/llvm/llvm-project/issues/62703

Differential Revision: https://reviews.llvm.org/D152126
2023-06-06 07:48:35 +02:00
Ben Shi
b1f0cb89c1 [AVR][NFC][test] Supplement more tests of 8-bit rotation
Reviewed By: Patryk27, jacquesguan

Differential Revision: https://reviews.llvm.org/D152129
2023-06-06 11:24:18 +08:00
Jianjian GUAN
77da27b5e3 [RISCV] Improve selection for vector fpclass.
Since vfclass intruction will only set one single bit in the result, so if we only want to check 1 fp class, we could use vmseq to do it.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151967
2023-06-06 10:24:24 +08:00
Matt Arsenault
ecf30c31fb AMDGPU: Fix broken test 2023-06-05 20:44:59 -04:00
NAKAMURA Takumi
d3777f20c5 test/AMDGPU: REQUIRES asserts (D148184) 2023-06-06 08:55:46 +09:00
Matt Arsenault
30bd96fa17 AMDGPU: Add baseline test for undoing mul add 1 reassociation
Add some tests for combines to undo regressions caused by
0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c.
2023-06-05 18:44:17 -04:00
Matt Arsenault
b25c001ad3 AMDGPU: Fold zext into result of v_mad_u16 on high zeroing targets
Avoids regressions in future patch.
2023-06-05 18:41:07 -04:00
Matt Arsenault
db08f9a2d5 AMDGPU: Add baseline 16-bit mad matching tests 2023-06-05 18:41:07 -04:00
Matt Arsenault
cb4b7340b0 AMDGPU: Convert test to generated checks 2023-06-05 18:41:06 -04:00
Craig Topper
b64ddae8a2 [RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases.
This patch lowers to vsetvli when the AVL is i32 or XLenVT and
the VF is a power of 2 in the range [1, 64]. VLEN=32 is not supported
as we don't have a valid type mapping for that. VF=1 is not supported
with Zve32* only.

The element width is used to set the SEW for the vsetvli if possible.
Otherwise we use SEW=8.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D150824
2023-06-05 15:02:11 -07:00
Craig Topper
4157bfb230 [RISCV] Add RISCVISD nodes for vfwadd/vfwsub.
Add a DAG combine to form these from FADD_VL/FSUB_VL and FP_EXTEND_VL.

This makes it similar to other widening ops and allows us to handle
using the same FP_EXTEND_VL for both operands.

Differential Revision: https://reviews.llvm.org/D151969
2023-06-05 14:12:47 -07:00
Artem Belevich
73464e377b [NVPTX] fixed vector-compare test.
Apparently this test didn't actually test anything other that the IR compiles.
2023-06-05 12:49:12 -07:00
Artem Belevich
dc90f42ea7 Coalesce 16-bit FP types to use integer register classes.
i16/f16/bf16 will use the same .b16 registers and
i32/v2f16 and v2bf16 will share .b32 registers.

The changes are mostly mechanical, intended to remove unnecessary register
classes which tend to produce redundant register moves.

Differential Revision: https://reviews.llvm.org/D151601

v2f16 regtype conversion to i32
2023-06-05 12:21:52 -07:00
Krzysztof Drewniak
23098bd454 [AMDGPU] Add intrinsic for converting global pointers to resources
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148957
2023-06-05 17:07:59 +00:00
Krzysztof Drewniak
ab37937812 [AMDGPU] Use resource base for buffer instruction MachineMemOperands
1. Remove the existing code that would encode the constant offsets (if
there were any) on buffer intrinsic operations onto their
`MachineMemOperand`s. As far as I can tell, this use of `offset` has
no substantial impact on the generated code, especially since the same
reasoning is performed by areMemAccessesTriviallyDisjoint().

2. When a buffer resource intrinsic takes a pointer argument as the
base resource/descriptor, place that memory argument in the value
field of the MachineMemOperand attached to that intrinsic.

This is more conservative than what would be produced by more typical
LLVM code using GEP, as the Value (for alias analysis purposes)
corresponding to accessing buffer[0] and buffer[1] is the same.
However, the target-specific analysis of disjoint offsets covers a lot
of the simple usecases.

Despite this limitation, the new buffer intrinsics, combined with
LLVM's existing pointer annotations, allow for non-trivial
optimizations, as seen in the new tests, where marking two buffer
descriptors "noalias" allows merging together loads and stores in a
"load from A, modify loaded value, store to B" sequence, which would
not be possible previously.

Depends on D147547

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148184
2023-06-05 17:06:57 +00:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
JP Lehr
c9998ec145 Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45.

This patch lead to build time outs on the AMDGPU OpenMP runtime
buildbot.
2023-06-05 10:55:58 -04:00
Simon Pilgrim
c2926c6c4d [GlobalISel][X86] Regenerate legalize-undef.mir 2023-06-05 14:41:40 +01:00