48337 Commits

Author SHA1 Message Date
Noah Goldstein
809b1d834d [KnownBits] Return 0 for poison {s,u}div inputs
It seems consistent to always return zero for known poison rather than
varying the value. We do the same elsewhere.

Differential Revision: https://reviews.llvm.org/D150922
2023-06-06 15:14:10 -05:00
David Green
2a8df8d0b9 [AArch64][SVE] Add one-use-check to EitherVSelectOrPassthruPatFrags
As pointed out in D149968 vselect predicate patterns could do with a one-use
check to prevent multiple operations being created. This updates the
EitherVSelectOrPassthruPatFrags pattern frags used in creating predicates
min/max.

Differential Revision: https://reviews.llvm.org/D151080
2023-06-06 21:10:32 +01:00
Craig Topper
58b2d652af [RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C).
Where C is a simm32.

This costs an extra temporary register, but avoids a constant pool.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D152236
2023-06-06 11:59:12 -07:00
Simon Pilgrim
9a81b69757 [AArch64] Regenerate tests with missing immediate hex asm comments
Reduces diff in a future commit
2023-06-06 19:44:28 +01:00
Simon Pilgrim
a279a09ab9 Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:44:24 +01:00
Simon Pilgrim
78de45fd4a Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
Reverting while we address an existing issue exposed by this (Issue #63108)
2023-06-06 18:07:33 +01:00
Jay Foad
a4a3ac10cb [AMDGPU] Remove extract_subvector patterns
Removing them seems to slightly increase code quality as well as
simplifying both the tablegen and C++ parts of the code.

Differential Revision: https://reviews.llvm.org/D149853
2023-06-06 14:04:50 +01:00
Ricardo Jesus
3a87c15026 [AArch64][NFC] Normalise name of indexed forms of SQRDMLAH/SQRDMLSH
Most indexed vector instructions are suffixed with v<N><TY>_indexed.

SQRDMLAH/SQRDMLSH are the exception, being suffixed with <TY>_indexed
instead, which can complicate matching them slightly.

Differential Revision: https://reviews.llvm.org/D152161
2023-06-06 13:02:36 +00:00
Simon Pilgrim
85b77b13e3 [GlobalISel][X86] Add G_IMPLICIT_DEF / G_CONSTANT legalization handling 2023-06-06 11:45:22 +01:00
Thorsten Schütt
60b8019ea0 [GlobalIsel][X86] Legalize G_ANYEXT, G_SEXT, and G_ZEXT
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152243
2023-06-06 12:22:09 +02:00
wangpc
26e41a80d0 [RISCV] Handle "o" inline asm memory constraint
This is the same as D100412.

We just found the same crash when we tried to compile some packages
like mariadb, php, etc.

For constraint "o", it means "A memory operand is allowed, but
only if the address is offsettable". So I think it can be handled
just like constraint "m" for RISCV target.

And we print verbose information when unsupported constraints occur.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D151979
2023-06-06 17:50:40 +08:00
Carl Ritson
6afc4b0629 [AMDGPU] WQM: Ensure exact mode placement before branches
Fix for D151797 where the change accidentally allowed exit to
exact mode between branch instructions.

Reviewed By: dstuttard

Differential Revision: https://reviews.llvm.org/D152228
2023-06-06 18:11:35 +09:00
Serge Pavlov
10e7899818 [FPEnv] Get rid of extra moves in fpenv calls
If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP
environment is represented as a region in memory, extra moves can
appear. For example the code:

  define void @func_01(ptr %ptr) {
    %env = call i256 @llvm.get.fpenv.i256()
    store i256 %env, ptr %ptr
    ret void
  }

produces DAG:

  ch = get_fpenv_mem ch, memory_region
  val: i256, ch = load ch, memory_region
  ch = store ch, ptr, val

In this case the extra moves can be avoided if `get_fpenv_mem` got
pointer to the memory where the FP environment should be finally placed.

This change implement such optimization for this use case.

Differential Revision: https://reviews.llvm.org/D150437
2023-06-06 14:54:52 +07:00
Carl Ritson
7275637505 [AMDGPU] Pre-commit test for D152228 (NFC) 2023-06-06 16:00:20 +09:00
Luo, Yuanke
787f3008be [X86] Pre-commit test case for D152227. 2023-06-06 14:56:45 +08:00
Luo, Yuanke
60b7dbb670 [X86] Add test cases for D152227. 2023-06-06 14:24:46 +08:00
Paulo Matos
9571a28ee4 [WebAssembly] Add tests ensuring rotates persist
Due to the nature of WebAssembly, it's always better to keep
rotates instead of trying to optimize it. Commit 9485d983
disabled the generation of fsh for rotates, however these
tests ensure that future changes don't change the behaviour for
the Wasm backend that tends to have different optimization
requirements than other architectures. Also see:
https://github.com/llvm/llvm-project/issues/62703

Differential Revision: https://reviews.llvm.org/D152126
2023-06-06 07:48:35 +02:00
Ben Shi
b1f0cb89c1 [AVR][NFC][test] Supplement more tests of 8-bit rotation
Reviewed By: Patryk27, jacquesguan

Differential Revision: https://reviews.llvm.org/D152129
2023-06-06 11:24:18 +08:00
Jianjian GUAN
77da27b5e3 [RISCV] Improve selection for vector fpclass.
Since vfclass intruction will only set one single bit in the result, so if we only want to check 1 fp class, we could use vmseq to do it.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151967
2023-06-06 10:24:24 +08:00
Matt Arsenault
ecf30c31fb AMDGPU: Fix broken test 2023-06-05 20:44:59 -04:00
NAKAMURA Takumi
d3777f20c5 test/AMDGPU: REQUIRES asserts (D148184) 2023-06-06 08:55:46 +09:00
Matt Arsenault
30bd96fa17 AMDGPU: Add baseline test for undoing mul add 1 reassociation
Add some tests for combines to undo regressions caused by
0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c.
2023-06-05 18:44:17 -04:00
Matt Arsenault
b25c001ad3 AMDGPU: Fold zext into result of v_mad_u16 on high zeroing targets
Avoids regressions in future patch.
2023-06-05 18:41:07 -04:00
Matt Arsenault
db08f9a2d5 AMDGPU: Add baseline 16-bit mad matching tests 2023-06-05 18:41:07 -04:00
Matt Arsenault
cb4b7340b0 AMDGPU: Convert test to generated checks 2023-06-05 18:41:06 -04:00
Craig Topper
b64ddae8a2 [RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases.
This patch lowers to vsetvli when the AVL is i32 or XLenVT and
the VF is a power of 2 in the range [1, 64]. VLEN=32 is not supported
as we don't have a valid type mapping for that. VF=1 is not supported
with Zve32* only.

The element width is used to set the SEW for the vsetvli if possible.
Otherwise we use SEW=8.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D150824
2023-06-05 15:02:11 -07:00
Craig Topper
4157bfb230 [RISCV] Add RISCVISD nodes for vfwadd/vfwsub.
Add a DAG combine to form these from FADD_VL/FSUB_VL and FP_EXTEND_VL.

This makes it similar to other widening ops and allows us to handle
using the same FP_EXTEND_VL for both operands.

Differential Revision: https://reviews.llvm.org/D151969
2023-06-05 14:12:47 -07:00
Artem Belevich
73464e377b [NVPTX] fixed vector-compare test.
Apparently this test didn't actually test anything other that the IR compiles.
2023-06-05 12:49:12 -07:00
Artem Belevich
dc90f42ea7 Coalesce 16-bit FP types to use integer register classes.
i16/f16/bf16 will use the same .b16 registers and
i32/v2f16 and v2bf16 will share .b32 registers.

The changes are mostly mechanical, intended to remove unnecessary register
classes which tend to produce redundant register moves.

Differential Revision: https://reviews.llvm.org/D151601

v2f16 regtype conversion to i32
2023-06-05 12:21:52 -07:00
Krzysztof Drewniak
23098bd454 [AMDGPU] Add intrinsic for converting global pointers to resources
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148957
2023-06-05 17:07:59 +00:00
Krzysztof Drewniak
ab37937812 [AMDGPU] Use resource base for buffer instruction MachineMemOperands
1. Remove the existing code that would encode the constant offsets (if
there were any) on buffer intrinsic operations onto their
`MachineMemOperand`s. As far as I can tell, this use of `offset` has
no substantial impact on the generated code, especially since the same
reasoning is performed by areMemAccessesTriviallyDisjoint().

2. When a buffer resource intrinsic takes a pointer argument as the
base resource/descriptor, place that memory argument in the value
field of the MachineMemOperand attached to that intrinsic.

This is more conservative than what would be produced by more typical
LLVM code using GEP, as the Value (for alias analysis purposes)
corresponding to accessing buffer[0] and buffer[1] is the same.
However, the target-specific analysis of disjoint offsets covers a lot
of the simple usecases.

Despite this limitation, the new buffer intrinsics, combined with
LLVM's existing pointer annotations, allow for non-trivial
optimizations, as seen in the new tests, where marking two buffer
descriptors "noalias" allows merging together loads and stores in a
"load from A, modify loaded value, store to B" sequence, which would
not be possible previously.

Depends on D147547

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148184
2023-06-05 17:06:57 +00:00
Krzysztof Drewniak
faa2c678aa [AMDGPU] Add buffer intrinsics that take resources as pointers
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.

The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D147547
2023-06-05 16:59:07 +00:00
JP Lehr
c9998ec145 Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order."
This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45.

This patch lead to build time outs on the AMDGPU OpenMP runtime
buildbot.
2023-06-05 10:55:58 -04:00
Simon Pilgrim
c2926c6c4d [GlobalISel][X86] Regenerate legalize-undef.mir 2023-06-05 14:41:40 +01:00
Simon Pilgrim
ca0caa23ce [X86] Replace X32 test check prefix with X86 + add common CHECK prefix
We try to only use X32 for gnux32 triple test cases
2023-06-05 14:41:40 +01:00
Simon Pilgrim
fcacc41a22 [X86] Replace X32 test check prefix with X86
We try to only use X32 for gnux32 triple test cases
2023-06-05 14:41:40 +01:00
Amaury Séchet
e69fa03ddd [DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D127115
2023-06-05 11:09:18 +00:00
Simon Pilgrim
b28bc5f5ad [GlobalISel][X86] Add 128/256/512-bit vector and/or/xor test coverage
Based off the legalize-add-v*.mir tests
2023-06-05 12:08:22 +01:00
Simon Pilgrim
dbd3695092 [GlobalISel][X86] Add illegal types and 32-bit target scalar and/or/xor test coverage
Based off the legalize-add.mir tests
2023-06-05 12:08:22 +01:00
Jay Foad
9912bcc8ec [AMDGPU] Regenerate some GlobalISel checks 2023-06-05 11:21:31 +01:00
Simon Pilgrim
d37bd544ff [X86] canonicalizeShuffleWithBinOps - ensure a binary shuffle of binops have the same value type
Fixes #63091
2023-06-05 11:18:28 +01:00
Simon Pilgrim
d75efc1d51 [X86] Add test case for Issue #63091 2023-06-05 11:18:27 +01:00
Simon Pilgrim
346ee549e5 [GlobalISel][X86] Add G_CTTZ_ZERO_UNDEF/G_CTTZ legalization handling
G_CTTZ_ZERO_UNDEF is always legal using the BSF instruction, G_CTTZ requires the BMI1 TZCNT instruction
2023-06-05 11:18:27 +01:00
David Green
2b4807ba04 [AArch64][SVE] Predicated mla/mls patterns
To go with D149267 and D149967, this adds predicated mla/mls patterns, selected
from select(mask, add(a, mul(b, c)), a) -> mla(a, mask, b, c). The existing
patterns are eventually removed by D149967.

Differential Revision: https://reviews.llvm.org/D149969
2023-06-05 10:08:57 +01:00
Qiu Chaofan
9e17e08324 [PowerPC] Combine fptoint-store under strict cases
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141249
2023-06-05 16:24:02 +08:00
esmeyi
6f57d8df2d Revert "[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode."
This reverts commit 4054c68644dfebbb584bca698a25d18d1d312bae.

Due to AIX system linker requires DWARF64 for XCOFF64.
2023-06-05 02:50:47 -04:00
Serge Pavlov
eecaeb6f10 [FPEnv] Intrinsics for access to FP environment
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.

The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.

Differential Revision: https://reviews.llvm.org/D71742
2023-06-05 13:10:01 +07:00
Qiu Chaofan
69bc8ff766 Reland "[PowerPC] Simplify fp-to-int store optimization"
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.

This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
2023-06-05 13:53:08 +08:00
Ben Shi
53a7c254e4 [AVR][NFC][test] Suppement a test of the pseudo instruction RORBRd
Reviewed By: aykevl, Patryk27

Differential Revision: https://reviews.llvm.org/D152087
2023-06-04 23:19:21 +08:00
Simon Pilgrim
9424a54201 [GlobalIsel][X86] Update legalization of G_AND/G_OR/G_XOR
Replace the legacy G_AND/G_OR/G_XOR legalizer, this handles all scalar promotion and vector clamping (allows AVX1 to handle 256-bit logic ops).
2023-06-04 11:44:27 +01:00