52796 Commits

Author SHA1 Message Date
Thomas Symalla
256343a0e9
Revert "Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226" (#86273)
Reverts llvm/llvm-project#81394

This reverts commit 3ac243bc0d7922d083af2cf025247b5698556062.
It is not handling RSrc registers s0-s3 correctly. This leads to a
broken test, where it expects s0-s3 as function argument and uses it as
RSrc register as well.
We need to re-visit the patch, but apparently we only want to have s0-s3
as
argument registers if we don't need them as RSrc registers.
2024-03-26 11:01:08 +01:00
David Green
fbc247367a
[AArch64][GlobalISel] Legalization for small anyext/sext/zext (#86438)
Similar to #85625, some of the codegen is still far from optimal but
this helps fix quite a few fallback cases.
2024-03-26 09:48:06 +00:00
David Green
4d315ff382
[GlobalISel] Add CTLZ known bits. (#86436)
Replicated from SDAG.
2024-03-26 09:11:35 +00:00
Bevin Hansson
14c30189fb
[ExpandLargeFpConvert] Fix incorrect values in fp-to-int conversion. (#86514)
The IR for a double-to-i129 conversion looks like this in one of the
blocks in compiler-rt:

  %cmp5.i = icmp ult i16 %3, -129, !dbg !24

But in ExpandLargeFpConvert, it looks like:

  %13 = icmp ult i129 %12, 4294967167, !dbg !19

ExpandLargeFpConvert is wrong; the value should have been
signed before negating, but instead we get a very large
unsigned value. Another value in the same pass also has this
issue.
2024-03-26 10:08:22 +01:00
Philip Reames
a6b870db09
[RISCV] Enable sub(max, min) lowering for ABDS and ABDU (#86592)
We have the ISD nodes for representing signed and unsigned absolute
difference. For RISCV, we have vector min/max in the base vector
extension, so we can expand to the sub(max,min) lowering.

We could almost use the default expansion, but since fixed length
min/max are custom (not legal), the default expansion doesn't cover the
fixed vector cases. The expansion here is just a copy of the generic
code specialized to allow the custom min/max nodes to be created so they
can in turn be legalized to the _vl variants.

Existing DAG combines handle the recognition of absolute difference
idioms and conversion into the respective ISD::ABDS and ISD::ABDU nodes.

This change does have the net effect of potentially pushing a free
floating zero/sign extend after the expansion, and we don't do a great
job of folding that into later expressions. However, since in general
narrowing can reduce required work (by reducing LMUL) this seems like
the right general tradeoff.
2024-03-25 20:13:53 -07:00
YAMAMOTO Takashi
6420f37926
[WebAssembly] Implement an alternative translation for -wasm-enable-sjlj (#84137)
Instead of maintaining per-function-invocation malloc()'ed tables to track which functions each label belongs to, store the equivalent info in jump buffers (jmp_buf) themselves.

Also, use a less emscripten-looking ABI symbols:
```
    saveSetjmp     -> __wasm_setjmp
    testSetjmp      -> __wasm_setjmp_test
    getTempRet0    -> (removed)
    __wasm_longjmp  -> (no change)
```

While I want to use this for WASI, it should work for emscripten as well.

An example runtime and a few tests:
https://github.com/yamt/garbage/tree/wasm-sjlj-alt2/wasm/longjmp

wasi-libc version of the runtime:
https://github.com/WebAssembly/wasi-libc/pull/483

emscripten version of the runtime:
https://github.com/emscripten-core/emscripten/pull/21502

Discussion:
https://docs.google.com/document/d/1ZvTPT36K5jjiedF8MCXbEmYjULJjI723aOAks1IdLLg/edit
2024-03-25 18:11:56 -07:00
Changpeng Fang
350bda4419
AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.

This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
2024-03-25 16:55:22 -07:00
Farzon Lotfi
4cea2d049f
[HLSL][DXIL] implement sqrt intrinsic (#86560)
completes #86187
- fix hlsl_intrinsic to cover the correct cases
- move to using `__builtin_elementwise_sqrt`
- add lowering of `Intrinsic::sqrt` to dxilop 24.
2024-03-25 18:02:30 -04:00
Farzon Lotfi
060df78cdb
[DXIL] Add Float Dot Intrinsic Lowering (#86071)
Completes #83626
- `CGBuiltin.cpp` - modify `getDotProductIntrinsic` to be able to emit
`dot2`, `dot3`, and `dot4` intrinsics based on element count
- `IntrinsicsDirectX.td` - for floating point add `dot2`, `dot3`, and
`dot4` inntrinsics -`DXIL.td` add dxilop intrinsic lowering for `dot2`,
`dot3`, & `dot4`.
- `DXILOpLowering.cpp` - add vector arg flattening for dot product. 
- `DXILOpBuilder.h` - modify `createDXILOpCall` to take a smallVector
instead of an iterator
- `DXILOpBuilder.cpp` - modify `createDXILOpCall` by moving the small
vector up to the calling function in `DXILOpLowering.cpp`.
- Moving one function up gives us access to the `CallInst` and
`Function` which were needed to distinguish the dot product intrinsics
and get the operands without using the iterator.
2024-03-25 18:01:46 -04:00
Philip Reames
07ee9bd215 [RISCV] Add fixed vector coverage for sum-absolute-difference (sad) pattern
This builds on the previously added absolute difference cases, and adds
the reduction at the end.  This is mostly interesting for examining
impact of extend placement when changing the abdu lowering.
2024-03-25 13:27:09 -07:00
Philip Reames
4b941ff4b4 [RISCV] Add coverage for abdu and abds (absolute difference)
Test copied from aarch64 with minimal adaption.  We likely need addition
coverage, but this is a reasonable starting point.
2024-03-25 13:27:09 -07:00
Jeffrey Byrnes
b761137049
[AMDGPU] Use correct VGPR threshold for flagging ExcessRP regions in unified register file case (#85860)
`ST.getMaxNumVGPRs(MF)` lowers to `AMDGPUBaseInfo.cpp:getTotalNumVGPRs`
which returns 512 for gfx90a. This is subsequently limited by
`AMDGPUBaseInfo:getAddressableNumVGPRs()`, which also returns 512 for
gfx90a. The ISA states we can have a total of 512 registers, but a
maximum of only 256 of each of AGPR and VGPR (gfx90a 3.6.4).

Therefore, in unified register file case, `ST.getMaxNumVGPRs(MF)`
calculates the maximum number of combined VGPR + AGPR. But, it is
currently used as the limit for accvgpr and as the limit for archvgpr.

This patch uses it as the combined limit, and accounts for the maximum addressable arch/acc VGPRs when calculating the per RegClass limits.

It is not unreasonable to think other clients of getTotalNumVGPRs are
using it in the wrong way.
2024-03-25 13:11:58 -07:00
Michael Maitland
8b9c3b57b1 Revert "[RISCV][GISEL] Add instruction select tests for G_VSCALE"
This reverts commit c00a5ab8c4be14f63735ec61c5c9245c233cbcfc. It is not
consistent with SelectionDAG.
2024-03-25 11:50:57 -07:00
Michael Maitland
668687f8a8 Revert "[RISCV][GISEL] Add regbankselect tests for G_VSCALE"
This reverts commit a2476c99b745381380eab245fc9499a4ecf0b39e. It is not
consistent with SelectionDAG
2024-03-25 11:50:21 -07:00
Michael Maitland
9056ce8804 Revert "[RISCV][GISEL] Legalize G_VSCALE"
This reverts commit 47681506ded30fada68f180b5e80f740bc76abcd. It is not
consistent with SelectionDAG.
2024-03-25 11:46:02 -07:00
Craig Topper
ce37a7131f
[RISCV] Add integer RISCVISD::SELECT_CC to canCreateUndefOrPoison and isGuaranteedNotToBeUndefOrPoison. (#84693)
Integer RISCVISD::SELECT_CC doesn't create poison. If none of the,
operands are poison, the result is not poison.

This allows ISD::FREEZE to be hoisted above RISCVISD::SELECT_CC.
2024-03-25 11:10:58 -07:00
Michael Maitland
c00a5ab8c4 [RISCV][GISEL] Add instruction select tests for G_VSCALE 2024-03-25 10:44:59 -07:00
Michael Maitland
a2476c99b7 [RISCV][GISEL] Add regbankselect tests for G_VSCALE 2024-03-25 10:44:59 -07:00
Michael Maitland
47681506de [RISCV][GISEL] Legalize G_VSCALE
G_VSCALE should be lowered using VLENB.
2024-03-25 10:44:58 -07:00
Michael Maitland
05840c8714 [RISCV][GISEL] Instruction select for scalable G_SELECT
SelectionDAG has SELECT and VSELECT

SELECT restricts the condition operand to an i1 and the true and false operands
can be vectors. The result of a SELECT has the same type as the true and
false operands.

VSELECT has a vector condition operand and the true and false operands
must be vectors. The result of a VSELECT has a vector result.

GlobalISel has G_SELECT which has condition operand that is an i1 if the
true and false operands are scalar and a vector type with i1 elements if
the true and false operands are vector.

A G_SELECT acts like a ISD::SELECT when the operands are all scalar, and
an ISD::VSELECT when the operands are are scalar. A G_SELECT cannot act
like a ISD::SELECT with an i1 condition and vector operands because the
type system.

In this patch, we would like to take advantage of the patterns written
for SELECT and VSELECT, so we mark G_SELECT equivalent to both SELECT
and VSELECT to reuse the patterns. Since we cannot write a `G_SELECT (s1),
(vector-ty), (vector-ty)`, we don't have to worry about accidently
matching the SDAG patterns of that nature.

We will probably need a way to represent an i1 condition with vector
true and false operands in the future. That can be the topic of another
patch.
2024-03-25 10:35:22 -07:00
Michael Maitland
973e9dbd57 [RISCV][GISEL] Regbank select for scalable G_SELECT 2024-03-25 10:35:21 -07:00
Michael Maitland
8c51ac9ddb [RISCV][GISEL] Legalize G_SELECT for scalable vectors 2024-03-25 10:35:21 -07:00
Michael Maitland
ea798a7900 [RISCV][GISEL] Legalize and regbankselect vector typed G_IMPLICIT_DEF 2024-03-25 10:19:14 -07:00
David Green
96819daa3d
[AArch64] Handle v2i16 and v2i8 in concat load combine. (#86264)
This extends the concat load patch from
https://reviews.llvm.org/D121400, which was later moved to a combine, to
handle v2i8 and v2i16 concat loads too.
2024-03-25 17:10:23 +00:00
Evgenii Kudriashov
fb394562a3
[X86][GlobalISel] Fix referencing nonexistent operand in G_ICMP (#86221)
Fixes #86203
2024-03-25 16:46:12 +01:00
David Stuttard
06cfbe3cfd
[AMDPU] Add support for idxen and bothen buffer load/store merging in SILoadStoreOptimizer (#86285)
Added more buffer instruction merging support
2024-03-25 14:44:22 +00:00
Michael Maitland
865294b2e6
[CodeGen][MISched] Add misched post-regalloc bidirectional scheduling (#77138)
This PR is stacked on #76186.

This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.

It is up to targets to decide whether they would like to enable this
option for themselves.
2024-03-25 10:10:35 -04:00
AtariDreams
f5a067bb90
[SelectionDAG]: Deduce KnownNeverZero from SMIN and SMAX (#85722) 2024-03-25 10:35:28 +00:00
Nathan Gauër
f0eb908340
[SPIR-V] Add WaveGetLaneIndex() intrinsic support (#85979)
Add support to generate valid SPIR-V for the WaveGetLaneIndex() HLSL
builtin.

To implement this, I had to fix a few small issues in the backend, like
the i8* pointer type being emitted, even if we have the type information
elsewhere.

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-03-25 11:30:47 +01:00
Vyacheslav Levytskyy
b0d03ccc08
[SPIR-V] Fix illegal OpConstantComposite instruction with non-const constituents in SPIR-V Backend (#86352)
This PR fixes illegal use of OpConstantComposite with non-constant
constituents. The test attached to the PR is able now to satisfy
`spirv-val` check. Before the fix SPIR-V Backend produced for the
attached test case a pattern like
```
%a = OpVariable %_ptr_CrossWorkgroup_uint CrossWorkgroup %uint_123
%11 = OpConstantComposite %_struct_6 %a %a
```
so that `spirv-val` complained with
```
error: line 25: OpConstantComposite Constituent <id> '10[%a]' is not a constant or undef.
  %11 = OpConstantComposite %_struct_6 %a %a
```
2024-03-25 10:14:46 +01:00
Vyacheslav Levytskyy
1d250d9099
[SPIR-V] Improve type inference in SPIR-V Backend for opaque pointers (#86283)
This PR improves type inference in SPIR-V Backend for opaque pointers,
accounting or a case when there is a chain of function calls that allows
to deduce formal parameter types from actual arguments. The attached
test demonstrates the case.
2024-03-25 10:14:08 +01:00
Vyacheslav Levytskyy
99c40f6ba6
[SPIR-V] Introduce a command line option to support compatibility with Khronos SPIRV Translator (#86101)
SPIRV-LLVM-Translator project
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator) from Khronos
Group is a tool and a library for bi-directional translation between
SPIR-V and LLVM IR. In its backward translation from SPIR-V to LLVM IR
SPIRV-LLVM-Translator isn't necessarily able to cover the same SPIR-V
patterns/instructions set that SPIRV Backend produces, even if we target
the same SPIR-V version in both SPIRV-LLVM-Translator and SPIRV Backend
projects.

To improve interoperability and ability to apply SPIRV Backend output in
different products this PR introduces a notion of a mode of SPIR-V
output that is compatible with a subset of SPIR-V supported by
SPIRV-LLVM-Translator. This includes a new command line option that
doesn't influence default behavior of SPIRV Backend and one test case
that demonstrates how this command line option may be used to get a
practical benefit of producing that one of two possible and similar
output options that can be understood by SPIRV-LLVM-Translator.
2024-03-25 10:13:42 +01:00
David Stuttard
75e528fdd9
[AMDGPU] Extend zero initialization of return values for TFE (#85759)
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
2024-03-25 09:01:46 +00:00
Pierre van Houtryve
babbdad15b
[AMDGPU] Handle non-register operands for S_SUB/ADD_U64_PSEUDO (#86104)
This pseudo uses SSrc_b64 so it allows both an immediate or a register,
but the lowering crashed on immediate operands.
2024-03-25 09:23:40 +01:00
Luke Lau
373e77b4c0
[RISCV] Generalize (sub zext, zext) -> (sext (sub zext, zext)) to add (#86248)
This generalizes the combine added in #82455 to other binary ops,
beginning with adds in this patch.

Because the two zext operands are always +ve when treated as signed, and
we don't get any overflow since the add is carried out in at least N * 2
bits of the narrow type, the result of the add will always be +ve. So we
can use a zext for the outer extend, unlike sub which may produce a -ve
result from two +ve operands.

Although we could still use sext for add, I plan to add support for
other binary ops like mul in a later patch, but mul requires zext to be
correct (because the maximum value will take up the full N * 2 bits). So
I've opted to use zext here too for consistency.

Alive2 proof: https://alive2.llvm.org/ce/z/PRNsUM
2024-03-25 13:08:56 +08:00
Wang Pengcheng
6af6416e89
[RISCV] Add a tune feature to disable stripping W suffix (#86255)
We have a hidden option to disable it, but I'd like to make it a
tune feature.

For some implementations, instructions with W suffix would be less
costly as they only perform on 32 bits data. Though we may lose some
chances to compress.
2024-03-25 11:44:16 +08:00
Phoebe Wang
2e4e04c590
[X86][BF16] Do not lower to VCVTNEPS2BF16 without AVX512VL (#86395)
Fixes: #86305
2024-03-25 10:06:12 +08:00
houndlord
9632e1515c
Match fixed width ISD::AVGFLOORS + ISD::AVGCEILS patterns (#86222) 2024-03-24 15:33:16 +00:00
David Green
e8d5223ce4 [AArch64] Additional GISel test coverage. NFC 2024-03-24 12:32:47 +00:00
Simon Pilgrim
6c6fe4b2ae [X86] known-never-zero.ll - add 32-bit test coverage
Enabled vector coverage as well: i686+SSE2 and x64_64+AVX

Should improve test quality for #85722
2024-03-24 11:33:51 +00:00
yingopq
5d7fd6a04a
[Mips] Restore wrong deletion of instruction 'and' in unsigned min/max processing. (#85902)
Fix #61881
2024-03-24 02:35:42 -04:00
Owen Anderson
7c9b5228da
Only check assertions that were meant to apply to the normal case of non-splat vector SREM expansion when we aren't hitting the special case. (#86238)
Fixes https://github.com/llvm/llvm-project/issues/84830
Introduced in https://github.com/llvm/llvm-project/pull/82706
2024-03-23 21:49:29 -05:00
Felix (Ting Wang)
90a7fc366a
[PowerPC][NFC] Add base test case for small-local-dynamic-tls on AIX (#84711) 2024-03-24 08:46:45 +08:00
Harvin Iriawan
57146daeaa
[CodeGen] Update for scalable MemoryType in MMO (#70452)
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
2024-03-23 12:56:25 +00:00
Evgenii Kudriashov
d365a45cb3
[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941)
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
2024-03-23 13:12:44 +01:00
XChy
d7c672834e [CodeGen][NFC] Update tests in AArch64/and-sink.ll 2024-03-23 19:01:33 +08:00
paperchalice
ef57977f2a
[NewPM][Hexagon] Add HexagonPassRegistry.def (#86244)
Prepare for dag-isel, also migrate some test case
2024-03-23 15:02:27 +08:00
Florian Mayer
215f105ca5
[MTE] Fix test (#85875)
llc runs the stack tagging instrumentation, so if we run opt before, we
double instrument
2024-03-22 14:14:43 -07:00
Ulrich Weigand
4b907414d2 [SystemZ] Add support for llvm.readcyclecounter
The llvm.readcyclecounter intrinsic can be implemented via the
STORE CLOCK FAST (STCKF) instruction.
2024-03-22 20:01:02 +01:00
David Green
f82d0187a7 [AArch64] Add a test to show incorrect latencies into Bundle instructions. NFC 2024-03-22 14:00:21 +00:00