48420 Commits

Author SHA1 Message Date
Anna Thomas
b2195bc771 [SelectionDAG][AArch64] Legalize FMAXIMUM/FMINIMUM
The missing legalization in SelectionDAG was identified when adding the
intrinsic support for vector reduction for maximum/minimum (D152370).

Fixes part of PR: https://github.com/llvm/llvm-project/issues/63267

Differential Revision: https://reviews.llvm.org/D152718
2023-06-12 12:22:21 -04:00
Kazu Hirata
9eea63bc9c [AMDGPU] Fix resource-usage-pal.ll 2023-06-12 08:06:46 -07:00
Baptiste
3604fdf18d [AMDGPU] Do not assume stack size for PAL code object indirect calls
There is no need to set a big default stack size for PAL code object indirect
calls. The driver knows the max recursion depth, so it can compute a more
accurate value from the minimum scratch size.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D150609
2023-06-12 10:14:17 -04:00
Ivan Kosarev
d09fa8ff2c [AMDGPU][GFX11] Add test coverage for cases involving conversions from and to fp16 values.
Other such tests, of which there are many, are to be updated with
separate patches.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152557
2023-06-12 13:04:40 +01:00
Francesco Petrogalli
45902a25fa [MISched] Require asserts and AArch64 registered target for test.
Fixes failure at https://lab.llvm.org/buildbot/#/builders/124/builds/7472:

```
llc: Unknown command line argument '-debug-only=machine-scheduler'. Try: '/home/buildbot/as-worker-91/clang-with-lto-ubuntu/build/stage1/bin/llc --help'
```

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D152703
2023-06-12 13:51:19 +02:00
Simon Pilgrim
3d34f7be73 [GlobalIsel][X86] Rename x86_64-select-fcmp.mir to select-fcmp.mir and add 32-bit test coverage
x86_64 was being used as shorthand for SSE2
2023-06-12 12:41:27 +01:00
Simon Pilgrim
22c17c6a1f [GlobalIsel][X86] Move G_FCMP getActionDefinitionsBuilder out of setLegalizerInfo64bit and add 32-bit support
We were using x86_64-only support as a SSE2 proxy
2023-06-12 12:18:37 +01:00
Simon Pilgrim
6a12ab874a [GlobalIsel][X86] Regenerate legalize-fcmp.mir 2023-06-12 12:18:36 +01:00
Simon Pilgrim
1a576aa09d [GlobalIsel][X86] Rename x86_64-legalize-fcmp to legalize-fcmp
32-bit support will be added shortly - x86_64 was being used a shorthand for SSE2
2023-06-12 12:18:36 +01:00
Luke Lau
2a1716dec5 [LegalizeTypes][VP] Widen load/store of fixed length vectors to VP ops
If we have a load/store with an illegal fixed length vector result type that
needs widened, e.g. `x:v6i32 = load p`
Instead of just widening it to: `x:v8i32 = load p`
We can widen it to the equivalent VP operation and set the EVL to the
exact number of elements needed: `x:v8i32 = vp_load a, b, mask=true, evl=6`
Provided that the target supports vp_load/vp_store on the widened type.

Scalable vectors are already widened this way where possible, so this
largely reuses the same logic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D148713
2023-06-12 10:21:04 +01:00
Francesco Petrogalli
15a16ef8e0 [MISched] Use StartAtCycle in trace dumps.
This commit re-work the methods that dump traces with resource usage to take into account the StartAtCycle value added by https://reviews.llvm.org/D150310.

For each i, the values of the lists StartAtCycle and ReservedCycles is  are printed with the interval [StartAtCycle[i], ReservedCycles[i])

```
... | StartAtCycle[i] | ... | ReservedCycles[i] - 1 | ReservedCycles[i] | ...
    | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |                   |
```

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D150311
2023-06-12 09:11:48 +02:00
Fangrui Song
849f1dd15e [XRay] Rename XRayOmitFunctionIndex to XRayFunctionIndex
Apply my post-commit comment on D81995. The negative name misguided commit
d8a8e5d6240a1db809cd95106910358e69bbf299 (`[clang][cli] Remove marshalling from
Opt{In,Out}FFlag`) to:

* accidentally flip the option to not emit the xray_fn_idx section.
* change -fno-xray-function-index (instead of -fxray-function-index) to emit xray_fn_idx

This patch renames XRayOmitFunctionIndex and makes -fxray-function-index emit
xray_fn_idx, but the default remains -fno-xray-function-index .
2023-06-11 15:27:22 -07:00
Oleksii Lozovskyi
c72dea88b6 [AArch64][ARM][X86] Split XRay tests for Linux/macOS
XRay instrumentation works for macOS running on Apple Silicon, but
codegen is untested there. I'm going to make changes affecting this
target, get the XRay tests running on AArch64.

Data sections are going to become slightly different on x86_64 soon.
I do want the tests to be specific about symbol names, so instead of
having test check the common step, bifurcate tests a bit and check
the full symbol names.

As for ARM, XRay is not really supported on iOS at the moment, though
ARM is also really used there with modern phones. Nevertheless, codegen
tests exist and the output is going to change a little, make it easier
to write the special case for iOS.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D145291
2023-06-11 12:53:29 -07:00
Simon Pilgrim
26706807c9 [GlobalIsel][X86] Ensure bit count legalizer patterns keep matching result + input scalar types 2023-06-11 18:56:28 +01:00
Ben Shi
71d90f3108 [AVR] Optimize 8-bit rotation when rotation bits == 3
Fixes https://github.com/llvm/llvm-project/issues/63100

Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D152365
2023-06-11 08:41:47 +08:00
Ben Shi
e21df8296d [AVR] Optimize 8-bit rotation when rotation bits >= 4
Fixes https://github.com/llvm/llvm-project/issues/63100

Reviewed By: aykevl, Patryk27, jacquesguan

Differential Revision: https://reviews.llvm.org/D152130
2023-06-11 08:36:22 +08:00
Noah Goldstein
b6808ba291 [X86] Make constant mul -> shl + add/sub work for vector types
Something like:
    `%r = mul %x, <33, 33, 33, ...>`

Is best lowered as:
    `%tmp = %shl x, <5, 5, 5>; %r = add %tmp, %x`

As well, since vectors have non-destructive shifts, we can also do
cases where the multiply constant is `Pow2A +/- Pow2B` for arbitrary A
and B, unlike in the scalar case where the extra `mov` instructions
make it not worth it.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D150324
2023-06-10 14:38:46 -05:00
Matt Arsenault
6d2e5c3445 LowerMemIntrinsics: Skip memmove with different address spaces
This is a quick fix for an assert when the source and dest have
different address spaces. The pointer compare needs to have matching
types, but we can't generically introduce addrspacecast and we don't
know if the address spaces alias.
2023-06-10 12:28:05 -04:00
Ben Shi
f3837e726f [AVR] Fix incorrect expansion of pseudo instruction ROLBRd
Since ROLBRd needs an implicit R1 (on AVR) or an implicit R17 (on AVRTiny),
we split ROLBRd to ROLBRdR1 (on AVR) and ROLBRdR17 (on AVRTiny).

Reviewed By: aykevl, Patryk27

Differential Revision: https://reviews.llvm.org/D152248
2023-06-11 00:20:43 +08:00
Ben Shi
cef723a0fe [AVR] Enable sub register liveness
Reviewed By: Patryk27

Differential Revision: https://reviews.llvm.org/D152606
2023-06-11 00:16:35 +08:00
Ben Shi
3b8c12c18e [AVR][NFC] Improve CodeGen tests
Reviewed By: Patryk27

Differential Revision: https://reviews.llvm.org/D152605
2023-06-11 00:15:20 +08:00
Matt Arsenault
abff7668ab AMDGPU: Implement known bits functions for min3/max3/med3 2023-06-10 10:58:44 -04:00
Matt Arsenault
f24de950e5 AMDGPU: Add baseline tests for known bits handling of med3 2023-06-10 10:58:39 -04:00
Matt Arsenault
5b657f50b8 AMDGPU: Move LICM after AMDGPUCodeGenPrepare
The commit that added the run says it's to hoist uniform parts of
integer division expansion. That expansion is performed later, so this
didn't do anything in that case. Move this later so the original test
shows the improvement.

This also saves a run of "Canonicalize natural loops". Not sure why
this appears to be still getting a separate loop PM run. Also feels a
bit heavy to run this just for divide. Is there a way to specifically
hoist the divide sequence when it expands?
2023-06-10 07:37:32 -04:00
Thorsten Schütt
24f49deacf [GlobalIsel][X86] Legalize G_FREEZE
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152501
2023-06-10 07:33:02 +02:00
Matt Arsenault
4c0fc4841b AMDGPU: Mark scalar loads as rematerializable
This should be true, but this is useless as is. The rematerialization
logic only permits rematerialize with constant physical register uses,
so non-constant physregs or virtual register uses (the case that
really matters) are not rematerialized. Add the tests which shows
nothing happens, but should in the future.

Also, all loads should really be rematerializable so in the future
this should apply to all the other kinds.
2023-06-09 21:20:21 -04:00
Matt Arsenault
4e4c351ae5 AMDGPU: Avoid endpgm in middle of block for fallback trap lowering.
This was inserting an s_endpgm in the middle of the block when it has
to be a terminator. Split the block and insert a branch to a new block
with the trap if it's not in a terminator position.

Fixes verifier error on LDS in function with no trap support (and
other trap sources).
2023-06-09 21:04:38 -04:00
Matt Arsenault
3c848194f2 CodeGen: Expand memory intrinsics in PreISelIntrinsicLowering
Expand large or unknown size memory intrinsics into loops in the
default lowering pipeline if the target doesn't have the corresponding
libfunc. Previously AMDGPU had a custom pass which existed to call the
expansion utilities.

With a default no-libcall option, we can remove the libfunc checks in
LoopIdiomRecognize for these, which never made any sense. This also
provides a path to lifting the immarg restriction on
llvm.memcpy.inline.

There seems to be a bug where TLI reports functions as available if
you use -march and not -mtriple.
2023-06-09 21:04:37 -04:00
Matt Arsenault
4469aff148 AMDGPU: Add baseline tests for integer mad matching
Test some clpeak-like patterns with multiple use muls.
2023-06-09 19:17:56 -04:00
Amara Emerson
6f6298e5b3 [GlobalISel] Fix D144336 in a different way, by choosing operands from the first of the div/rem insts.
Differential Revision: https://reviews.llvm.org/D144336
2023-06-09 15:06:06 -07:00
Artem Belevich
8006c7e3f2 [NVPTX] Remove few more unneeded fp16 instruction variants
Differential Revision: https://reviews.llvm.org/D152478
2023-06-09 12:09:08 -07:00
Amara Emerson
1c2c668846 [GlobalISel] Introduce G_CONSTANT_FOLD_BARRIER and use it to prevent constant folding
hoisted constants.

The constant hoisting pass tries to hoist large constants into predecessors and also
generates remat instructions in terms of the hoisted constants. These aim to prevent
codegen from rematerializing expensive constants multiple times. So we can re-use
this optimization, we can preserve the no-op bitcasts that are used to anchor
constants to the predecessor blocks.

SelectionDAG achieves this by having the OpaqueConstant node, which is just a
normal constant with an opaque flag set. I've opted to avoid introducing a new
constant generic instruction here. Instead, we have a new G_CONSTANT_FOLD_BARRIER
operation that constitutes a folding barrier.

These are somewhat like the optimization hints, G_ASSERT_ZEXT in that they're
eliminated by the generic instruction selection code.

This change by itself has very minor improvements in -Os CTMark overall. What this
does allow is better optimizations when future combines are added that rely on having
expensive constants remain unfolded.

Differential Revision: https://reviews.llvm.org/D144336
2023-06-09 11:45:06 -07:00
Caleb Zulawski
18077e9fd6 [WebAssembly] Re-land 8392bf6000ad
Correctly handle single-element vectors to fix an assertion failure. Add tests
that were missing from the original commit.

Differential Revision: D151782
2023-06-09 08:42:27 -07:00
Simon Pilgrim
0662167c5b [GlobalIsel][X86] Update legalization of G_PTR_ADD
Replace the legacy legalizer versions

Add test coverage for 32-bit targets and non-constant ptr offsets
2023-06-09 13:27:25 +01:00
pvanhout
df1782c2a2 [MCP] Do not remove redundant copy for COPY from undef
I don't think we can safely remove the second COPY as redundant in such cases.
The first COPY (which has undef src) may be lowered to a KILL instruction instead, resulting in no COPY being emitted at all.

Testcase is X86 so it's in the same place as other testcases for this function, but this was initially spotted on AMDGPU with the following:
```
 renamable $vgpr24 = PRED_COPY undef renamable $vgpr25, implicit $exec
 renamable $vgpr24 = PRED_COPY killed renamable $vgpr25, implicit $exec
```
The second COPY waas removed as redundant, and the first one was lowered to a KILL (= removed too), causing $vgpr24 to not have $vgpr25's value.

Fixes SWDEV-401507

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152502
2023-06-09 14:23:57 +02:00
David Stuttard
90431ca2e0 Reland [AMDGPU] New PAL metadata updates to ps_extra_lds_size and float_mode
New metadata format contains full calculation of field contents for
ps_extra_lds_size (vs old format where the value in RSRC register is used by PAL
to calculate the value required).

Also stop updating float_mode and rely on front end settings for this field.

Differential Revision: https://reviews.llvm.org/D152247
2023-06-09 12:34:00 +01:00
Simon Pilgrim
b8f053f5d7 [GlobalIsel][X86] Add 32-bit test coverage to zero count tests
This shows a current problem with G_CTTZ_ZERO_UNDEF result legalizations
2023-06-09 10:36:32 +01:00
Simon Pilgrim
2717d98a1e [GlobalIsel][X86] legalize-select.mir - add x86-64 test coverage 2023-06-09 10:36:32 +01:00
pvanhout
ecbd37d5a3 [AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4
Split from D146023

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152432
2023-06-09 09:07:53 +02:00
Pravin Jagtap
f6c8a8e9cb [AMDGPU] Iterative scan implementation for atomic optimizer.
This patch provides an alternative implementation to DPP for Scan Computations.

An alternative implementation iterates over all active lanes of Wavefront
using llvm.cttz and performs the following steps:
    1.  Read the value that needs to be atomically incremented using
        llvm.amdgcn.readlane intrinsic
    2.  Accumulate the result.
    3.  Update the scan result using llvm.amdgcn.writelane intrinsic
        if intermediate scan results are needed later in the kernel.

Reviewed By: arsenm, cdevadas

Differential Revision: https://reviews.llvm.org/D147408
2023-06-09 01:08:44 -04:00
Amara Emerson
086601eac2 [GlobalISel] Implement some binary reassociations, G_ADD for now
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)

Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.

Differential Revision: https://reviews.llvm.org/D150230
2023-06-08 21:14:58 -07:00
Phoebe Wang
c778ca201e [X86][BF16] Split vNbf16 vectors according to vNf16
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D151778
2023-06-09 09:04:56 +08:00
Phoebe Wang
7634905a73 [X86][BF16] Share FP16 vector ABI with BF16
The ABI of BF16 is identical to FP16 rather than i16.

Fixes #62997

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D151710
2023-06-09 09:04:56 +08:00
Heejin Ahn
90073e8de3 [WebAssembly] Error out on invalid personality functions
Without explicitly checking and erroring out, an invalid personality
function, which is not `__gxx_wasm_personality_v0`, caused
a segmentation fault down the line because `WasmEHFuncInfo` was not
created. This explicitly checks the validity of personality functions in
functions with EH pads and errors out explicitly with a helpful error
message. This also adds some more assertions to ensure `WasmEHFuncInfo`
is correctly created and non-null.

Invalid personality functions wouldn't be generated by our Clang, but
can be present in handwritten ll files, and more often, in files
transformed by passes like `metarenamer`, which is often used with
`bugpoint` to simplify names in `bugpoint`-reduced files.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D152203
2023-06-08 16:59:49 -07:00
Thomas Lively
100c756d96 Revert "Improve WebAssembly vector bitmask, mask reduction, and extending"
This reverts commit 8392bf6000ad039bd0e55383d40a05ddf7b4af13.

The commit missed some edge cases that led to crashes. Reverting to resolve
downstream breakage while a fix is pending.
2023-06-08 14:36:29 -07:00
Matt Arsenault
c01f284fbb AMDGPU: Fix regressions in integer mad matching
Undo the canonicalize done in
0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c. Restores some regressed
matching of integer mad. The selection patterns fo the actual mads
don't seem to be properly commuting, so some of the commuted cases are
still missed.

Fixes: SWDEV-363009
2023-06-08 16:48:47 -04:00
Artem Belevich
c16b7e54ac [NVPTX] Allow using v4i32 for memcpy lowering.
Differential Revision: https://reviews.llvm.org/D152317
2023-06-08 13:28:43 -07:00
Krzysztof Parzyszek
c6ddd04d73 [RDF] Create individual phi for each indivisible register
This isn't quite using register units, but it's getting close. The phi
generation is driven by register units, but each phi still contains a
reference to a register, potentially with a mask that amounts to a unit.
In cases of explicit register aliasing this may still create phis with
references that are aliased, whereas separate phis would ideally contain
disjoint references (this is all within a single basic block).

Previously phis used maximal registers, now they use minimal. This is a
step towards both, using register units directly, and a simpler liveness
calculation algorithm. The idea is that a phi cannot reach a reference
to anything smaller than the phi itself represents. Before there could
be a phi for R1_R0, now there will be two for this case (assuming R0 and
R1 have one unit each).
2023-06-08 11:07:57 -07:00
Thorsten Schütt
0b771c679a [GlobalIsel][X86] Legalize G_SELECT
with bug fixes

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152445
2023-06-08 18:54:25 +02:00
Craig Topper
167f2fa1b6 [RISCV] Fix crash in lowerVECTOR_INTERLEAVE when VecVT is an LMUL=8 type.
If VecVT is an LMUL=8 VT, we can't concatenate the vectors as that
would create an illegal type. Instead we need to split the vectors
and emit two VECTOR_INTERLEAVE operations that can each be lowered.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D152411
2023-06-08 08:41:38 -07:00