52796 Commits

Author SHA1 Message Date
Fangrui Song
eabaee0c59
[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467)
R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and
R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
`call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not
useful and can be removed now (matching AArch64 and PowerPC).

GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09
(70f35d72ef04cd23771875c1661c9975044a749c).

Without this patch, unconditionally changing MO_CALL to MO_PLT could
create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler
and GNU assembler.
2024-01-07 12:09:44 -08:00
David Green
780a5116ba
[AArch64] Fix condition for combining UADDV and Add. (#76809)
This should have been checking that the transform was valid, but used
incorrect conditions letting through invalid combinations of lo/hi
extracts.

Hopefully fixes #76769
2024-01-07 08:23:17 +00:00
Luke Lau
274f8332b9
[RISCV] Don't attempt PRE if available info is SEW/LMUL ratio only (#77063) 2024-01-07 14:23:01 +07:00
Thorsten Schütt
a085402ef5 Revert "[GlobalIsel] Combine select of binops (#76763)"
This reverts commit 1687555572ee4fb435da400dde02e7a1e60b742c.
2024-01-06 17:04:24 +01:00
Thorsten Schütt
1687555572
[GlobalIsel] Combine select of binops (#76763) 2024-01-06 11:28:10 +01:00
hev
16094cb629
[llvm][LoongArch] Support per-global code model attribute for LoongArch (#72079)
This patch gets the code model from global variable attribute if it has,
otherwise the target's will be used.

---------

Signed-off-by: WANG Rui <wangrui@loongson.cn>
2024-01-06 13:36:09 +08:00
Shengchen Kan
a5902a4d24 [X86][NFC] Rename variables/passes for EVEX compression optimization
RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031

APX introduces EGPR, NDD and NF instructions. In addition to compressing
EVEX encoded AVX512 instructions into VEX encoding, we also have several
more possible optimizations.

a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space)
b. NDD (EVEX space) -> non-NDD (legacy space)
c. NF_ND (EVEX space) -> NF (EVEX space)

The first two types of compression can usually reduce code size, while
the third type of compression can help hardware decode although the
instruction length remains unchanged.

So we do the renaming for the upcoming APX optimizations.

BTW, I clang-format the code in X86CompressEVEX.cpp,
X86CompressEVEXTablesEmitter.cpp.

This patch also extracts the NFC in #77065 into a separate commit.
2024-01-06 12:41:09 +08:00
Mikhail Gudim
ba3ef331b4
[RISCV][GlobalISel] Zbkb support for G_BSWAP (#77050)
This instructions is legal in the presence of Zbkb extension.
2024-01-05 23:19:46 -05:00
Daniel Hoekwater
def42537ee [NFC][CodeGen][AArch64] Add tests for unconditional branch duplication
c9f3288 introduced unconditional branch deduplication for basic block
sections and machine function splitting, but it didn't add tests for
AArch64 since prior behavior crashed the test.

This change adds tests for AArch64 and has no functional change.
2024-01-05 23:39:01 +00:00
David Green
365fbbfbcf [AArch64] Add some extra tests for SelectOpt. NFC 2024-01-05 21:04:01 +00:00
Simon Pilgrim
070ac1dcd5 [SystemZ] vec-perm-14.ll - partially regenerate checks so we can see all the vperm codegen
We can't use the script as we need to keep the shuffle mask constant pool checks, but do more than just check that a second vperm isn't generated
2024-01-05 18:00:08 +00:00
Craig Topper
4dd5d96797
[RISCV] Don't call use_nodbg_operands for physical registers in RISCVOptWInstrs hasAllNBitUsers. (#77032)
The ADDIW in the new test case was incorrectly removed due to
incorrectly following the x10 register from the return value back to the
argument. This is due to use_nodbg_operands returning every instruction
that uses a physical register regardless of the data flow.
2024-01-05 09:22:54 -08:00
Mircea Trofin
c49965b97e [mlgo] Fix post PR #76919
Relaxed a bit the opcode checks to make the test less sensitive to
changes resulting in opcode numbering.
2024-01-05 09:10:03 -08:00
Orlando Cazalet-Hyams
10b03e6662
[RemoveDIs] Handle DPValues in FastISel (#76952)
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.

The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).

Depends on #76941 (otherwise some modified tests spuriously fail).
2024-01-05 15:11:47 +00:00
Simon Pilgrim
ae81400a0f [X86] keylocker-intrinsics.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-05 13:38:44 +00:00
Simon Pilgrim
b51130a331 [X86] combine-fneg.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-05 13:38:44 +00:00
Simon Pilgrim
4ecd6384af [X86] fp128-cast.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-05 13:38:44 +00:00
Simon Pilgrim
c307147660 [X86] vec_fptrunc.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-05 13:38:43 +00:00
Simon Pilgrim
1dbdf7658a [X86] vec_fpext.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-05 13:38:43 +00:00
Florian Hahn
da148a0805
[AArch64] Add tests showing unnecessary cast promotion. 2024-01-05 13:32:00 +00:00
David Green
d187dfe515 [AArch64] Add some tests for addLikeOr with csinc. NFC 2024-01-05 12:39:32 +00:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Jay Foad
e96e7a9a86
[AMDGPU] Implement readcyclecounter for GFX12 (#76965) 2024-01-05 08:20:52 +00:00
David Green
77b124cc57
[AArch64][GlobalISel] Add legalization for G_VECREDUCE_SEQ_FADD. (#76238)
And G_VECREDUCE_SEQ_FMUL at the same time. They require the elements of
the vector operand to be accumulated in order, so just need to be
scalarized.

Some of the operands are not simplified as much as they can quite yet
due to not canonicalizing constant operands post-legalization.
2024-01-05 08:11:44 +00:00
XinWang10
f5f66e26b5
[X86]Support lowering for APX Promoted SHA/MOVDIR/CRC32/INVPCID/CET instructions (#76786)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the lowering for promoted
SHA/MOVDIR/CRC32/INVPCID/CET.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-05 15:56:15 +08:00
Phoebe Wang
59af659ee3
[X86][BF16] Try to use f16 for lowering (#76901)
This patch fixes BF16 32-bit ABI problem:
https://godbolt.org/z/6dMnh8jGG
2024-01-05 15:25:18 +08:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
Benoit Jacob
054b5fc0fd
X86: add some missing lowerings for shuffles on bf16 element type. (#76076)
Some shuffles with `bf16` as element type were running into a
`llvm_unreachable`. Key to reproducing was to chain two shuffles.

```llvm
define <2 x bfloat> @shuffle_chained_v32bf16_v2bf16(<32 x bfloat> %a) {
  %s = shufflevector <32 x bfloat> %a, <32 x bfloat> zeroinitializer, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
  %s2 = shufflevector <32 x bfloat> %s, <32 x bfloat> zeroinitializer, <2 x i32> <i32 0, i32 1>
  ret <2 x bfloat> %s2
}
```

This was hitting this UNREACHABLE:

```
Not a valid 512-bit x86 vector type!
UNREACHABLE executed at /home/benoit/iree/third_party/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:17124!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/benoit/mlir-build/bin/llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw,+avx512bf16
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@shuffle_chained_v32bf16_v2bf16'
```
2024-01-04 22:39:40 -05:00
Michal Paszkowski
b4cfb50c65
[SPIR-V] Emit SPIR-V bitcasts between source/expected pointer type (#69621)
This patch introduces a new spv_ptrcast intrinsic for tracking expected
pointer types. The change fixes multiple OpenCL CTS regressions due the
switch to opaque pointers (e.g. basic/hiloeo).
2024-01-04 19:31:15 -08:00
Dávid Ferenc Szabó
f68647997b
[GlobalISel] Adding support for handling G_ASSERT_{SEXT,ZEXT,ALIGN} i… (#74196)
…n artifact combiner

These instructions are hint for optimizations and can be treated as
copies and are handled as such with this change. Without it is possible
to run into an assertion, since tryCombineUnmergeValues rightfully use
getDefIgnoringCopies to get the source MI, which already handle these
hint instructions and treat them as copies. The problem is that
markDefDead only considers COPYs, which will lead to crash with
assertion for cases like testUnmergeHintOfTrunc.
2024-01-05 10:13:39 +07:00
wanglei
c56a5e895a [LoongArch] Reimplement the expansion of PseudoLA*_LARGE instructions (#76555)
According to the description of the psABI v2.30:
https://github.com/loongson/la-abi-specs/releases/tag/v2.30, moved the
expansion of relevant pseudo-instructions from
`LoongArchPreRAExpandPseudo` pass to `LoongArchExpandPseudo` pass, to
ensure that the code sequences of `PseudoLA*_LARGE` instructions and
Medium code model's function call are not scheduled.
2024-01-05 10:57:53 +08:00
wanglei
3d6fc35b90 [LoongArch] Pre-commit test for #76555. NFC 2024-01-05 10:57:40 +08:00
wanglei
2cf420d5b8 [LoongArch] Emit function call code sequence as PCADDU18I+JIRL in medium code model
According to the description of the psABI v2.20:
https://github.com/loongson/la-abi-specs/releases/tag/v2.20, adjustments
are made to the function call instructions under the medium code model.

At the same time, AsmParser has already supported parsing the call36 and
tail36 macro instructions.
2024-01-05 10:56:47 +08:00
Matt Arsenault
597086c609
DAG: Implement promotion for strict_fp_round (#74332)
Needs an AMDGPU hack to get the selection to work. The ordinary
variant is custom lowered through an almost equivalent target node
that would need a strict variant for additional known bits
optimizations.
2024-01-05 08:44:19 +07:00
Matt Arsenault
47685633a7
AMDGPU: Make v4bf16 a legal type (#76217)
Gets a few code quality improvements. A few cases are worse
from losing load narrowing.
Depends #76213 #76214 #76215
2024-01-05 08:35:07 +07:00
Simon Pilgrim
2cbf652615 [X86] avx512-pmovxrm.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-04 17:17:08 +00:00
Simon Pilgrim
63e3074781 [X86] aligned-variadic.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-04 17:17:07 +00:00
Simon Pilgrim
ce4459d590 [X86] 64-bit-shift-by-32-minus-y.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-04 17:17:07 +00:00
Simon Pilgrim
076dbc0272 [X86] SimplifyDemandedVectorEltsForTargetNode - add X86ISD::VZEXT_LOAD handling.
Simplify to a scalar_to_vector(load()) if we don't demand any of the upper vector elements.
2024-01-04 17:17:07 +00:00
Simon Pilgrim
5cd3cf1072 [X86] cvtv2f32.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-04 17:17:06 +00:00
Matt Arsenault
460ffcddd9
AMDGPU: Make bf16/v2bf16 legal types (#76215)
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.

Depends #76213 #76214
2024-01-04 22:31:18 +07:00
Chaitanya
9803de0e8e
[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.

hidden argument will be added in below cases:

- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
2024-01-04 19:05:12 +05:30
Jay Foad
26ff659c39 [AMDGPU] Remove some unused check prefixes 2024-01-04 13:16:46 +00:00
HaohaiWen
e147dcbcbc
[SEH] Add test to track EHa register liveness verification (#76921)
This test tracks bug of MachineVerifier to check live range segment for
EHa. Async exception can happen at any place within seh scope, not only
the call instruction. Need to teach MachineVerifier to know that.
2024-01-04 20:49:11 +08:00
Thomas Preud'homme
ce61b0e9a4
Add out-of-line-atomics support to GlobalISel (#74588)
This patch implement the GlobalISel counterpart to
4d7df43ffdb460dddb2877a886f75f45c3fee188.
2024-01-04 10:15:16 +00:00
sstipanovic
b4ac4d2264
[NFC][AMDGPU] Move image-atomic-attributes test to test/Assembler. (#76917) 2024-01-04 10:38:18 +01:00
Chen Zheng
dd4dc2111e nfc add cases for pr47156 and pr47155 2024-01-04 03:56:40 -05:00
Phoebe Wang
176c341198 [X86][BF16] Add 32-bit tests to show ABI problem, NFC 2024-01-04 15:43:34 +08:00
David Green
5550e9c841
[GlobalISel][AArch64] Add libcall lowering for fpowi. (#67114)
This adds legalization, notably libcall lowering for fpowi. It is a
little different to other methods as the function takes both a float and
integer register. Otherwise all vectors get scalarized and fp16 is
promoted to fp32.
2024-01-04 07:26:23 +00:00
sstipanovic
55395f5c83
[AMDGPU] Remove nosync from image atomic intrinsics. (#76814)
Remove `nosync` as discussed in
https://github.com/llvm/llvm-project/pull/73613
2024-01-04 08:22:05 +01:00