R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and
R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530
`call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not
useful and can be removed now (matching AArch64 and PowerPC).
GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09
(70f35d72ef04cd23771875c1661c9975044a749c).
Without this patch, unconditionally changing MO_CALL to MO_PLT could
create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler
and GNU assembler.
This should have been checking that the transform was valid, but used
incorrect conditions letting through invalid combinations of lo/hi
extracts.
Hopefully fixes#76769
This patch gets the code model from global variable attribute if it has,
otherwise the target's will be used.
---------
Signed-off-by: WANG Rui <wangrui@loongson.cn>
RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031
APX introduces EGPR, NDD and NF instructions. In addition to compressing
EVEX encoded AVX512 instructions into VEX encoding, we also have several
more possible optimizations.
a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space)
b. NDD (EVEX space) -> non-NDD (legacy space)
c. NF_ND (EVEX space) -> NF (EVEX space)
The first two types of compression can usually reduce code size, while
the third type of compression can help hardware decode although the
instruction length remains unchanged.
So we do the renaming for the upcoming APX optimizations.
BTW, I clang-format the code in X86CompressEVEX.cpp,
X86CompressEVEXTablesEmitter.cpp.
This patch also extracts the NFC in #77065 into a separate commit.
c9f3288 introduced unconditional branch deduplication for basic block
sections and machine function splitting, but it didn't add tests for
AArch64 since prior behavior crashed the test.
This change adds tests for AArch64 and has no functional change.
The ADDIW in the new test case was incorrectly removed due to
incorrectly following the x10 register from the return value back to the
argument. This is due to use_nodbg_operands returning every instruction
that uses a physical register regardless of the data flow.
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.
The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).
Depends on #76941 (otherwise some modified tests spuriously fail).
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"
Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
And G_VECREDUCE_SEQ_FMUL at the same time. They require the elements of
the vector operand to be accumulated in order, so just need to be
scalarized.
Some of the operands are not simplified as much as they can quite yet
due to not canonicalizing constant operands post-legalization.
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560
Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
Some shuffles with `bf16` as element type were running into a
`llvm_unreachable`. Key to reproducing was to chain two shuffles.
```llvm
define <2 x bfloat> @shuffle_chained_v32bf16_v2bf16(<32 x bfloat> %a) {
%s = shufflevector <32 x bfloat> %a, <32 x bfloat> zeroinitializer, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
%s2 = shufflevector <32 x bfloat> %s, <32 x bfloat> zeroinitializer, <2 x i32> <i32 0, i32 1>
ret <2 x bfloat> %s2
}
```
This was hitting this UNREACHABLE:
```
Not a valid 512-bit x86 vector type!
UNREACHABLE executed at /home/benoit/iree/third_party/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:17124!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/benoit/mlir-build/bin/llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw,+avx512bf16
1. Running pass 'Function Pass Manager' on module '<stdin>'.
2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@shuffle_chained_v32bf16_v2bf16'
```
This patch introduces a new spv_ptrcast intrinsic for tracking expected
pointer types. The change fixes multiple OpenCL CTS regressions due the
switch to opaque pointers (e.g. basic/hiloeo).
…n artifact combiner
These instructions are hint for optimizations and can be treated as
copies and are handled as such with this change. Without it is possible
to run into an assertion, since tryCombineUnmergeValues rightfully use
getDefIgnoringCopies to get the source MI, which already handle these
hint instructions and treat them as copies. The problem is that
markDefDead only considers COPYs, which will lead to crash with
assertion for cases like testUnmergeHintOfTrunc.
According to the description of the psABI v2.30:
https://github.com/loongson/la-abi-specs/releases/tag/v2.30, moved the
expansion of relevant pseudo-instructions from
`LoongArchPreRAExpandPseudo` pass to `LoongArchExpandPseudo` pass, to
ensure that the code sequences of `PseudoLA*_LARGE` instructions and
Medium code model's function call are not scheduled.
According to the description of the psABI v2.20:
https://github.com/loongson/la-abi-specs/releases/tag/v2.20, adjustments
are made to the function call instructions under the medium code model.
At the same time, AsmParser has already supported parsing the call36 and
tail36 macro instructions.
Needs an AMDGPU hack to get the selection to work. The ordinary
variant is custom lowered through an almost equivalent target node
that would need a strict variant for additional known bits
optimizations.
There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.
Depends #76213#76214
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.
hidden argument will be added in below cases:
- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
This test tracks bug of MachineVerifier to check live range segment for
EHa. Async exception can happen at any place within seh scope, not only
the call instruction. Need to teach MachineVerifier to know that.
This adds legalization, notably libcall lowering for fpowi. It is a
little different to other methods as the function takes both a float and
integer register. Otherwise all vectors get scalarized and fp16 is
promoted to fp32.