52796 Commits

Author SHA1 Message Date
Felipe de Azevedo Piovezan
380ac53dfa
[DebugNames] Implement Entry::GetParentEntry query (#78760)
This commit introduces a helper function to DWARFAcceleratorTable::Entry
which follows DW_IDX_Parent attributes to returns the corresponding
parent Entry in the table.

It is tested by enhancing dwarfdump so that it now prints:

1. When data is corrupt.
2. When parent information is present, but the parent is not indexed.
3. The parent entry offset, when the parent is present and indexed. This
is printed in terms a real entry offset (the same that gets printed at
the start of each entry: "Entry @ 0x..."), instead of the encoded number
in the table (which is an offset from the start off the Entry list).
This makes it easy to visually inspect the dwarfdump and check what the
parent is.
2024-01-24 06:44:03 -08:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Florian Hahn
98509c7f97
[AArch64] Add vec3 tests with different load/store alignments.
Add extra tests with different load/store alignments for
https://github.com/llvm/llvm-project/pull/78637.
2024-01-24 14:19:33 +00:00
Simon Pilgrim
8b43c1be23
[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000)
If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits.

Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue).

Fixes #73783
2024-01-24 14:00:51 +00:00
Mirko Brkušanin
7fdf608cef
[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Simon Pilgrim
72f10f7eb5 [X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2)
Fixes #78888
2024-01-24 12:04:45 +00:00
Simon Pilgrim
6255bae6c9 [X86] Add test coverage based on #78888 2024-01-24 12:04:44 +00:00
Mariusz Sikora
cfddb59be2
[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Petar Avramovic
c46109d0d7
Revert "AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis" (#79274)
Reverts llvm/llvm-project#78482
2024-01-24 12:18:34 +01:00
Petar Avramovic
149ed9d2c5
AMDGPU: update GFX11 wmma hazards (#76143)
One V_NOP or unrelated VALU instruction in between is required for
correctness when matrix A or B of current WMMA instruction overlaps with
matrix D of previous WMMA instruction.
Remaining cases of WMMA operand overlaps are handled by the hardware and
do not require handling in hazard recognizer.

Hardware may stall in cases where:
- matrix C of current WMMA instruction overlaps with matrix D of
previous WMMA instruction
- VALU instruction reads matrix D of previous WMMA instruction
- matrix A,B or C of WMMA instruction reads result of previous VALU
instruction
2024-01-24 12:00:35 +01:00
Petar Avramovic
91ddcba83a
AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#78482)
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.

TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.

patch 3 from: https://github.com/llvm/llvm-project/pull/73337
2024-01-24 11:58:32 +01:00
Shengchen Kan
33ecef9812
[X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245)
Reported in 134fcc62786d31ab73439201dce2d73808d1785a

Incorrect opcode is used  b/c there is a `[[fallthrough]]` at line 2386.
2024-01-24 17:10:28 +08:00
Shengchen Kan
71d64ed80f [X86][Peephole] Add NDD entries for EFLAGS optimization 2024-01-24 15:47:58 +08:00
Shengchen Kan
d119ecb958 [X86][NFC] Pre-commit test for RA hints for APX NDD instructions 2024-01-24 14:44:34 +08:00
Douglas Yung
a01195ff5c Update compiler version expected that seems to be embedded in CHECK line of test at llvm/test/CodeGen/SystemZ/zos-ppa2.ll.
The test contains a CHECK line which verifies an .ascii line which originally checks
for 18001970010100000000. After the bump of the compiler version to 19, the test
started to fail with the string now being 19001970010100000000.

This should fix this failing test on bots.
2024-01-23 22:05:17 -08:00
Shengchen Kan
f7b61f81b5
[X86][CodeGen] Transform NDD SUB to CMP if dest reg is dead (#79135) 2024-01-24 13:58:48 +08:00
Brandon Wu
33d804c6c2
[RISCV] Allow VCIX with SE to reorder (#77049)
This patch allows VCIX instructions that have side effect to be
reordered
with memory and other side effecting instructions. However we don't want
VCIX instructions to be reordered with each other, so we propose a dummy
register called VCIX_STATE and make these instructions implicitly define
and use
it.
2024-01-24 11:30:12 +08:00
Christudasan Devadasan
230c13d59d
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.

This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
2024-01-24 07:08:43 +05:30
Aiden Grossman
b1778c7d7b
[AsmPrinter] Remove mbb-profile-dump flag (#76595)
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
2024-01-23 16:48:10 -08:00
Paul Kirth
03a61d34eb
[RISCV] Support TLSDESC in the RISC-V backend (#66915)
This patch adds basic TLSDESC support in the RISC-V backend.

Specifically, we add new relocation types for TLSDESC, as prescribed in 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373, and add a
new pseudo instruction to simplify code generation.

This patch does not try to optimize the local dynamic case, which can be
improved in separate patches. 

Linker side changes will also be handled separately.

The current implementation is only enabled when passing the new
`-enable-tlsdesc` codegen flag.
2024-01-23 16:16:07 -08:00
Paul Kirth
9d476e1e1a
[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061)
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.

This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
2024-01-23 14:04:52 -08:00
Philip Reames
f05dd29cee [RISCV] Regenerate autogen test to remove spurious diff 2024-01-23 10:57:54 -08:00
Philip Reames
bdc41106ee
[RISCV] Recurse on first operand of two operand shuffles (#79180)
This is the first step towards an alternate shuffle lowering design for
the general two vector argument case. The goal is to leverage the
existing lowering for single vector permutes to avoid as many of the
vrgathers as required - even if we do need the other.

This patch handles only the first argument, and is arguably a slightly
weird half-step. However, the test changes from the full two argument
recurse patch are a lot harder to reason about. Taking this half step
gives much more easily reviewable changes, and is thus worthwhile. I
intend to post the patch for the second argument once this has landed.
2024-01-23 10:49:55 -08:00
Philip Reames
bb8a8770e2
[RISCV] Exploit register boundaries when lowering shuffle with exact vlen (#79072)
If we have a shuffle which is larger than m1, we may be able to split it
into a series of individual m1 shuffles. This patch starts with the
subcase where the mask allows a 1-to-1 mapping from source register to
destination register - each with a possible permutation of their own. We
can potentially extend this later, thought in practice this seems to
already catch a number of the most interesting cases.
2024-01-23 10:36:22 -08:00
gulfemsavrun
7fe951ad8a
Revert "Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass … (#79186)
…#78606"

This reverts commit 13c6f1ea2e7eb15fe492d8fca4fa1857c6f86370 because it
causes an assertion in DebugInfoMetadata.cpp:1968 in Clang Linux
builders for Fuchsia.

https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8758111613576762817/+/u/clang/build/stdout
2024-01-23 10:12:10 -08:00
Changpeng Fang
32073b8356
AMDGPU: Do not generate non-temporal hint when Load_Tr intrinsic did not specify it (#79104)
int_amdgcn_global_load_tr did not specify non-temporal load transpose,
thus we should
not genetrate the non-temporal hint for the load. We need to implement
getTgtMemIntrinsic
to create the corresponding MemSDNode. And we don't set the non-temporal
flag because
the intrinsic did not specify it.

NOTE: We need to implement getTgtMemIntrinsic for any memory intrinsics.
2024-01-23 10:05:32 -08:00
Craig Topper
d360963aaa
[RISCV] Add regalloc hints for Zcb instructions. (#78949)
This hints the register allocator to use the same register for source
and destination to enable more compression.
2024-01-23 09:33:06 -08:00
Jay Foad
6cf37dd504
[AMDGPU] Enable architected SGPRs for GFX12 (#79160) 2024-01-23 16:36:30 +00:00
Simon Pilgrim
e1aa5b1fd1 [DAG] visitSCALAR_TO_VECTOR - don't fold scalar_to_vector(bin(extract(x),extract(y)) -> bin(x,y) if extracts have other uses
Fixes #78897 - although the test case still has a number of poor codegen issues (in particular for i686 triples) that will need addressing (combining the nodes in topological order should help).
2024-01-23 16:28:43 +00:00
Mirko Brkušanin
6bb7d515c3
[AMDGPU] Properly check op_sel in GCNDPPCombine (#79122) 2024-01-23 17:21:16 +01:00
Simon Pilgrim
8c41e3fcb1 [X86] Add test case for Issue #78897 2024-01-23 15:44:01 +00:00
Jeremy Morse
087172258a [DebugInfo][RemoveDIs] Handle non-instr debug-info in GlobalISel (#75228)
The RemoveDIs project is aiming to eliminate debug intrinsics like
dbg.value and dbg.declare from LLVM, and replace them with DPValue objects
attached to instructions. ISel is one of the "terminals" where that
information needs to be converted into MIR format: this patch implements
support for that in GlobalISel. We aim for the output of LLVM to be
identical with/without RemoveDIs debug-info.

This patch should be NFC, as we're handling the same data about variables
stored in a different format -- it now appears in a DPValue object rather
than as an intrinsic. To that end, I've refactored the handling of
dbg.values into a dedicated function, and call it whenever a dbg.value or a
DPValue is encountered. dbg.declare is handled in a similar way.

Testing: adding the --try-experimental-debuginfo-iterators switch to llc
causes it to try and convert to the "new" debug-info format if it's built
in (LLVM_EXPERIMENTAL_DEBUGINFO_ITERATORS=On), and it'll be covered by our
buildbot. One test has a few extra wildcard-regexes added: this is because
there's some extra data printed about attached debug-info, which is safe to
ignore.
2024-01-23 15:04:08 +00:00
Danial Klimkin
16df714e77
[test] Update stack_guard_remat.ll (#79139)
Replace cp with a cat. This allows to create a writable file when the
original one is read-only.
2024-01-23 14:48:33 +01:00
Florian Hahn
e7b4ff8119
[AArch64] Add vec3 tests with add between load and store.
Extra tests for
  https://github.com/llvm/llvm-project/pull/78637
  https://github.com/llvm/llvm-project/pull/78632
2024-01-23 12:38:00 +00:00
Simon Pilgrim
4318b033bd
[MC][X86] Merge lane/element broadcast comment printers. (#79020)
This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.
2024-01-23 12:33:52 +00:00
Pierre van Houtryve
42b0884238
[AMDGPU] Handle V_PERMLANE64_B32 in fixVcmpxPermlaneHazards (#79125)
Fixes #78856
2024-01-23 13:10:58 +01:00
Simon Pilgrim
5c7bbe383b [X86] canonicalizeShuffleWithOp - recognise constant vectors with getTargetConstantFromNode
Allows shuffle to fold constant vectors that have already been lowered to constant pool - shuffle combining can then constant fold this.

Noticed while triaging #79100
2024-01-23 11:30:06 +00:00
OCHyams
13c6f1ea2e Reapply [hwasan] Update dbg.assign intrinsics in HWAsan pass #78606
llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to update
the second expression.

Fixes #76545
2024-01-23 11:24:21 +00:00
Shengchen Kan
66237d647e [X86][CodeGen] Add entries for NDD SHLD/SHRD to the commuteInstructionImpl 2024-01-23 17:05:09 +08:00
David Spickett
f20556678c
Reland "[llvm][AArch64] Copy all operands when expanding BLR_BTI bundle (#78267)" (#78719)
This reverts commit 955417ade2648c2b1a4e5f0be697f61570590a88.

The problem with the previous version was that the bundle instruction
had arguments like "target arg1 arg2". When it's expanded we produced a
BL or BLR which can only accept one argument, the target of the branch.

Now I realise why expandCALL_RVMARKER has to copy them in mutiple steps.
The operands for the called function need to be changed to implicit
arguments of the branch instruction.

* Copy the branch target.
* Copy all register operands, marking them as implicit.
* Copy any other operands without modifying them.

Prior to any attempt to fix #77915:
BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp,
implicit-def dead $lr, implicit $sp, implicit-def $sp

Which is dropping the use of the arguments for the called function.

My first fix attempt produced:
BL @_setjmp, $x0, $w1, <regmask $fp ...>, implicit-def $lr, implicit
$sp, implicit-def dead $lr, implicit $sp, implicit-def $sp

It copied the arguments but as explicit arguments to the BL which only
expects 1, failing verification.

With this new change we produce:
BL @_setjmp, csr_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit
$x0, implicit $w1, implicit-def dead $lr, implicit $sp, implicit-def $sp

Note specifically the added "implicit $x0, implicit $w1". So BL only has
1 explicit argument, but the arguments to the function are still used.
2024-01-23 08:45:47 +00:00
yjijd
44ba6ebc99
[CodeGen][LoongArch] Set FP_TO_SINT/FP_TO_UINT to legal for vector types (#79107)
Support the following conversions:
v4f32->v4i32, v2f64->v2i64(LSX)
v8f32->v8i32, v4f64->v4i64(LASX)
v4f32->v4i64, v4f64->v4i32(LASX)
2024-01-23 15:57:06 +08:00
yjijd
f799f93692
[CodeGen][LoongArch] Set SINT_TO_FP/UINT_TO_FP to legal for vector types (#78924)
Support the following conversions:
v4i32->v4f32, v2i64->v2f64(LSX)
v8i32->v8f32, v4i64->v4f64(LASX)
v4i32->v4f64, v4i64->v4f32(LASX)
2024-01-23 15:16:23 +08:00
Simeon K
297b77036e
[RISCV] Fix stack size computation when M extension disabled (#78602)
Ensure that getVLENFactoredAmount does not fail when the scale amount
requires the use of a non-trivial multiplication but the M extension is
not enabled. In such case, perform the multiplication using shifts and
adds.
2024-01-22 23:10:25 -08:00
Ami-zhang
fcb8342a21
[LoongArch] Add definitions and feature 'frecipe' for FP approximation intrinsics/builtins (#78962)
This PR adds definitions and 'frecipe' feature for FP approximation
intrinsics/builtins. In additions, this adds and complements relative
testcases.
2024-01-23 14:24:58 +08:00
Eli Friedman
a6065f0fa5
Arm64EC entry/exit thunks, consolidated. (#79067)
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.

Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.

The other changes are supporting changes, to handle the new constructs
generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.

Thanks to @dpaoliello for extensive testing and suggestions.

(Originally posted as https://reviews.llvm.org/D157547 .)
2024-01-22 21:28:07 -08:00
Shengchen Kan
5c68c6d70f
[X86] Support encoding/decoding and lowering for APX variant SHL/SHR/SAR/ROL/ROR/RCL/RCR/SHLD/SHRD (#78853)
Four variants: promoted legacy, ND (new data destination), NF (no flags
update) and NF_ND (NF + ND).

The syntax of NF instructions is aligned with GNU binutils.
https://sourceware.org/pipermail/binutils/2023-September/129545.html
2024-01-23 10:23:27 +08:00
Jim Lin
b8e708b9d3
[RISCV] Merge ADDI with X0 into base offset (#78940)
If offset is `addi rd, x0, imm`, merge imm into base offset.
2024-01-23 09:57:05 +08:00
Carl Ritson
4db4d7f282
[AMDGPU] SILowerSGPRSpills: do not update MRI reserve registers (#77888)
VGPRs used for spilling do not require explicit reservation with MRI.
freezeReservedRegs() executed before register allocation ensures these
are placed in the reserve set.

The only pass after SILowerSGPRSpills is SIPreAllocateWWMRegs which
explicitly tests for interference before register allocation so should
not reuse a WWM VGPR holding spill data. reserveReg prevents calculation
of correct liveness for physical registers which could be used to extend
SIPreAllocateWWMRegs.
2024-01-23 10:49:26 +09:00
Simeon K
58cfd56356
[VP][RISCV] Introduce llvm.vp.minimum/maximum intrinsics (#74840)
Although there are predicated versions of minnum/maxnum, the ones for
minimum/maximum are currently missing. This patch introduces these
intrinsics and implements their lowering to RISC-V.
2024-01-22 16:46:39 -08:00
David Green
a2d68b4bec
[SelectOpt] Add handling for Select-like operations. (#77284)
Some operations behave like selects. For example `or(zext(c), y)` is the
same as select(c, y|1, y)` and instcombine can canonicalize the select
to the or form. These operations can still be worthwhile converting to
branch as opposed to keeping as a select or or instruction.

This patch attempts to add some basic handling for them, creating a
SelectLike abstraction in the select optimization pass. The backend can
opt into handling `or(zext(c),x)` as a select if it could be profitable,
and the select optimization pass attempts to handle them in much the
same way as a `select(c, x|1, x)`. The Or(x, 1) may need to be added as
a new instruction, generated as the or is converted to branches.

This helps fix a regression from selects being converted to or's
recently.
2024-01-22 23:46:58 +00:00