52796 Commits

Author SHA1 Message Date
Craig Topper
7fee58acf4 [RISCV] Update relax-per-target-feature.ll to use hexadecimal constants. NFC
Needed after 3dde0d02568d31ae48b557c486a3ff4edb24199b
2023-12-14 21:08:01 -08:00
Saiyedul Islam
e21b7e2143
[AMDGPU][NFC] Check more autogenerated llc tests for COV5 (#75219)
Regenerate a few more llc tests to check for COV5 instead of the default
ABI version.
2023-12-15 10:27:49 +05:30
Craig Topper
2a21260ea8 [SelectionDAG] Use getVectorElementPointer in DAGCombiner::replaceStoreOfInsertLoad. (#74249)
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.

Fixes #74248 #75557.
2023-12-14 20:25:16 -08:00
Matt Arsenault
f4b5be1ecd Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit 69c4930aad9659ec6ab846c8e7124d6afe044b1e.

See if this sticks after a few more coalescer assertions are fixed.
2023-12-15 10:51:47 +07:00
Jianjian Guan
3fe81410b2
[clang][RISCV] Change default abi with f extension but without d extension (#73489)
Now we have default abi lp64 for rv64if and ilp32 for rv32if, which is
different with riscv-gnu-toolchain. In
8e9fb09a0c/configure (L3385)
when have f and not d, it prefers lp64f/ilp32f but no soft float. This
patch tries to make their behaviors consistent.
2023-12-15 11:16:05 +08:00
Youngsuk Kim
f9304974cc
[llvm][NVPTX] Inform that 'DYNAMIC_STACKALLOC' is unsupported (#74684)
Catch unsupported path early up, and emit error with information.

Motivated by the following threads:
* https://discourse.llvm.org/t/nvptx-problems-with-dynamic-alloca/70745
* #64017
2023-12-14 22:06:22 -05:00
Wang Yaduo
3dde0d0256
[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-15 10:13:20 +08:00
Arthur Eubanks
239a41e8f2 Re-Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386)
For non-GlobalValue references, the small and medium code models can use
32 bit constants.

For GlobalValue references, use TargetMachine::isLargeGlobalObject().
Look through aliases for determining if a GlobalValue is small or large.
Even the large code model can reference small objects with 32 bit
constants as long as we're in no-pic mode, or if the reference is offset
from the GOT.

Original commit broke the build...

First reland broke large PIC builds referencing small data since it was using GOTOFF as a 32-bit constant.
2023-12-14 14:12:37 -08:00
Jon Roelofs
b071b70317
[GlobalISel] Always direct-call IFuncs and Aliases (#74902)
This is safe because for both cases, the use must be in the same TU as the definition, and they cannot be forward declared.
2023-12-14 14:58:20 -07:00
Jon Roelofs
640c1d3dd1
[llvm] Support IFuncs on Darwin platforms (#73686)
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.

Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
2023-12-14 14:40:52 -07:00
Jay Foad
3e6da3252f
[AMDGPU] Add GFX12 s_sleep_var instruction and intrinsic (#75499) 2023-12-14 21:11:39 +00:00
Arthur Eubanks
15617d14f7 Revert "Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386)"
This reverts commit ec92d74a0ef89b9dd46aee6ec8aca6bfd3c66a54.

Breaks some compiler-rt tests, e.g. https://lab.llvm.org/buildbot/#/builders/37/builds/28834
2023-12-14 12:28:50 -08:00
Natalie Chouinard
e75f37fd64
[SPIR-V][NFC] Require asserts on 2 tests (#75087)
These tests currently fail on asserts, so adding a REQUIRES to make sure
they're skipped on builds with asserts disabled.

Follow-up from #74849
2023-12-14 15:17:11 -05:00
Mirko Brkusanin
c6351b4cc9 [AMDGPU][NFC] Regenerate .mir test 2023-12-14 18:58:43 +01:00
Arthur Eubanks
ec92d74a0e Reland [X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386)
For non-GlobalValue references, the small and medium code models can use
32 bit constants.

For GlobalValue references, use TargetMachine::isLargeGlobalObject().
Look through aliases for determining if a GlobalValue is small or large.
Even the large code model can reference small objects with 32 bit
constants as long as we're in no-pic mode, or if the reference is offset
from the GOT.

Original commit broke the build...
2023-12-14 09:49:35 -08:00
Philip Reames
7537c3c452 [RISCV] Precommit test coverage for VLMAX encodable via vsetivli 2023-12-14 09:42:54 -08:00
Arthur Eubanks
f0c03da63c
Revert "[X86] Respect code models more when determining if a global reference can fit in 32 bits" (#75500)
Reverts llvm/llvm-project#75386

Breaks build.
2023-12-14 09:32:55 -08:00
Simon Pilgrim
88f1a2c50d [X86] combineLoad - allow constant loads to share matching 'lower constant bits' with larger VBROADCAST_LOAD/SUBV_BROADCAST_LOAD nodes
We already had separate support for VBROADCAST_LOAD - merge this with the generic load handling and add SUBV_BROADCAST_LOAD support as well.
2023-12-14 17:31:07 +00:00
Arthur Eubanks
5e38ba26d2
[X86] Respect code models more when determining if a global reference can fit in 32 bits (#75386)
For non-GlobalValue references, the small and medium code models can use
32 bit constants.

For GlobalValue references, use TargetMachine::isLargeGlobalObject().
Look through aliases for determining if a GlobalValue is small or large.
Even the large code model can reference small objects with 32 bit
constants as long as we're in no-pic mode, or if the reference is offset
from the GOT.
2023-12-14 09:28:27 -08:00
Philip Reames
b7ebba3d8a [riscv] Consolidate a set of load-add-store tests into one file 2023-12-14 09:08:04 -08:00
Philip Reames
1fdbdb84a1 [riscv] Convert a set of tests to opaque pointers 2023-12-14 08:57:13 -08:00
Philip Reames
46d1f30882
[RISCV][InsertSETVTLI] Handle large immediates in backwards walk (#75409)
When doing our backwards walk, we were not handling the case where the
AVL was defined by a register whose definition was an ADDI xN, x0,
<imm>. Doing so (as we already do in the forward pass) allows us to
prune a few more transitions.
2023-12-14 07:36:07 -08:00
Simon Pilgrim
3c423722cf [X86] combineLoad - improve constant pool matches by ignoring undef elements
When trying to share constant pool entries, we can ignore the undef elements of the entry that is being removed
2023-12-14 15:13:39 +00:00
Jonas Paulsson
01061ed370
[SystemZ] Improve shouldCoalesce() for i128. (#74942)
The SystemZ implementation of shouldCoalesce() is merely a workaround
for the fact that regalloc can run out of registers when extending 128-bit
intervals with subreg (GPR64/GPR32) COPYs. 

This patch adds more freedom to the coalescer as it now only checks that
the subreg interval is local to MBB and does not have too many physreg
clobbers.
2023-12-14 15:55:27 +01:00
ostannard
4888218d03
[ARM] Do not emit unwind tables when saving LR around outlined call (#69611)
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.

The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
2023-12-14 14:46:13 +00:00
Shih-Po Hung
b97c5a9554
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
- Replace `undef` pointer operands and add stores to avoid the loads
   being optmized away.
2023-12-14 21:16:11 +08:00
Valery Pykhtin
dd051295bc
[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975)
Let's try once again after #69957 has landed.
2023-12-14 14:10:27 +01:00
Simon Pilgrim
2141a51be1 [X86] broadcast-elm-cross-splat-vec.ll - drop constant pool check
This is handled in the assembly comments.
2023-12-14 12:10:33 +00:00
DianQK
7649d22306
[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184)
Follows
https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782.
Fixes #74680.
2023-12-14 19:19:55 +08:00
Simon Pilgrim
a0c7a29655 [GlobalISel] IRTranslator::translateGetElementPtr - don't assume a gep constant offset is representable as i64
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65052
2023-12-14 11:02:38 +00:00
Simon Pilgrim
b7fc78255e Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026)"
vplan-unused-interleave-group.ll is causing buildbot failures
2023-12-14 10:25:41 +00:00
Shih-Po Hung
2047ab00ea
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
   - Replace `undef` pointer operands and add stores to avoid the loads
      being optmized away.
2023-12-14 17:36:58 +08:00
Philip Reames
1f3d13c415 [riscv] Fix build due to missing test update
My 632f1c appears to have missed a test update, sorry for the breakage.
2023-12-13 18:18:41 -08:00
Philip Reames
632f1c5d18
[RISCV] When VLEN is exactly known, prefer VLMAX encoding for vsetvli (#75412)
If we know the exact VLEN, then we can tell if the AVL for particular
operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using
this encoding is better than having to materialize an immediate in a
register, but worse than being able to use the vsetivli zero, imm,
<type> encoding.
2023-12-13 17:51:03 -08:00
Arthur Eubanks
c64334fb30
[X86][FastISel] Support medium code model in more places (#75375)
The medium code model is basically identical to the small code model
except that large objects cannot be referenced with 32-bit offsets.
2023-12-13 13:44:37 -08:00
Arthur Eubanks
e8f43883a0 [X86][test] Use separate check prefix in code-model-elf.ll
Since these will produce different results in upcoming changes.
2023-12-13 13:05:59 -08:00
Philip Reames
29bb7f762b [RISCV] Add test coverage for profitable vsetvli a0, zero, <vtype> cases
Test coverage for an upcoming change, we can avoid generating an immediate
in register if we know the immediate is equal to vlmax.
2023-12-13 12:58:25 -08:00
Stanislav Mekhanoshin
c6ecbcb48b
[AMDGPU] Fix no waitcnt produced between LDS DMA and ds_read on gfx10 (#75245)
BUFFER_LOAD_DWORD_LDS was incorrectly touching vscnt instead of the
vmcnt. This is VMEM load and DS store, so it shall use vmcnt.
2023-12-13 10:49:36 -08:00
Craig Topper
2c185709bc
[RISCV] Remove setJumpIsExpensive(). (#74647)
Middle end up optimizations can speculate away the short circuit
behavior of C/C++ && and ||. Using i1 and/or or logical select
instructions and a single branch.

SelectionDAGBuilder can turn i1 and/or/select back into multiple
branches, but this is disabled when jump is expensive.

RISC-V can use slt(u)(i) to evaluate a condition into any GPR which
makes us better than other targets that use a flag register. RISC-V also
has single instruction compare and branch. So its not clear from a code
size perspective that using compare+and/or is better.

If the full condition is dependent on multiple loads, using a logic
delays the branch resolution until all the loads are resolved even if
there is a cheap condition that makes the loads unnecessary.

PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive.
NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears
to have a MachineIR pass that turns AND/OR of CR bits into multiple
branches. I don't know anything about Lanai and their reason for using
setJumpIsExpensive.

I think the decision to use logic vs branches is much more nuanced than
this big hammer. So I propose to make RISC-V match other CPU targets.

Anyone who wants the old behavior can still pass -mllvm
-jump-is-expensive=true.
2023-12-13 09:37:25 -08:00
CarolineConcatto
f2464ca317
[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926)
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.

[1]https://github.com/ARM-software/acle/pull/257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
2023-12-13 15:45:59 +00:00
Petar Avramovic
6892c175c5
AMDGPU/GlobalISel: add AMDGPUGlobalISelDivergenceLowering pass (#75340)
Add empty AMDGPUGlobalISelDivergenceLowering pass. This pass will
implement
- selection of divergent i1 phis as lane mask phis, requires lane mask
merging in some cases
- lower uses of divergent i1 values outside of the cycle using lane mask
merging
- lowering of all cases of temporal divergence:
- lower uses of uniform i1 values outside of the cycle using lane mask
merging
- lower uses of uniform non-i1 values outside of the cycle using a copy
to vgpr inside of the cycle

Add very detailed set of regression tests for cases mentioned above.

patch 1 from: https://github.com/llvm/llvm-project/pull/73337
2023-12-13 16:42:56 +01:00
Yingwei Zheng
3564c85b0e
[RISCV] Eliminate dead li after emitting VSETVLIs (#65934)
This patch tracks li instructions that set AVL operands and does DCE
after emitting VSETVLIs.
2023-12-13 23:18:48 +08:00
Mariusz Sikora
7f55d7de1a
[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Nikita Popov
9c093cbb5e Revert "[StackColoring] Delete dead stack slots (#72633)"
This reverts commit a29457844bf0c4b2eb5c0f3877b6e8ef30cdef52.

Causes an assertion failure in llvm/test/DebugInfo/COFF/lexicalblock.ll.
2023-12-13 14:31:09 +01:00
Piotr Sobczak
6eec80133b
[AMDGPU] Min/max changes for GFX12 (#75214)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 14:18:10 +01:00
Piotr Sobczak
fac093dd08
[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-13 13:52:40 +01:00
mohammed-nurulhoque
a29457844b
[StackColoring] Delete dead stack slots (#72633)
Deletes slots that have lifetime markers and the lifetime ranges are
empty.
2023-12-13 13:01:21 +01:00
Momchil Velikov
d7ee99a4fc
[MachineSink] Clear kill flags of sunk addressing mode registers (#75072)
When doing sink-and-fold, the MachineSink clears the "killed" flags of
the operands of the sunk (and deleted) instruction. However, this is not
always sufficient. In some cases we can create the new load/store
instruction with operands other than the ones present in the deleted
instruction. One such example is folding a zero word extend into a
memory load on AArch64. The zero-extend is represented by a pair of
instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The
`SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new
load instruction mentions operands "killed" in the `MOV`, which is no
longer correct.

To fix this, clear the "killed" flags of the registers participating in
the addressing mode.
2023-12-13 09:15:28 +00:00
paperchalice
60eca674b1
[CodeGen] Port ExpandMemCmp to new pass manager (#74050) 2023-12-13 16:18:24 +08:00
Matt Arsenault
300a55003c
RegisterCoalescer: Fix implicit operand handling during rematerialize (#75271)
If the rematerialize was placing a subregister into a super register,
and implicit operands referenced the original register, we need to add
undef flags to the now-subregister indexed implicit operands.

Depends #75152
2023-12-13 15:15:36 +07:00