52796 Commits

Author SHA1 Message Date
Fangrui Song
9e9907f1cf
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
  LLVM :: CodeGen/AMDGPU/fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fabs.ll
  LLVM :: CodeGen/AMDGPU/floor.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
  LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
  LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
  LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
2024-01-16 21:54:58 -08:00
Phoebe Wang
8d6e82d501
[X86] Use vXi1 for k constraint in inline asm (#77733)
Fixes #77172
2024-01-17 11:40:32 +08:00
Phoebe Wang
9745c13ca8
[X86][BF16] Improve float -> bfloat lowering under AVX512BF16 and AVXNECONVERT (#78042) 2024-01-17 10:09:26 +08:00
Florian Mayer
f3190c78ec
Revert "[AMDGPU] Sign extend simm16 in setreg intrinsic" (#78372)
Reverts llvm/llvm-project#77997

Broke UBSan bots.
2024-01-16 16:37:48 -08:00
Rahman Lavaee
e1616ef9d7
[BasicBlockSections] Always keep the entry block in the beginning of the function. (#74696)
BasicBlockSections must enforce placing the entry block at the beginning
of the function regardless of the basic block sections profile.
2024-01-16 14:15:33 -08:00
mmoadeli
aa23e493f2
[NVPTX] Fix generating permute bytes from register pair when the initial values are undefined (#74437)
When generating the permute bytes for the prmt instruction, the
existence of an undefined initial value initialises the int32 that holds
the mask with all 1's (0xFFFFFFFF). That initialization subsequently
leads to complications during the subsequent OR operation, leading to
inaccuracies in populating mask values for the following bytes.
Consequently, the final value persists as a constant -1, irrespective of
the actual mask values that succeed the initial set value.
2024-01-16 11:05:41 -08:00
Simon Pilgrim
d6ee91b110 [X86] Add test case for Issue #77805 2024-01-16 17:23:45 +00:00
Stanislav Mekhanoshin
371fdbaa57
[AMDGPU] Sign extend simm16 in setreg intrinsic (#77997)
We currently force users to use a negative contant in the intrinsic
call. Changing it zext would break existing programs, so just sign
extend an argument.
2024-01-16 09:17:18 -08:00
Craig Topper
7fe5269b54
[RISCV] Bump Zfbfmin, Zvfbfmin, and Zvfbfwma to 1.0. (#78021) 2024-01-16 08:42:21 -08:00
Simon Pilgrim
662d1cb86b [X86] Add test case for Issue #78109 2024-01-16 16:33:21 +00:00
Koakuma
118d4234ac
[SPARC] Prefer RDPC over CALL to implement GETPCX for 64-bit target
On 64-bit target, prefer using RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids
polluting the processor's predictor state.

The old behavior of using a fake CALL is still done when tuning for
classic UltraSPARC processors, since RDPC is much slower there.

A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT
loads, and about 7% speedup on INSERT/UPDATE loads.

Reviewed By: @s-barannikov

Github PR: https://github.com/llvm/llvm-project/pull/78280
2024-01-16 22:46:39 +07:00
Luke Lau
93d39657f5 [RISCV] Remove -riscv-v-vector-bits-min flag that was left behind. NFC
This should have been removed in 74f985b793bf4005e49736f8c2cef8b5cbf7c1ab
2024-01-16 21:30:32 +07:00
Wang Pengcheng
3ac9fe69f7
[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777)
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.

The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.

`RVE` can be combined with all current standard extensions.

The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.

If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.

To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.

FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.

Differential Revision: https://reviews.llvm.org/D70401
2024-01-16 20:44:30 +08:00
Shengchen Kan
b1eaffd389 [X86][test] Add test for lowering NDD AND
We supported encoding/decoding for APX AND in #76319

This test should be added in #77564 but was missing.
2024-01-16 20:09:40 +08:00
Sander de Smalen
289999bad7
[Clang] Make sdot builtins available to SME (#77792)
See the specification for more details:
*
https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-vectors
*
https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-indexed
2024-01-16 10:32:30 +00:00
Alfie Richards
60c775769b
[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782)
This avoids possible undefined behavior using the same register for Rm
and Rda.

Additionally adds a check in MC to produce an error upon parsing this
case.
2024-01-16 10:30:16 +00:00
David Green
1074b94f5d
[ARM] Fix phi operand order issue in MVEGatherScatterLowering (#78208)
With commuted operands on the phi node, the two old incoming values
could be removed in the wrong order, removing newly added operand
instead of the old one.
2024-01-16 10:15:05 +00:00
Pierre van Houtryve
4b0a76a3d7
[GlobalISel] Fix buildCopyFromRegs for split vectors (#77448)
Fixes #77055
2024-01-16 10:04:20 +01:00
Matt Arsenault
480cc413b7
AMDGPU/GlobalISel: Handle inreg arguments as SGPRs (#78123)
This is the missing GISel part of
54470176afe20b16e6b026ab989591d1d19ad2b7
2024-01-16 15:13:31 +07:00
Alex Bradbury
84f7fb6217
[MachineScheduler] Add option to control reordering for store/load clustering (#75338)
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.

This patch adds a parameter that can control the behaviour on a
per-target basis.

Split out from #73789.
2024-01-16 07:17:41 +00:00
Luke Lau
286a366d05
[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501)
vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR
register
class (just like the actual instruction), so we now only have TableGen
patterns for vectors of LMUL <= 1.
We now rely on the existing combines that shrink LMUL down to 1 for
vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines
later and just inserting the nodes with the correct type in a later
patch.

The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no
longer
carries any information about LMUL, so if it's the only vector pseudo
instruction in a block then it now defaults to LMUL=1.
2024-01-16 13:36:24 +07:00
Shilei Tian
d63c2e52e6
[AMDGPU][MC] Remove incorrect _e32 suffix from v_dot2c_f32_f16 and v_dot4c_i32_i8 (#77993)
The two VOP2 instructions cannot be encoded as VOP3.

Fix #54691.
2024-01-15 23:11:50 -05:00
Michal Paszkowski
59e5cb7b83
[SPIR-V] Do not emit spv_ptrcast if GEP result is of expected type (#78122)
Prior to this change spv_ptrcast (and OpBitcast) was never emitted for
GEP resulting pointers. While such SPIR-V was (mostly) accepted by the
NEO GPU driver, the generated SPIR-V was incorrect.

The newly added test (pointers/getelementptr-bitcast-load.ll) verifies
that a correct bitcast is added for more complex cases and passes
spirv-val. The test is based on an OpenCL CTS test (basic/prefetch).
2024-01-15 19:56:11 -08:00
Amara Emerson
eb009ed249 [GlobalISel] Fix the select->minmax combine from trying to operate on pointer types. 2024-01-15 18:20:18 -08:00
David Green
6719a5a3f6 [ARM] Extra test for MVE gather optimization with commuted phi operands. NFC 2024-01-15 19:28:55 +00:00
Jonas Paulsson
1d1893097a
[SystemZ] Don't use FP Load and Test as comparisons to same reg (#78074)
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.

This patch
- changes instruction selection to always emit FP LT with a scratch def
  reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
2024-01-15 19:36:40 +01:00
chuongg3
927b8a0f4f
[AArch64][GlobalISel] Combine vecreduce(ext) to {U/S}ADDLV (#75832) 2024-01-15 18:26:27 +00:00
Jay Foad
ba131b7017
[AMDGPU] Do not generate s_set_inst_prefetch_distance for GFX12 (#78190)
GFX12 can still encode the s_set_inst_prefetch_distance instruction but
it has no effect.
2024-01-15 18:20:45 +00:00
Jay Foad
ed60cb8fb9
[AMDGPU] Disable hasVALUPartialForwardingHazard for GFX12 (#78188) 2024-01-15 18:20:10 +00:00
Jay Foad
85705bbf1d
[AMDGPU] Disable hasVALUMaskWriteHazard for GFX12 (#78187) 2024-01-15 18:19:32 +00:00
Jonas Paulsson
e2ce91f48c Fix test output for 3b16d8c 2024-01-15 12:04:00 -06:00
Tuan Chuong Goh
22c24be37c [AArch64][GlobalISel] Pre-commit for Combine vecreduce(ext) to {U/S}ADDLV 2024-01-15 17:54:52 +00:00
chuongg3
fcfe1b6482
[GlobalISel] Refactor extractParts() (#75223)
Moved extractParts() and extractVectorParts() from LegalizerHelper
to Utils to be able to use it in different passes.

extractParts() will also try to use unmerge when doing irregular
splits where possible, falling back to extract elements when not.
2024-01-15 16:40:39 +00:00
Dávid Ferenc Szabó
0ff3d729f9
[GlobalISel] Make IRTranslator able to handle PHIs with empty types. (#73235)
SelectionDAG already handle this since
e53b7d1a11d180ed7b33190a837d8898ab2a0b71.
2024-01-15 23:26:30 +07:00
Hans Wennborg
677ced8af2 Require asserts for llvm/test/CodeGen/PowerPC/fence.ll 2024-01-15 17:25:49 +01:00
Jonas Paulsson
3b16d8c8ea
[SystemZ] Don't crash on undef source in shouldCoalesce() (#78056)
SystemZRegisterInfo::shouldCoalesce() has to be able to handle an undef
source.
2024-01-15 17:24:38 +01:00
Jay Foad
f3d07881c8
[AMDGPU] Remove functions with incompatible gws attribute (#78143)
This change is to remove incompatible gws related functions
in order to make device-libs work correctly under -O0 for
gfx1200+

Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
2024-01-15 16:23:39 +00:00
Amara Emerson
c32d02efd2 [AArch64][GlobalISel] Fix not extending GPR32->GPR64 result of anyext indexed load.
Was causing assertions to fail.
2024-01-15 08:22:39 -08:00
Nathan Gauër
0e1037edbf
[SPIR-V] Strip convergence intrinsics before ISel (#75948)
The structurizer will require the frontend to emit convergence
intrinsics. Once uses to restructurize the control-flow, those
intrinsics shall be removed, as they cannot be converted to
SPIR-V.

This commit adds a new pass to the SPIR-V backend which strips those
intrinsics.

Those 2 new steps are not limited to Vulkan as OpenCL could
also benefit from not crashing if a convertent operation is in
the IR (even though the frontend doesn't generate such intrinsics).

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-01-15 11:35:35 +01:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Qiu Chaofan
ce1f9465b0 [NFC] Pre-commit case of ppcf128 extractelt soften 2024-01-15 15:27:36 +08:00
Luke Lau
3b7abf38fb [RISCV] Add disjoint flag to or ops in RISCVGatherScatterLowering tests. NFC
InstCombine will add the disjoint flag to these or instructions. This patch
adds them to the tests so that it matches the input RISCVGatherScatterLowering
will receive in practice, allowing us to rely on said disjoint flag:
https://github.com/llvm/llvm-project/pull/77800#discussion_r1449231844
2024-01-15 14:09:27 +07:00
Luke Lau
0cf768e7f1
[RISCV] Handle disjoint or in RISCVGatherScatterLowering (#77800)
This patch adds support for the disjoint flag in the non-recursive case,
as well as adding an additional check for it in the recursive case. Note
that haveNoCommonBitsSet should be equivalent to having the disjoint
flag set, and the check can be removed in a follow-up patch.

Co-authored-by: Philip Reames <preames@rivosinc.com>

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2024-01-15 13:37:09 +07:00
Luke Lau
c07a1fe7b4
[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699)
Currently vfmv.s.f intrinsics are directly selected to their pseudos via
a
tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move
instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their
corresponding VL SDNode, then get selected from a pattern in
RISCVInstrInfoVVLPatterns.td

This patch brings vfmv.s.f inline with the other move instructions.

Split out from #71501, where we did this to preserve the behaviour of
selecting
vmv_s_x for VFMV_S_F_VL for small enough immediates.
2024-01-15 12:07:29 +07:00
Qiu Chaofan
85071a3c74
[PowerPC] Implement fence builtin (#76495) 2024-01-15 11:19:16 +08:00
Nicholas Mosier
49138d97c0
[X86] Fix SLH crash on llvm.eh.sjlh.longjmp (#77959)
Fix #60081.
2024-01-14 12:03:18 +08:00
Heejin Ahn
d871f40deb
[WebAssembly] Use DebugValueManager only when subprogram exists (#77978)
We previously scanned the whole BB for `DBG_VALUE` instruction even when
the program doesn't have debug info, i.e., the function doesn't have a
subprogram associated with it, which can make compilation unnecessarily
slow. This disables `DebugValueManager` when a `DISubprogram` doesn't
exist for a function.

This only reduces unnecessary work in non-debug mode and does not change
output, so it's hard to add a test to test this behavior.

Test changes were necessary because their `DISubprogram`s were not
correctly linked with the functions, so with this PR the compiler
incorrectly assumed the functions didn't have a subprogram and the tests
started to fail.

Fixes https://github.com/emscripten-core/emscripten/issues/21048.
2024-01-13 14:55:54 -08:00
Durgadoss R
8d817f6479
[LLVM][NVPTX]: Add aligned versions of cluster barriers (#77940) 2024-01-13 10:41:19 +01:00
Usman Nadeem
792fa23c1b
[AArch64][SVE2] Lower OR to SLI/SRI (#77555)
Code builds on NEON code and the tests are adapted from NEON tests
minus the tests for illegal types.
2024-01-12 11:23:56 -08:00
Min-Yih Hsu
2f2217a8f7
[RISCV] Add missing tests for inttoptr/ptrtoint on scalable vectors (#77857)
Add missing tests for inttoptr/ptrtoint on scalable vectors. Previously we only had inttoptr/ptrtoint tests for fixed vectors.
2024-01-12 09:52:07 -08:00