52521 Commits

Author SHA1 Message Date
Thorsten Schütt
deefe3fbc9
[GlobalIsel] Post-review combine ADDO (#85961)
https://github.com/llvm/llvm-project/pull/82927
2024-03-21 03:56:40 +01:00
Freddy Ye
07a5e31cb3
Move pre-commit test for #85737 (#86062) 2024-03-21 10:55:26 +08:00
Freddy Ye
35a66f965c
Precommit test for #85737 (#86056)
Copied from llvm/test/CodeGen/X86/domain-reassignment.mir
2024-03-21 10:19:28 +08:00
Paul Kirth
f6f474c4ef
[llvm][lld] Pre-commit tests for RISCV TLSDESC symbols
Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as
TLS symbols, when they should not be. This patch adds tests with the
current incorrect behavior, and subsequent patches will address the
issue.

Reviewers: MaskRay, topperc

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/85816
2024-03-20 13:39:39 -07:00
S. Bharadwaj Yadavalli
3f39571228
[DirectX][DXIL] Distinguish return type for overload type resolution. (#85646)
Return type of DXIL Ops may be different from valid overload type of the
parameters, if any. Such DXIL Ops are correctly represented in DXIL.td.
However, DXILEmitter assumes the return type to be the same as parameter
overload type, if one exists. This results in generation in incorrect
overload index value in DXILOperation.inc for the DXIL Op and incorrect
DXIL operation function call in DXILOpLowering pass.

This change distinguishes return types correctly from parameter overload
types in DXILEmitter backend to handle such DXIL ops.

Add specification for DXIL Op `isinf` and corresponding tests to verify
the above change.

Fixes issue #85125
2024-03-20 14:48:16 -04:00
Craig Topper
891172d9be
[RISCV] Use 'riscv-isa' module flag to set ELF flags and attributes. (#85155)
Walk all the ISA strings and set the subtarget bits for any extension we
find in any string.

This allows LTO output to have a ELF attributes from the union of all of
the files used to compile it.
2024-03-20 11:35:19 -07:00
Vyacheslav Levytskyy
c2483ed52d
[SPIRV] Add __spirv_ builtins for existing instructions (#85654)
This PR:
* adds __spirv_ builtins for existing instructions;
* fixes parsing of "syncscope" values in atomic instructions;
* fix a special case of binary header emision.
2024-03-20 19:28:29 +01:00
Vyacheslav Levytskyy
949d70d5e0
[SPIR-V] Fix incorrect bitwise instructions applied to the bool type (#85929)
This PR ensures that LLVM IR bitwise instructions result in logical
SPIR-V instructions when applied to i1 type.
2024-03-20 19:23:12 +01:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Craig Topper
576d81baa5
[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887)
Previously we used memory like we do to move between GPRs and FPR64 with
the D extension on RV32.

We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register
allocation how to do the copy without memory.
2024-03-20 08:44:24 -07:00
Thomas Lively
767e0c8bce
[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880)
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.

Fixes #63817.
2024-03-20 08:42:42 -07:00
Neumann Hon
5fb2797f23
[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730)
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:

First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.

Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
2024-03-20 10:30:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Benjamin Kramer
5f5a64134b Revert "[DAGCombiner] Simplifying {si|ui}tofp when only signbit is needed"
This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes
when it encounters an UINT_TO_FP.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"
2024-03-20 15:08:37 +01:00
Pravin Jagtap
e52a687871
[AMDGPU][NFC] Test clean up (#85922)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-20 17:29:42 +05:30
Pravin Jagtap
070d1e8321
[AMDGPU] Add test for fpext & fptrunc with bf16. (#85909)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-20 14:45:38 +05:30
YunQiang Su
d7e28cd82b
MIPS: Support -m(no-)unaligned-access for r6 (#85174)
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.

Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.

If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.

To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
2024-03-20 14:18:24 +08:00
Peter Rong
4a026b5092
[AMDGCN] Use ZExt when handling indices in insertment element (#85718)
When i1 true is used as an index, SExt extends it to i32 -1. This would
cause BitVector to overflow.
The language manual have specified that the index shall be treated as an
unsigned number, this patch fixes that.
(https://llvm.org/docs/LangRef.html#insertelement-instruction)

This patch fixes #85717

---------

Signed-off-by: Peter Rong <PeterRong96@gmail.com>
2024-03-19 21:44:08 -07:00
Jiahan Xie
4bf06bebb9
[GISEL][RISCV] IRTranslator for scalable vector load (#80006)
Add IRTranslator for scalable vector load instruction and include
corresponding tests with alignment argument included, which can be
smaller/equal/larger than element size or smaller/equal/larger than the
minimum total vector size.
2024-03-19 20:12:26 -04:00
Alex MacLean
888e284903
[NVPTX] Use PTX prmt for llvm.bswap (#85545) 2024-03-19 15:18:53 -07:00
Noah Goldstein
353fbeb0a2 [DAGCombiner] Simplifying {si|ui}tofp when only signbit is needed
If we only need the signbit `uitofp` simplified to 0, and `sitofp`
simplifies to `bitcast`.

Closes #85138
2024-03-19 17:17:35 -05:00
Noah Goldstein
ebd1379663 [DAGCombiner] Add tests for simplifying {si|ui}tofp; NFC 2024-03-19 17:17:35 -05:00
quic-areg
31f4b329c8
[Hexagon] ELF attributes for Hexagon (#85359)
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.

The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.
2024-03-19 16:22:30 -05:00
Simon Pilgrim
2377b9773d [DAG] SimplifyShift - shift i1/vXi1 X, Y --> X (any non-zero shift amount is undefined).
Alive2: https://alive2.llvm.org/ce/z/SdESbg

Fixes #85681
2024-03-19 20:18:37 +00:00
Changpeng Fang
ab76052fa9
AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721) 2024-03-19 09:58:09 -07:00
Luke Lau
e59f120e3a [RISCV] Add test for strided load combine regression. NFC
This adds a reduced test case for the regression seen in x264 with #83035.
If the intermediate concatenating shuffles are large enough then the
splitting combine will prevent the strided load combine which is
preferable.
2024-03-20 00:38:23 +08:00
Farzon Lotfi
081a66ffac
[DXIL] implement dot intrinsic lowering for integers (#85662)
this implements part 1 of 2 for #83626
- `CGBuiltin.cpp` - modified to have seperate cases for signed and
unsigned integers.
- `SemaChecking.cpp` - modified to prevent the generation of a double
dot product intrinsic if the builtin were to be called directly.
- `IntrinsicsDirectX.td` creation of the signed and unsigned dot
intrinsics needed for instruction expansion.
- `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for
integer dot product.
2024-03-19 12:03:43 -04:00
Simon Pilgrim
66125ad8e9 [X86] Add test coverage for vector avgceils/avgceilu/avgfloors/avgflooru test patterns
SSE only has AVGCEILU vXi8/vXi16 support - but for other types we should be trying to use the fixed width expansion instead of extensions
2024-03-19 15:32:40 +00:00
ostannard
ef395a492a
[AArch64] Add soft-float ABI (#84146)
This is re-working of #74460, which adds a soft-float ABI for AArch64.
That was reverted because it causes errors when building the linux and
fuchsia kernels.

The problem is that GCC's implementation of the ABI compatibility checks
when using the hard-float ABI on a target without FP registers does it's
checks after optimisation. The previous version of this patch reported
errors for all uses of floating-point types, which is stricter than what
GCC does in practice.

This changes two things compared to the first version:
* Only check the types of function arguments and returns, not the types
of other values. This is more relaxed than GCC, while still guaranteeing
ABI compatibility.
* Move the check from Sema to CodeGen, so that inline functions are only
checked if they are actually used. There are some cases in the linux
kernel which depend on this behaviour of GCC.
2024-03-19 13:58:51 +00:00
Ulrich Weigand
335f365982 Reapply: [SystemZ] Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 14:07:08 +01:00
Shourya Goel
92764c99e9
[DAG] Matched Fixedwidth Pattern for ISD::AVGCEILU (#85031)
Fixes: #84753
2024-03-19 13:02:37 +00:00
Luke Lau
ef520ca6b1 Revert "[RISCV] Recursively split concat_vector into smaller LMULs (#83035)"
This reverts commit c59129a7c79448837d665de8f2743ad4b14666f6.

This causes regressions in some x264 workloads like pixel_var_8x8 due to it
interfering with the strided load combine. Reverting so I can try to rework
it as a lowering instead.
2024-03-19 20:59:03 +08:00
Simon Pilgrim
9f433bf8ca [X86] Add PAVG(0,x) test coverage for PR #85581 2024-03-19 12:39:47 +00:00
Pravin Jagtap
08701e35ed
[AMDGPU][NFC] Test clean up. (#85775)
Added common check for DPP and Iterative strategies for uniform value
case since optimization applied is same.

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-19 18:00:34 +05:30
Pierre van Houtryve
953c13b5c9
[AMDGPU][PromoteAlloca] Whole-function alloca promotion to vector (#84735)
Update PromoteAllocaToVector so it considers the whole function before promoting allocas.
Allocas are scored & sorted so the highest value ones are seen first. The budget is now per function instead of per alloca.

Passed internal performance testing.
2024-03-19 11:49:22 +01:00
Ulrich Weigand
d1c3795968 Revert "Fix overflow flag for i128 USUBO"
This reverts commit d9c31ee9568277e4303715736b40925e41503596.
2024-03-19 11:43:05 +01:00
Ulrich Weigand
d9c31ee956 Fix overflow flag for i128 USUBO
We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement
USUBO/USUBO_CARRY for the i128 data type.  However, these instructions
use an inverted sense of the borrow indication flag (a value of 1
indicates *no* borrow, while a value of 0 indicated borrow).  This
does not match the semantics of the boolean "overflow" flag of the
USUBO/USUBO_CARRY ISD nodes.

Fix this by generating code to explicitly invert the flag.  These
cancel out of the result of USUBO feeds into an USUBO_CARRY.

To avoid unnecessary zero-extend operations, also improve the
DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc)))
sequences where appropriate.

Fixes: https://github.com/llvm/llvm-project/issues/83268
2024-03-19 11:20:52 +01:00
Shourya Goel
703920d413
[DAG] Matched FixedWidth pattern for ISD::AVGFLOORU (#84903)
Fixes: #84749
2024-03-19 08:29:55 +00:00
Adrian Kuegel
f0a5e50550 [llvm][NVPTX] Add missing feature guard. 2024-03-19 06:53:14 +00:00
Jonas Paulsson
8b8e1adbde
[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879)
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
2024-03-18 17:21:50 -04:00
David Green
9a784303a3
[AArch64][GlobalISel] Legalize small G_TRUNC (#85625)
This is an alternative to #85610, that moreElement's small G_TRUNC
vectors to widen the vectors. It needs to disable one of the existing
Unmerge(Trunc(..)) combines, and some of the code is not as optimal as
it could be. I believe with some extra optimizations it could look
better (I was thinking combining trunc(buildvector) -> buildvector and
possibly improving buildvector lowering by generating
insert_vector_element earlier).
2024-03-18 10:04:31 -07:00
Jonas Paulsson
09bc6abba6
[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001)
- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.

- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
2024-03-18 10:37:59 -04:00
Qiu Chaofan
e5b20c83e5
[PowerPC] Update chain uses when emitting lxsizx (#84892) 2024-03-18 22:31:05 +08:00
Vyacheslav Levytskyy
59f34e8c2b
[SPIRV] Add Lifetime intrinsics/instructions (#85391)
This PR:
* adds Lifetime intrinsics/instructions
* fixes how the binary header is emitted (correct version and better
approximation of Bound)
* add validation into more test cases
2024-03-18 11:42:44 +01:00
Yingwei Zheng
38a44bdc93
[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572)
In commit
2b582440c1,
we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better
analysis/codegen (See also the discussion in
https://github.com/llvm/llvm-project/pull/76338).

This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass`
is not supported by the target, it will be expanded by TLI.

Fixes the regression introduced by
2b582440c1
and
https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.
2024-03-18 18:27:45 +08:00
pvanhout
3493438605 Revert "[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333)"
This reverts commit 9b98692eedb78aa106539c36ba02944f32cae1ff.
2024-03-18 11:18:57 +01:00
Sander de Smalen
7fad304a03
[AArch64][SME] Make coalescer barrier available without +sme. (#85311)
For each call that changes the streaming-mode ISel inserts a
COALESCER_BARRIER node for the FP and (non-scalable) vector arguments to
the callee.

When calling a non-streaming function from a streaming-compatible
function, it's not required to have +sme (in case the SME code-path is
not actually executed at runtime). The patterns to match the
COALESCER_BARRIER however were still predicated with `HasSME`, which is
incorrect. This patch tries to fix that.
2024-03-18 09:43:03 +00:00
Pierre van Houtryve
9b98692eed
[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333)
This change allows us to use `--lto-partitions` in some cases (not at
all guaranteed it works perfectly), as LDS is lowered before the module
is split for parallel codegen.

We must run LowerLDS before splitting modules as it needs to see all
callers of functions with LDS to properly lower them.
2024-03-18 09:09:43 +01:00
Qiu Chaofan
65ae09eeb6
[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040)
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.

This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
2024-03-18 14:17:16 +08:00
Sameer Sahasrabuddhe
ec34699f75
[GlobalISel] convergence control tokens and intrinsics (#67006)
[GlobalISel] Implement convergence control tokens and intrinsics in GMIR

In the IR translator, convert the LLVM token type to LLT::token(), which is an
alias for the s0 type. These show up as implicit uses on convergent operations.

Differential Revision: https://reviews.llvm.org/D158147
2024-03-18 10:34:11 +05:30