52796 Commits

Author SHA1 Message Date
Simon Pilgrim
a277dd82d8 [X86] vector-half-conversions.ll - add v4f16->v4i32 fptosi/fptoui test coverage 2024-03-22 13:54:42 +00:00
Simon Pilgrim
74c3150ffc [X86] Add shuffle tests from Issue #86076
SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary
2024-03-22 11:23:36 +00:00
Farzon Lotfi
79c32eb03d
[DXIL] Add lowerings for cosine and floor (#86173)
Completes #86170
Completes #86172
- `DXIL.td` - Add changes to lower the cosine and floor intrinsics to
dxilOps.
2024-03-22 07:02:47 -04:00
Farzon Lotfi
d8e5c0b4e5
[DXIL] Complete abs lowering (#86158)
This change completes #86155
- `DXIL.td` - lowering `fabs` intrinsic to the float dxil op.
- `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs
case.
2024-03-22 07:01:01 -04:00
XChy
cb4453dc69
[SelectionDAG] Prevent combination on inconsistent type in combineCarryDiamond (#84888)
Fixes #84831
When matching carry pattern with `getAsCarry`, it may produce different
type of carryout. This patch checks such case and does early exit.

I'm new to DAG, any suggestion is appreciated.
2024-03-22 16:05:20 +05:30
David Green
99d8c25b31 [AArch64] Extra tests for v2i8 concat loads. NFC 2024-03-22 09:55:18 +00:00
Chen Zheng
90454a6098
[PowerPC][AIX] support explicit sections for -ffunction-sections (#85351)
Fix crashes in https://godbolt.org/z/6voEa1o6Y
2024-03-22 13:23:36 +08:00
Pravin Jagtap
e1a8120a63
[AMDGPU] Support double type in atomic optimizer. (#84307)
Presently the atomic optimizer supports only 32-bit operations. Plan is
to extend the atomic optimizer for 64-bit operations for compute and
graphics. This patch extends support for double type for `uniform
values` only. Going forward, will extend the support for divergent
values. Adding support for divergent values requires
extending/legalizing readfirstlane, readlane, writelane, etc ops for
64-bit operations to avoid `bitcast` noise that we have currently.

---------

Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-22 09:25:06 +05:30
Craig Topper
c67ed2f1e1
[SelectionDAG][RISCV] Use TypeSize version of ComputeValueVTs in TargetLowering::LowerCallTo. (#86166)
This is needed to support non-intrinsic functions returning tuple types
which are represented as structs with scalable vector types in IR.

I suspect this may have been broken since
https://reviews.llvm.org/D158115
2024-03-21 20:35:08 -07:00
Freddy Ye
3e4caa9da4
[X86] Support DomainReassignment for APX NDD instructions (#85737) 2024-03-22 08:52:40 +08:00
paperchalice
a2dfc9ac7d
[NewPM][AMDGPU] Add AMDGPUPassRegistry.def (#86095)
Move the pass registry to a separate file, prepare for porting dag-isel.
2024-03-22 08:49:29 +08:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
Luke Lau
51d5b65819
[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL (#84158)
We can remove the restriction that the narrow type needs to be exactly
EEW / 2 for scalable ISD::{ADD,SUB,MUL} nodes. This allows us to perform
the combine even if we can't fully fold the extend into the widening op.

VP intrinsics already do this, since they are lowered to _VL nodes which
don't have this restriction.

The "exactly EEW / 2" narrow type restriction prevented us from emitting
V{S,Z}EXT_VL nodes with i1 element types which crash when we try to
select them, since no other legal type is double the size of i1, see the
test case added in this PR `i1_zext`. So to preserve this, this adds a
check for i1 narrow types instead.
2024-03-22 07:26:29 +08:00
Luke Lau
06d245242e
[RISCV] Recursively split concat_vector into smaller LMULs when lowering (#85825)
This is a reimplementation of the combine added in #83035 but as a
lowering instead of a combine, so we don't regress the test case added
in e59f120e3a14ccdc55fcb7be996efaa768daabe0 by interfering with the
strided load combine

Previously the combine had to concatenate the split vectors with
insert_subvector instead of concat_vectors to prevent an infinite
combine loop. And the reasoning behind keeping it as a combine was
because if we emitted the insert_subvector during lowering then we
didn't fold away inserts of undef subvectors.

However it turns out we can avoid this if we just do this in lowering
and select a concat_vector directly, since we get the undef folding for
free with `DAG.getNode(ISD::CONCAT_VECTOR, ...)` via foldCONCAT_VECTORS.
2024-03-22 07:08:51 +08:00
Simon Pilgrim
3218570620 [X86] Add shuffle test case for Issue #86068 2024-03-21 17:56:58 +00:00
Craig Topper
f5c90f3000
[RISCV] Use BuildPairF64 and SplitF64 for bitcast i64<->f64 on rv32 regardless of Zfa. (#85982)
Previously we used BuildPairF64 and SplitF64 only if Zfa was supported
since they will select register file moves that are only available with
Zfa.

We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to
not go through memory so we should use that for bitcast.

That leaves the D without Zfa case that does need to go through memory.
Previously we let type legalization expand to loads and stores using a
new stack temporary created for each bitcast. After this patch we will
create the loads ands stores in the custom inserter and share the same
stack slot for all. This also allows DAGCombiner to optimize when
bitcast is mixed with BuildPairF64/SplitF64.
2024-03-21 08:52:51 -07:00
Craig Topper
7678e6e562
[RISCV] Lower the alignment requirement for a GPR pair spill for Zdinx on RV32. (#85871)
I believe we can use XLen alignment as long as eliminateFrameIndex
limits the maximum folded offset to 2043. This way when we split
the load/store into two 2 instructions we'll be able to add 4
without overflowing simm12.
2024-03-21 08:14:48 -07:00
Jonas Paulsson
b4b5e8277a
Check for all frame instructions in finalize isel. (#85945)
Check for all frame instructions in finalize isel, not just for the
frame setup opcode. This was proven necessary, see #78001 
for discussion.
2024-03-21 11:00:08 -04:00
David Green
686f4599cf [ARM] Regenerate some check lines. NFC 2024-03-21 13:45:44 +00:00
SahilPatidar
3ac243bc0d
Update amdgpu_gfx functions to use s0-s3 for inreg SGPR arguments on targets using scratch instructions for stack #78226 (#81394)
Resolve #78226
2024-03-21 16:52:08 +05:30
AtariDreams
7e72cafd68
[SelectionDAG] Add MaskedValueIsZero check to allow folding of zero extended variables we know are safe to extend (#85573)
Add ones for every high bit that will cleared.

This will allow us to evaluate variables that have their bits known to
see if they have no risk of overflow despite the shift amount being
greater than the difference between the two types.
2024-03-21 16:45:17 +05:30
Pierre van Houtryve
95a834a16c
(Reland) [AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#85626)
Reland of #75333
2024-03-21 11:44:47 +01:00
Pierre van Houtryve
ccb3a8feaa
[AMDGPU][LowerModuleLDS] Refactor partially lowered module detection (#85793)
Refactor the logic that checks if a module contains mixed
absolute/non-lowered LDS GVs.

The check now happens latter when the "worklists" are formed. This is
because in some cases (OpenMP) we can have non-lowered GVs in a lowered
module, and this is normal because those GVs are just unused and removed
from the list at some point before the end of `getUsesOfLDSByFunction`.

Doing the check later ensures that if a mixed module is spotted, then
it's a _real_ mixed module that needs rejection, not a module containing
an intentionally ignored GV.
2024-03-21 11:28:35 +01:00
Matt Arsenault
b6b703b2df
AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)
SIMachineFunctionInfo has a scan  of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
    
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
2024-03-21 14:24:06 +05:30
Madhur Amilkanthwar
7bb87d5338
[AArch64][GlobalISel] Take abs scalar codegen closer to SDAG (#84886)
This patch improves codegen for scalar (<128bits) version
of llvm.abs intrinsic by using the existing non-XOR based lowering.
This takes the generated code closer to SDAG.

codegen with GISel for > 128 bit types is not very good
with these method so not doing so.
2024-03-21 09:54:03 +05:30
Thorsten Schütt
deefe3fbc9
[GlobalIsel] Post-review combine ADDO (#85961)
https://github.com/llvm/llvm-project/pull/82927
2024-03-21 03:56:40 +01:00
Freddy Ye
07a5e31cb3
Move pre-commit test for #85737 (#86062) 2024-03-21 10:55:26 +08:00
Freddy Ye
35a66f965c
Precommit test for #85737 (#86056)
Copied from llvm/test/CodeGen/X86/domain-reassignment.mir
2024-03-21 10:19:28 +08:00
Paul Kirth
f6f474c4ef
[llvm][lld] Pre-commit tests for RISCV TLSDESC symbols
Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as
TLS symbols, when they should not be. This patch adds tests with the
current incorrect behavior, and subsequent patches will address the
issue.

Reviewers: MaskRay, topperc

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/85816
2024-03-20 13:39:39 -07:00
S. Bharadwaj Yadavalli
3f39571228
[DirectX][DXIL] Distinguish return type for overload type resolution. (#85646)
Return type of DXIL Ops may be different from valid overload type of the
parameters, if any. Such DXIL Ops are correctly represented in DXIL.td.
However, DXILEmitter assumes the return type to be the same as parameter
overload type, if one exists. This results in generation in incorrect
overload index value in DXILOperation.inc for the DXIL Op and incorrect
DXIL operation function call in DXILOpLowering pass.

This change distinguishes return types correctly from parameter overload
types in DXILEmitter backend to handle such DXIL ops.

Add specification for DXIL Op `isinf` and corresponding tests to verify
the above change.

Fixes issue #85125
2024-03-20 14:48:16 -04:00
Craig Topper
891172d9be
[RISCV] Use 'riscv-isa' module flag to set ELF flags and attributes. (#85155)
Walk all the ISA strings and set the subtarget bits for any extension we
find in any string.

This allows LTO output to have a ELF attributes from the union of all of
the files used to compile it.
2024-03-20 11:35:19 -07:00
Vyacheslav Levytskyy
c2483ed52d
[SPIRV] Add __spirv_ builtins for existing instructions (#85654)
This PR:
* adds __spirv_ builtins for existing instructions;
* fixes parsing of "syncscope" values in atomic instructions;
* fix a special case of binary header emision.
2024-03-20 19:28:29 +01:00
Vyacheslav Levytskyy
949d70d5e0
[SPIR-V] Fix incorrect bitwise instructions applied to the bool type (#85929)
This PR ensures that LLVM IR bitwise instructions result in logical
SPIR-V instructions when applied to i1 type.
2024-03-20 19:23:12 +01:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Craig Topper
576d81baa5
[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887)
Previously we used memory like we do to move between GPRs and FPR64 with
the D extension on RV32.

We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register
allocation how to do the copy without memory.
2024-03-20 08:44:24 -07:00
Thomas Lively
767e0c8bce
[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880)
Previously we expected lane constants to be in the range of signed
values for each lane size, but the included test case produced large
unsigned values that fall outside that range. Allow instruction
selection to proceed in this case rather than failing.

Fixes #63817.
2024-03-20 08:42:42 -07:00
Neumann Hon
5fb2797f23
[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730)
The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@
and @ respectively) are, in hindsight, poor choices for multiple
reasons:

First, there exist externally visible routines from the language
environment that begin with @@. These functions are certainly not
local/private by any means and they should not share a prefix with
private globals.

Secondly, both private globals and private labels should be handled the
same way by GOFF, so it doesn't make much sense for them to have
separate prefixes. GOFF remains the only file format where these are
different and there is no reason for that to be the case
2024-03-20 10:30:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Benjamin Kramer
5f5a64134b Revert "[DAGCombiner] Simplifying {si|ui}tofp when only signbit is needed"
This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes
when it encounters an UINT_TO_FP.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"
2024-03-20 15:08:37 +01:00
Pravin Jagtap
e52a687871
[AMDGPU][NFC] Test clean up (#85922)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-20 17:29:42 +05:30
Pravin Jagtap
070d1e8321
[AMDGPU] Add test for fpext & fptrunc with bf16. (#85909)
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
2024-03-20 14:45:38 +05:30
YunQiang Su
d7e28cd82b
MIPS: Support -m(no-)unaligned-access for r6 (#85174)
MIPSr6 ISA requires normal load/store instructions support
misunaligned memory access, while it is not always do so
by hardware. On some microarchitectures or some corner cases
it may need support by OS.

Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't
support them, instead, r6 requires lw instruction support
misunaligned memory access. So, if -mstrict-align is used for
pre-R6, lwl/lwr won't be disabled.

If -mstrict-align is used for r6 and the access is not well
aligned, some lb/lh instructions will be used to replace lw.
This is useful for OS kernels.

To be back-compatible with GCC, -m(no-)unaligned-access are also
added as Neg-Alias of -m(no-)strict-align.
2024-03-20 14:18:24 +08:00
Peter Rong
4a026b5092
[AMDGCN] Use ZExt when handling indices in insertment element (#85718)
When i1 true is used as an index, SExt extends it to i32 -1. This would
cause BitVector to overflow.
The language manual have specified that the index shall be treated as an
unsigned number, this patch fixes that.
(https://llvm.org/docs/LangRef.html#insertelement-instruction)

This patch fixes #85717

---------

Signed-off-by: Peter Rong <PeterRong96@gmail.com>
2024-03-19 21:44:08 -07:00
Jiahan Xie
4bf06bebb9
[GISEL][RISCV] IRTranslator for scalable vector load (#80006)
Add IRTranslator for scalable vector load instruction and include
corresponding tests with alignment argument included, which can be
smaller/equal/larger than element size or smaller/equal/larger than the
minimum total vector size.
2024-03-19 20:12:26 -04:00
Alex MacLean
888e284903
[NVPTX] Use PTX prmt for llvm.bswap (#85545) 2024-03-19 15:18:53 -07:00
Noah Goldstein
353fbeb0a2 [DAGCombiner] Simplifying {si|ui}tofp when only signbit is needed
If we only need the signbit `uitofp` simplified to 0, and `sitofp`
simplifies to `bitcast`.

Closes #85138
2024-03-19 17:17:35 -05:00
Noah Goldstein
ebd1379663 [DAGCombiner] Add tests for simplifying {si|ui}tofp; NFC 2024-03-19 17:17:35 -05:00
quic-areg
31f4b329c8
[Hexagon] ELF attributes for Hexagon (#85359)
Defines a subset of attributes and emits them to a section called
.hexagon.attributes.

The current attributes recorded are the attributes needed by
llvm-objdump to automatically determine target features and eliminate
the need to manually pass features.
2024-03-19 16:22:30 -05:00
Simon Pilgrim
2377b9773d [DAG] SimplifyShift - shift i1/vXi1 X, Y --> X (any non-zero shift amount is undefined).
Alive2: https://alive2.llvm.org/ce/z/SdESbg

Fixes #85681
2024-03-19 20:18:37 +00:00
Changpeng Fang
ab76052fa9
AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721) 2024-03-19 09:58:09 -07:00