48233 Commits

Author SHA1 Message Date
Nitin John Raj
af8e386102 [RISCV][GlobalISel] Add lowerFormalArguments for calling convention
This patch adds an IncomingValueHandler and IncomingValueAssigner, and implements minimal support for lowering formal arguments according to the RISC-V calling convention. Simple non-aggregate integer and pointer types are supported.

In the future, we must correctly handle byval and sret pointer arguments, and instances where the number of arguments exceeds the number of registers.

Coauthored By: lewis-revill

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D74977
2023-05-30 13:42:49 -07:00
Philip Reames
b07d08bb85 [RISCV] Add additional vslide1up test coverage
Add another form of the same pattern (as_rotate tests), and add coverage for a couple corner cases I got wrong at first in an upcoming rewrite.
2023-05-30 10:53:58 -07:00
Philip Reames
0bb23c58be [RISCV] Rename vslide1down tests (should have been part of 24172de) 2023-05-30 10:32:24 -07:00
Philip Reames
24172de17d [RISCV] Add tests for vslide1down shuffle/insert idiom 2023-05-30 10:24:43 -07:00
Simon Pilgrim
0ec79f413e [X86] Regenerate sqrt-fastmath-mir.ll 2023-05-30 17:21:53 +01:00
Igor Kirillov
40a81d3100 [CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass
This patch updates several functions in LLVM's IR generation code to accept
an IRBuilder object as an argument, rather than an Instruction that indicates
the insertion point for new instructions.
This change is necessary to handle sophisticated -Ofast optimization cases
from D148558 where it's unclear which instructions should be used as the
insertion point for new operations.

Differential Revision: https://reviews.llvm.org/D148703
2023-05-30 16:18:28 +00:00
Philip Reames
544a240ff7 [RISCV] Use v(f)slide1up for shuffle+insert idiom
This is pretty straight forward in the basic form. I did need to move the slideup matching earlier, but that looks generally profitable on it's own.

As follow ups, I plan to explore the v(f)slide1down variants, and see what I can do to canonicalize the shuffle then insert pattern (see _inverse tests at the end of the vslide1up.ll test).

Differential Revision: https://reviews.llvm.org/D151468
2023-05-30 07:37:41 -07:00
Simon Pilgrim
ab4b924832 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.
2023-05-30 13:17:26 +01:00
Igor Kirillov
48339d0fbb [CodeGen] Add pre-commit tests for D148558
This patch adds four new tests for upcoming functionality in LLVM:
* complex-deinterleaving-add-mull-fixed-contract.ll
* complex-deinterleaving-add-mull-scalable-contract.ll
* complex-deinterleaving-add-mull-fixed-fast.ll
* complex-deinterleaving-add-mull-scalable-fast.ll.

These tests were generated from the IR of vectorizable loops, which were
compiled from C++ code using different optimization flags in Clang. Each pair
of tests corresponds to Neon and SVE architectures, respectively, and
each pair contains tests compiled with -Ofast and -O3 -ffp-contract=fast
-ffinite-math-only optimization flags.
The tests were stripped of nnan and ninf flags as they have no impact on the
output.
The primary objective of these tests is to show the various sequences of
complex computations that may be encountered and to demonstrate the ability
of ComplexDeinterleaving to support any ordering.

Depends on D147451

Differential Revision: https://reviews.llvm.org/D148550
2023-05-30 11:49:59 +00:00
Simon Pilgrim
95661b9c75 [X86] getTargetConstantBitsFromNode - support extracting fp data from ConstantDataSequential
Fixes issue introduced by 0f8e0f4228805cbecce13dcfadef4c48a4f0f4cd where SimplifyDemandedBits could crash when trying to extract fp data from broadcasted constants
2023-05-30 11:38:31 +01:00
Alex Bradbury
c4efcd6970 [RISCV] Generalise shouldExtendTypeInLibcall logic to apply to all <XLEN floats on soft ABIs
This results in improved codegen for half/bf16 libcalls on soft ABIs

Adds a RISCVSubtarget helper method for determining if a soft FP ABI is
being targeted (future bf16 related patches make use of this).

Differential Revision: https://reviews.llvm.org/D151434
2023-05-30 11:04:03 +01:00
Shao-Ce SUN
216e2820f9 [RISCV] Add more tests in zdinx-boundary-check.ll
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151534
2023-05-30 14:49:33 +08:00
Jianjian GUAN
944773436a [RISCV][NFC] Fix unmasked test for vp_cttz and vp_ctlz.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151673
2023-05-30 12:38:24 +08:00
Craig Topper
9239d3a3ea [RISCV] Teach performCombineVMergeAndVOps to handle instructions FMA instructions.
Previously we only handled instructions with merge ops that were
also masked. This patch supports instructions with merge ops that
aren't masked, like FMA.

I'm only folding into a TU vmerge for now. Supporting TA vmerge
shouldn't be much more work, but we need to make sure we get the
policy operand for the result correct. And of course we need more
tests.

Reviewed By: fakepaper56, frasercrmck

Differential Revision: https://reviews.llvm.org/D151596
2023-05-29 19:44:43 -07:00
Jianjian GUAN
071e9d7bac [RISCV] Fix unmasked vp_abs select.
Make unmasked vp_abs select to umasked instructions.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D151646
2023-05-30 09:57:40 +08:00
Alex Bradbury
9bb34ca652 [RISCV][test] Expand bfloat.ll tests to include i16 bitcasts and load/store
Pre-commit new tests used in D151663.
2023-05-29 21:38:26 +01:00
Simon Pilgrim
98061013e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.
2023-05-29 16:10:52 +01:00
Alex Bradbury
061e368fe2 [SelectionDAG] Implement soft FP legalisation for bf16 FP_EXTEND and BF16_TO_FP
As discussed in D151436, it's safe to do this as a simple shift (as is
done in LegalizeDAG.cpp) rather than needing a libcall. The added test
cases for RISC-V previously just triggered an assertion.

Codegen for bfloat_to_double will be slightly improved by D151434.

Differential Revision: https://reviews.llvm.org/D151563
2023-05-29 10:32:28 +01:00
David Green
0a762ec1b0 [ARM] Allow D-reg copies to use VMOVD with fpregs64
This instruction should be available with MVE, where we have D regs, not
requiring the full FP64 target feature.
2023-05-28 19:12:45 +01:00
Krzysztof Parzyszek
96b59b4f06 [Hexagon] Use scalar evolution to calculate pointer difference in HVC 2023-05-27 09:11:09 -07:00
Simon Pilgrim
0f8e0f4228 [X86] lowerBuildVectorAsBroadcast - broadcast Constant of original (BuildVector) element size
Noticed in D150143/D150526 - we currently create scalar Constant values using the broadcast instruction width, which might be wider than the original build vector width, making it tricky to recognise the original constant bits data.

If we have widened the broadcast value, its much more useful for asm comments if we create a ConstantVector with the original element data, add that to the constant-pool and load that with the same (wider) broadcast instruction.
2023-05-27 14:05:44 +01:00
Craig Topper
28ab032298 [RISCV] Add isel patterns to form tail undisturbed vfwadd.wv from fpextend_vl+vfwadd_vl+vp_merge.
We use a special TIED instructions for vfwadd.wv to avoid an
earlyclobber constraint preventing the first source and the destination
from being the same register.

This prevents our normal post process for forming TU instructions.
Add manual isel pattern instead. This matches what we do for FMA
for example.
2023-05-26 16:44:20 -07:00
Justin Lebar
6585599828
Fix test failure after 2be0abb7fe7 (caused by bad merge, sorry). 2023-05-26 15:31:20 -07:00
Justin Lebar
2be0abb7fe
Rewrite load-store-vectorizer.
The motivation for this change is a workload generated by the XLA compiler
targeting nvidia GPUs.

This kernel has a few hundred i8 loads and stores.  Merging is critical for
performance.

The current LSV doesn't merge these well because it only considers instructions
within a block of 64 loads+stores.  This limit is necessary to contain the
O(n^2) behavior of the pass.  I'm hesitant to increase the limit, because this
pass is already one of the slowest parts of compiling an XLA program.

So we rewrite basically the whole thing to use a new algorithm.  Before, we
compared every load/store to every other to see if they're consecutive.  The
insight (from tra@) is that this is redundant.  If we know the offset from PtrA
to PtrB, then we don't need to compare PtrC to both of them in order to tell
whether C may be adjacent to A or B.

So that's what we do.  When scanning a basic block, we maintain a list of
chains, where we know the offset from every element in the chain to the first
element in the chain.  Each instruction gets compared only to the leaders of
all the chains.

In the worst case, this is still O(n^2), because all chains might be of length
1.  To prevent compile time blowup, we only consider the 64 most recently used
chains.  Thus we do no more comparisons than before, but we have the potential
to make much longer chains.

This rewrite affects many tests.  The changes to tests fall into two
categories.

1. The old code had what appears to be a bug when deciding whether a misaligned
   vectorized load is fast.  Suppose TTI reports that load <i32 x 4> align 4
   has relative speed 1, and suppose that load i32 align 4 has relative speed
   32.

   The intent of the code seems to be that we prefer the scalar load, because
   it's faster.  But the old code would choose the vectorized load.
   accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not
   even call into TTI to get the relative speed), because the scalar load is
   aligned.

   After this patch, we will prefer the scalar load if it's faster.

2. This patch changes the logic for how we vectorize.  Usually this results in
   vectorizing more.

Explanation of changes to tests:

 - AMDGPU/adjust-alloca-alignment.ll: #1
 - AMDGPU/flat_atomic.ll: #2, we vectorize more.
 - AMDGPU/int_sideeffect.ll: #2, there are two possible locations for the call to @foo, and the pass is brittle to this.  Before, we'd vectorize in case 1 and not case 2.  Now we vectorize in case 2 and not case 1.  So we just move the call.
 - AMDGPU/adjust-alloca-alignment.ll: #2, we vectorize more
 - AMDGPU/insertion-point.ll: #2 we vectorize more
 - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117d476, which appear to have hit the bug from #1)
 - AMDGPU/multiple_tails.ll: #1
 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above).
 - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like #2, we vectorize more.
 - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above).
 - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split)
 - X86/correct-order.ll: #2, we vectorize more, probably because of changes to the chain-splitting logic.
 - X86/subchain-interleaved.ll: #2, we vectorize more
 - X86/vector-scalar.ll: #2, we can now vectorize scalar float + <1 x float>
 - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical.  It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0.
 - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll

Differential Revision: https://reviews.llvm.org/D149893
2023-05-26 15:15:39 -07:00
Craig Topper
a4f437f012 SelectionDAG: Teach ComputeKnownBits about VSCALE
This reverts commit 9b92f70d4758f75903ce93feaba5098130820d40.  The issue
with the re-applied change was an implicit truncation due to the
multiplication.  Although the operations were converted to `APInt`, the
values were implicitly converted to `long` due to the typing rules.

Fixes: #59594

Differential Revision: https://reviews.llvm.org/D140347
2023-05-26 10:48:49 -07:00
Craig Topper
c5e6c886aa [VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support.
The generic implementation is umin(TC, VF * vscale).

Lowering to vsetvli for RISC-V will come in a future patch.

This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D149916
2023-05-26 09:06:38 -07:00
Felipe de Azevedo Piovezan
1898fc1a54 [FastISel] Implement translation of entry_value dbg.value intrinsics
For dbg.value intrinsics targeting an llvm::Argument address whose expression
starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein
physical register corresponding to that Argument.

Depends on D151332

Differential Revision: https://reviews.llvm.org/D151333
2023-05-26 11:34:15 -04:00
Philip Reames
461d571e15 [RISCV] Revise test coverage for shuffle/insert idiom which become v(f)slide1ups
This fixes a couple mistakes in 0f64d4f877.  In particular, I'd not included a negative test where the slideup didn't write the entire VL, and had gotten all of my 4 element vector shuffle masks incorrect so they didn't match.  Also, add a test with swapped operands for completeness.

The transform is in D151468.
2023-05-26 08:09:57 -07:00
Zain Jaffal
0c93879d96 [AArch64] merge scaled and unscaled zero narrow stores.
This patch fixes a crash when a sclaed and unscaled zero stores are merged.

Differential Revision: https://reviews.llvm.org/D150963
2023-05-26 15:07:24 +01:00
Luo, Yuanke
969c686e54 [X86] fold select to mask instructions.
When avx512 is available the lhs operand of select instruction can be
folded with mask instruction, while the rhs operand can't. This patch is
to commute the lhs and rhs of the select instruction to create the
opportunity of folding.

Differential Revision: https://reviews.llvm.org/D151535
2023-05-26 21:53:03 +08:00
Felipe de Azevedo Piovezan
aba1bea673 [SelectionDAGBuilder] Handle entry_value dbg.value intrinsics
Summary:
DbgValue intrinsics whose expression is an entry_value and whose address is
described an llvm::Argument must be lowered to the corresponding livein physical
register for that Argument.

Depends on D151329

Reviewers: aprantl

Subscribers:
2023-05-26 06:55:49 -04:00
Felipe de Azevedo Piovezan
e8aee45be7 [IRTranslator] Implement translation of entry_value dbg.value intrinsics
For dbg.value intrinsics targeting an llvm::Argument address whose expression
starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein
physical register corresponding to that Argument.

Depends on D151328

Differential Revision: https://reviews.llvm.org/D151329
2023-05-26 06:45:01 -04:00
luxufan
9e8ed3403c [RISCV] Support '.option arch' directive
The proposal of '.option arch' directive is https://github.com/riscv-non-isa/riscv-asm-manual/pull/67

Note: For '.option arch, +/-' directive, version number is not yet supported.

Reviewed By: luismarques, craig.topper

Differential Revision: https://reviews.llvm.org/D123515
2023-05-26 18:39:41 +08:00
Luke Lau
90c4db4a2c [RISCV] Don't scalarize vector stores if volatile
As noted by @reames in https://reviews.llvm.org/D151211#4373404, we shouldn't
scalarize vector stores of constants if the store is volatile, or vector copies
if either the store or load are volatile.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D151500
2023-05-26 09:34:34 +01:00
Jay Foad
e4284a7c70 [AMDGPU] 4-align SGPR triples
Previously SGPR triples like s[3:5] were aligned on a 3-SGPR boundary
which has no basis in hardware.

Aligning them on a 4-SGPR boundary is at least justified by the
architecture reference guide which says: "Quad-alignment of SGPRs is
required for operation on more than 64-bits".

Currently there are no instructions that take SGPR triples as operands
so the issue is latent.

Differential Revision: https://reviews.llvm.org/D151463
2023-05-26 08:06:25 +01:00
Valery Pykhtin
8d0412ce9d [AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size.
The main purpose of this is to simplify register pressure tracking as after the pass there is no need
to track subreg liveness anymore.

On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs
becomes ordinary registers.

Intersting sideeffect: spill-vgpr.ll has lost a lot of spills.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D139732
2023-05-26 09:05:44 +02:00
LiaoChunyu
477d1080cb [RISCV] Custom lower vector llvm.is.fpclass to vfclass.v
After D149063.
This patch adds support for both scalable and fixed-length vector.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151176
2023-05-26 14:44:35 +08:00
Fraser Cormack
a800a4ffa1 [RISCV] Regenerate missing test checks
Codegen was different between RV32 and RV64 so the single unified CHECK
was skipping these functions.
2023-05-26 07:33:28 +01:00
Anshil Gandhi
a22ef958cb [AMDGPUCodegenPrepare] Add NewPM Support
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D151241
2023-05-26 00:20:01 -06:00
Luo, Yuanke
3d075fe487 [X86] Add test for select folding.
When avx512 is available the lhs operand of select instruction can be
folded with mask instruction, while the rhs operand can't.
2023-05-26 13:00:21 +08:00
Alexander Timofeev
bad4de1ae7 Don't disable loop unroll for vectorized loops on AMDGPU target
We've got a performance regression after the https://reviews.llvm.org/D115261.
Despite the loop being vectorized unroll is still required.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D149281
2023-05-25 22:54:41 +02:00
Artem Belevich
25708b3df6 [NVPTX, CUDA] barrier intrinsics and builtins for sm_90
Differential Revision: https://reviews.llvm.org/D151363
2023-05-25 11:57:57 -07:00
Artem Belevich
3d4964f494 [NVPTX] add new sm90-specific intrinsics.
Differential Revision: https://reviews.llvm.org/D151009
2023-05-25 11:57:55 -07:00
Craig Topper
8cdbf8d3e7 [SelectionDAG][AArch64][ARM] Remove setFlags call from DAGTypeLegalizer::SetPromotedInteger.
This was originally added to preserve FMF on SETCC. Unfortunately,
it also incorrectly preserves nuw/nsw on ADD/SUB in some cases.

There's also no guarantee the new opcode is even the same opcode
as the original node.

This patch removes the code and adds code to explicitly preserve
FMF flags in the SETCC promotion function.

The other test changes are from nuw/nsw not being preserved. I
believe for all these tests it was correct to preserve the flags,
so we need new code to preserve the flags when possible. I'll post
another patch for that since it's a riskier change.

This should unblock D150769.

Differential Revision: https://reviews.llvm.org/D151472
2023-05-25 11:01:19 -07:00
Philip Reames
0f64d4f877 [RISCV] Add test coverage for shuffle/insert idioms which can become v(f)slide1ups 2023-05-25 07:54:45 -07:00
Thorsten Schütt
bc713b193f [GlobalIsel][X86] fix legalization of G_CTLZ and G_CTPOP
Note that the builders are protected by is64Bit().

More fine-grained availibility checks.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D150790
2023-05-25 16:41:14 +02:00
Simon Pilgrim
56cdeac194 [X86] Regenerate x86-32-intrcc.ll test checks
This will allow us to improve the diffs for D151400
2023-05-25 14:14:19 +01:00
Nikita Popov
2ba14283cd Revert "[SelectionDAG] Handle NSW for ADD/SUB in computeKnownBits()"
This reverts commit b66551370fdfc6f357ae0d77237119d2b1077b62.

This has exposed a pre-existing miscompile, reported in
https://reviews.llvm.org/D150769#4370467.
2023-05-25 11:13:51 +02:00
Luke Lau
6fdc77e488 [RISCV] Don't reduce vslidedown's VL in rotations
Even though we only need to write to the bottom NumElts - Rotation
elements for the vslidedown.vi, we can save an extra vsetivli toggle if
we just keep the wide VL.

(I may be missing something here: is there a reason why we want to explicitly keep the vslidedown narrow?)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151390
2023-05-25 09:27:55 +01:00
sgokhale
c4a60c9d34 [CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
This is an attempt to reland D42600 and enabling this optimisation by default.

This also resolves the issue pointed out in the context of PGO build.

Differential Revision: https://reviews.llvm.org/D42600
2023-05-25 13:56:29 +05:30