llvm-project

Author	SHA1	Message	Date
Nitin John Raj	af8e386102	[RISCV][GlobalISel] Add lowerFormalArguments for calling convention This patch adds an IncomingValueHandler and IncomingValueAssigner, and implements minimal support for lowering formal arguments according to the RISC-V calling convention. Simple non-aggregate integer and pointer types are supported. In the future, we must correctly handle byval and sret pointer arguments, and instances where the number of arguments exceeds the number of registers. Coauthored By: lewis-revill Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74977	2023-05-30 13:42:49 -07:00
Philip Reames	b07d08bb85	[RISCV] Add additional vslide1up test coverage Add another form of the same pattern (as_rotate tests), and add coverage for a couple corner cases I got wrong at first in an upcoming rewrite.	2023-05-30 10:53:58 -07:00
Philip Reames	0bb23c58be	[RISCV] Rename vslide1down tests (should have been part of 24172de)	2023-05-30 10:32:24 -07:00
Philip Reames	24172de17d	[RISCV] Add tests for vslide1down shuffle/insert idiom	2023-05-30 10:24:43 -07:00
Simon Pilgrim	0ec79f413e	[X86] Regenerate sqrt-fastmath-mir.ll	2023-05-30 17:21:53 +01:00
Igor Kirillov	40a81d3100	[CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass This patch updates several functions in LLVM's IR generation code to accept an IRBuilder object as an argument, rather than an Instruction that indicates the insertion point for new instructions. This change is necessary to handle sophisticated -Ofast optimization cases from D148558 where it's unclear which instructions should be used as the insertion point for new operations. Differential Revision: https://reviews.llvm.org/D148703	2023-05-30 16:18:28 +00:00
Philip Reames	544a240ff7	[RISCV] Use v(f)slide1up for shuffle+insert idiom This is pretty straight forward in the basic form. I did need to move the slideup matching earlier, but that looks generally profitable on it's own. As follow ups, I plan to explore the v(f)slide1down variants, and see what I can do to canonicalize the shuffle then insert pattern (see _inverse tests at the end of the vslide1up.ll test). Differential Revision: https://reviews.llvm.org/D151468	2023-05-30 07:37:41 -07:00
Simon Pilgrim	ab4b924832	[X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.	2023-05-30 13:17:26 +01:00
Igor Kirillov	48339d0fbb	[CodeGen] Add pre-commit tests for D148558 This patch adds four new tests for upcoming functionality in LLVM: * complex-deinterleaving-add-mull-fixed-contract.ll * complex-deinterleaving-add-mull-scalable-contract.ll * complex-deinterleaving-add-mull-fixed-fast.ll * complex-deinterleaving-add-mull-scalable-fast.ll. These tests were generated from the IR of vectorizable loops, which were compiled from C++ code using different optimization flags in Clang. Each pair of tests corresponds to Neon and SVE architectures, respectively, and each pair contains tests compiled with -Ofast and -O3 -ffp-contract=fast -ffinite-math-only optimization flags. The tests were stripped of nnan and ninf flags as they have no impact on the output. The primary objective of these tests is to show the various sequences of complex computations that may be encountered and to demonstrate the ability of ComplexDeinterleaving to support any ordering. Depends on D147451 Differential Revision: https://reviews.llvm.org/D148550	2023-05-30 11:49:59 +00:00
Simon Pilgrim	95661b9c75	[X86] getTargetConstantBitsFromNode - support extracting fp data from ConstantDataSequential Fixes issue introduced by 0f8e0f4228805cbecce13dcfadef4c48a4f0f4cd where SimplifyDemandedBits could crash when trying to extract fp data from broadcasted constants	2023-05-30 11:38:31 +01:00
Alex Bradbury	c4efcd6970	[RISCV] Generalise shouldExtendTypeInLibcall logic to apply to all <XLEN floats on soft ABIs This results in improved codegen for half/bf16 libcalls on soft ABIs Adds a RISCVSubtarget helper method for determining if a soft FP ABI is being targeted (future bf16 related patches make use of this). Differential Revision: https://reviews.llvm.org/D151434	2023-05-30 11:04:03 +01:00
Shao-Ce SUN	216e2820f9	[RISCV] Add more tests in zdinx-boundary-check.ll Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151534	2023-05-30 14:49:33 +08:00
Jianjian GUAN	944773436a	[RISCV][NFC] Fix unmasked test for vp_cttz and vp_ctlz. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151673	2023-05-30 12:38:24 +08:00
Craig Topper	9239d3a3ea	[RISCV] Teach performCombineVMergeAndVOps to handle instructions FMA instructions. Previously we only handled instructions with merge ops that were also masked. This patch supports instructions with merge ops that aren't masked, like FMA. I'm only folding into a TU vmerge for now. Supporting TA vmerge shouldn't be much more work, but we need to make sure we get the policy operand for the result correct. And of course we need more tests. Reviewed By: fakepaper56, frasercrmck Differential Revision: https://reviews.llvm.org/D151596	2023-05-29 19:44:43 -07:00
Jianjian GUAN	071e9d7bac	[RISCV] Fix unmasked vp_abs select. Make unmasked vp_abs select to umasked instructions. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D151646	2023-05-30 09:57:40 +08:00
Alex Bradbury	9bb34ca652	[RISCV][test] Expand bfloat.ll tests to include i16 bitcasts and load/store Pre-commit new tests used in D151663.	2023-05-29 21:38:26 +01:00
Simon Pilgrim	98061013e0	[X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.	2023-05-29 16:10:52 +01:00
Alex Bradbury	061e368fe2	[SelectionDAG] Implement soft FP legalisation for bf16 FP_EXTEND and BF16_TO_FP As discussed in D151436, it's safe to do this as a simple shift (as is done in LegalizeDAG.cpp) rather than needing a libcall. The added test cases for RISC-V previously just triggered an assertion. Codegen for bfloat_to_double will be slightly improved by D151434. Differential Revision: https://reviews.llvm.org/D151563	2023-05-29 10:32:28 +01:00
David Green	0a762ec1b0	[ARM] Allow D-reg copies to use VMOVD with fpregs64 This instruction should be available with MVE, where we have D regs, not requiring the full FP64 target feature.	2023-05-28 19:12:45 +01:00
Krzysztof Parzyszek	96b59b4f06	[Hexagon] Use scalar evolution to calculate pointer difference in HVC	2023-05-27 09:11:09 -07:00
Simon Pilgrim	0f8e0f4228	[X86] lowerBuildVectorAsBroadcast - broadcast Constant of original (BuildVector) element size Noticed in D150143/D150526 - we currently create scalar Constant values using the broadcast instruction width, which might be wider than the original build vector width, making it tricky to recognise the original constant bits data. If we have widened the broadcast value, its much more useful for asm comments if we create a ConstantVector with the original element data, add that to the constant-pool and load that with the same (wider) broadcast instruction.	2023-05-27 14:05:44 +01:00
Craig Topper	28ab032298	[RISCV] Add isel patterns to form tail undisturbed vfwadd.wv from fpextend_vl+vfwadd_vl+vp_merge. We use a special TIED instructions for vfwadd.wv to avoid an earlyclobber constraint preventing the first source and the destination from being the same register. This prevents our normal post process for forming TU instructions. Add manual isel pattern instead. This matches what we do for FMA for example.	2023-05-26 16:44:20 -07:00
Justin Lebar	6585599828	Fix test failure after 2be0abb7fe7 (caused by bad merge, sorry).	2023-05-26 15:31:20 -07:00
Justin Lebar	2be0abb7fe	Rewrite load-store-vectorizer. The motivation for this change is a workload generated by the XLA compiler targeting nvidia GPUs. This kernel has a few hundred i8 loads and stores. Merging is critical for performance. The current LSV doesn't merge these well because it only considers instructions within a block of 64 loads+stores. This limit is necessary to contain the O(n^2) behavior of the pass. I'm hesitant to increase the limit, because this pass is already one of the slowest parts of compiling an XLA program. So we rewrite basically the whole thing to use a new algorithm. Before, we compared every load/store to every other to see if they're consecutive. The insight (from tra@) is that this is redundant. If we know the offset from PtrA to PtrB, then we don't need to compare PtrC to both of them in order to tell whether C may be adjacent to A or B. So that's what we do. When scanning a basic block, we maintain a list of chains, where we know the offset from every element in the chain to the first element in the chain. Each instruction gets compared only to the leaders of all the chains. In the worst case, this is still O(n^2), because all chains might be of length 1. To prevent compile time blowup, we only consider the 64 most recently used chains. Thus we do no more comparisons than before, but we have the potential to make much longer chains. This rewrite affects many tests. The changes to tests fall into two categories. 1. The old code had what appears to be a bug when deciding whether a misaligned vectorized load is fast. Suppose TTI reports that load <i32 x 4> align 4 has relative speed 1, and suppose that load i32 align 4 has relative speed 32. The intent of the code seems to be that we prefer the scalar load, because it's faster. But the old code would choose the vectorized load. accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not even call into TTI to get the relative speed), because the scalar load is aligned. After this patch, we will prefer the scalar load if it's faster. 2. This patch changes the logic for how we vectorize. Usually this results in vectorizing more. Explanation of changes to tests: - AMDGPU/adjust-alloca-alignment.ll: #1 - AMDGPU/flat_atomic.ll: #2, we vectorize more. - AMDGPU/int_sideeffect.ll: #2, there are two possible locations for the call to @foo, and the pass is brittle to this. Before, we'd vectorize in case 1 and not case 2. Now we vectorize in case 2 and not case 1. So we just move the call. - AMDGPU/adjust-alloca-alignment.ll: #2, we vectorize more - AMDGPU/insertion-point.ll: #2 we vectorize more - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117d476, which appear to have hit the bug from #1) - AMDGPU/multiple_tails.ll: #1 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above). - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like #2, we vectorize more. - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above). - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split) - X86/correct-order.ll: #2, we vectorize more, probably because of changes to the chain-splitting logic. - X86/subchain-interleaved.ll: #2, we vectorize more - X86/vector-scalar.ll: #2, we can now vectorize scalar float + <1 x float> - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical. It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0. - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll Differential Revision: https://reviews.llvm.org/D149893	2023-05-26 15:15:39 -07:00
Craig Topper	a4f437f012	SelectionDAG: Teach ComputeKnownBits about VSCALE This reverts commit 9b92f70d4758f75903ce93feaba5098130820d40. The issue with the re-applied change was an implicit truncation due to the multiplication. Although the operations were converted to `APInt`, the values were implicitly converted to `long` due to the typing rules. Fixes: #59594 Differential Revision: https://reviews.llvm.org/D140347	2023-05-26 10:48:49 -07:00
Craig Topper	c5e6c886aa	[VP][SelectionDAG][RISCV] Add get_vector_length intrinsics and generic SelectionDAG support. The generic implementation is umin(TC, VF * vscale). Lowering to vsetvli for RISC-V will come in a future patch. This patch is a pre-requisite to be able to CodeGen vectorized code from D99750. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D149916	2023-05-26 09:06:38 -07:00
Felipe de Azevedo Piovezan	1898fc1a54	[FastISel] Implement translation of entry_value dbg.value intrinsics For dbg.value intrinsics targeting an llvm::Argument address whose expression starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein physical register corresponding to that Argument. Depends on D151332 Differential Revision: https://reviews.llvm.org/D151333	2023-05-26 11:34:15 -04:00
Philip Reames	461d571e15	[RISCV] Revise test coverage for shuffle/insert idiom which become v(f)slide1ups This fixes a couple mistakes in 0f64d4f877. In particular, I'd not included a negative test where the slideup didn't write the entire VL, and had gotten all of my 4 element vector shuffle masks incorrect so they didn't match. Also, add a test with swapped operands for completeness. The transform is in D151468.	2023-05-26 08:09:57 -07:00
Zain Jaffal	0c93879d96	[AArch64] merge scaled and unscaled zero narrow stores. This patch fixes a crash when a sclaed and unscaled zero stores are merged. Differential Revision: https://reviews.llvm.org/D150963	2023-05-26 15:07:24 +01:00
Luo, Yuanke	969c686e54	[X86] fold select to mask instructions. When avx512 is available the lhs operand of select instruction can be folded with mask instruction, while the rhs operand can't. This patch is to commute the lhs and rhs of the select instruction to create the opportunity of folding. Differential Revision: https://reviews.llvm.org/D151535	2023-05-26 21:53:03 +08:00
Felipe de Azevedo Piovezan	aba1bea673	[SelectionDAGBuilder] Handle entry_value dbg.value intrinsics Summary: DbgValue intrinsics whose expression is an entry_value and whose address is described an llvm::Argument must be lowered to the corresponding livein physical register for that Argument. Depends on D151329 Reviewers: aprantl Subscribers:	2023-05-26 06:55:49 -04:00
Felipe de Azevedo Piovezan	e8aee45be7	[IRTranslator] Implement translation of entry_value dbg.value intrinsics For dbg.value intrinsics targeting an llvm::Argument address whose expression starts with an entry value, we lower this to a DEBUG_VALUE targeting the livein physical register corresponding to that Argument. Depends on D151328 Differential Revision: https://reviews.llvm.org/D151329	2023-05-26 06:45:01 -04:00
luxufan	9e8ed3403c	[RISCV] Support '.option arch' directive The proposal of '.option arch' directive is https://github.com/riscv-non-isa/riscv-asm-manual/pull/67 Note: For '.option arch, +/-' directive, version number is not yet supported. Reviewed By: luismarques, craig.topper Differential Revision: https://reviews.llvm.org/D123515	2023-05-26 18:39:41 +08:00
Luke Lau	90c4db4a2c	[RISCV] Don't scalarize vector stores if volatile As noted by @reames in https://reviews.llvm.org/D151211#4373404, we shouldn't scalarize vector stores of constants if the store is volatile, or vector copies if either the store or load are volatile. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D151500	2023-05-26 09:34:34 +01:00
Jay Foad	e4284a7c70	[AMDGPU] 4-align SGPR triples Previously SGPR triples like s[3:5] were aligned on a 3-SGPR boundary which has no basis in hardware. Aligning them on a 4-SGPR boundary is at least justified by the architecture reference guide which says: "Quad-alignment of SGPRs is required for operation on more than 64-bits". Currently there are no instructions that take SGPR triples as operands so the issue is latent. Differential Revision: https://reviews.llvm.org/D151463	2023-05-26 08:06:25 +01:00
Valery Pykhtin	8d0412ce9d	[AMDGPU] Add pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size. The main purpose of this is to simplify register pressure tracking as after the pass there is no need to track subreg liveness anymore. On the other hand this pass creates more possibilites for the subreg unaware code, as many of the subregs becomes ordinary registers. Intersting sideeffect: spill-vgpr.ll has lost a lot of spills. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D139732	2023-05-26 09:05:44 +02:00
LiaoChunyu	477d1080cb	[RISCV] Custom lower vector llvm.is.fpclass to vfclass.v After D149063. This patch adds support for both scalable and fixed-length vector. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151176	2023-05-26 14:44:35 +08:00
Fraser Cormack	a800a4ffa1	[RISCV] Regenerate missing test checks Codegen was different between RV32 and RV64 so the single unified CHECK was skipping these functions.	2023-05-26 07:33:28 +01:00
Anshil Gandhi	a22ef958cb	[AMDGPUCodegenPrepare] Add NewPM Support Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D151241	2023-05-26 00:20:01 -06:00
Luo, Yuanke	3d075fe487	[X86] Add test for select folding. When avx512 is available the lhs operand of select instruction can be folded with mask instruction, while the rhs operand can't.	2023-05-26 13:00:21 +08:00
Alexander Timofeev	bad4de1ae7	Don't disable loop unroll for vectorized loops on AMDGPU target We've got a performance regression after the https://reviews.llvm.org/D115261. Despite the loop being vectorized unroll is still required. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D149281	2023-05-25 22:54:41 +02:00
Artem Belevich	25708b3df6	[NVPTX, CUDA] barrier intrinsics and builtins for sm_90 Differential Revision: https://reviews.llvm.org/D151363	2023-05-25 11:57:57 -07:00
Artem Belevich	3d4964f494	[NVPTX] add new sm90-specific intrinsics. Differential Revision: https://reviews.llvm.org/D151009	2023-05-25 11:57:55 -07:00
Craig Topper	8cdbf8d3e7	[SelectionDAG][AArch64][ARM] Remove setFlags call from DAGTypeLegalizer::SetPromotedInteger. This was originally added to preserve FMF on SETCC. Unfortunately, it also incorrectly preserves nuw/nsw on ADD/SUB in some cases. There's also no guarantee the new opcode is even the same opcode as the original node. This patch removes the code and adds code to explicitly preserve FMF flags in the SETCC promotion function. The other test changes are from nuw/nsw not being preserved. I believe for all these tests it was correct to preserve the flags, so we need new code to preserve the flags when possible. I'll post another patch for that since it's a riskier change. This should unblock D150769. Differential Revision: https://reviews.llvm.org/D151472	2023-05-25 11:01:19 -07:00
Philip Reames	0f64d4f877	[RISCV] Add test coverage for shuffle/insert idioms which can become v(f)slide1ups	2023-05-25 07:54:45 -07:00
Thorsten Schütt	bc713b193f	[GlobalIsel][X86] fix legalization of G_CTLZ and G_CTPOP Note that the builders are protected by is64Bit(). More fine-grained availibility checks. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D150790	2023-05-25 16:41:14 +02:00
Simon Pilgrim	56cdeac194	[X86] Regenerate x86-32-intrcc.ll test checks This will allow us to improve the diffs for D151400	2023-05-25 14:14:19 +01:00
Nikita Popov	2ba14283cd	Revert "[SelectionDAG] Handle NSW for ADD/SUB in computeKnownBits()" This reverts commit b66551370fdfc6f357ae0d77237119d2b1077b62. This has exposed a pre-existing miscompile, reported in https://reviews.llvm.org/D150769#4370467.	2023-05-25 11:13:51 +02:00
Luke Lau	6fdc77e488	[RISCV] Don't reduce vslidedown's VL in rotations Even though we only need to write to the bottom NumElts - Rotation elements for the vslidedown.vi, we can save an extra vsetivli toggle if we just keep the wide VL. (I may be missing something here: is there a reason why we want to explicitly keep the vslidedown narrow?) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151390	2023-05-25 09:27:55 +01:00
sgokhale	c4a60c9d34	[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default This is an attempt to reland D42600 and enabling this optimisation by default. This also resolves the issue pointed out in the context of PGO build. Differential Revision: https://reviews.llvm.org/D42600	2023-05-25 13:56:29 +05:30

1 2 3 4 5 ...

48233 Commits