llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Jay Foad	e2e3f06813	Revert "[MachineScheduler] Track physical register dependencies per-regunit" This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc. It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS build.	2023-07-29 18:05:25 +01:00
Jay Foad	1a54671d54	[MachineScheduler] Track physical register dependencies per-regunit Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Differential Revision: https://reviews.llvm.org/D156552	2023-07-29 15:34:53 +01:00
Jay Foad	5a64c89c8d	[MachineScheduler] Test case for physical register dependencies Differential Revision: https://reviews.llvm.org/D156551	2023-07-29 15:34:53 +01:00
Matt Arsenault	3240ae7034	AMDGPU/GlobalISel: Set dead on scc on manually selected instructions In SelectionDAG InstrEmitter automatically puts dead flags on unused physreg defs everywhere. The generated selectors should also set dead on physreg defs that were not used in the pattern.	2023-07-28 14:14:06 -04:00
Jeffrey Byrnes	391249d1af	[AMDGPU] Allow 8,16 bit sources in calculateSrcByte This is required for many trees produced in practice for i8 CodeGen. Differential Revision: https://reviews.llvm.org/D155864 Change-Id: Iac01d183d9998b15138bdc7a5051e3bed338e7d9	2023-07-28 09:50:21 -07:00
Matt Arsenault	95e5a461f5	AMDGPU: Always custom lower extract_subvector The patterns were ripped out in a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be custom lowered. I absolutely hate how difficult it is to write tests for these, I have no doubt there are more of these hidden. Fixes #64142	2023-07-27 08:46:44 -04:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Pravin Jagtap	1462053608	[AMDGPU] Propagate constants for llvm.amdgcn.wave.reduce.umin/umax Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156077	2023-07-26 23:46:01 -04:00
pvanhout	a8aabba587	[AMDGPU] Fix PromoteAlloca Subvector Stores for Single Elements The previous condition was incorrect in some cases, like storing <2 x i32> into a double. If IndexVal was >0, we ended up never storing anything. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156308	2023-07-26 13:21:21 +02:00
pvanhout	6a767fbc36	[AMDGPU] Precommit tests for D156308 Also includes another testcase that's unrelated, it's just a sanity check. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156309	2023-07-26 13:21:20 +02:00
Corbin Robeck	7a4968b5a3	[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156040 Author: Corbin Robeck <corbin.robeck@amd.com>	2023-07-25 12:20:13 -07:00
Kevin P. Neal	76c22b18ea	[FPEnv][AMDGPU] Correct strictfp tests. Correct AMDGPU strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. I also removed attributes added to declare lines of intrinsics. The attributes of intrinsics cannot be changed in a test so I eliminated attempts to do so. Test changes verified with D146845.	2023-07-25 13:24:46 -04:00
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Matt Arsenault	47b3ada432	AMDGPU: Add more sqrt f64 lowering tests Almost all permutations of the flags are potentially relevant.	2023-07-25 07:54:11 -04:00
pvanhout	3cd4afce5b	[AMDGPU] Allow vector access types in PromoteAllocaToVector Depends on D152706 Solves SWDEV-408279 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155699	2023-07-25 07:44:48 +02:00
pvanhout	3890a3b113	[AMDGPU] Use SSAUpdater in PromoteAlloca This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly. Note PromoteAlloca is still reliant on SROA running first to canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D152706	2023-07-25 07:44:47 +02:00
Matt Arsenault	0d797b71eb	RegisterCoaleser: Fix empty subrange verifier error In this example an implicit def had live-out undef subrange defs. After coalescing with the def from a previous block, the undef-defed lanes are no longer live out of the block in the new interval. An empty subrange was tenatively created for these lanes, but it must be deleted.	2023-07-24 12:18:34 -04:00
Matt Arsenault	2a53b6c06b	RegisterCoalescer: Fix verifier error on redef of subregister for live out implicit_defs A live out implicit_def wasn't deleted, but the subranges weren't correctly updated. The main range was correct but the def corresponding to the initial main range def instruction was missing from the lanes redefined in another block. The written lanes are not quite the same as the valid lanes in the case of an implicit_def. Fixes verifier error in blender. There is an additional verifier in some of the testcase variants where an empty subrange remains.	2023-07-24 12:18:34 -04:00
Matt Arsenault	e561e7cb48	AMDGPU: Implement combineRepeatedFPDivisors	2023-07-24 11:19:36 -04:00
Pravin Jagtap	d163b76ce3	[AMDGPU] Fix llvm.amdgcn.wave.reduce.umax/umin MIR tests Fixes the MIR tests reported in https://lab.llvm.org/buildbot/#/builders/16/builds/51955 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156125	2023-07-24 10:19:37 -04:00
Pravin Jagtap	c48ed93cf8	[AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic. When input to intrinsic is uniform value, reduced value is same as input whereas if input value is divergent we need to iterate over all active lanes of WaveFront to perform the reduction. The control flow for a `loop` has been set up, which iterates over `only` active lanes to perform reduction. Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which are lowered Post-ISel (in `EmitInstrWithCustomInserter `). Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D154858	2023-07-24 00:06:00 -04:00
Matt Arsenault	8406c3568a	AMDGPU: Implement new 2ulp fdiv lowering Extends the new frexp scaled reciprocal to the general case. The reciprocal case is just the same thing when frexp of 1 is constant folded. Could probably clean up the code to rely on that constant folding. Improves results for the IEEE path for the default OpenCL division. We used to only emit the fdiv.fast intrinsic with a 2.5 ulp accuracy threshold with DAZ, which uses explicit range checks. This gives us a better fast option with the default IEEE behavior.	2023-07-21 18:55:42 -04:00
Matt Arsenault	6699c37028	AMDGPU: Refactor AMDGPUCodeGenPrepare fdiv handling NFC-ish. Does trigger some reordering of the fdiv scalarization. Also skips scalarizing in more cases where nothing was going to happen. We can still scalarize in some no-op edge cases. https://reviews.llvm.org/D155740	2023-07-21 18:55:42 -04:00
Matt Arsenault	8287f3af9d	AMDGPU: Overhaul and improve rcp and rsq f32 formation The highlight change is a new denormal safe 1ulp lowering which uses rcp after using frexp to perform input scaling. This saves 2 instructions compared to other implementations which performed an explicit denormal range change. This improves the OpenCL default, and requires a flag for HIP. I don't believe there's any flag wired up for OpenMP to emit the necessary fpmath metadata. This provides several improvements and changes that were hard to separate without regressing one case or another. Disturbingly the OpenCL conformance test seems to have the reciprocal test commented out. I locally hacked it back in to test this. Starts introducing f32 rsq intrinsics in AMDGPUCodeGenPrepare. Like the rcp case, we could do this in codegen if !fpmath were preserved (although we would lose some computeKnownFPClass tricks). Start requiring contract flags to form rsq. The rsq fusion actually improves the result from ~2ulp to ~1ulp. We have some older fusion in codegen which only keys off unsafe math which should be refined. Expand rsq patterns by checking for denormal inputs and pre/post multiplying like the current library code does. We also take advantage of computeKnownFPClass to avoid the scaling when we can statically prove the input cannot be a denormal. We could do the same for the rcp case, but unlike rsq a large input can underflow to denormal. We need additional upper bound exponent checks on the input in order to do the same for rcp. This rsq handling also now starts handling the negated case. We introduce rsq with an fneg. In the case the fneg doesn't fold into its user, it's a neutral change but provides improvement if it is foldable as a source modifier. Also starts respecting the arcp attribute properly, and more strictly interprets afn. We were previously interpreting afn as implying you could do the reciprocal expansion of an fdiv. The codegen handling of these also needs to be revisited. This also effectively introduces the optimization combineRepeatedFPDivisors enables, just done in the IR instead (and only for f32). This is almost across the board better. The one minor regression is for gfx6/buggy frexp case where for multiple reciprocals, we could previously reuse rematerialized constants per instance (it's neutral for a single rcp). The fdiv.fast and sqrt handling need to be revisited next. https://reviews.llvm.org/D155593	2023-07-21 16:35:53 -04:00
Matt Arsenault	37512d7629	AMDGPU: Add baseline test for fdiv combine	2023-07-21 16:04:12 -04:00
Jay Foad	e45a0c2994	[AMDGPU][RFC] Update isLegalAddressingMode for GFX9 SMEM signed offsets Differential Revision: https://reviews.llvm.org/D155587	2023-07-21 10:56:43 +01:00
Jay Foad	787bef0bee	[AMDGPU] Add tests for SMEM addressing modes in CodeGenPrepare Differential Revision: https://reviews.llvm.org/D155854	2023-07-21 10:56:43 +01:00
Matt Arsenault	d33ab05467	AMDGPU: Add flag to disable fdiv processing in IR pass We kind of have to have multiple implementations of fdiv split between the two selectors with some pre-processing. Add yet another test to check for consistency of interpretation of flag combinations. We have quite a bit of test redundancy here already, but there are so many possible interesting permutations it's unwieldy to cover every detail in any one of them. We have a number of overlapping fdiv tests but it's hard to follow everything going on as it is.	2023-07-20 19:51:15 -04:00
Matt Arsenault	b2d58b596c	AMDGPU: Expand rsq testing to cover contract flag The 1.0/sqrt(x) -> rsq(x) fold increases precision and probably needs a contract flag.	2023-07-20 19:51:15 -04:00
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
Matt Arsenault	881e9f2934	AMDGPU: Regenerate test checks Mostly a workaround for recent reverts in update_test_checks	2023-07-20 19:26:35 -04:00
Matt Arsenault	ca34f1bdcd	AMDGPU: Add baseline test for folding fsub into fneg modifiers	2023-07-20 18:29:35 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	076bc374fc	AMDGPU: Add some new baseline tests for exp lowering	2023-07-20 18:14:24 -04:00
Jingu Kang	351b4c17dd	Revert "[MachineLICM] Handle Subloops" This reverts commit 50dd383d08670960540fecb4b48c0f0429fbfba3.	2023-07-20 17:12:25 +01:00
Jingu Kang	50dd383d08	[MachineLICM] Handle Subloops Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outmost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-07-20 16:39:13 +01:00
Johannes Doerfert	d015018cb7	[AMDGPUAttributor][FIX] No endless recursion for recursive initializers Fixes: https://github.com/llvm/llvm-project/issues/63956	2023-07-19 10:27:01 -07:00
Ivan Kosarev	1b32427213	[AMDGPU] Combine the SDAG and GISel versions of the fmed3.ll test. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D155590	2023-07-19 11:42:36 +01:00
Jay Foad	7fa7a08f21	[AMDGPU] Insert s_nop before s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) Differential Revision: https://reviews.llvm.org/D155681	2023-07-19 10:33:11 +01:00
Jingu Kang	62ed3ff4bb	Revert "[MachineLICM] Handle Subloops" This reverts commit 33e60484d750291e99301e29e60fe72c8fa48ccd.	2023-07-19 10:30:50 +01:00
Jay Foad	4389f9b2ad	[AMDGPU] Regenerate is.fpclass checks	2023-07-19 09:32:22 +01:00
Matt Arsenault	c28e09c8d1	AMDGPU: Preserve flags in fdiv_fast lowering We were dropping the flags and thus blocking contract into potential fadd users. GlobalISel was already preserving the flags here. https://reviews.llvm.org/D155443	2023-07-18 06:57:07 -04:00
Matt Arsenault	4a81283b94	AMDGPU: Generate and add fdiv tests Prepare for new lowering strategies because we somehow didn't have enough of them already.	2023-07-18 06:38:05 -04:00
Matt Arsenault	cdfdfe7ccc	AMDGPU: Add some additional rcp/rsq tests	2023-07-18 06:37:15 -04:00
Matt Arsenault	3f8ef57bed	MachineSink: Fix sinking VGPR def out of a divergent loop This fixes sinking a VGPR def out of a loop past the reconvergence point at the SI_END_CF. There was a prior fix which introduced blockPrologueInterferes (D121277) to fix the same basic problem for the post RA sink. This also had the special case isIgnorableUse case which was incorrect, because in some contexts the exec use is not ignorable. I'm thinking about a new way to represent this which will avoid needing hasIgnorableUse and isBasicBlockPrologue, which would function more like the exception handling. Fixes: SWDEV-407790 https://reviews.llvm.org/D155343	2023-07-18 06:15:50 -04:00
Matt Arsenault	d5ab379506	AMDGPU: Add baseline test for broken machine sinking	2023-07-18 06:15:50 -04:00
Konstantina Mitropoulou	4c42ab1199	[DAGCombiner] Change foldAndOrOfSETCC() to optimize and/or patterns CMP(A,C)\|\|CMP(B,C) => CMP(MIN/MAX(A,B), C) CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C) This first patch handles integer types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153502	2023-07-17 17:13:47 -07:00
Konstantina Mitropoulou	11cd92a70f	[NFC] Tests for future commit in DAGCombiner Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153479	2023-07-17 17:08:32 -07:00
Matt Arsenault	04185f0b0b	AMDGPU: Fix broken denormal constant folding of canonicalize This needs to consider the dynamic denormal mode. It should be possible to implement a runtime DAZ check with a canonicalize.	2023-07-17 19:54:20 -04:00

1 2 3 4 5 ...

6615 Commits