llvm-project

Author	SHA1	Message	Date
sstipano	eb16acedf5	[AMDGPU] Overload resource descriptor in image intrinsics. (#107255 )	2024-09-27 15:33:52 +02:00
Jay Foad	d2d947b7e2	[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237 ) This also generalizes the Undef handling and adds Poison handling.	2024-09-18 09:37:04 +01:00
Jay Foad	f142f8afe2	[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812 ) Common up handling of intrinsics that are a no-op on uniform arguments. This catches a couple of new cases: readlane (readlane x, y), z -> readlane x, y (for any z, does not have to equal y). permlane64 (readfirstlane x) -> readfirstlane x (and likewise for any other uniform argument to permlane64).	2024-08-23 14:43:31 +01:00
Changpeng Fang	06ab30b574	[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562 ) If the parameters(the input and segment select) coming in to amdgcn.trig.preop intrinsic are compile time constants, we pre-compute the output of amdgcn.trig.preop on the CPU and replaces the uses with the computed constant. This work extends the patch https://reviews.llvm.org/D120150 to make it a complete coverage. For the segment select, only src1[4:0] are used. A segment select is invalid if we are selecting the 53-bit segment beyond the [1200:0] range of the 2/PI table. 0 is returned when a segment select is not valid.	2024-07-18 09:40:37 -07:00
Shilei Tian	f56cdd4a45	[NFC] Use named variable for test case `select-from-load.ll`	2024-07-16 19:46:46 -04:00
Shilei Tian	f38baad3e7	[InstCombine] Fix a crash in `PointerReplacer` (#98987 ) A crash could happen in `PointerReplacer::replace` when constructing a new select instruction and there is no replacement for one of its operand. This can happen when the operand is a load instruction that has been replaced earlier such that the operand itself is already the new value. In this case, it is not in the replacement map and `getReplacement` simply returns nullptr. Fix SWDEV-472192.	2024-07-16 13:17:24 -04:00
AtariDreams	2399d87768	[Transforms] Let amdgcn take advantage of sin(-x) --> -sin(x) (#79700 ) We do it for amdgcn_cos, and we should do it for amdgcn_sin as well.	2024-06-30 09:09:36 +02:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Jay Foad	18ec885a26	[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801 ) They have been superseded by llvm.amdgcn.raw.buffer.* and llvm.amdgcn.struct.buffer.*.	2024-06-10 12:14:51 +01:00
Matt Arsenault	847c83f7cc	InstCombine: Process addrspacecast uses in PointerReplacer (#91953 ) This was looking through an addrspacecast, and not finding a later unfoldable cast to another address space. Fixes improperly deleting a required alloca + memcpy and introducing an illegal addrspacecast. This also required fixing some worklist management issues with addrspacecast, and assuming that only memcpy sources could need replacement. Regresses one test function, but this looks like it optimized before by accident. It never saw the pointer use by the call to readonly_callee, which should require insertion of a new cast. Fixes #68120	2024-05-15 07:02:31 +02:00
Matt Arsenault	c5b0da9d83	InstCombine: Preserve inbounds in PointerReplacer (#91735 ) This avoids spurious test changes in a future commit.	2024-05-13 13:49:09 +02:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Nikita Popov	9d4557920f	[InstCombine] Don't treat undef as poison in demanded element simplification We can only set PoisonElts if the element is poison, not if it is undef.	2023-12-19 12:26:48 +01:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	e93d324adb	[InstCombine] Preserve poison in evaluateInDifferentElementOrder() Don't unnecessarily replace poison with undef.	2023-12-18 15:36:22 +01:00
Nikita Popov	6c9813aa02	[InstCombine] Check for poison instead of undef in shuffle combine Otherwise we may replace undef with poison. Note that a lot of tests regressing here already have variants that use poison instead of undef (often in a separate inseltpoison file), which is why I'm not adjusting them to the new pattern.	2023-12-18 15:19:16 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Nikita Popov	c00f49cf12	[InstCombine] Remove instcombine-infinite-loop-threshold option This option has been superseded by the fixpoint verification functionality.	2023-09-21 15:30:05 +02:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	61c8af6792	AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16 There's nothing special about f16 sqrt handling. https://reviews.llvm.org/D158090	2023-08-23 20:30:40 -04:00
Matt Arsenault	7c4aa3b37e	AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq We currently have some wrong combines in the backend that approximately do this. https://reviews.llvm.org/D158002	2023-08-16 10:04:13 -04:00
Matt Arsenault	f19ee76f35	AMDGPU: Add baseline tests for rcp to rsq fold	2023-08-16 10:03:49 -04:00
Kevin P. Neal	1e7c79d362	[FPEnv][InstCombine] Correct strictfp tests. Correct InstCombine strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. After D154991 the constrained intrinsics have the strictfp attribute by default so they don't need it here, but other functions do. Test changes verified with D146845.	2023-08-02 13:03:10 -04:00
Jay Foad	70eafa391b	[InstCombine] Regenerate AMDGPU test checks	2023-07-14 15:28:55 +01:00
Matt Arsenault	5ccfc4543d	AMDGPU: Fold away mbcnt.hi in wave32 mode This will allow libraries to drop some of the special casing based on wave size.	2023-06-30 15:04:03 -04:00
Matt Arsenault	3680b57a88	AMDGPU: Add baseline tests for mbcnt.hi combine	2023-06-30 15:04:03 -04:00
Jay Foad	84313162bf	[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec) Rationale: - It does not enable any further IR simplifications. - It does not improve the generated code since the isel lowering of ballot also has special cases for 0 and 1. - getreg is "too powerful" since it can read from many different registers, so its intrinsic properties have to be set very conservatively. There is also a correctness problem that getreg can read from exec but it is currently not marked as convergent. Differential Revision: https://reviews.llvm.org/D153047	2023-06-16 17:15:52 +01:00
Mateja Marjanovic	7047cb5203	[AMDGPU] Trim trailing undefs from the end of image and buffer store Remove undef values from the end of the vector operand in image and buffer store instructions. Also instead of call to computeKnownFPClass, use only findScalarElement. Continuation of: 88421ea973916e Trim zero components from buffer and image stores Differential Revision: https://reviews.llvm.org/D152440	2023-06-15 15:19:36 +02:00
Matt Arsenault	c6aaa0b14f	AMDGPU: Perform basic folds on llvm.amdgcn.exp2	2023-06-15 07:01:06 -04:00
Matt Arsenault	6e934f2292	AMDGPU: Add baseline tests for llvm.amdgcn.exp2 folds	2023-06-15 07:01:01 -04:00
Matt Arsenault	10717f9294	AMDGPU: Add basic folds for llvm.amdgcn.log	2023-06-12 21:10:30 -04:00
Matt Arsenault	1269e45b09	AMDGPU: Add baseline instcombine test for llvm.amdgcn.log	2023-06-12 21:10:30 -04:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Mateja Marjanovic	88421ea973	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed by: arsenm, foad, nhaehnle, piotr	2023-06-05 12:30:21 +02:00
Matt Arsenault	8609df7c6e	AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports a variable test mask. Start propagating poison. Prefer folding to a defined result (false) for an undef test mask. Propagate undef for the first operand.	2023-06-01 18:35:55 -04:00
Matt Arsenault	9ef1333bf4	AMDGPU: Replace certain llvm.amdgcn.class uses with llvm.is.fpclass Most transforms should now be performed on llvm.is.fpclass. Unlike the generic intrinsic, this supports variable test masks.	2023-05-24 21:49:52 +01:00
Matt Arsenault	f74bb32694	AMDGPU: Add some new tests for class undef/poison handling	2023-05-24 16:54:39 +01:00
Mateja Marjanovic	9c8c31eea4	Revert "[AMDGPU] Trim zero components from buffer and image stores" This reverts commit 3181a6e3e7dae9292782216a55c5e1f0583c1668.	2023-05-18 17:02:01 +02:00
Matt Arsenault	8f3e64624c	AMDGPU: Fold fmed3 of fpext sources to f16 fmed3 InstCombine already does this for minnum/maxnum. If we also apply this to fmed3, we don't need to explicitly use 16-bit fmed3 if we're not sure the target supports 16-bit instructions yet.	2023-05-18 08:34:46 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Nikita Popov	605f0a46dc	[InstCombine] Use IRBuilder in evaluateInDifferentElementOrder() This ensures that the new instructions get reprocessed in the same iteration. This should be largely NFC, apart from worklist order effects and naming changes, as seen in the test diff.	2023-05-17 15:07:36 +02:00
Mateja Marjanovic	3181a6e3e7	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D146737	2023-05-15 18:23:27 +02:00
Matt Arsenault	cc54f8eec7	AMDGPU: Add baseline tests for fmed3 shrinking combine	2023-05-10 08:02:11 +01:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00

1 2 3

138 Commits