llvm-project

Author	SHA1	Message	Date
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Nikita Popov	9d4557920f	[InstCombine] Don't treat undef as poison in demanded element simplification We can only set PoisonElts if the element is poison, not if it is undef.	2023-12-19 12:26:48 +01:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	e93d324adb	[InstCombine] Preserve poison in evaluateInDifferentElementOrder() Don't unnecessarily replace poison with undef.	2023-12-18 15:36:22 +01:00
Nikita Popov	6c9813aa02	[InstCombine] Check for poison instead of undef in shuffle combine Otherwise we may replace undef with poison. Note that a lot of tests regressing here already have variants that use poison instead of undef (often in a separate inseltpoison file), which is why I'm not adjusting them to the new pattern.	2023-12-18 15:19:16 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Nikita Popov	c00f49cf12	[InstCombine] Remove instcombine-infinite-loop-threshold option This option has been superseded by the fixpoint verification functionality.	2023-09-21 15:30:05 +02:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	61c8af6792	AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16 There's nothing special about f16 sqrt handling. https://reviews.llvm.org/D158090	2023-08-23 20:30:40 -04:00
Matt Arsenault	7c4aa3b37e	AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq We currently have some wrong combines in the backend that approximately do this. https://reviews.llvm.org/D158002	2023-08-16 10:04:13 -04:00
Matt Arsenault	f19ee76f35	AMDGPU: Add baseline tests for rcp to rsq fold	2023-08-16 10:03:49 -04:00
Kevin P. Neal	1e7c79d362	[FPEnv][InstCombine] Correct strictfp tests. Correct InstCombine strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. After D154991 the constrained intrinsics have the strictfp attribute by default so they don't need it here, but other functions do. Test changes verified with D146845.	2023-08-02 13:03:10 -04:00
Jay Foad	70eafa391b	[InstCombine] Regenerate AMDGPU test checks	2023-07-14 15:28:55 +01:00
Matt Arsenault	5ccfc4543d	AMDGPU: Fold away mbcnt.hi in wave32 mode This will allow libraries to drop some of the special casing based on wave size.	2023-06-30 15:04:03 -04:00
Matt Arsenault	3680b57a88	AMDGPU: Add baseline tests for mbcnt.hi combine	2023-06-30 15:04:03 -04:00
Jay Foad	84313162bf	[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec) Rationale: - It does not enable any further IR simplifications. - It does not improve the generated code since the isel lowering of ballot also has special cases for 0 and 1. - getreg is "too powerful" since it can read from many different registers, so its intrinsic properties have to be set very conservatively. There is also a correctness problem that getreg can read from exec but it is currently not marked as convergent. Differential Revision: https://reviews.llvm.org/D153047	2023-06-16 17:15:52 +01:00
Mateja Marjanovic	7047cb5203	[AMDGPU] Trim trailing undefs from the end of image and buffer store Remove undef values from the end of the vector operand in image and buffer store instructions. Also instead of call to computeKnownFPClass, use only findScalarElement. Continuation of: 88421ea973916e Trim zero components from buffer and image stores Differential Revision: https://reviews.llvm.org/D152440	2023-06-15 15:19:36 +02:00
Matt Arsenault	c6aaa0b14f	AMDGPU: Perform basic folds on llvm.amdgcn.exp2	2023-06-15 07:01:06 -04:00
Matt Arsenault	6e934f2292	AMDGPU: Add baseline tests for llvm.amdgcn.exp2 folds	2023-06-15 07:01:01 -04:00
Matt Arsenault	10717f9294	AMDGPU: Add basic folds for llvm.amdgcn.log	2023-06-12 21:10:30 -04:00
Matt Arsenault	1269e45b09	AMDGPU: Add baseline instcombine test for llvm.amdgcn.log	2023-06-12 21:10:30 -04:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Mateja Marjanovic	88421ea973	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed by: arsenm, foad, nhaehnle, piotr	2023-06-05 12:30:21 +02:00
Matt Arsenault	8609df7c6e	AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports a variable test mask. Start propagating poison. Prefer folding to a defined result (false) for an undef test mask. Propagate undef for the first operand.	2023-06-01 18:35:55 -04:00
Matt Arsenault	9ef1333bf4	AMDGPU: Replace certain llvm.amdgcn.class uses with llvm.is.fpclass Most transforms should now be performed on llvm.is.fpclass. Unlike the generic intrinsic, this supports variable test masks.	2023-05-24 21:49:52 +01:00
Matt Arsenault	f74bb32694	AMDGPU: Add some new tests for class undef/poison handling	2023-05-24 16:54:39 +01:00
Mateja Marjanovic	9c8c31eea4	Revert "[AMDGPU] Trim zero components from buffer and image stores" This reverts commit 3181a6e3e7dae9292782216a55c5e1f0583c1668.	2023-05-18 17:02:01 +02:00
Matt Arsenault	8f3e64624c	AMDGPU: Fold fmed3 of fpext sources to f16 fmed3 InstCombine already does this for minnum/maxnum. If we also apply this to fmed3, we don't need to explicitly use 16-bit fmed3 if we're not sure the target supports 16-bit instructions yet.	2023-05-18 08:34:46 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Nikita Popov	605f0a46dc	[InstCombine] Use IRBuilder in evaluateInDifferentElementOrder() This ensures that the new instructions get reprocessed in the same iteration. This should be largely NFC, apart from worklist order effects and naming changes, as seen in the test diff.	2023-05-17 15:07:36 +02:00
Mateja Marjanovic	3181a6e3e7	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D146737	2023-05-15 18:23:27 +02:00
Matt Arsenault	cc54f8eec7	AMDGPU: Add baseline tests for fmed3 shrinking combine	2023-05-10 08:02:11 +01:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Matt Arsenault	1bce1beac4	AMDGPU: Reduce number of calls to computeKnownFPClass and pass all arguments Makes assumes work for this case.	2023-04-26 13:02:17 -04:00
Michael Liao	72fc08a541	[InstCombine] Teach alloca replacement to handle `addrspacecast` - As the address space cast may not be valid on a specific target, `addrspacecast` is not handled when an `alloca` is able to be replaced with the source of memcpy/memmove. This patch addresses that by querying a target hook on whether that address space cast is valid. For example, on most GPU targets, the cast from a global pointer to a generic pointer is valid. - If that cast is allowedd (by querying `isValidAddrSpaceCast`), the replacement is enhanced to handle that `addrspacecast` as well. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D147025	2023-04-11 11:47:37 -04:00
Roman Lebedev	5fb9e84047	[NFC] Port all InstCombine tests to `-passes=` syntax	2022-12-08 02:38:44 +03:00
Matt Arsenault	8fcf387202	InstCombine: Convert target tests to opaque pointers The opaquify script deleted a few declarations for some reason which were manually deleted.	2022-12-01 21:56:14 -05:00
Matt Arsenault	e04d2e20c3	AMDGPU: Add some baseline tests for llvm.amdgcn.trig.preop folding	2022-11-18 09:14:19 -08:00
Matt Arsenault	a58541f14d	AMDGPU: Fold llvm.amdgcn.sqrt(undef)	2022-11-11 17:02:19 -08:00
Matt Arsenault	3e4280c04d	AMDGPU: Disable some class simplifications for strictfp	2022-11-11 09:22:37 -08:00
Nikita Popov	fcfc31fffb	[InstCombine] Convert some tests to opaque pointers (NFC) Conversion was performed (without manual fixup) using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34	2022-10-28 13:07:30 +02:00
Jay Foad	bfcfd53b92	[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662	2022-06-13 21:12:11 +01:00
Mariusz Sikora	2417de2758	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-05-05 06:29:19 +02:00

1 2 3

126 Commits