llvm-project

Author	SHA1	Message	Date
Artem Tyurin	141145232f	[IRBuilder] Fold binary intrinsics (#80743 ) Fixes https://github.com/llvm/llvm-project/issues/61240.	2024-03-15 09:58:25 +01:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Matt Arsenault	65f486c45d	AMDGPU: Simplify else if to just else in AMDGPUInstCombineIntrinsic Fixes #79738	2024-01-30 08:17:03 +05:30
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Nikita Popov	bc7ca9170f	[AMDGPUInstCombine] Avoid use of ConstantExpr::getSExt() (NFC) Let the IRBuilder handle the constant folding instead.	2023-10-02 11:12:04 +02:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	61c8af6792	AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16 There's nothing special about f16 sqrt handling. https://reviews.llvm.org/D158090	2023-08-23 20:30:40 -04:00
Matt Arsenault	7c4aa3b37e	AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq We currently have some wrong combines in the backend that approximately do this. https://reviews.llvm.org/D158002	2023-08-16 10:04:13 -04:00
Matt Arsenault	5ccfc4543d	AMDGPU: Fold away mbcnt.hi in wave32 mode This will allow libraries to drop some of the special casing based on wave size.	2023-06-30 15:04:03 -04:00
Matt Arsenault	d9333e360a	Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874. Accidentally pushed wrong patch	2023-06-16 18:13:07 -04:00
Matt Arsenault	1159c670d4	AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp	2023-06-16 18:06:27 -04:00
Jay Foad	84313162bf	[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec) Rationale: - It does not enable any further IR simplifications. - It does not improve the generated code since the isel lowering of ballot also has special cases for 0 and 1. - getreg is "too powerful" since it can read from many different registers, so its intrinsic properties have to be set very conservatively. There is also a correctness problem that getreg can read from exec but it is currently not marked as convergent. Differential Revision: https://reviews.llvm.org/D153047	2023-06-16 17:15:52 +01:00
Mateja Marjanovic	7047cb5203	[AMDGPU] Trim trailing undefs from the end of image and buffer store Remove undef values from the end of the vector operand in image and buffer store instructions. Also instead of call to computeKnownFPClass, use only findScalarElement. Continuation of: 88421ea973916e Trim zero components from buffer and image stores Differential Revision: https://reviews.llvm.org/D152440	2023-06-15 15:19:36 +02:00
Matt Arsenault	c6aaa0b14f	AMDGPU: Perform basic folds on llvm.amdgcn.exp2	2023-06-15 07:01:06 -04:00
Matt Arsenault	10717f9294	AMDGPU: Add basic folds for llvm.amdgcn.log	2023-06-12 21:10:30 -04:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Mateja Marjanovic	c91246b71e	fix failures caused by https://reviews.llvm.org/D146737 buildbot: https://lab.llvm.org/buildbot/#/builders/77/builds/27340	2023-06-05 13:10:27 +02:00
Mateja Marjanovic	88421ea973	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed by: arsenm, foad, nhaehnle, piotr	2023-06-05 12:30:21 +02:00
Haojian Wu	55635433a8	Fix isKnownNeverInfOrNaN() call in AMDGPU after ORE removal 97b5cc214aee48e30391bfcd2cde4252163d7406	2023-06-02 09:32:46 +02:00
Matt Arsenault	8609df7c6e	AMDGPU: Refine undef handling for llvm.amdgcn.class intrinsic This barely matters since 99% are converted to the generic intrinsic now, and the only real difference is the target intrinsic supports a variable test mask. Start propagating poison. Prefer folding to a defined result (false) for an undef test mask. Propagate undef for the first operand.	2023-06-01 18:35:55 -04:00
Matt Arsenault	9ef1333bf4	AMDGPU: Replace certain llvm.amdgcn.class uses with llvm.is.fpclass Most transforms should now be performed on llvm.is.fpclass. Unlike the generic intrinsic, this supports variable test masks.	2023-05-24 21:49:52 +01:00
Mateja Marjanovic	9c8c31eea4	Revert "[AMDGPU] Trim zero components from buffer and image stores" This reverts commit 3181a6e3e7dae9292782216a55c5e1f0583c1668.	2023-05-18 17:02:01 +02:00
Matt Arsenault	8f3e64624c	AMDGPU: Fold fmed3 of fpext sources to f16 fmed3 InstCombine already does this for minnum/maxnum. If we also apply this to fmed3, we don't need to explicitly use 16-bit fmed3 if we're not sure the target supports 16-bit instructions yet.	2023-05-18 08:34:46 +01:00
Matt Arsenault	86d0b524f3	ValueTracking: Expand signature of isKnownNeverInfinity/NaN This is in preparation for replacing the implementation with a wrapper around computeKnownFPClass.	2023-05-16 20:42:58 +01:00
Mateja Marjanovic	3181a6e3e7	[AMDGPU] Trim zero components from buffer and image stores For image and buffer stores the default behaviour on GFX11 and older is to set all unset components to zero. So if we pass only X component it will be the same as X000, or XY same as XY00. This patch simplifies the passed vector of components in InstCombine by removing zero components from the end. For image stores it also trims DMask if necessary. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D146737	2023-05-15 18:23:27 +02:00
Matt Arsenault	1bce1beac4	AMDGPU: Reduce number of calls to computeKnownFPClass and pass all arguments Makes assumes work for this case.	2023-04-26 13:02:17 -04:00
Craig Topper	1f60c8d025	[IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC There is no getNullValue in ConstantFP. Due to inheritance, we're calling Constant::getNullValue which handles any type including FP. Since we already know we want an FP constant we can use ConstantFP::getZero which might be faster and is a more readable name for an FP zero.	2023-04-03 23:14:02 -07:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Kazu Hirata	cbde2124f1	Use APInt::popcount instead of APInt::countPopulation (NFC) This is for consistency with the C++20-style bit manipulation functions in <bit>.	2023-02-19 11:29:12 -08:00
Sanjay Patel	83ba349ae0	[InstSimplify] fix/improve folding with an SNaN operand There are 2 issues here: 1. In the default LLVM FP environment (regular FP math instructions), SNaN is some flavor of "don't care" which we will nail down in D143074, so this is just a quality-of-implementation improvement for default FP. 2. In the constrained FP environment (constrained intrinsics), SNaN must not propagate through a math operation; it has to be quieted according to IEEE-754 spec. That is independent of exception handling mode, so the current behavior is a miscompile. Differential Revision: https://reviews.llvm.org/D143505	2023-02-14 17:51:06 -05:00
Kazu Hirata	caa99a01f5	Use llvm::popcount instead of llvm::countPopulation(NFC)	2023-01-22 12:48:51 -08:00
Jay Foad	821c7be8e6	[AMDGPU] Simplify simplifyAMDGCNMemoryIntrinsicDemanded. NFC.	2022-12-22 11:50:04 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Krzysztof Parzyszek	86fe4dfdb6	TargetTransformInfo: convert Optional to std::optional Recommit: added missing "#include <cstdint>".	2022-12-02 11:42:15 -08:00
Krzysztof Parzyszek	4e12d1836a	Revert "TargetTransformInfo: convert Optional to std::optional" This reverts commit b83711248cb12639e7ef7303cfbb4452b4067e85. Some buildbots are failing.	2022-12-02 11:34:04 -08:00
Krzysztof Parzyszek	b83711248c	TargetTransformInfo: convert Optional to std::optional	2022-12-02 11:27:12 -08:00
Matt Arsenault	a58541f14d	AMDGPU: Fold llvm.amdgcn.sqrt(undef)	2022-11-11 17:02:19 -08:00
Matt Arsenault	3e4280c04d	AMDGPU: Disable some class simplifications for strictfp	2022-11-11 09:22:37 -08:00
Matt Arsenault	8ea3cf4b70	AMDGPU: Use generic is.fpclass enum instead of locally defined copy The generic intrinsic uses the same bitlayout as the amdgcn intrinsic, so re-use the enum.	2022-11-10 19:22:00 -08:00
Jay Foad	445a483b41	[AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row Differential Revision: https://reviews.llvm.org/D127671	2022-06-16 18:23:14 +01:00
Jay Foad	bfcfd53b92	[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662	2022-06-13 21:12:11 +01:00
Mariusz Sikora	2417de2758	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-05-05 06:29:19 +02:00
Piotr Sobczak	c6afbdb5d2	Revert "[AMDGPU] Use d16 flag for image.sample instructions" This reverts commit d1762fc454c0d7ee0bcffe87e798f67b6c43c1d2. Reverting D124232 as the buildbot reported some errors in sanitizers.	2022-04-25 17:18:49 +02:00
Mariusz Sikora	d1762fc454	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-04-25 13:05:52 +01:00
Sebastian Neubauer	4ed7c6eec9	[AMDGPU] Only match correct type for a16 Addresses are floats when a sampler is present and unsigned integers when no sampler is present. Therefore, only zext instructions, not sext instructions should match. Also match integer constants that can be truncated. Differential Revision: https://reviews.llvm.org/D118043	2022-01-25 14:59:16 +01:00

1 2

72 Commits