llvm-project

Author	SHA1	Message	Date
Mariusz Sikora	2417de2758	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-05-05 06:29:19 +02:00
Piotr Sobczak	c6afbdb5d2	Revert "[AMDGPU] Use d16 flag for image.sample instructions" This reverts commit d1762fc454c0d7ee0bcffe87e798f67b6c43c1d2. Reverting D124232 as the buildbot reported some errors in sanitizers.	2022-04-25 17:18:49 +02:00
Mariusz Sikora	d1762fc454	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-04-25 13:05:52 +01:00
Sebastian Neubauer	4ed7c6eec9	[AMDGPU] Only match correct type for a16 Addresses are floats when a sampler is present and unsigned integers when no sampler is present. Therefore, only zext instructions, not sext instructions should match. Also match integer constants that can be truncated. Differential Revision: https://reviews.llvm.org/D118043	2022-01-25 14:59:16 +01:00
Sebastian Neubauer	80532ebb50	[AMDGPU][InstCombine] Remove zero image offset Remove the offset parameter if it is zero. Differential Revision: https://reviews.llvm.org/D117876	2022-01-24 18:06:33 +01:00
Sebastian Neubauer	603d18033c	[AMDGPU][InstCombine] Remove zero LOD bias If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines. Differential Revision: https://reviews.llvm.org/D116042	2022-01-21 12:09:07 +01:00
Sebastian Neubauer	0530fdbbbb	[AMDGPU] Fix LOD bias in A16 combine As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine. Differential Revision: https://reviews.llvm.org/D116038	2022-01-21 12:09:06 +01:00
Matt Arsenault	45f16eabd6	AMDGPU: Combine is.shared/is.private of null/undef	2021-12-15 18:20:49 -05:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Daniil Fukalov	48958d02d2	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596	2021-08-25 12:01:55 +03:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Piotr Sobczak	c7afb698ca	[AMDGPU] Avoid calling copyFastMathFlags in wrong context Calling Instruction::copyFastMathFlags() assumes the caller is FPMathOperator. Avoid calling the function for instructions that are not instances of FPMathOperator.	2020-12-16 10:22:51 +01:00
Jay Foad	958130dfda	[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy. Differential Revision: https://reviews.llvm.org/D90028	2020-10-23 16:16:13 +01:00
Jay Foad	86a480e9ce	[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy Differential Revision: https://reviews.llvm.org/D88955	2020-10-23 09:31:00 +01:00
Fangrui Song	20e9c36c01	Internalize functions from various tools. NFC And internalize some classes if I noticed them:)	2020-09-26 15:57:13 -07:00
Simon Pilgrim	f026812110	InstCombiner.h - remove unnecessary KnownBits.h include. NFCI. Move the include down to cpp files with an implicit dependency.	2020-09-17 14:28:42 +01:00
Sebastian Neubauer	833b3b0d3a	[AMDGPU] Add v3f16/v3i16 support to SDag Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them. This patch only implements it for the selection dag. GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work. Differential Revision: https://reviews.llvm.org/D84420	2020-09-16 17:20:27 +02:00
Sebastian Neubauer	b8d1994778	[AMDGPU] Add A16/G16 to InstCombine When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target hardware supports it. An alternative would be to always apply this combination, independent of the target hardware and extend 16 bit arguments to 32 bit arguments during legalization. To me, this sounds like an unnecessary roundtrip that could prevent some further InstCombine optimizations. Differential Revision: https://reviews.llvm.org/D85887	2020-08-20 10:51:49 +02:00
Christopher Tetreault	3b92db4c84	[SVE] Remove bad call to VectorType::getNumElements() from AMDGPU Differential Revision: https://reviews.llvm.org/D85151	2020-08-03 15:56:10 -07:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00

26 Commits