26 Commits

Author SHA1 Message Date
Mariusz Sikora
2417de2758 [AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Differential Revision: https://reviews.llvm.org/D124232
2022-05-05 06:29:19 +02:00
Piotr Sobczak
c6afbdb5d2 Revert "[AMDGPU] Use d16 flag for image.sample instructions"
This reverts commit d1762fc454c0d7ee0bcffe87e798f67b6c43c1d2.

Reverting D124232 as the buildbot reported some errors in sanitizers.
2022-04-25 17:18:49 +02:00
Mariusz Sikora
d1762fc454 [AMDGPU] Use d16 flag for image.sample instructions
Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Differential Revision: https://reviews.llvm.org/D124232
2022-04-25 13:05:52 +01:00
Sebastian Neubauer
4ed7c6eec9 [AMDGPU] Only match correct type for a16
Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.

Therefore, only zext instructions, not sext instructions should match.

Also match integer constants that can be truncated.

Differential Revision: https://reviews.llvm.org/D118043
2022-01-25 14:59:16 +01:00
Sebastian Neubauer
80532ebb50 [AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876
2022-01-24 18:06:33 +01:00
Sebastian Neubauer
603d18033c [AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042
2022-01-21 12:09:07 +01:00
Sebastian Neubauer
0530fdbbbb [AMDGPU] Fix LOD bias in A16 combine
As the codegen fix in D111754, the LOD bias needs to be converted to 16
bits. Fix this in the combine.

Differential Revision: https://reviews.llvm.org/D116038
2022-01-21 12:09:06 +01:00
Matt Arsenault
45f16eabd6 AMDGPU: Combine is.shared/is.private of null/undef 2021-12-15 18:20:49 -05:00
Kazu Hirata
f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Jacob Lambert
dc6e8dfdfe [AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor. 2021-09-20 14:48:50 -07:00
Daniil Fukalov
48958d02d2 [NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
   and `R600TargetMachine::getSubtargetImpl()` had different return value type
   than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596
2021-08-25 12:01:55 +03:00
Arthur Eubanks
3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Kazu Hirata
0e219b6443 [Target] Construct SmallVector with iterator ranges (NFC) 2021-01-03 09:57:45 -08:00
Juneyoung Lee
9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Piotr Sobczak
c7afb698ca [AMDGPU] Avoid calling copyFastMathFlags in wrong context
Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not instances of FPMathOperator.
2020-12-16 10:22:51 +01:00
Jay Foad
958130dfda [AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028
2020-10-23 16:16:13 +01:00
Jay Foad
86a480e9ce [AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy
Differential Revision: https://reviews.llvm.org/D88955
2020-10-23 09:31:00 +01:00
Fangrui Song
20e9c36c01 Internalize functions from various tools. NFC
And internalize some classes if I noticed them:)
2020-09-26 15:57:13 -07:00
Simon Pilgrim
f026812110 InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
2020-09-17 14:28:42 +01:00
Sebastian Neubauer
833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Sebastian Neubauer
b8d1994778 [AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887
2020-08-20 10:51:49 +02:00
Christopher Tetreault
3b92db4c84 [SVE] Remove bad call to VectorType::getNumElements() from AMDGPU
Differential Revision: https://reviews.llvm.org/D85151
2020-08-03 15:56:10 -07:00
Benjamin Kramer
c6f08b14d4 Hide some internal symbols. NFC. 2020-07-31 17:28:02 +02:00
Sebastian Neubauer
2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00