36 Commits

Author SHA1 Message Date
Krzysztof Drewniak
25d976b45c
[ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (#104842)
ScalarizedMaskedMemIntr contains an optimization where the <N x i1> mask
is bitcast into an iN and then bit-tests with powers of two are used to
determine whether to load/store/... or not.

However, on machines with branch divergence (mainly GPUs), this is a
mis-optimization, since each i1 in the mask will be stored in a
condition register - that is, ecah of these "i1"s is likely to be a word
or two wide, making these bit operations counterproductive.

Therefore, amend this pass to skip the optimizaiton on targets that it
pessimizes.

Pre-commit tests #104645
2024-08-22 19:02:45 -05:00
Krzysztof Drewniak
70995a1a33
[ScalarizeMaskedMemIntr] Optimize splat non-constant masks (#104537)
In cases (like the ones added in the tests) where the condition of a
masked load or store is a splat but not a constant (that is, a masked
operation is being used to implement patterns like "load if the current
lane is in-bounds, otherwise return 0"), optimize the 'scalarized' code
to perform an aligned vector load/store if the splat constant is true.

Additionally, take a few steps to preserve aliasing information and
names when nothing is scalarized while I'm here.

As motivation, some LLVM IR users will genatate masked load/store in
cases that map to this kind of predicated operation (where either the
vector is loaded/stored or it isn't) in order to take advantage of
hardware primitives, but on AMDGPU, where we don't have a masked load or
store, this pass would scalarize a load or store that was intended to be
- and can be - vectorized while also introducing expensive branches.

Fixes #104520

Pre-commit tests at #104527
2024-08-16 16:24:25 -05:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Graham Hunter
2b15c4a62b
[AArch64] Postcommit fixes for histogram intrinsic (#92095)
A buildbot with expensive checks enabled flagged some problems with my patch. There was also a post-commit nit on the langref changes.
2024-05-14 15:16:42 +01:00
Graham Hunter
fbb37e9606
[AArch64] Add an all-in-one histogram intrinsic
Based on discussion from
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788

Current interface is:

llvm.experimental.histogram(<vecty> ptrs, <intty> inc_amount, <vecty> mask)

The integer type used by 'inc_amount' needs to match the type of the buckets in memory.

The intrinsic covers the following operations:
  * Gather load
  * histogram on the elements of 'ptrs'
  * multiply the histogram results by 'inc_amount'
  * add the result of the multiply to the values loaded by the gather
  * scatter store the results of the add

Supports lowering to histcnt instructions for AArch64 targets, and scalarization for all others at present.
2024-05-13 11:35:28 +01:00
Kolya Panchenko
889d99a50f
[TTI] Add alignment argument to TTI for compress/expand support (#83516)
Since `llvm.compressstore` and `llvm.expandload` do require memory
access, it's essential for some target to check if alignment is good to
be able to lower them to target-specific instructions
2024-03-05 20:33:56 -05:00
Yeting Kuo
61c283db4b
[ScalarizeMaskedMemIntrin] Use pointer alignment from pointer of masked.compressstore/expandload. (#83519)
Previously we used Align(1) for all scalarized load/stores from
masked.compressstore/expandload.
For targets not supporting unaligned accesses, it make backend need to
split
aligned large width loads/stores to byte loads/stores.
To solve this performance issue, this patch preserves the alignment of
base
pointer after scalarizing.
2024-03-04 09:41:16 +08:00
Nuno Lopes
68f1391a62 [ScalarizeMaskedMemIntrin] Use poison instead of undef as placeholder [NFC]
This is used for masked out lanes, that are replaced with the passthrough value
2023-07-17 10:11:14 +01:00
Youngsuk Kim
d22a236ae7 [llvm] Replace use of Type::getPointerTo() (NFC)
Partial progress towards replacing in-tree uses of
`Type::getPointerTo()`.

If `getPointerTo()` is used solely to support an unnecessary bitcast,
remove the bitcast.

Reviewed By: barannikov88, nikic

Differential Revision: https://reviews.llvm.org/D153307
2023-06-23 22:32:29 -04:00
ManuelJBrito
d22edb9794 [IR][NFC] Change UndefMaskElem to PoisonMaskElem
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.

Differential Revision: https://reviews.llvm.org/D149256
2023-04-27 18:01:54 +01:00
Kazu Hirata
6288a09c94 [Scalar] Use std::optional in ScalarizeMaskedMemIntrin.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 17:39:36 -08:00
Kazu Hirata
1f914944b6 Don't use Optional::getPointer (NFC)
Since std::optional does not offer getPointer(), this patch replaces
X.getPointer() with &*X to make the migration from llvm::Optional to
std::optional easier.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Differential Revision: https://reviews.llvm.org/D138466
2022-11-21 19:03:40 -08:00
Manuel Brito
1e55d5b1f2 Use poison instead of undef as placeholder for vector construction [NFC]
Differential Revision: https://reviews.llvm.org/D138450
2022-11-21 18:43:23 +00:00
Kazu Hirata
0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
serge-sans-paille
59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Rosie Sumpter
552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
Kazu Hirata
f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Kazu Hirata
8e86c0e4f4 [Scalar] Use make_early_inc_range (NFC) 2021-09-12 08:17:18 -07:00
Craig Topper
066524ea54 [ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0.
Previously we used the vector type, but we're loading/storing
invididual elements so I think only element alignment should matter.

Noticed while looking at the code for something else so I don't
have a test case.

Differential Revision: https://reviews.llvm.org/D105220
2021-07-01 19:08:47 -07:00
Markus Lavin
9498315c9b Expand masked mem intrinsics correctly wrt big-endian
Need to take endianness into account when doing vector to scalar casts
such as %bc = bitcast <8 x i1> %v to i8
Companion commit for https://reviews.llvm.org/D94867
Upload in response to
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html
Attempting to document the actual memory layout rules for vectors in
https://reviews.llvm.org/D94964

Differential Revision: https://reviews.llvm.org/D94765
2021-02-11 08:59:52 +00:00
Yang Fan
59bd2068e9
[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning
GCC warning:
```
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst*, llvm::DomTreeUpdater*, bool&)’:
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable]
  295 |   BasicBlock *IfBlock = CI->getParent();
      |               ^~~~~~~
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst*, llvm::DomTreeUpdater*, bool&)’:
/llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable]
  555 |   BasicBlock *IfBlock = CI->getParent();
      |               ^~~~~~~
```
2021-01-29 15:15:58 +08:00
Roman Lebedev
056385921d
[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable
This de-pessimizes the arguably more usual case of no masked mem intrinsics,
and gets rid of one more Dominator Tree recalculation.

As per llvm/test/CodeGen/X86/opt-pipeline.ll,
there's one more Dominator Tree recalculation left, we could get rid of.
2021-01-29 01:11:36 +03:00
Roman Lebedev
573f74117b
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev
2e4bb3f119
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev
e8efc03a1e
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:35 +03:00
Roman Lebedev
1356399a11
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Roman Lebedev
22b8421156
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Roman Lebedev
0ea45a412a
[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen()
Makes Dominator Tree preservation in a followup patch somewhat easier.
2021-01-29 01:11:34 +03:00
Mariya Podchishchaeva
7113de301a [ScalarizeMaskedMemIntrin] Add missing dependency
The pass has dependency on 'TargetTransformInfoWrapperPass', but the
corresponding call to INITIALIZE_PASS_DEPENDENCY was missing.

Differential Revision: https://reviews.llvm.org/D94916
2021-01-19 22:33:47 +03:00
Anna Thomas
29356e3279 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Benjamin Kramer
5f18e2f31e Move createScalarizeMaskedMemIntrinPass to Scalar.h 2020-12-08 19:08:09 +01:00
Benjamin Kramer
10987e30be Remove unused include. NFC.
This is also a layering violation.
2020-12-08 19:03:56 +01:00
Anna Thomas
09f2f9605f [ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms
ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass
is actually operating on IR level and does not use any code gen specific
passes.  It is useful to move it into transforms directory so that it
can be more widely used as a mid-level transform as well (apart from
usage in codegen pipeline).
In particular, we have a usecase downstream where we would like to use
this pass in our mid-level pipeline which operates on IR level.

The next change will be to add support for new PM.

Reviewers: craig.topper, apilipenko, skatkov
Reviewed-By: skatkov
Differential Revision: https://reviews.llvm.org/D92407
2020-12-08 12:25:58 -05:00