111 Commits

Author SHA1 Message Date
Chow
3e008cb333
Scalarizer : Fix vector shuffle issue when can't aligned to customized minBits. (#163912)
When set a value to minBits, and doing scalarizer pass, if last remained
boolean vector size can't be aligned to min bits, remained bits should
be processed each by each, and not allowed to do a direct shuffle during
packing.

Problem:
In 'concatenate' step, when processing a boolean vector, if last
remained bits (fragment) can't be aligned to minBits, but required to be
packed, those bits should be processed each by each.

A direct call to vector shuffle is to assume those remained boolean bits
can be packed to target pack size. For example, when processing a
boolean vector with `size = 7`, but set `min bits = 4`, first fragment
with `4` bits can be packed correctly, but there are still `3` bits
remained which can't be used in a vector shuffle call.

Solution:
If remained bits can't be aligned to required target (min bits) pack
size, process them each by each.
(This will mostly only influence boolean vector as they have bit width
not aligned to pow(2).)

---------

Co-authored-by: Zhou, Shaochi(AMD) <shaozhou@amd.com>
2025-12-08 18:06:39 +00:00
Rahul Joshi
4b87d5861d
[NFC][LLVM] Namespace cleanup in Scalarizer.cpp (#163766) 2025-10-16 10:15:55 -07:00
Farzon Lotfi
581ba1cbf7
[DirectX] Fix crash in passes when building with LLVM_ENABLE_EXPENSIVE_CHECKS (#150483)
fixes #148681
fixes #148680

For the scalarizer pass we just need to indicate that scalarization took
place, I used the logic for knowing when to eraseFromParent to indicate
this.

For the DXILLegalizePass  the new `legalizeScalarLoadStoreOnArrays` did
not use `ToRemove` which means our uses of !ToRemove.empty(); was no
longer correct. This meant each legalization now needed a means of
indicated if a change was maded.

For DXILResourceAccess.cpp the `Changed` bool was never set to true.
So removed it and replaced it with `!Resources.empty();` since we only
call `replaceAccess` if we have items in Resources.
2025-07-24 17:17:47 -04:00
Deric C.
0c14f0e891
[Scalarizer] Use correct key for ExtractValueInst gather (#149855)
Fixes #149345

Effectively no-op pairs of insertelement-extractelement instructions
were being created due to the ExtractValueInst visitor in the Scalarizer
storing its scalarized result into the Scattered map using an incorrect
key (specifically the type used in the key).
This PR fixes this issue.
2025-07-21 17:12:15 -07:00
Deric C.
1440f02259
[Scalarizer] Ensure valid VectorSplits for each struct element in visitExtractValueInst (#128538)
Fixes #127739 

The `visitExtractValueInst` is missing a check that was present in
`splitCall` / `visitCallInst`.
This check ensures that each struct element has a VectorSplit, and that
each VectorSplit contains the same number of elements packed per
fragment.

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2025-03-04 13:10:31 -08:00
Finn Plummer
45c01e8a33
[NFC][TargetTransformInfo][VectorUtils] Consolidate isVectorIntrinsic... api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
  consistently name them
- move `isTriviallyScalarizable` to VectorUtils
  
- update all uses of the api and provide the TTI parameter

Resolves #117030
2024-12-19 11:54:26 -08:00
Finn Plummer
8663b8777e
[NFC][VectorUtils][TargetTransformInfo] Add isVectorIntrinsicWithOverloadTypeAtArg api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.

- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI

This change will let us add scalarization for `asdouble`:  #114847
2024-11-21 11:04:25 -08:00
Farzon Lotfi
21b3769d1d
[Scalarizer] Fix to only scalarize if intrinsic was marked as isTriviallyScalarizable (#113625)
fixes #113624
2024-10-24 23:26:02 -07:00
Farzon Lotfi
dcbf2c2ca0
[Scalarizer][DirectX] support structs return types (#111569)
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306

LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.

## Three changes are needed. 
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.

## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`

fixes https://github.com/llvm/llvm-project/issues/111437
2024-10-21 12:51:01 -04:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Farzon Lotfi
63a0a81e73
[NFC][Scalarizer][TargetTransformInfo] Add isTargetIntrinsicWithScalarOpAtArg api (#111441)
This change allows target intrinsics can have scalar args

fixes [111440](https://github.com/llvm/llvm-project/issues/111440)

This change will let us add scalarization for WaveReadLaneAt:
https://github.com/llvm/llvm-project/pull/111010
2024-10-07 19:57:07 -04:00
Matt Arsenault
1bc9b67bd8
Scalarizer: Replace cl::opts with pass parameters (#110645)
Preserve the existing defaults (although load-store defaulting
to false is a really bad one). Also migrate DirectX tests to new PM.
2024-10-02 14:45:26 +04:00
Rahul Joshi
1b7b3b8d35
[NFC] Move intrinsic related functions to Intrinsic namespace (#110125)
Move static functions `Function::lookupIntrinsicID` and
`Function::isTargetIntrinsic` to Intrinsic namespace.
2024-09-30 07:42:53 -07:00
Farzon Lotfi
0f97b4824a
[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776)
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
2024-09-17 11:35:42 -04:00
Farzon Lotfi
c05e29bff0
[LegacyPM][DirectX] Add legacy scalarizer back for use in the DirectX backend (#107427)
As discussed in this
[proposal](https://github.com/llvm/wg-hlsl/pull/62/files?short_path=ac6e592#diff-ac6e59276afe8016e307eedc5c835f534c0cb353707760b44df0fa9d905a5cf8).
We had to bring back the legacy pass manager interface for the
scalarizer pass. Two reasons for this:
1. The DirectX backend is still using the legacy pass manager
2. The new PM isn't hooked up in clang yet via `BackendUtil.cpp`'s
`AddEmitPasses` That means even if we add a `buildCodeGenPipeline` we
won't be able to benefit from the new pass manager's scalarizer pass
interface.

The remaining changes are hooking up the scalarizer pass to the DirectX
backend, updating the DirectX test cases,
and allowing the `optdriver` to not block the legacy invocation of the
scalarizer pass.

Future work still needs to be done to allow the scalarizer pass to
handle target specific intrinsics.

closes #105178
2024-09-12 15:53:50 -04:00
Kazu Hirata
4b28b3fae4
[Transforms] Use range-based for loops (NFC) (#97195) 2024-07-02 16:20:44 -07:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Aiden Grossman
2470857fe7
[NewPM] Remove ScalarizerLegacyPass (#72814)
This pass isn't used anywhere upstream and thus has no test coverage.
Because of these reasons, remove it.
2023-11-20 01:09:27 -08:00
Kazu Hirata
697082de74
[Scalar] Use LLVMContext::MD_mem_parallel_loop_access directly (NFC) (#69549)
This patch "constant propagates"
LLVMContext::MD_mem_parallel_loop_access into wherever
ParallelLoopAccessMDKind is used.
2023-10-19 18:38:25 -07:00
Kazu Hirata
0187960cdd [Scalar] Use LLVMContext::MD_mem_parallel_loop_access (NFC) 2023-10-15 00:14:14 -07:00
Mikhail Goncharov
74f4daef04 fix unused variable warnings in conditionals
for 92023b15099012a657da07ebf49dd7d94a260f84
2023-08-30 14:36:42 +02:00
Nuno Lopes
31d8bdbcad [Scalarizer] Fold -1 mask in shufflevector to poison instead of undef
Per latest LangRef
2023-07-23 15:02:23 +01:00
Nikita Popov
9cf5254878 [llvm] Remove some uses of isOpaqueOrPointeeTypeEquals() (NFC) 2023-07-18 11:18:31 +02:00
Nicolai Hähnle
2cb5c6d124 Scalarizer: limit scalarization for small element types
Scalarization can expose optimization opportunities for the individual
elements of a vector, and can therefore be beneficial on targets like
GPUs that tend to operate on scalars anyway.

However, notably with 16-bit operations it is often beneficial to keep
<2 x i16 / half> vectors around since there are packed instructions for
those.

Refactor the code to operate on "fragments" of split vectors. The
fragments are usually scalars, but may themselves be smaller vectors
when the scalarizer-min-bits option is used. If the split is uneven,
the last fragment is a shorter remainder.

This is almost NFC when the new option is unused, but it happens to
clean up some code in the fully scalarized case as well.

Differential Revision: https://reviews.llvm.org/D149842
2023-06-13 21:14:32 +02:00
Jay Foad
63901cb082 [Scalarizer] Scalarize freeze instruction
Differential Revision: https://reviews.llvm.org/D152518
2023-06-09 13:54:24 +01:00
Nicolai Hähnle
d0a125a1e6 Scalarizer: use the canonical form of {extract,insert}element
This leads to a bunch of trivial test churn, plus some extra test changes
that are purely due to update_test_checks.

Pulled out of https://reviews.llvm.org/D149842 as a preparatory change.

Differential Revision: https://reviews.llvm.org/D149944
2023-05-05 13:05:31 +02:00
Jay Foad
593e25ffae [Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.

Differential Revision: https://reviews.llvm.org/D148905
2023-04-24 13:42:08 +01:00
Fangrui Song
3152156334 [Transforms/Scalar] llvm::Optional => std::optional 2022-12-13 08:05:14 +00:00
Nicolai Hähnle
6c379cb318 Scalarizer: fix an opaque pointer bug
With opaque pointers, it's possible for the same pointer value to be
used to store different vector types (both number and type of elements),
so we need to take that into account when caching the scattering.

Differential Revision: https://reviews.llvm.org/D139359
2022-12-08 20:48:14 +01:00
Nicolai Hähnle
1a78c64654 Scalarizer: explicitly exclude scalable vectors
They are unsupported and would previously crash, now we just skip them.

Hypothetically, one could consider "scalarizing" a <vscale x n x T> into
n copies of <vscale x 1 x T>. But (1) it's unclear how to do that
because insertelement etc. don't work with scalable vectors in the
required way, and (2) there is no user of such functionality.

Differential Revision: https://reviews.llvm.org/D139358
2022-12-08 20:48:14 +01:00
Kazu Hirata
595f1a6aaf [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 19:47:13 -08:00
Kazu Hirata
343de6856e [Transforms] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:37 -08:00
Manuel Brito
1e55d5b1f2 Use poison instead of undef as placeholder for vector construction [NFC]
Differential Revision: https://reviews.llvm.org/D138450
2022-11-21 18:43:23 +00:00
Thomas Symalla
fc26a75280 [NFC] Fixed several misspellings of "Splitter" in Scalarizer
Spliiter => Splitter
2022-10-22 15:13:56 +02:00
Nuno Lopes
0586d1cac2 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-06-30 21:47:31 +01:00
serge-sans-paille
aaf1630ac3 [Scalarizer] No need to gather a scattered extracted element
ExtractElement does not produce a vector out of a vector, so there's no need to
call a gather once done.

Fix #54469

Credits to npopov@redhat.com for the original approach.

Differential Revision: https://reviews.llvm.org/D126012
2022-06-21 18:43:54 +02:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
David Green
6f81903e89 [LV][SLP] Mark fptosi_sat as vectorizable
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.

The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.

Differential Revision: https://reviews.llvm.org/D124358
2022-05-03 09:32:34 +01:00
David Green
9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Benoit Jacob
9879c555f2 Expose ScalarizerPass options to C++ (not just commandline)
Context: I needed this for https://github.com/google/iree/pull/8474 .
I found that TSan instrumentation expects vector sizes to be <= 16,
and in my project (IREE) we have tests with higher vector sizes.
That left some test functions uninstrumented, resulting in crashes as
instrumented code called into them.

Differential Revision: https://reviews.llvm.org/D121182
2022-03-14 12:00:35 +01:00
Nikita Popov
c262ba2aab [Scalarizer] Avoid pointer element type accesses
Pass through the load/store type to the Scatterer instead.
2022-03-03 10:28:58 +01:00
serge-sans-paille
59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Nikita Popov
aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Daniele Vettorel
67887b0f81 [Scalarizer] Do not insert instructions between PHI nodes and debug intrinsics.
The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions.

This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics.

Patch by vettoreldaniele.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D112472
2021-11-02 09:53:59 -04:00
Kazu Hirata
4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Bjorn Pettersson
4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Juneyoung Lee
1fc992bd86 [Scalarizer] Use poison as insertelement's placeholder
This patch makes Scalarizer to use poison as insertelement's placeholder.

It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program.
It is because the placeholder value (poison) is already completely hidden by following insertelement instructions.

The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times,
which is # of elems of MidTy.
The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with
Count insertelements.

The test diffs show that the poison value is never exposed after insertelements.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93989
2021-01-04 00:35:28 +09:00
Bjorn Pettersson
aa8be5aeea [Scalarizer] Avoid changing name of non-instructions
The "takeName" logic in ScalarizerVisitor::gather did not consider
that the value vector could refer to non-instructions, such as
global variables. This patch make sure that we avoid changing the
name of a value if it isn't an instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D87685
2020-09-15 14:15:50 +02:00