When set a value to minBits, and doing scalarizer pass, if last remained
boolean vector size can't be aligned to min bits, remained bits should
be processed each by each, and not allowed to do a direct shuffle during
packing.
Problem:
In 'concatenate' step, when processing a boolean vector, if last
remained bits (fragment) can't be aligned to minBits, but required to be
packed, those bits should be processed each by each.
A direct call to vector shuffle is to assume those remained boolean bits
can be packed to target pack size. For example, when processing a
boolean vector with `size = 7`, but set `min bits = 4`, first fragment
with `4` bits can be packed correctly, but there are still `3` bits
remained which can't be used in a vector shuffle call.
Solution:
If remained bits can't be aligned to required target (min bits) pack
size, process them each by each.
(This will mostly only influence boolean vector as they have bit width
not aligned to pow(2).)
---------
Co-authored-by: Zhou, Shaochi(AMD) <shaozhou@amd.com>
fixes#148681fixes#148680
For the scalarizer pass we just need to indicate that scalarization took
place, I used the logic for knowing when to eraseFromParent to indicate
this.
For the DXILLegalizePass the new `legalizeScalarLoadStoreOnArrays` did
not use `ToRemove` which means our uses of !ToRemove.empty(); was no
longer correct. This meant each legalization now needed a means of
indicated if a change was maded.
For DXILResourceAccess.cpp the `Changed` bool was never set to true.
So removed it and replaced it with `!Resources.empty();` since we only
call `replaceAccess` if we have items in Resources.
Fixes#149345
Effectively no-op pairs of insertelement-extractelement instructions
were being created due to the ExtractValueInst visitor in the Scalarizer
storing its scalarized result into the Scattered map using an incorrect
key (specifically the type used in the key).
This PR fixes this issue.
Fixes#127739
The `visitExtractValueInst` is missing a check that was present in
`splitCall` / `visitCallInst`.
This check ensures that each struct element has a VectorSplit, and that
each VectorSplit contains the same number of elements packed per
fragment.
---------
Co-authored-by: Jay Foad <jay.foad@amd.com>
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves#117030
This changes allows target intrinsics to specify and overwrite overloaded types.
- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI
This change will let us add scalarization for `asdouble`: #114847
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306
LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.
## Three changes are needed.
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.
## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`
fixes https://github.com/llvm/llvm-project/issues/111437
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
As discussed in this
[proposal](https://github.com/llvm/wg-hlsl/pull/62/files?short_path=ac6e592#diff-ac6e59276afe8016e307eedc5c835f534c0cb353707760b44df0fa9d905a5cf8).
We had to bring back the legacy pass manager interface for the
scalarizer pass. Two reasons for this:
1. The DirectX backend is still using the legacy pass manager
2. The new PM isn't hooked up in clang yet via `BackendUtil.cpp`'s
`AddEmitPasses` That means even if we add a `buildCodeGenPipeline` we
won't be able to benefit from the new pass manager's scalarizer pass
interface.
The remaining changes are hooking up the scalarizer pass to the DirectX
backend, updating the DirectX test cases,
and allowing the `optdriver` to not block the legacy invocation of the
scalarizer pass.
Future work still needs to be done to allow the scalarizer pass to
handle target specific intrinsics.
closes#105178
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Scalarization can expose optimization opportunities for the individual
elements of a vector, and can therefore be beneficial on targets like
GPUs that tend to operate on scalars anyway.
However, notably with 16-bit operations it is often beneficial to keep
<2 x i16 / half> vectors around since there are packed instructions for
those.
Refactor the code to operate on "fragments" of split vectors. The
fragments are usually scalars, but may themselves be smaller vectors
when the scalarizer-min-bits option is used. If the split is uneven,
the last fragment is a shorter remainder.
This is almost NFC when the new option is unused, but it happens to
clean up some code in the fully scalarized case as well.
Differential Revision: https://reviews.llvm.org/D149842
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.
Differential Revision: https://reviews.llvm.org/D148905
With opaque pointers, it's possible for the same pointer value to be
used to store different vector types (both number and type of elements),
so we need to take that into account when caching the scattering.
Differential Revision: https://reviews.llvm.org/D139359
They are unsupported and would previously crash, now we just skip them.
Hypothetically, one could consider "scalarizing" a <vscale x n x T> into
n copies of <vscale x 1 x T>. But (1) it's unclear how to do that
because insertelement etc. don't work with scalable vectors in the
required way, and (2) there is no user of such functionality.
Differential Revision: https://reviews.llvm.org/D139358
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
ExtractElement does not produce a vector out of a vector, so there's no need to
call a gather once done.
Fix#54469
Credits to npopov@redhat.com for the original approach.
Differential Revision: https://reviews.llvm.org/D126012
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.
The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.
Differential Revision: https://reviews.llvm.org/D124358
Context: I needed this for https://github.com/google/iree/pull/8474 .
I found that TSan instrumentation expects vector sizes to be <= 16,
and in my project (IREE) we have tests with higher vector sizes.
That left some test functions uninstrumented, resulting in crashes as
instrumented code called into them.
Differential Revision: https://reviews.llvm.org/D121182
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions.
This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics.
Patch by vettoreldaniele.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D112472
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
This patch makes Scalarizer to use poison as insertelement's placeholder.
It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program.
It is because the placeholder value (poison) is already completely hidden by following insertelement instructions.
The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times,
which is # of elems of MidTy.
The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with
Count insertelements.
The test diffs show that the poison value is never exposed after insertelements.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93989
The "takeName" logic in ScalarizerVisitor::gather did not consider
that the value vector could refer to non-instructions, such as
global variables. This patch make sure that we avoid changing the
name of a value if it isn't an instruction.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D87685