This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.
---------
Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
Fixes#115767
This PR folds `X udiv Y` to `X lshr cttz(Y)` if Y is a power of two
since bitwise operations are faster than division.
Proof: https://alive2.llvm.org/ce/z/qHmLta
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.
A subsequent patch will address deserialization.
(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.
Fixes#122430
An occasional idiom for rotation is "(A << B) + (A >> (BitWidth - B))".
Currently this is not well handled on targets with native
funnel-shift/rotate support. Add a special case to haveNoCommonBitsSet
to ensure that the addition is converted to a disjoint or in InstCombine
so during instruction selection the idiom can be converted to an
efficient rotation implementation.
Proof: https://alive2.llvm.org/ce/z/WdCZsN
When estimating the cost of entries shuffles for buildvectors, need to
rebuild original mask, not a generated submask, used for subregisters
analysis.
Fixes#122430
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
Also use getPointerAlignment when trying to use alignment and
dereferenceable assumptions. This catches cases where dereferencable is
known via the assumption but alignment is known via getPointerAlignment
(e.g. via argument attribute or align of 1)
PR: https://github.com/llvm/llvm-project/pull/120916
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of
`TargetRegisterInfo::getRegPressureSetLimit` with some logics to
adjust the limit by removing reserved registers.
It seems that we shouldn't use
`TargetRegisterInfo::getRegPressureSetLimit`
directly, just like the comment "This limit must be adjusted
dynamically for reserved registers" said.
Separate from https://github.com/llvm/llvm-project/pull/118787
Currently we emit early-exit related debug messages/remarks even when
there is a single exit. Update to only check isVectorizableEarlyExitLoop
if there isn't a single exit block.
PR: https://github.com/llvm/llvm-project/pull/121994
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.
This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.
```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```
Note: The tests require the VFABI changes from #119000 to pass.
This allows it to produce a more accurate cost for the shuffle, using
the more accurate calls to getShuffleCost in getInstructionCost. It
helps fix some of the regressions from vector combine a little while
ago, now that we have better subvector extract costs.
Need to check if the GEP bases are equal and return false early. Also,
need to return false if the lookup is too deep, considering bases equal
too. Fixes a crash in the assertion.
These can generally be emitted using an ext instruction or mov from the
high half. The half half extracts can be free depending on the users,
but that is not handled here, just the basic costs. It originally
included all subvector extracts, but that was toned-down to just
half-vector extracts to try and help the mid end not breakup high/low
extracts without having the SLP vectorizer create a mess using other
shuffles.
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.
I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.
This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
This patch introduces a new method:
void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses)
const;
The method is called at the end of
Vectorizer::collectEquivalenceClasses() and is needed to merge
equivalence classes that differ only by their underlying objects (UO1
and UO2), where UO1 is 1-level-indirection underlying base for UO2. This
situation arises due to the limited lookup depth used during the search
of underlying bases with llvm::getUnderlyingObject(ptr).
Using any fixed lookup depth can result into creation of multiple
equivalence classes that only differ by 1-level indirection bases.
The new approach merges equivalence classes if they have adjacent bases
(1-level indirection). If a series of equivalence classes form ladder
formed of 1-step/level indirections, they are all merged into a single
equivalence class. This provides more opportunities for the load-store
vectorizer to generate better vectors.
---------
Signed-off-by: Klochkov, Vyacheslav N <vyacheslav.n.klochkov@intel.com>
Optionally (by default) no longer mark callsite nodes as Recursive,
which means they would be automatically skipped during cloning. This was
too conservative as it prevents cloning of any callsite that showed up
in any recursive cycle, even for non-recursive contexts.
While this will enable partial cloning of recursive contexts, the
recursive calls themselves will not be updated to call the correct
clone, possibly leading to some unnecessary but benign cloning and
affecting bytes hinted reporting. To prevent this, optional support
looks for recursive cycles in contexts during cloning and removes
those contexts from cloning. This requires some additional runtime
overhead, so is disabled by default for now.
Support for correct cloning of recursive cycles is WIP.
VPExpandSCEVRecipes must be placed in the entry and are alway uniform.
This fixes a crash by always identifying them as uniform, even if the
main vector loop region has been removed.
Fixes https://github.com/llvm/llvm-project/issues/121897.
- **[InstSimplify] Refactor `simplifyWithOpsReplaced` to allow multiple
replacements; NFC**
- **[InstSimplify] Use multi-op replacement when simplify `select`**
In the case of `select X | Y == 0 :...` or `select X & Y == -1 : ...`
we can do more simplifications by trying to replace both `X` and `Y`
with the respective constant at once.
Handles some cases for https://github.com/llvm/llvm-project/pull/121672
more generically.
Add constant-folding support for the NVVM intrinsics for converting
float/double to signed/unsigned int32/int64 types, including all
rounding-modes and ftz modifiers.
improveShuffleKindFromMask matches this as a SK_InsertSubvector of a v1f32 (which legalises to f32) into a v4f32 base vector, making it easy to recognise. MOVSS is limited to index0.
If the bitwidth is 2 and we add two 1s, the result may overflow.
This is fine in terms of correctness, but triggers the APInt ctor
assertion. Fix this by performing the calculation directly on APInts.
Fixes the issue reported in:
https://github.com/llvm/llvm-project/pull/114539#issuecomment-2574845003