Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.
Fixes#178403
Recommit after revert in 993e1f66afcfe9da03bd813e669eada341b11d2f
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/180635
This fixes a miscompile in #180005 where we didn't check that the first
incoming value isn't poison.
We should use the first non-poison incoming value if it exists, or just
poison if all the incoming values are poison.
Since TargetTransformInfo::enableAggressiveInterleaving(bool
HasReductions) takes the HasReductions argument, the LoopVectorizer
should save its returned value in a variable called AggressivelyInterleave
instead of AggressivelyInterleaveReductions.
urem x, n: result < n (remainder is always less than divisor)
urem x, n: result <= x (remainder is at most the dividend)
udiv x, n: result <= x (quotient is at most the dividend)
https://alive2.llvm.org/ce/z/ezzsjQ
Model zext i1 %x to in as select i1 %x, in 1, in 0 in case, if there are
other select instructions, which can be combined into a bundle.
Fixes#178403
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/180635
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.
Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.
The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
sub(0, mul(ext(a), ext(b)))
-> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.
For AArch64 there are no dot-product instructions that can
do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation.
I'm not sure if such instructions exist for other targets.
(If so then we may want to make this decision a target option)
This PR also increases the AArch64 cost of a partial sub-reduction
when this exists in an 'add-sub' reduction chain.
Fixes https://github.com/llvm/llvm-project/issues/178703
The heuristic for deciding which scope line to use for a continuation
funclet relies on iterating on the instructions of the first BB of the
continuation. Often, this contains a single unconditional branch, which
is skipped by the heuristic. However, in coro-retcon, two such
"jump-only" BBs are generated. This patch amends the heuristic to
account for that.
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).
For example, the following loop can now be vectorized:
```
int simple_csa_int_load(
int* a, int* b, int default_val, int N, int threshold)
{
int result = default_val;
for (int i = 0; i < N; ++i)
if (a[i] > threshold)
result = b[i];
return result;
}
```
It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
---
Reverts llvm/llvm-project#180275 (original PR: #178862)
Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.
Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.
I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
Same as https://github.com/llvm/llvm-project/pull/177988, but for
fminimum_num/fmaximum_num. Directly canonicalize these to the
corresponding intrinsics, and let the shrinking happen directly on the
intrinsics.
When a type test has two phases and is used by llvm.cond.loop to
implement a conditional trap, it is more efficient for two infinite
loops to be generated. Arrange for this by having the pass detect the
typical IR pattern used for conditional CFI traps and generate the second
llvm.cond.loop if found.
Part of this RFC:
https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456
Reviewers: fmayer, vitalybuka
Reviewed By: vitalybuka
Pull Request: https://github.com/llvm/llvm-project/pull/177687
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.
Fixes#174617
---
This patch is a reland of #174835.
Reverts #179819
This PR adds a small, targeted InstCombine fold for the pattern:
```
%idx = srem i64 %x, 2^k
%p = getelementptr inbounds nuw i8, ptr %base, i64 %idx
```
When the GEP is inbounds + nuw, and the divisor is a non-zero
power-of-two constant, the signed remainder cannot produce a negative
offset without violating the inbounds/nuw constraints. In that case we
can canonicalize the index to a non-negative form and expose the common
power-of-two rewrite:
- Rewrite the GEP index from `srem %x, 2^k` to `urem %x, 2^k`
- Create a new GEP with the new index and replace the original GEP
- the `urem %x, 2^k` will further folds to `and %x (2^k-1)`
resulting the following pattern
```
%idx = and i64 %x, (2^k-1)
%p = getelementptr inbounds nuw i8, ptr %base, i64 %idx
```
Fixes#180097.
generalized alive2 proof: https://alive2.llvm.org/ce/z/8EBxug
In some cases we decide to vectorise loops with first-order recurrences
using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF,
however when trying to erase the list of values from the parent we
trigger the following assert:
```
virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty()
&& "trying to delete a VPRecipeValue with remaining users"' failed.
```
The problem seems to stem from this code:
```
DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) {
return U.usesFirstLaneOnly(DefR);
});
```
since usesFirstLaneOnly returns false and we fail to replace uses of
DefR with LaneDefs[0]. Upon inspection the only VPUser objects that
return false are VPInstruction::FirstOrderRecurrenceSplice and
VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's
simply not possible for us to be using anything other than the first
lane. I've fixed this by bailing out of replicateByVF early for plans with
only a scalar VF.
Fixes https://github.com/llvm/llvm-project/issues/179671
ForceTargetInstructionCost in the legacy cost model overrides any costs
from InstsToScalarize. Match the behavior in the VPlan-based cost model.
This fixes a crash with -force-target-instr-cost for the added test
case.
PR: https://github.com/llvm/llvm-project/pull/168269
This weird pattern was introduced by LoopVectorize. But it was placed in
an unreachable path, so we cannot assert that the indices are always
valid in InstCombine.
Closes https://github.com/llvm/llvm-project/issues/180233.
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.
Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T
The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
Drop the custom shrinking code, which we'll also do for intrinsics.
Having libcall-only optimizations is confusing, as these are typically
directly emitted as intrinsics by the frontend.
If only of the operands is one-use, the total number of fpexts stays the
same, but the min/max is performed on a narrowed type. Additionally, the
fpext may fold with a following fptrunc.
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.
Fixes a crash after 0c4f8094939d2.
Update VPReplicateReicpe::computeCost to compute predicated load/store
costs directly, unless the pointer is uniform. In that case, the legacy
cost model uses a different logic, which will be migrated separately.
PR: https://github.com/llvm/llvm-project/pull/179129
With a sized dead_on_return, we need to not eliminate stores if there
are to a pointer with a variable offset from the underlying object
marked dead_on_return. This manifested as an assertion failure as
BaseValue/V ended up not being equal. It's possible we could do a range
analysis to try and prove the variable offset stays within bounds, but
this case seems to come up relatively rarely (only reproducible with a
UBSan build of LLVM) and is probably not worth the compile time.
Fixes#180361.
Fixes#171890
If the pointer operands of an instruction are all constants with generic
AS, we always infer the AS of the instruction as uninitialized finally.
And the rewrite process will skip cloning the instruction, producing
invalid IR.
This patch fixes it by inferring the AS of this kind of instruction as
flat. Maybe we can fold the operator with all constants to get better
performance, but I think this case is rare in the real world.
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder.
SCEVPtrToAddr is handled similarly to SCEVPtrToInt.
Fixes a crash with debug info after bd40d1de9c9ee, which started to
generate ptrtoaddr instead of ptrtoint expressions.
Previously, `SimplifyDemandedUseBits` for `add` instructions only
used known zeros from the RHS to simplify the LHS. It failed to
handle the symmetric case where the LHS has known zeros and the
result does not demand the low bits.
This patch implements this missing optimization, allowing the RHS
constant to be shrunk when the LHS low bits are known zero and unused.
Proof: https://alive2.llvm.org/ce/z/6v9iFY
Fixed: https://github.com/llvm/llvm-project/issues/135411
This reverts commit cb905605b2e95f88296afe136b21a7d2476cb058.
Recommit the patch with a small change to check the destination
type matches the address type, to avoid a crash on mismatch.
Original message:
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.
This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.
PR: https://github.com/llvm/llvm-project/pull/178727
https://github.com/llvm/llvm-project/pull/178333 updates the memprof
pass to annotate string literal section prefix.
The StaticDataProfileInfo.cpp provides an analysis pass to reconcile
global variable hotness. It's used by StaticDataAnnotator and AsmPrinter
to look up global variable hotness.
This PR updates the analysis pass to compute the hotness of string
literals.
* When both data access profiles and pgo counters provide a hotness
attribute, use the hotter one.
* Otherwise, use the hotness attribute that's available.
Implementation-wise, the option `AnnotateStringLiteralSectionPrefix` is
moved from MemProf (a transform pass) to StaticDataProfileInfo (an
Analysis pass). Otherwise, there might be errors like caught by CI. Note
https://github.com/llvm/llvm-project/pull/178336#issuecomment-3808537817
is an edited message, and its history shows the intermediate failures
like below. ~My understanding is~ Preliminary LLM study (:)) shows that
the error manifests in PowerPC but not X86 due to cmake variable
differences.
```
FAILED: unittests/Target/PowerPC/PowerPCTests
...
>>> referenced by CommandLine.h:1437 (/home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/include/llvm/Support/CommandLine.h:1437)
>>> StaticDataProfileInfo.cpp.o:(llvm::StaticDataProfileInfo::getConstantSectionPrefix(llvm::Constant const*, llvm::ProfileSummaryInfo const*) const) in archive lib/libLLVMAnalysis.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.
This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.
PR: https://github.com/llvm/llvm-project/pull/178727