This fold already exists for scalars via FAddCombine (and that's
why 2 of the tests are only changed cosmetically), but that code
misses vectors and has largely been replaced by simpler folds
over time, so this is another step towards removing it.
The tests with constant folding that produces poison
could potentially remove the select entirely:
https://alive2.llvm.org/ce/z/e-WUqF
...but this patch just removes the FMF-only limitation on
propagation.
This was a last-minute addition to D117249, and of course I ended
up inverting the condition in a way that caused an uninitialized
memory read.
I've dropped it entirely, as I don't think we actually care whether
the size is zero or not here. The previous code wasn't checking
this either.
The malloc to global transform currently determines the type of the
global by looking at bitcasts of the malloc. This is limited (the
transform fails if there are multiple different types) and
incompatible with opaque pointers.
My initial approach was to construct an appropriate struct type
based on usage in loads/stores. What this patch does instead is
to always create an [i8 x AllocSize] global, without trying to
guess types at all.
This does mean that other transforms that require a certain global
type may break. I fixed two of these in D117034 and D117223, which
I believe should be sufficient to avoid regressions. In particular,
the global SRA change should end up splitting the global into
naturally-typed sub-globals, at which point all other optimizations
should work.
Differential Revision: https://reviews.llvm.org/D117092
Currently global SRA uses the GEP structure to determine how to
split the global. This patch instead analyses the loads and stores
that are performed on the global, and collects which types are used
at which offset, and then splits the global according to those.
This is both more general, and works fine with opaque pointers.
This is also closer to how ordinary SROA is performed.
Differential Revision: https://reviews.llvm.org/D117223
canSkipDef() currently skips inaccessiblememonly calls, but not
if they are allocation functions. This check was added in D103009,
but actually seems to be a leftover from a previous implementation
in D101440. canSkipDef() is not used on the storeIsNoop() path,
where the relevant transform ended up being implemented.
Differential Revision: https://reviews.llvm.org/D117005
After d4a8fc3a87a1 LV stopped adding metadata to disable runtime
unrolling to the vectorized epilogue loop. This was missed because
278aa65cc495 removed the relevant test coverage.
This patch fixes that by adding the relevant metadata after
vector loop generation.
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
The empty() method is a footgun: It only checks whether there are
non-string attributes, which is not at all obvious from its name,
and of dubious usefulness. td_empty() is entirely unused.
Drop these methods in favor of hasAttributes(), which checks
whether there are any attributes, regardless of whether these are
string or enum attributes.
If function is not sanitized we must reset shadow, not copy.
Depends on D117285
Reviewed By: kda, eugenis
Differential Revision: https://reviews.llvm.org/D117286
This fixes a crash I observed in issue #48708 where the LSR
pass tries to insert an instruction in a basic block with only a
catchswitch statement in there. This happens because the Phi node
being evaluated assumes the same value for different basic blocks.
If the basic block associated with the incoming value of the operand
being evaluated has an EHPad terminator LSR skips optimizing it.
But if that incoming value can come from multiple different blocks
there can be some incoming basic blocks which are terminated in
an EHPad. If these are then rewritten in RewriteForPhi the ones
containing an EHPad terminator will hit the "Insertion point must
be a normal instruction" assert in AdjustInsertPositionForExpand.
This fix makes CollectLoopInvariantFixupsAndFormulae also ignore
cases where the same value has another incoming basic block with an
EHPad, same as it already does in case the primary value has one.
Patch by Lorenz Brun <lorenz@brun.one>
Differential Revision: https://reviews.llvm.org/D98378
If function has no sanitize_memory we still reset shadow for nested calls.
The first return from getShadow() correctly returned shadow for argument,
but it didn't reset shadow of byval pointee.
Depends on D117277
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D117278
It's NFC because shadow of pointer is clean so origins will not be
propagated anyway.
Depends on D117275
Reviewed By: kda, eugenis
Differential Revision: https://reviews.llvm.org/D117276
In the process of rewriting `alloca`s and `phi`s that use them, the SROA
pass can try to insert a non-PHI instruction by calling
`getFirstInsertionPt()`, which is not possible in a catchswitch BB. This
CL makes we bail out on these cases.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D117168
Currently loop interchange only supports loops with one inner loop
induction variable. This patch adds support for transformation with
more than one inner loop induction variables. The induction PHIs and
induction increment instructions are moved/duplicated properly to the
new outer header and the new outer latch, respectively.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D114917
This commit optimizes the code sequence:
icmp-XXX (ashr-exact (X, C_1), C_2).
Instcombine already implements this optimization for sgt, and this
patch adds support to additional predicates. The transformation is legal
for all predicates if the 'exact' flag is set, and to SGE, UGE, SLT, ULT
when the exact flag is not present.
This pattern is found in the std::vector bounds checks code of the at()
method.
Alive2 proof:
https://alive2.llvm.org/ce/z/JT_WL8
Differential Revision: https://reviews.llvm.org/D117252
After e734e8286b4b521d829aaddb6d1cbbd264953625, it is possible to end up in
a situation where an `indirectbr` is fed by a cast, which is in turn fed by
an operation which only produces integers.
`indirectbr` expects a block address, however these operations can't produce
that.
There were several asserts in `computeValueKnownInPredecessorsImpl` which check
that we're not looking for a block address if we're walking through something
which can never produce one.
Since it's now possible to hit these asserts, this changes them into actual
checks which return false if `Preference` is not `WantInteger`.
This adds a testcase which verifies that we don't crash anymore in these
situations.
Differential Revision: https://reviews.llvm.org/D99814
This reverts the revert commit 073c27b5e5851f13d99d383e047309299b68827d.
A reduced test case has been added in 5e4966cbae7ba5 and the code has
been updated to handle the case where getInductionOpcode returns
BinaryOpsEnd. In this case, the original code was always using
Instruction::Add. Do the same in the patch.
Note this commit may slightly change the value naming, because it now
also assigns the 'induction' name in the floating point case.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D117038
When masked scatter intrinsic does a uniform store to a destination
address from a source vector, and in this case, the mask is all one value.
This patch replaces the masked scatter with an extracted element of the
last lane of the source vector and stores it in the destination vector.
This patch also folds when the value in the masked scatter is a splat.
In this case, the mask cannot be all zero, and it folds to a scalar store
of the value in the destination pointer.
Differential Revision: https://reviews.llvm.org/D115724
Causes a crash with the following (creduce'd) test-case:
clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF
int *e;
int f;
int g() {
int h;
int *j = 0;
while (&f - j > 0) {
int k;
k = j;
if (e == j && *e)
k = 5;
h = k;
j++;
}
return h;
}
EOF
This reverts commit 7ce48be0fd83fb4fe3d0104f324bbbcfcc82983c.
This completes removal of the isXLike queries, and depends on a whole series of earlier patches which have already landed.
Differential Revision: https://reviews.llvm.org/D117242
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.
Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.
Differential Revision: https://reviews.llvm.org/D117241
This patch enables loop interchange with multiple outer loop
induction variables, and hence removes the limitation that only
a single outer loop induction variable is supported. In fact, it
turns out that the current pass already trivially supports multiple
outer indvars, which is the result of a previous patch
`https://reviews.llvm.org/D102743`. Therefore, this patch removed that
limitation and provides test cases for multiple outer indvars.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D114916
I strongly believe we need some variant of this.
The main problem is e.g. that the glibc's assert has 4 parameters,
but the profitability check is only okay with one extra phi node,
so D116692 doesn't even trigger on most of the expected cases.
While that restriction probably makes sense in normal code, if we
are about to run off of a cliff (into an `unreachable`), this
successor block is unlikely so the cost to setup these PHI nodes
should not be on the hotpath, and shouldn't matter performance-wise.
Likewise, we don't sink if there are unconditional predecessors
UNLESS we'd sink at least one non-speculatable instruction,
which is a performance workaround, but if we are about to run into
`unreachable`, it shouldn't matter.
Note that we only allow the case where there are at
most unconditiona branches on the way to the unreachable block.
Differential Revision: https://reviews.llvm.org/D117045
The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement. Instead, just directly check the allocation requirement.
Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values. The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well.
This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled.
Inspired by discussion on D116971
This patch adds a new BranchOnCount VPInstruction opcode with 2
operands. It first compares its 2 operands (increment of canonical
induction and vector trip count), followed by a branch to either the
exit block or back to the vector header.
It must be the last recipe in the exit block of the topmost vector loop
region.
This extracts parts from D113224 and was discussed in D113223.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D116479
This is required to query the legality more precisely in the LoopVectorizer.
This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.
Differential Revision: https://reviews.llvm.org/D115329
Similar to memset, memset_pattern{4,8,16} all will return and do not
unwind. Use fallthrough to include all attributes also set for memset.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D114904