Only tests in llvm/test/Analysis.
-analyze is legacy PM-specific.
This only touches files with `-passes`.
I looked through everything and made sure that everything had a new PM equivalent.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D109040
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.
The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.
Differential Revision: https://reviews.llvm.org/D109093
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e39fe6601f9c95a95a6bc242ef648cd5.
This reverts commit 0a2b1ba33ae6dcaedb81417f7c4cc714f72a5968.
This reverts commit d9873711cb03ac7aedcaadcba42f82c66e962e6e.
This reverts commit 791006fb8c6fff4f33c33cb513a96b1d3f94c767.
This reverts commit c22b64ef66f7518abb6f022fcdfd86d16c764caf.
This reverts commit 72ebcd3198327da12804305bda13d9b7088772a8.
This reverts commit 5fa6039a5fc1b6392a3c9a3326a76604e0cb1001.
This reverts commit 9efda541bfbd145de90f7db38d935db6246dc45a.
This reverts commit 94d3ff09cfa8d7aecf480e54da9a5334e262e76b.
Several FP instructions (fadd, fsub, etc.) were incorrectly assigned
a higher cost for SVE because they have custom lowering, however we
know they are legal. This patch explicitly assigns a cost of 2 to
these opcodes.
Tests added here:
Analysis/CostModel/AArch64/arith-fp-sve.ll
Differential Revision: https://reviews.llvm.org/D108993
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.
The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll
Differential Revision: https://reviews.llvm.org/D109015
This patch is specifically the howManyLessThan case. There will be a couple of followon patches for other codepaths.
The subtle bit is explaining why the two codepaths have a difference while both are correct. The test case with modifications is a good example, so let's discuss in terms of it.
* The previous exact bounds for this example of (-126 + (126 smax %n))<nsw> can evaluate to either 0 or 1. Both are "correct" results, but only one of them results in a well defined loop. If %n were 127 (the only possible value producing a trip count of 1), then the loop must execute undefined behavior. As a result, we can ignore the TC computed when %n is 127. All other values produce 0.
* The max taken count computation uses the limit (i.e. the maximum value END can be without resulting in UB) to restrict the bound computation. As a result, it returns 0 which is also correct.
WARNING: The logic above only holds for a single exit loop. The current logic for max trip count would be incorrect for multiple exit loops, except that we never call computeMaxBECountForLT except when we can prove either a) no overflow occurs in this IV before exit, or b) this is the sole exit.
An alternate approach here would be to add the limit logic to the symbolic path. I haven't played with this extensively, but I'm hesitant because a) the term is optional and b) I'm not sure it'll reliably simplify away. As such, the resulting code quality from expansion might actually get worse.
This was noticed while trying to figure out why D108848 wasn't NFC, but is otherwise standalone.
Differential Revision: https://reviews.llvm.org/D108921
And add a test case to illustrate that we do in fact produce the right result for the multiple exit case. I have gotten myself confused at least three times when reading this code, so clarify to prevent future confusion.
Tell the cost model to use the scalable calculation for non-neon fixed vector.
This results in a cheaper cost for fixed-length SVE masked gathers/scatters
allowing the vectorizor to emit them more frequently.
This was previously committed in 914836b, and reverted due to confusion on the status of the review.
Differential Revision: https://reviews.llvm.org/D108601
If we no an addrec doesn't self-wrap, the increment is strictly positive, and the start value is the smallest representable value, then we know that the corresponding wrap type can not occur.
Differential Revision: https://reviews.llvm.org/D108601
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.
Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.
The opt -passes syntax is changed for the following passes:
early-cse-memssa => early-cse<memssa>
post-inline-ee-instrument => ee-instrument<post-inline>
loop-extract-single => loop-extract<single>
lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>
This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.
Differential Revision: https://reviews.llvm.org/D108362
According to the langref, it is valid to have multiple consecutive
lifetime start or end intrinsics on the same object.
For llvm.lifetime.start:
"If ptr [...] is a stack object that is already alive, it simply
fills all bytes of the object with poison."
For llvm.lifetime.end:
"Calling llvm.lifetime.end on an already dead alloca is no-op."
However, we currently fail an assertion in such cases. I've observed
the assertion failure when the loop vectorization pass duplicates
the intrinsic.
We can conservatively handle these intrinsics by ignoring all but
the first one, which can be implemented by removing the assertions.
Differential Revision: https://reviews.llvm.org/D108337
MSSA-based LICM has been enabled by default for a few years now.
This drops the old AST-based implementation. Using loop(licm) will
result in a fatal error, the use of loop-mssa(licm) is required
(or just licm, which defaults to loop-mssa).
Note that the core canSinkOrHoistInst() logic has to retain AST
support for now, because it is shared with LoopSink.
Differential Revision: https://reviews.llvm.org/D108244
For tight loops like this:
float r = 0;
for (int i = 0; i < n; i++) {
r += a[i];
}
it's better not to vectorise at -O3 using fixed-width ordered reductions
on AArch64 targets. Although the resulting number of instructions in the
generated code ends up being comparable to not vectorising at all, there
may be additional costs on some CPUs, for example perhaps the scheduling
is worse. It makes sense to deter vectorisation in tight loops.
Differential Revision: https://reviews.llvm.org/D108292
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D106277
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.
The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.
Differential Revision: https://reviews.llvm.org/D108075
This is enabled by default. Drop explicit uses in preparation for
removing the option.
Also drop RUN lines that are now the same (typically modulo a
-verify-memoryssa option).
This reverts the revert 28c04794df74ad3c38155a244729d1f8d57b9400.
The failing MLIR test that caused the revert should be fixed in this
version.
Also includes a PPC test fix previously in 1f87c7c478a6.
a1ef81de35a4bac6d3 adjusted the definition of the intrinsic, but did not
update a PowerPC test. Fix the test by updating the call & declaration
of @llvm.matrix.column.major.load.
D105263 introduced this new test. It fails when asserts are disabled,
due to using a debug option on opt.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D107805
The MemorySSA-based implementation has been enabled for a few months
(since D94376). This patch drops the old MDA-based implementation
entirely.
I've kept this to only the basic cleanup of dropping various
conditions -- the code could be further cleaned up now that there
is only one implementation.
Differential Revision: https://reviews.llvm.org/D102113
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D107618
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.
In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.
Differential Revision: https://reviews.llvm.org/D106239
Function exploreDirections() in DependenceAnalysis implements a recursive
algorithm for refining direction vectors. This algorithm has worst-case
complexity of O(3^(n+1)) where n is the number of common loop levels.
In this patch I'm adding a threshold to control the amount of time we
spend in doing MIV tests (which most of the time end up resulting in over
pessimistic direction vectors anyway).
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D107159
As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost.
NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm.
Differential Revision: https://reviews.llvm.org/D107228
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188
This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.
The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.
The widening case will be addressed in a follow up patch that treats the cost as 0.
Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.
Differential Revision: https://reviews.llvm.org/D107228
This expands the cost model test for min/max to many more types,
including floating point minnum/maxnum and minimum/maximum, and FP16
with and without fullfp16. The old llc run lines are removed, as those
are better tested by CodeGen tests.