1. Refactor for costs of sqrt/fabs
2. Add half type support for the cost model of sqrt/fabs
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132908
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.
Differential Revision: https://reviews.llvm.org/D140498
The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes
and the cost model of vp.ctlz/cttz.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140370
The current code did not properly handled duplicated FoldCacheUser ID
entries when overwriting an existing entry in the FoldCache.
This triggered verification failures reported by @uabelho and #59721.
The patch fixes that by removing stale IDs when overwriting an existing
entry in the cache.
Fixes#59721.
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:
1. The proposed spec presents a notion of "maximal convergence" that
captures the existing convention of converging threads at the
headers of natual loops.
2. Maximal convergence is then extended to irreducible cycles. The
identity of irreducible cycles is determined by the choices made
in a depth-first traversal of the control flow graph. Uniformity
analysis uses criteria that depend only on closed paths and not
cycles, to determine maximal convergence. This makes it a
conservative analysis that is independent of the effect of DFS on
CycleInfo.
3. The analysis is implemented as a template that can be
instantiated for both LLVM IR and Machine IR.
Validation:
- passes existing tests for divergence analysis
- passes new tests with irreducible control flow
- passes equivalent tests in MIR and GMIR
Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>
With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.
Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.
Differential Revision: https://reviews.llvm.org/D130746
When converting this test to opaque pointers (and dropping bitcast),
we get improved memory checks. Per fhahn:
> It looks like the difference is due to the logic that determines
> pointer strides in LAA not handling bitcasts. Without the
> bitcasts, the logic now triggers successfully.
Differential Revision: https://reviews.llvm.org/D140204
This patch updates propgatesPoison to take a Use as argument and
propagatesPoison now returns true if the passed in operand causes the
user to yield poison if the operand is poison
This allows propagating poison if the condition of a select is poison.
This helps improve results for programUndefinedIfUndefOrPoison.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D111643
This operand bundle on an assume informs alias analysis that the
arguments point to regions of memory that were allocated separately
(i.e. different heap allocations, different allocas, or different
globals).
As a safety measure, we leave the analysis flag-disabled by default.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136514
Opaque ptr types have a size in bits of 0. The legalised type is an i64 or
vector of i64s, which do have a size. Because of this difference in size, target
hook getMemoryOpCost modelled stores of ptr types as extending/truncating
load/stores. Now we just check for opaque ptr types and return the legalised
cost. This makes stores of pointers cheaper, and as a result we now SLP
vectorise the changed test case.
Differential Revision: https://reviews.llvm.org/D140193
The patch also adds expandVPCTPOP in TargetLowering to expand VP_CTPOP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139920
As the added codegen test coverage shows,
there isn't that much difference between AVX512DQI and
baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`,
but with just the Foundation we can use `vpternlogd`/`vptestmd`
to do the same.
BasicAA currently has an optional dependency on the PhiValues
analysis. However, at least with our current pipeline setup, we
never actually make use of it. It's possible that this used to work
with the legacy pass manager, but I'm not sure of that either.
Given that this analysis has not actually been in use for a long
time, and nobody noticed or complained, I think we should drop
support for it and focus on one code path. It is worth noting that
analysis quality for the non-PhiValues case has significantly
improved in the meantime.
If we really wanted to make use of PhiValues, the right way would
probably be to pass it in via AAQI in places we want to use it,
rather than using an optional pass manager dependency (which are
an unpredictable PITA and should really only ever be used for
analyses that are only preserved and not used).
Differential Revision: https://reviews.llvm.org/D139719
The CFL Steens/Anders alias analysis passes are not enabled by
default, and to the best of my knowledge have no pathway towards
ever being enabled by default. The last significant interest in
these passes seems to date back to 2016. Given the little
maintenance these have seen in recent times, I also have very
little confidence in the correctness of these passes. I don't
think we should keep these in-tree.
Differential Revision: https://reviews.llvm.org/D139703
When creating SCEV expressions for ZExt, there's quite a bit of
reasoning done and in many places the reasoning in turn will try to
create new SCEVs for other ZExts.
This can have a huge compile-time impact. The attached test from #58402
takes an excessive amount of compile time; without the patch, the test
doesn't complete in 1500+ seconds, but with the patch it completes in 1
second.
To speed up this case, cache created ZExt expressions for given (SCEV, Ty) pairs.
Caching just ZExts is relatively straight-forward, but it might make
sense to extend it to other expressions in the future.
This has a slight positive impact on CTMark:
* O3: -0.03%
* ReleaseThinLTO: -0.03%
* ReleaseLTO-g: 0.00%
https://llvm-compile-time-tracker.com/compare.php?from=bf9de7464946c65f488fe86ea61bfdecb8c654c1&to=5ac0108553992fb3d58bc27b1518e8cf06658a32&stat=instructions:u
The patch also improves compile-time for some internal real-world workloads
where time spent in SCEV goes from ~300 seconds to ~3 seconds.
There are a few cases where computing & caching the result earlier may
return more pessimistic results, but the compile-time savings seem to
outweigh that.
Fixes#58402.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D137505