This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
If looking for a miscompile revert candidate, look here!
The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV. This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.
On RISC-V which has a large number of scalar registers, this is
generally a profitable transform. We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body. For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable. On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.
Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq. We also add instructions
in the preheader. Net result is a slight regression overall, but
neutral or better inside the loop.
Previous versions of this transform had numerous cornercase correctness
bugs. All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking. Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.
This patch is a reworked version of D134893 by @eop. That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
This patch fixes a miscompile in LSR caused by incorrect inference of
NUW flag for AddRec: we shouldn't infer no-wrap flags based on a
comparison which doesn't fully control the loop exit.
The old algorithm would remove all operands matching %step SCEV when it
intended to only remove a single one. This lead to assert when
SCEVAddExpr was of the form %step + %step and potential miscompiles in
similar cases. Such SCEVs could be created when construction reached
depth thresholds.
Fixes#70348
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
%x = shl i64 %w, n
%y = add i64 %x, c
%z = ashr i64 %y, m
The above given instruction triplet is seen many times in the generated
LLVM IR, but SCEV model is not able to compute the SCEV value of AShr
instruction in this case.
This patch models the two cases of the above instruction pattern using
the following expression:
=> sext(add(mul(trunc(w), 2^(n-m)), c >> m))
1) when n = m the expression reduces to sext(add(trunc(w), c >> n))
as n-m=0, and multiplying with 2^0 gives the same result.
2) when n > m the expression works as given above.
It also adds several unittest to verify that SCEV is able to compute
the value.
$ opt sext-add-inreg.ll -passes="print<scalar-evolution>"
Comparing the snippets of the result of SCEV analysis:
* SCEV of ashr before change
----------------------------
%idxprom = ashr exact i64 %sext, 32
--> %idxprom U: [-2147483648,2147483648) S: [-2147483648,2147483648)
Exits: 8 LoopDispositions: { %for.body: Variant }
* SCEV of ashr after change
---------------------------
%idxprom = ashr exact i64 %sext, 32
--> {0,+,1}<nuw><nsw><%for.body> U: [0,9) S: [0,9)
Exits: 8 LoopDispositions: { %for.body: Computable }
LoopDisposition of the given SCEV was LoopVariant before, after adding
the new way to model the instruction, the LoopDisposition becomes
LoopComputable as it is able to compute the SCEV of the instruction.
Differential Revision: https://reviews.llvm.org/D152278
Refresh of the generic scheduling model to use A510 instead of A55.
Main benefits are to the little core, and introducing SVE scheduling information.
Changes tested on various OoO cores, no performance degradation is seen.
Differential Revision: https://reviews.llvm.org/D156799
In CollectLoopInvariantFixupsAndFormulae(), LSR looks at users
outside the loop. E.g. if we have an addrec based on %base, and
%base is also used outside the loop, then we have to keep it in a
register anyway, which may make it more profitable to use
%base + %idx style addressing.
This reasoning doesn't hold up when the base is a constant, because
the constant can be rematerialized. The lsr-memcpy.ll test regressed
when enabling opaque pointers, because inttoptr (i64 6442450944 to ptr)
now also has a use outside the loop (previously it didn't due to a
pointer type difference), and that extra "use" results in worse use
of addressing modes in the loop. However, the use outside the loop
actually gets rematerialized, so the alleged register saving does
not occur.
The same reasoning also applies to other types of constants, such
as global variable references.
Differential Revision: https://reviews.llvm.org/D155073
Instead of checking the pointer type, check the element type of
the GEP.
Previously we ended up reusing GEP increments that were not in
expanded form, thus not respecting LSRs choice of representation.
The change in 2011-10-06-ReusePhi.ll recovers a regression that
appeared when converting that test to opaque pointers.
Changes in various Thumb tests now compute the step outside the
loop instead of using add.w inside the loop, which is LSR's
preferred representation for this target.
When normalizing a SCEV expression during expansion, there should be
no need for it to be invertible, as it will only be used for code
generation. This fixes a crash after 7f5b15ad150e.
Fixes https://github.com/llvm/llvm-project/issues/63678.
Move the logic added in 3a57152d85e1 to normalizeForPostIncUse to catch
additional un-invertable cases. This fixes another mis-compile pointed
out by @peixin in D153004.
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
getExpr is missing a check to make sure the result is invertible.
This can lead to incorrect results, so return nullptr in those cases
like in other places in IVUsers.
Fixes#62660.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D153202
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.
The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
This reverts commit abfeda5af329b5889db709ff74506e20e0b569e9.
and fe19036e1266d2a90b44725c82b898134906e4c3.
The added assertion triggers during clang bootstrap builds. Revert while
I investigate.
GenerateTruncates at the moment creates extends/truncates for post-inc
uses of normalized expressions. For example, if an add rec of the form
{1,+,-1} is used outside the loop, the normalized form will use {1,+,-1}
instead of {0,+,-1}. When naively sign-extending the normalized
expression, it will get extended incorrectly to {1,+,-1} for the wider
type, if the backedge-taken count of the loop is 1.
To address this, the patch updates GenerateTruncates to check if the
LSRUse contains any fixups with PostIncLoops. If that's the case, first
de-normalize the expression, then perform the extend/truncate, then
normalize again.
There may be other places where similar checks are needed and the helper
can be generalized for those cases. I'd not be surprised if other subtle
mis-compiles are caused by this.
Fixes#38847.
Fixes#58039.
Fixes#62852.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153004
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with
ThinLTO: https://crbug.com/1443635