Much of foldLoadsRecursive relies on knowing the size of loaded
data, which is not possible for scalable vector types. However,
the logic of combining two small loads into one bigger load does
not apply for vector types so rather than converting the algorithm
to use TypeSize I've simply added an early exit for vectors.
Fixes#59510
Differential Revision: https://reviews.llvm.org/D140106
This patch updates the load insert point of the merged load in AggressiveInstCombine().
This is done to handle the reported test breaks by handling Alias Analysis correctly.
Differential Revision: https://reviews.llvm.org/D137201
This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads.
This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases.
Differential Revision: https://reviews.llvm.org/D137333
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.
Differential Revision: https://reviews.llvm.org/D135137
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Fix the error reported on reverse load merge.
Differential Revision: https://reviews.llvm.org/D127392
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Differential Revision: https://reviews.llvm.org/D127392
The bug reported on the [0] has been fixed.
The issue was we have not checked if the global variables that
represent cttz tables was constant.
There is a new negative test added in negative-lower-table-based-cttz.ll
that represents this.
[0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b
This reverts commit 053841c5624ca7eacd108a26071d8a1cefe1bebd.
We faced a use-after-free after pushing the D113291, since the
foldSqrt() has a call to eraseFromParent(). The function
should be at the end of the main loop that folds the patterns.
This patch fixes that.
This is an alternate to D129155 that uses TTI.haveFastSqrt() to avoid a
potential miscompile for programs with reads of errno. Moving the transform
to AggressiveInstCombine provides access to TTI.
If a sqrt call has "nnan", that implies that the input argument is never
negative because sqrt of {negative number} --> NAN.
If the argument is never negative and the call can be lowered without a
libcall, then we can assume that errno accesses are unchanged after lowering,
so the call can be translated to the LLVM intrinsic (which is expected to
become inline code).
This affects codegen for targets like x86 that have sqrt instructions, but
still have to conservatively assume that a libcall may be needed to set
errno as shown in issue #52620 and issue #56383.
This patch won't solve those examples - we will need to extend this to use
CannotBeOrderedLessThanZero or similar, enhance that analysis for new
operators, and/or deal with llvm.assume too.
Differential Revision: https://reviews.llvm.org/D129167
This adds a fold for aggressive instcombine that converts
smin(smax(fptosi(x))) into a llvm.fptosi.sat, providing that the
saturation constants are correct and the cost of the llvm.fptosi.sat is
lower.
Unfortunately, a llvm.fptosi.sat cannot always be converted back to a
smin/smax/fptosi. The llvm.fptosi.sat intrinsic is more defined that the
original, which produces poison if the original fptosi was out of range.
The llvm.fptosi.sat will saturate any value, so needs to be expanded to
a fptosi(fpmin(fpmax(x))), which can be worse for codegeneration
depending on the target.
So this change thais conditional on the backend reporting that the
llvm.fptosi.sat is cheaper that the original smin+smax+fptost. This is
a change to the way that AggressiveInstrcombine has worked in the past.
Instead of just being a canonicalization pass, that canonicalization can
be dependant on the target in certain specific cases.
Differential Revision: https://reviews.llvm.org/D125755
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.
Differential Revision: https://reviews.llvm.org/D125299
Expand `TruncInstCombine` to handle loops by adding `phi` nodes
to expression graph.
Reviewed by: RKSimon, lebedev.ri
(recommit of fixed f84d732f, reverted by 8ad6d5e after sanitizer breakage)
Differential Revision: https://reviews.llvm.org/D109817
This updates transform test cases for
ADCE
AddDiscriminators
AggressiveInstCombine
AlignmentFromAssumptions
ArgumentPromotion
BDCE
CalledValuePropagation
DCE
Reg2Mem
WholeProgramDevirt
to use the -passes syntax when specifying the pipeline.
Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is
a deprecated feature) the updated test cases already used the new
pass manager, but they were using the legacy syntax when specifying
the passes to run. This patch can be seen as a step toward deprecating
that interface.
This patch also removes some redundant RUN lines. Here I am
referring to test cases that had multiple RUN lines verifying both
the legacy "-passname" syntax and the new "-passes=passname" syntax.
Since we switched the default pass manager to "new PM" both RUN lines
have verified the new PM version of the pass (more or less wasting
time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER
is set to "off". It is assumed that it is enough to run these tests
with the new pass manager now.
Differential Revision: https://reviews.llvm.org/D108472
Alive2 for `{insert/extract}element`: https://alive2.llvm.org/ce/z/hwy_E-
Actually, no one file of test suite is touched by this change,
which means that is rare pattern not generated by frontend. But
it's worth being in place.
Differential Revision: https://reviews.llvm.org/D109236
Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing these
instructions. It is sufficient to require that all truncated bits of both
operands are zeros: https://alive2.llvm.org/ce/z/yiithn
(`urem` case is identical).
Differential Revision: https://reviews.llvm.org/D109515
Add `ashr` instruction to the DAG post-dominated by `trunc`, allowing
`TruncInstCombine` to reduce bitwidth of expressions containing
these instructions.
We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are sign bits (all zeros or ones) and
one sign bit is left untruncated: https://alive2.llvm.org/ce/z/Ajo2__
Part of https://reviews.llvm.org/D107766
Differential Revision: https://reviews.llvm.org/D108355