30532 Commits

Author SHA1 Message Date
Vitaly Buka
fc201d6133
Revert "[InstCombine] Support gep nuw in icmp folds" (#118698)
Reverts llvm/llvm-project#118472

Breaks profile tests on i386
https://lab.llvm.org/buildbot/#/builders/66/builds/7009
2024-12-04 15:07:27 -08:00
Pedro Lobo
0d1e762da7
[InstSimplify] Refine abs(min/undef, true) to poison (#118669)
Calls to `@llvm.abs(undef, i1 true)` and `@llvm.abs(INT_MIN, i1 true)`
can be optimized to `poison` instead of `undef`.

[Alive2](https://alive2.llvm.org/ce/z/Hg-2ug)
2024-12-04 18:41:05 +00:00
Simon Pilgrim
85d15bd130
[TTI][X86] getMemoryOpCost - reduced costs when loading uniform values due to value reuse (#118642)
Similar to what we do for broadcast shuffles, when legalising load costs, if the value is known to be uniform, then we will only load a single vector and reuse this across the split legalised registers.

Fixes #111126
2024-12-04 16:36:00 +00:00
Nikita Popov
4a7abfe0a7 [InstCombine] Preserve nuw in OptimizePointerDifference
If both the geps and the subs are nuw the new sub is also nuw.

Proof: https://alive2.llvm.org/ce/z/mM8UvF
2024-12-04 16:58:35 +01:00
Nikita Popov
a608607fd7
[ConstraintElim] Add support for decomposing gep nuw (#118639)
ConstraintElimination currently only supports decomposing gep nusw with
non-negative indices (with "non-negative" possibly being enforced via
pre-condition).

Add support for gep nuw, which directly gives us the necessary
guarantees for the decomposition.
2024-12-04 16:27:31 +01:00
Florian Hahn
7b6e0d9fc3
[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282)
ValueMap automatically updates entries with the new value if they have
been RAUW. This can lead to instructions that are expected to not have
shape info to be added to the map (e.g. shufflevector as in the added
test case).

This leads to incorrect results. Originally it was used for transpose
optimizations, but they now all use updateShapeAndReplaceAllUsesWith,
which takes care of updating the shape info as needed.

This fixes a crash in the newly added test cases.

PR: https://github.com/llvm/llvm-project/pull/118282
2024-12-04 14:51:31 +00:00
Paul Walker
a88653a2cd
[LLVM][IR] When evaluating GEP offsets don't assume ConstantInt is a scalar. (#117162) 2024-12-04 12:45:30 +00:00
Simon Pilgrim
140df02aa2 [SLP][X86] Update test coverage for #111126
I'd copied the test case from #118016 instead of the original #111126 test case
2024-12-04 12:28:55 +00:00
Nikita Popov
75af62839b [ConstraintElim] Add tests for gep nuw (NFC) 2024-12-04 13:17:02 +01:00
John Brawn
ecbe4d1e36
[IR] Allow fast math flags on fptrunc and fpext (#115894)
This consists of:
 * Make these instructions part of FPMathOperator.
* Adjust bitcode/ir readers/writers to expect fast math flags on these
instructions.
 * Make IRBuilder set the fast math flags on these instructions.
 * Update langref and release notes.
* Update a bunch of tests. Some of these are due to InstCombineCasts
incorrectly adding fast math flags to fptrunc, which will be fixed in a
later patch.
2024-12-04 10:53:04 +00:00
Simon Pilgrim
2202f0e093 [SLP][X86] Add test coverage for #111126
This needs to be expanded to a wider range of tests but for now just focus on #111126
2024-12-04 10:03:43 +00:00
Antonio Frighetto
f68b0e3699 [AggressiveInstCombine] Use APInt and avoid truncation when folding loads
A miscompilation issue has been addressed with improved handling.

Fixes: https://github.com/llvm/llvm-project/issues/118467.
2024-12-04 10:20:14 +01:00
ronryvchin
ff281f7d37
[PGO] Add option to always instrumenting loop entries (#116789)
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.

Note that change does NOT change the default behavior.
2024-12-04 07:56:46 +01:00
Owen Anderson
14a259f85b
GlobalOpt: Use the correct address space when creating a "*.init" global. (#118562) 2024-12-04 14:01:16 +13:00
Florian Hahn
45ff28746f
[ConstraintSystem] Fix signed overflow in negate.
Use AddOverflow for potentially overflowing addition to fixed signed
integer overflow.

Compile-time impact is in the noise
https://llvm-compile-time-tracker.com/compare.php?from=bfb26202e05ee2932b4368b5fca607df01e8247f&to=195b0707148b567c674235e59712458e7ce1bb0e&stat=instructions:u
2024-12-03 21:06:36 +00:00
Lee Wei
9bf6365237
[llvm] Remove br i1 undef from some regression tests [NFC] (#118419)
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
2024-12-03 20:54:36 +00:00
Igor Kirillov
af31aa4455 [LV] Pre-commit tests for fixed width VF fully unrolled loop cost model change 2024-12-03 16:47:52 +00:00
Dominik Steenken
866b9f43a0
[SystemZ] Add realistic cost estimates for vector reduction intrinsics (#118319)
This PR adds more realistic cost estimates for these reduction
intrinsics

- `llvm.vector.reduce.umax`
- `llvm.vector.reduce.umin`
- `llvm.vector.reduce.smax`
- `llvm.vector.reduce.smin`
- `llvm.vector.reduce.fadd`
- `llvm.vector.reduce.fmul`
- `llvm.vector.reduce.fmax`
- `llvm.vector.reduce.fmin`
- `llvm.vector.reduce.fmaximum`
- `llvm.vector.reduce.fminimum`
- `llvm.vector.reduce.mul
`
The pre-existing cost estimates for `llvm.vector.reduce.add` are moved
to `getArithmeticReductionCosts` to reduce complexity in
`getVectorIntrinsicInstrCost` and enable other passes, like the SLP
vectorizer, to benefit from these updated calculations.

These are not expected to provide noticable performance improvements and
are rather provided for the sake of completeness and correctness. This
PR is in draft mode pending benchmark confirmation of this.

This also provides and/or updates cost tests for all of these
intrinsics.

This PR was co-authored by me and @JonPsson1 .
2024-12-03 17:08:51 +01:00
Nikita Popov
10223c72a9 [ConstraintElim] Use nusw flag for GEP decomposition
Check for nusw instead of inbounds when decomposing GEPs.

In this particular case, we can also look through multiple nusw
flags, because we will ultimately be working in the unsigned
constraint system.
2024-12-03 15:56:29 +01:00
Nikita Popov
f33536468b
[InstCombine] Support gep nuw in icmp folds (#118472)
Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned
icmp of gep nusw nuw folds to unsigned samesign icmp of offsets.

Proofs: https://alive2.llvm.org/ce/z/VEwQY8
2024-12-03 14:28:56 +01:00
David Sherwood
8075445613
[LoopVectorize] Add tests for dereferenceable loads in more loops (#118470)
* Adds tests for strided accesses.
* Adds tests for reverse loops.

As part of this I've moved one of the negative tests from
load-deref-pred-align.ll into a new file
(load-deref-pred-neg-off.ll) because the pointer type had a
size of 16 bits and I realised it's probably not sensible for
allocas that are >16 bits in size!
2024-12-03 12:41:30 +00:00
Nikita Popov
bdc6faf775 [InstCombine] Support nusw in icmp of two geps with same base
Proof: https://alive2.llvm.org/ce/z/BYNQ7s
2024-12-03 11:51:14 +01:00
Nikita Popov
9c5a84b394 [InstCombine] Support nusw in icmp of gep with base
Proof: https://alive2.llvm.org/ce/z/omnQXt
2024-12-03 11:51:14 +01:00
Ramkumar Ramachandra
bfb26202e0
LV/test: clean up a test and regen with UTC (#118394) 2024-12-03 09:46:19 +00:00
Nikita Popov
5b0f4f2cb0
[BasicAA] Treat returns_twice functions as clobbering unescaped objects (#117902)
Effectively this models all the accesses that occur between the first
and second return as happening at the point of the call.

Fixes https://github.com/llvm/llvm-project/issues/116668.
2024-12-03 09:55:12 +01:00
Antonio Frighetto
1d6ab189be [MemCpyOpt] Drop dead memmove calls on memset'd source data
When a memmove happens to clobber source data, and such data have
been previously memset'd, the memmove may be redundant.
2024-12-03 09:50:57 +01:00
Antonio Frighetto
e30d304d72 [MemCpyOpt] Introduce test for PR101930 (NFC) 2024-12-03 09:50:56 +01:00
Yingwei Zheng
c1ad064dd3
[InstCombine] Fold icmp spred (and X, highmask), C1 into icmp spred X, C2 (#118197)
Alive2: https://alive2.llvm.org/ce/z/Ffg64g
Closes https://github.com/llvm/llvm-project/issues/104772.
2024-12-03 16:19:12 +08:00
Rajat Bajpai
de415fbb45
[InstCombine][FP] Fix nnan preservation for transform fcmp + sel => fmax/fmin (#117977)
Preserve `nnan` constraint only if present on both `fcmp` and `select`.

Alive2: https://alive2.llvm.org/ce/z/ZNDjzt
2024-12-03 14:01:36 +08:00
Yingwei Zheng
295d6b18f7
[InstCombine] Fold (X * (Y << K)) u>> K -> X * Y when highbits are not demanded (#111151)
Alive2: https://alive2.llvm.org/ce/z/Z7QgjH
2024-12-03 12:04:04 +08:00
Han-Kuan Chen
f71ea4bc1b
[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260) 2024-12-03 09:04:04 +08:00
Matt Arsenault
681bd84563
AMDGPU: Add baseline test for lane index simplification (#117962) 2024-12-02 14:50:02 -05:00
Florian Hahn
21d27b3aab
[LoopUnroll] Add tests for loop unrolling on Apple platforms.
Add first set of tests where runtime unrolling can be highly beneficial
on Apple Silicon CPUs.
2024-12-02 15:48:48 +00:00
Yingwei Zheng
16ec534989
[ValueTracking] Handle and/or of conditions in computeKnownFPClassFromContext (#118257)
Fix a typo introduced by
https://github.com/llvm/llvm-project/pull/83161.
This patch also supports decomposition of and/or expressions in
`computeKnownFPClassFromContext`.
Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=688bb432c4b618de69a1d0e7807077a22f15762a&to=07493fc354b686f0aca79d6f817091a757bd7cd5&stat=instructions:u
2024-12-02 21:00:55 +08:00
Florian Hahn
f5bc6b47e8
[PhaseOrdering] Remove -enable-matrix flag from sub-xor.ll test.
The test does not use matrix intrinsics, so does not need enable-matrix.
2024-12-02 11:43:43 +00:00
Nikita Popov
7a7a426188 [LVI] Fix insertelement of constexpr
Bail out when evaluating an insertelement of a constant expression.
Unlike other ValueLattice kinds, these don't have implicit splat
semantics and we end up with type mismatches. If we actually wanted
to handle these, we should actually evaluate the insertion via
constant folding. I'm not bothering with that, as these should
get constant folded on construction already.
2024-12-02 10:21:09 +01:00
Nikita Popov
8201926ec0
[InstSimplify] Generalize simplification of icmps with monotonic operands (#69471)
InstSimplify currently folds patterns like `(x | y) uge x` and `(x & y)
ule x` to true. However, it cannot handle combinations of such
situations, such as `(x | y) uge (x & z)` etc.

To support this, recursively collect operands of monotonic instructions
(that preserve either a greater-or-equal or less-or-equal relationship)
and then check whether any of them match.

Fixes https://github.com/llvm/llvm-project/issues/69333.
2024-12-02 09:53:10 +01:00
Nikita Popov
7bbc049688
[InstCombine] Consolidate another fold into select value equivalence (#117746)
We had a separate fold that handled just the trivial case where we're
replacing exactly the argument of the select. Handle this in select
value equivalence by relaxing the infinite loop protection to allow a
replacement of a non-constant with a constant.

This also fixes https://github.com/llvm/llvm-project/issues/113301, as
the separate fold did not handle undef values correctly.
2024-12-02 09:45:39 +01:00
Veera
979a0356d4
[InstCombine] Fold X Pred C2 ? X BOp C1 : C2 BOp C1 to min/max(X, C2) BOp C1 (#116888)
Fixes #82414.

General Proof: https://alive2.llvm.org/ce/z/ERjNs4 
Proof for Tests: https://alive2.llvm.org/ce/z/K-934G

This PR transforms `select` instructions of the form `select (Cmp X C1)
(BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`.

This helps in eliminating a noop loop in
https://github.com/rust-lang/rust/issues/123845 but does not improve
optimizations.
2024-12-02 09:33:45 +01:00
Yingwei Zheng
1a3eace82a
[InstCombine] Fold umax(X, C) + -C into usub.sat(X, C) (#118195)
Alive2: https://alive2.llvm.org/ce/z/oSWe5S
Closes https://github.com/llvm/llvm-project/issues/118155
2024-12-01 23:29:40 +08:00
Yingwei Zheng
f7ef0721d6
[SCEV] Do not allow refinement in the rewriting of BEValue (#117152)
See the following case:
```
; bin/opt -passes="print<scalar-evolution>" test.ll --disable-output
define i32 @widget() {
b:
  br label %b1

b1:                                              ; preds = %b5, %b
  %phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ]
  %phi2 = phi i32 [ 1, %b ], [ %add, %b5 ]
  %icmp = icmp eq i32 %phi, 0
  br i1 %icmp, label %b3, label %b8

b3:                                              ; preds = %b1
  %udiv = udiv i32 10, %phi2
  %urem = urem i32 %udiv, 10
  %icmp4 = icmp eq i32 %urem, 0
  br i1 %icmp4, label %b7, label %b5

b5:                                              ; preds = %b3
  %udiv6 = udiv i32 %phi2, 0
  %add = add i32 %phi2, 1
  br label %b1

b7:                                              ; preds = %b3
  ret i32 5

b8:                                              ; preds = %b1
  ret i32 7
}
```
```
%phi2 = phi i32 [ 1, %b ], [ %add, %b5 ] -->  {1,+,1}<nuw><nsw><%b1>
%udiv6 = udiv i32 %phi2, 0 --> ({1,+,1}<nuw><nsw><%b1> /u 0)
%phi = phi i32 [ 0, %b ], [ %udiv6, %b5 ] --> ({0,+,1}<nuw><nsw><%b1> /u 0)
```
`ScalarEvolution::createAddRecFromPHI` gives a wrong SCEV result for
`%phi`:

d7d6fb1804/llvm/lib/Analysis/ScalarEvolution.cpp (L5926-L5950)
It converts `phi(0, ({1,+,1}<nuw><nsw><%b1> /u 0))` into `phi(0 / 0,
({1,+,1}<nuw><nsw><%b1> /u 0))`. Then it simplifies the expr into
`{0,+,1}<nuw><nsw><%b1> /u 0`.

As we did in
acd700a24b,
this patch disallows udiv simplification if we cannot prove that the
denominator is a well-defined non-zero value.

Fixes https://github.com/llvm/llvm-project/issues/117133.
2024-12-01 20:11:09 +08:00
Simon Pilgrim
94df95de6b
[TTI][X86] getShuffleCosts - for SK_PermuteTwoSrc, if the masks are known to be "inlane" no need to scale the costs by worst-case legalization (#117999)
SK_PermuteTwoSrc legalization has to assume any of the legalised source registers could be referenced in split shuffles, but if we already know that each 128-bit lane only references elements from the same lane of the source operands, then this scaling won't occur.

Hopefully this can help with #113356 without us having to get full processShuffleMasks canonicalization finished first.
2024-12-01 12:01:47 +00:00
Yingwei Zheng
6568ceb9fa
[CodeGenPrepare] Drop nsw flags in optimizeLoadExt (#118180)
Alive2: https://alive2.llvm.org/ce/z/pMcD7q
Closes https://github.com/llvm/llvm-project/issues/118172.
2024-12-01 11:25:31 +08:00
Jonas Paulsson
0ad6be1927
[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491)
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to reflect this.

getScalarizationOverhead() gets an optional parameter which can hold the actual
Values so that they in turn can be passed (by BasicTTIImpl) to
getVectorInstrCost().

SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and
typically return a 0 cost for it, with some exceptions.
2024-11-29 21:19:45 +01:00
Marina Taylor
8fb748b4a7
[Inliner] Don't count a call penalty for foldable __memcpy_chk and similar (#117876)
When the size is an appropriate constant, __memcpy_chk will turn into a
memcpy that gets folded away by InstCombine. Therefore this patch avoids
counting these as calls for purposes of inlining costs.

This is only really relevant on platforms whose headers redirect memcpy
to __memcpy_chk (such as Darwin). On platforms that use intrinsics,
memcpy and similar functions are already exempt from call penalties.
2024-11-29 18:28:39 +00:00
David Green
fe04290482
[AArch64] Change the default vscale-for-tuning to 1. (#117174)
Most AArch64 cpus outside of Neoverse V1 (256) and A64FX (512) have an
SVE vector length of 128, and in environments like Android (where no
mcpu option is common) we would expect all cpus to match. This patch
changes the default vector length to 128 with -mcpu=generic, to match
the most common case.
2024-11-29 17:41:05 +00:00
Alexey Bataev
f4974e0931 [SLP] Add a check for poison value in AShrChecker
Need to check if the value in AShrChecker is a poison before casting it
to instruction to avoid compiler crash

Fixes #118030
2024-11-29 06:51:19 -08:00
David Green
6f4b4f41ca [AArch64] Remove LoopVectorizer/AArch64/scatter-cost.ll test. NFC
This test checks the costs, not vectorization, so is better placed in the
existing gather/scatter cost modelling tests. An extra neoverse-v2 check line
has been added for both gathers and scatters.
2024-11-29 14:38:36 +00:00
Ramkumar Ramachandra
4e8eabd93e
DSE: pre-commit tests for scalable vectors (#110669)
As AliasAnalysis now has support for scalable sizes, add tests to
DeadStoreElimination covering the scalable vectors case, in preparation
to extend it.
2024-11-28 16:16:16 +00:00
Florian Hahn
12cefcc7ec
[Matrix] Skip already fused instructions before trying to fuse multiply.
lowerDotProduct called above may already lower a matrix multiply and
mark it as procssed by adding it to FusedInsts. Don't try to process it
again in LowerMatrixMultiplyFused by checking if FusedInsts.

Without this change, we trigger an assertion when trying to erase the
same original matrix multiply twice.
2024-11-28 16:11:40 +00:00