35448 Commits

Author SHA1 Message Date
Florian Hahn
2ab5c47c87
[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe.
Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would
introduce wide (vectorizing) recipes when interleaving only.

Fixes https://github.com/llvm/llvm-project/issues/76986
2024-01-04 20:39:44 +00:00
Gabriel Baraldi
a87fa7f0ca
[InstCombine] Dont throw away noalias/alias scope metadata when inlining memcpys (#74805)
This was found in julia when we changed some operations from explicit
loads + stores to memcpys. While applying it to both the src and the
dest seems weird, thats what we do for normal TBAA.
2024-01-04 17:04:31 +01:00
Alexey Bataev
79e62315be [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

When trying to reuse the extractelement instruction, emitted for the
insertelement instruction, need to check, if the this insertelement
instruction was vectorized. In this case, need to use vectorized value,
not the original insertelement.
2024-01-04 06:45:26 -08:00
Nikita Popov
62144969bc [ConstraintElim] Add debug output for failed preconditions
Print debug output if a constraint does not get added due to a
failed precondition.
2024-01-04 14:29:07 +01:00
Nikita Popov
f812251875
[ConstraintElim] Use SCEV to check for multiples (#76925)
When adding constraints for induction variables, if the step is not one,
we need to make sure that (end-start) is a multiple of step, otherwise
we might step over the end value.

Currently this only supports one specific pattern for pointers, where
the end is a gep of the start with an appropriate offset.

Generalize this by using SCEV to check for multiples, which also makes
this work for integer IVs.
2024-01-04 14:04:15 +01:00
Jannik Silvanus
7954c57124
[IR] Fix GEP offset computations for vector GEPs (#75448)
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.

This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of  `GTI.getSequentialElementStride(DL)`, 
which is a new helper function added in this PR.

This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:

* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
  are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.

Existing tests are unaffected, but a miscompilation of a new test is fixed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-01-04 10:08:21 +01:00
Nilanjana Basu
cd28da390f
[LV] Change loops' interleave count computation (#73766)
[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.

The initial tests for this change were submitted in PRs:
https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.
2024-01-04 12:45:22 +05:30
Yingwei Zheng
0ce193708c
[InstCombine] Refactor folding of commutative binops over select/phi/minmax (#76692)
This patch cleans up the duplicate code for folding commutative binops
over `select/phi/minmax`.

Related commits:
+ select support:
88cc35b27e
+ phi support:
8674a023bc
+ minmax support:
624973806c
2024-01-04 15:11:28 +08:00
Florian Hahn
6dda74cc51
[VPlan] Use createSelect in adjustRecipesForReductions (NFCI).
Simplify the code and rename Result->NewExitingVPV as suggested by
@ayalz in https://github.com/llvm/llvm-project/pull/70253.
2024-01-03 20:54:10 +00:00
Alexey Bataev
7c963fde16 [SLP]Use revectorized value for extracts from buildvector, beeing
vectorized.

If the insertelement instruction is vectorized, and the extractelement
instruction from such insertelement also vectorized as part of the same
tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.
2024-01-03 10:38:09 -08:00
Wei Wang
0faf46befa
[coroutines][DPValue] Update DILocation in DPValue for hoisted dbg.declare (#76765)
Follow up #75402 to cover DPValue
2024-01-03 08:55:38 -08:00
Nikita Popov
c17af94b96 [ConstraintElim] Use SmallDenseMap (NFC)
The number of variables in the constraint is usually very small.
Use SmallDenseMap to avoid allocations.
2024-01-03 17:04:04 +01:00
Alexandros Lamprineas
ec7a231b30
[TLI] Use the VFABI demangling when declaring vector variants. (#76753)
When creating a declaration for a vector variant, in order to determine
the argument types we need to consult the VFABI demangler. This will
allow us to add TLI mappings with linear arguments (see #76060).
2024-01-03 14:28:52 +00:00
Quentin Dian
7d81e07271
[SimplifyCFG] When only one case value is missing, replace default with that case (#76669)
When the default branch is the last case, we can transform that branch
into a concrete branch with an unreachable default branch.

```llvm
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i64 @src(i64 %0) {
  %2 = urem i64 %0, 4
  switch i64 %2, label %5 [
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
  ]

3:                                                ; preds = %1, %1
  br label %5

4:                                                ; preds = %1
  br label %5

5:                                                ; preds = %1, %4, %3
  %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ]
  ret i64 %.0
}

define i64 @tgt(i64 %0) {
  %2 = urem i64 %0, 4
  switch i64 %2, label %unreachable [
    i64 0, label %5
    i64 1, label %3
    i64 2, label %3
    i64 3, label %4
  ]

unreachable:                              ; preds = %1
  unreachable

3:                                                ; preds = %1, %1
  br label %5

4:                                                ; preds = %1
  br label %5

5:                                                ; preds = %1, %4, %3
  %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ]
  ret i64 %.0
}
```

Alive2: https://alive2.llvm.org/ce/z/Y-PGXv

After transform to a lookup table, I believe `tgt` is better code.

The final instructions are as follows:

```asm
src:                                    # @src
        and     edi, 3
        lea     rax, [rdi - 1]
        cmp     rax, 2
        ja      .LBB0_1
        mov     rax, qword ptr [8*rdi + .Lswitch.table.src-8]
        ret
.LBB0_1:
        xor     eax, eax
        ret
tgt:                                    # @tgt
        and     edi, 3
        mov     rax, qword ptr [8*rdi + .Lswitch.table.tgt]
        ret
.Lswitch.table.src:
        .quad   1                               # 0x1
        .quad   1                               # 0x1
        .quad   2                               # 0x2

.Lswitch.table.tgt:
        .quad   0                               # 0x0
        .quad   1                               # 0x1
        .quad   1                               # 0x1
        .quad   2                               # 0x2
```

Godbolt: https://llvm.godbolt.org/z/borME8znd

Closes #73446.
2024-01-03 09:22:13 +08:00
Florian Hahn
3c127e83c0
[ConstraintElim] Replace NUWSub decomp with recursive decomp of ops.
The current patterns for NUWSub decompositions do not handle negative
constants correctly at the moment (causing #76713).

Replace the incorrect pattern by more general code that recursively
decomposes the operands and then combines the results. This is already
done in most other places that handle operators like add/mul.

This means we fall back to the general constant handling code (fixes the
mis-compile) while also being able to support reasoning about
decomposable expressions in the SUB operands.

Fixes https://github.com/llvm/llvm-project/issues/76713.
2024-01-02 22:05:57 +00:00
Alexander Shaposhnikov
3af59cfe0b
[ConstraintElim] Add facts implied by llvm.abs (#73189)
Add  "abs(x) >=s x" fact.

https://alive2.llvm.org/ce/z/gOrrU3

Test plan: ninja check-all
2024-01-02 11:00:03 -08:00
Alexandros Lamprineas
e512df3ecc
[LV] Fix crash when vectorizing function calls with linear args. (#76274)
llvm/lib/IR/Type.cpp:694:
    Assertion `isValidElementType(ElementType) && "Element type of a
    VectorType must be an integer, floating point, or pointer type."'
    failed.
Stack dump:
    llvm::FixedVectorType::get(llvm::Type*, unsigned int)
    llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&)
    llvm::VPBasicBlock::execute(llvm::VPTransformState*)
    llvm::VPRegionBlock::execute(llvm::VPTransformState*)
    llvm::VPlan::execute(llvm::VPTransformState*)
    ...

Happens with function calls of void return type.
2024-01-02 18:14:16 +00:00
Wei Wang
9c978c9418
[coroutines] Use DILocation from new storage for hoisted dbg.declare (#75402)
Make the hoisted dbg.declare inherent the DILocation scope from the new
storage.

After hoisting, the dbg.declare is moved into the block that defines the
new storage. This could create an inconsistency in the debug location
scope hierarchy where the scope of hoisted dbg.declare (i.e.
DILexicalBlock) is enclosed with the scope of the block (i.e.
DISubprogram). This confuses LiveDebugValues pass to think that the
hoisted dbg.declare is killed in that block and does not generate
DBG_VALUE in other blocks. Debugger won't be able to track its value
anymore.

We do this for unoptimized binary only.
2024-01-02 09:54:16 -08:00
Nikita Popov
9d5b0965c4 [InstCombine] Add helper for commutative icmp folds (NFCI)
Add a common place for icmp folds that should be tried with both
operand orders, so we don't have to repeat this pattern for
individual folds.
2024-01-02 16:16:32 +01:00
Enna1
9943d33997
[SLP][NFC] Fix assertion in vectorizeGEPIndices() (#76660)
The index constraints for the collected getelementptr instructions
should be single **and** non-constant.
2024-01-02 21:32:18 +08:00
Yingwei Zheng
7e405eb722
[FuncAttrs] Don't infer noundef for functions with sanitize_memory attribute (#76691)
MemorySanitizer assumes that the definition and declaration of a
function will be consistent. If we add `noundef` for some definitions,
it will break msan.

Fix buildbot failure caused by #76553.
2024-01-02 06:59:56 +08:00
Florian Hahn
f18536d642
[VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
hstk30-hw
4b2f1184fc
Skip tranformConstExprCastCall for naked function (#76496)
Fix this issue https://github.com/llvm/llvm-project/issues/72843 .

For naked function, assembly might be using an argument, or otherwise
rely on the frame layout, so don't transformConstExprCastCall
2024-01-01 22:52:13 +08:00
Yingwei Zheng
949ec83eaf
[InstCombine] Relax the same-underlying-object constraint for the GEP canonicalization (#76583)
7d7001b2cb
canonicalizes `(gep i8, X, (ptrtoint Y) - (ptrtoint X))` into `bitcast
Y` iff `X` and `Y` have the same underlying object.

I find that the result of this pattern is usually used as an operand of
an icmp in some real-world applications. I think we can do the
canonicalization if the result is only used by icmps/ptrtoints.

Alive2: https://alive2.llvm.org/ce/z/j4-HJZ
2024-01-01 00:35:42 +08:00
Florian Hahn
f248d5eed1
[Local] Bring back check for FP types in getExpressionForConstant.
The check makes sure that the result for getZExtValue is guaranteed to
fit into 64 bit.
2023-12-31 13:50:25 +00:00
Florian Hahn
b46638dc76
[Local] Handle undef FP constant in getExpressionForConstant.
Check for FP constant instead of checking for floating point types, as
Undef/Poison values can have floating point types while not being
FPConstants.

This fixes a crash introduced by #66745 (f3b20cb).
2023-12-31 13:42:47 +00:00
Yingwei Zheng
1228becf7d
[FuncAttrs] Deduce noundef attributes for return values (#76553)
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any 
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding 
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|

The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
2023-12-31 20:44:48 +08:00
Jie Fu
bf312263bf [InstCombine] Remove unused variables in InstCombineSelect.cpp (NFC)
llvm-project/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp:3810:14: error: unused variable 'LHS' [-Werror,-Wunused-variable]
 3810 |       Value *LHS, *RHS;
      |              ^~~
llvm-project/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp:3810:20: error: unused variable 'RHS' [-Werror,-Wunused-variable]
 3810 |       Value *LHS, *RHS;
      |
2023-12-31 18:40:26 +08:00
Yingwei Zheng
b23f59a646
[InstCombine] Fold select (A &/| B), T, F if select B, T, F is foldable (#76621)
This patch does the following folds:
```
(select A && B, T, F) -> (select A, (select B, T, F), F)
(select A || B, T, F) -> (select A, T, (select B, T, F))
```
if `(select B, T, F)` can be folded into a value or a canonicalized SPF.
Alive2: https://alive2.llvm.org/ce/z/4Bdrbu

The original motivation of this patch is to simplify the following
pattern:
```
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%cmp9.i = icmp ugt i64 %add.i, 1152921504606846975
%or.cond.i = or i1 %cmp7.i, %cmp9.i
%cond.i = select i1 %or.cond.i, i64 1152921504606846975, i64 %add.i
->
%.sroa.speculated.i = tail call i64 @llvm.umax.i64(i64 %sub.ptr.div.i.i, i64 1)
%add.i = add i64 %.sroa.speculated.i, %sub.ptr.div.i.i
%cmp7.i = icmp ult i64 %add.i, %sub.ptr.div.i.i
%max = call i64 @llvm.umax.i64(i64 %add.i, 1152921504606846975)
%cond.i = select i1 %cmp7.i, i64 1152921504606846975, i64 %max
```
The later form has a better codegen for some backends. It is also more
analysis-friendly than the original one.
Godbolt: https://godbolt.org/z/eK6eb5jf1
Alive2: https://alive2.llvm.org/ce/z/VHlxL2

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7c71d3996a72b9b024622f23bf556539b961c88c&to=638ce8666fadaca1ab2639a3c2bc52a4a8508f40&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.02%|-0.00%|+0.02%|-0.03%|-0.00%|-0.05%|-0.00%|

It is an alternative to #76203 and #76363 because we can simplify
`select (icmp eq/ne a, b), a, b` into `b` or `a`.
Fixes #75784.
Fixes #76043.

Thank @XChy for providing additional tests.
Co-authored-by: XChy <xxs_chy@outlook.com>
2023-12-31 18:28:48 +08:00
Yingwei Zheng
568db84247
[InstCombine] Refactor canonicalizeSPF to support decomposed select. NFC.
See also https://github.com/llvm/llvm-project/pull/76621
2023-12-31 16:30:24 +08:00
Mikhail Gudim
7a581c34f1
Reland "[InstCombine] Extend foldICmpBinOp to add-like or" (#76531)
The original PR had a typo which was causing a bug.
2023-12-30 01:55:07 -05:00
Enna1
a51c2f39f5
[SLP] no need to generate extract for in-tree uses for original scala… (#76077)
…r instruction.

Before
77a609b556,
we always skip in-tree uses of the vectorized scalars in
`buildExternalUses()`,
that commit handles the case that if the in-tree use is scalar operand
in vectorized instruction,
we need to generate extract for these in-tree uses.

in-tree uses remain as scalar in vectorized instructions can be 3 cases:
- The pointer operand of vectorized LoadInst uses an in-tree scalar
- The pointer operand of vectorized StoreInst uses an in-tree scalar
- The scalar argument of vector form intrinsic uses an in-tree scalar

Generating extract for in-tree uses for vectorized instructions are
implemented in `BoUpSLP::vectorizeTree()`:
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667

However,
77a609b556
not only generates extract for vectorized instructions,
but also generates extract for original scalar instructions.
There is no need to generate extract for origin scalar instrutions,
as these scalar instructions will be replaced by vector instructions and
get erased later.

This patch marks there is no exact user for in-tree scalars that 
remain as scalar in vectorized instructions when building external uses,
In this case all uses of this scalar will be automatically replaced by extractelement.
and remove
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551
-
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667
extracts.
2023-12-30 10:45:26 +08:00
Yingwei Zheng
90802e652d
[InstCombine] Handle commuted cases of the fold ((B|C)&A)|B -> B|(A&C) (#76565)
Alive2: https://alive2.llvm.org/ce/z/Qdsqk6

The commit f1eda23514
didn't handle other cases that commute operands.
2023-12-29 23:58:58 +08:00
XChy
dafd17895f [InstCombine][NFC] Format code in foldCmpLoadFromIndexedGlobal 2023-12-29 17:42:38 +08:00
Yingwei Zheng
2128fca6c1
[InstCombine] Canonicalize gep T* X, V / sizeof(T) to gep i8* X, V (#76458)
This patch canonicalize `gep T* X, V / sizeof(T)` to `gep i8* X, V`.
Alive2: https://alive2.llvm.org/ce/z/7XGjiB

As this pattern has been handled by the backends, the motivation of this
patch is to reduce the ref count of sdiv, which will enable more
optimizations.
2023-12-29 11:30:00 +08:00
Florian Hahn
516cc98aff
[LV] Fix typo in comment (NFC). 2023-12-28 21:20:10 +00:00
Alexey Bataev
5096501082 [SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-28 05:04:04 -08:00
Yingwei Zheng
7a1a476116
[InstCombine] Fold (X & C1) | C2 into X & (C1 | C2) iff (X & C2) == C2 (#76470)
Alive2: https://alive2.llvm.org/ce/z/VKJYaS
2023-12-28 20:47:40 +08:00
Wei Tao
a700298b3d
[CanonicalizeFreezeInLoops] fix duplicate removal (#74716)
This PR fixes #74572 where the freeze instruction could be found twice
by the pass CanonicalizeFreezeInLoops, and then the compiling may crash
in second removal since the instruction has already gone.
2023-12-28 09:47:31 +01:00
Douglas Yung
fb981e6b4b Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)"
This reverts commit bc8c4bbd7973ab9527a78a20000aecde9bed652d.

Change is failing to build on several bots:
- https://lab.llvm.org/buildbot/#/builders/127/builds/60184
- https://lab.llvm.org/buildbot/#/builders/123/builds/23709
- https://lab.llvm.org/buildbot/#/builders/216/builds/32302
2023-12-27 23:52:04 -08:00
Alexey Bataev
bc8c4bbd79
[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461)
SLP/TTI do not know about the cost estimation for addsub pattern,
supported by X86. Previously the support for pattern detection was added
(seeTTI::isLegalAltInstr), but the cost still did not estimated
properly.
2023-12-27 15:57:21 -05:00
Craig Topper
7f1c8fc25a
[InstCombine] Use ConstantInt::getSigned to sign extend -2 for large types. (#76464)
Using ContantInt::get will zero extend.

Fixes #76441
2023-12-27 12:27:12 -08:00
Yingwei Zheng
aacff347af
[InstCombine] Simplify icmp pred (sdiv exact X, C), (sdiv exact Y, C) into icmp pred X, Y when C is positive (#76409)
Alive2: https://alive2.llvm.org/ce/z/u49dQ9
It will improve the codegen of `std::_Vector_base<T>::~_Vector_base()` when `sizeof(T)` is not a power of 2.

NOTE: We can also fold `icmp signed-pred (sdiv exact X, C), (sdiv exact Y, C)` into `icmp signed-pred (sdiv exact Y, C), (sdiv exact X, C)` when C is negative. But I don't think it enables more optimizations for real-world applications.
2023-12-27 06:06:16 +08:00
Yingwei Zheng
4358e6e0c5
[FuncAttrs] Infer norecurse for funcs with calls to nocallback callees (#76372)
This patch adds missing `norecurse` attrs to funcs that only call intrinsics with `nocallback` attrs.
Fixes the regression found in https://github.com/dtcxzyw/llvm-opt-benchmark/pull/45#discussion_r1436148743.
The function loses `norecurse` attr because it calls `@llvm.fabs.f64`, which is not marked as `norecurse`.

Since `norecurse` is not a default attribute of intrinsics and it is
ambiguous for intrinsics, I decided to use the existing `callback`
attributes.

> nocallback
This attribute indicates that the function is only allowed to jump back
into caller’s module by a return or an exception, and is not allowed to
jump back by invoking a callback function, a direct, possibly
transitive, external function call, use of longjmp, or other means. It
is a compiler hint that is used at module level to improve dataflow
analysis, dropped during linking, and has no effect on functions defined
in the current module.

See also https://llvm.org/docs/LangRef.html#function-attributes.
2023-12-27 03:16:43 +08:00
Yingwei Zheng
ff76627aeb
[InstCombine] Fix type mismatch between cond and value in foldSelectToCopysign (#76343)
This patch fixes the miscompilation when we try to bitcast a floating point vector into an integer scalar.
2023-12-26 00:04:06 +08:00
Yingwei Zheng
0d454d6e59
[InstCombine] Fold xor of icmps using range information (#76334)
This patch folds xor of icmps into a single comparison using range-based reasoning as `foldAndOrOfICmpsUsingRanges` does.
Fixes #70928.
2023-12-25 07:14:31 +08:00
Craig Topper
d8ddcae547 [LSR] Fix typo in debug message where backspace escape was used instead of new line. 2023-12-24 10:35:27 -08:00
Benjamin Kramer
9423e45987 [ProfileData] Copy CallTargetMaps a bit less. NFCI 2023-12-24 17:48:18 +01:00
Kazu Hirata
1daf2994de [llvm] Use StringRef::contains (NFC) 2023-12-23 22:21:52 -08:00
Florian Hahn
fbcf8a8cbb
[ConstraintElim] Add (UGE, var, 0) to unsigned system for new vars. (#76262)
The constraint system used for ConstraintElimination assumes all
varibles to be signed. This can cause missed optimization in the
unsigned system, due to missing the information that all variables are
unsigned (non-negative).

Variables can be marked as non-negative by adding Var >= 0 for all
variables. This is done for arguments on ConstraintInfo construction and
after adding new variables. This handles cases like the ones outlined in
https://discourse.llvm.org/t/why-does-llvm-not-perform-range-analysis-on-integer-values/74341

The original example shared above is now handled without this change,
but adding another variable means that instcombine won't be able to
simplify examples like https://godbolt.org/z/hTnra7zdY

Adding the extra variables comes with a slight compile-time increase
https://llvm-compile-time-tracker.com/compare.php?from=7568b36a2bc1a1e496ec29246966ffdfc3a8b87f&to=641a47f0acce7755e340447386013a2e086f03d9&stat=instructions:u

stage1-O3    stage1-ReleaseThinLTO    stage1-ReleaseLTO-g  stage1-O0-g
 +0.04%           +0.07%                   +0.05%           +0.02%
stage2-O3    stage2-O0-g    stage2-clang
  +0.05%         +0.05%        +0.05%

https://github.com/llvm/llvm-project/pull/76262
2023-12-23 15:53:48 +01:00