38279 Commits

Author SHA1 Message Date
Vitaly Buka
fc201d6133
Revert "[InstCombine] Support gep nuw in icmp folds" (#118698)
Reverts llvm/llvm-project#118472

Breaks profile tests on i386
https://lab.llvm.org/buildbot/#/builders/66/builds/7009
2024-12-04 15:07:27 -08:00
Kazu Hirata
1b95e76d8f [Instrumentation] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:3840:14:
  error: unused variable 'NumArgOperands' [-Werror,-Wunused-variable]
2024-12-04 08:31:40 -08:00
Alexander Shaposhnikov
95e44d3670
[msan] Add handling for sse41_round_pd/sse41_round_ps (#118441)
Add handling for sse41_round_pd/sse41_round_ps similarly to
maybeHandleSimpleNomemIntrinsic.

Test plan: ninja check-all
2024-12-04 08:27:08 -08:00
Nikita Popov
66ed8fb973 [InstCombine] Fix use after free
Make sure we only access cached nowrap flags.
2024-12-04 17:20:04 +01:00
Nikita Popov
4a7abfe0a7 [InstCombine] Preserve nuw in OptimizePointerDifference
If both the geps and the subs are nuw the new sub is also nuw.

Proof: https://alive2.llvm.org/ce/z/mM8UvF
2024-12-04 16:58:35 +01:00
Nikita Popov
a608607fd7
[ConstraintElim] Add support for decomposing gep nuw (#118639)
ConstraintElimination currently only supports decomposing gep nusw with
non-negative indices (with "non-negative" possibly being enforced via
pre-condition).

Add support for gep nuw, which directly gives us the necessary
guarantees for the decomposition.
2024-12-04 16:27:31 +01:00
Florian Hahn
7b6e0d9fc3
[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282)
ValueMap automatically updates entries with the new value if they have
been RAUW. This can lead to instructions that are expected to not have
shape info to be added to the map (e.g. shufflevector as in the added
test case).

This leads to incorrect results. Originally it was used for transpose
optimizations, but they now all use updateShapeAndReplaceAllUsesWith,
which takes care of updating the shape info as needed.

This fixes a crash in the newly added test cases.

PR: https://github.com/llvm/llvm-project/pull/118282
2024-12-04 14:51:31 +00:00
Jan Ječmen
78db4e9f7b
[NFC][IRCE] Don't require LoopStructure to determine IRCE profitability (#116384)
This refactoring hoists the profitability check earlier in the pipeline,
so that for loops that are not profitable to transform there is no
iteration over the basic blocks or LoopStructure computation.

Motivated by PR #104659 that tweaks how the profitability of individual
branches is evaluated.
2024-12-04 11:09:19 +01:00
Antonio Frighetto
f68b0e3699 [AggressiveInstCombine] Use APInt and avoid truncation when folding loads
A miscompilation issue has been addressed with improved handling.

Fixes: https://github.com/llvm/llvm-project/issues/118467.
2024-12-04 10:20:14 +01:00
ronryvchin
ff281f7d37
[PGO] Add option to always instrumenting loop entries (#116789)
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.

Note that change does NOT change the default behavior.
2024-12-04 07:56:46 +01:00
Owen Anderson
14a259f85b
GlobalOpt: Use the correct address space when creating a "*.init" global. (#118562) 2024-12-04 14:01:16 +13:00
k-kashapov
f2fa9ac616
[nfc][MSan] Change for-loop to ArgNo instead of drop_begin (#117553)
As discussed in
https://github.com/llvm/llvm-project/pull/109284#discussion_r1838830571
Changed for loop to use `ArgNo` instead of `drop_begin` to keep loop
code consistent with other helpers.

Co-authored-by: Kamil Kashapov <kashapov@ispras.ru>
2024-12-03 14:32:54 -08:00
Teresa Johnson
d6cd214dd6
[ThinLTO][LowerTypeTests] Don't compute address taken set unless CFI (NFC) (#118508)
The AddressTaken set used for CFI with regular LTO was being computed on
the ExportSummary regardless of whether any CFI metadata existed. In the
case of ThinLTO, the ExportSummary is the global summary index for the
target, and the lack of guard in this code meant this was being computed
on the ThinLTO index even when there was an empty regular LTO module,
since the backend is called on the combined module to generate the
expected output file (normally this is trivial as there is no IR).

Move the computation of the AddressTaken set into the condition checking
for CFI to avoid this overhead. This change resulted in a 20% speedup in
the thin link of a large target. It looks like the outer loop has
existed here for several years, but likely became a larger overhead
after the inner loop was added very recently in PR113987.

I will send a separate patch to refactor the ThinLTO backend handling to
avoid invoking the opt pipeline if the module is empty, in case there
are other summary-based analyses in some of the passes now or in the
future. This change is still desireable as by default regular LTO
modules contain summaries, or we can have split thin and regular LTO
modules, and if they don't involve CFI these would still unnecessarily
compute the AddressTaken set.
2024-12-03 12:14:16 -08:00
Nikita Popov
10223c72a9 [ConstraintElim] Use nusw flag for GEP decomposition
Check for nusw instead of inbounds when decomposing GEPs.

In this particular case, we can also look through multiple nusw
flags, because we will ultimately be working in the unsigned
constraint system.
2024-12-03 15:56:29 +01:00
Florian Hahn
a7fda0e1e4
[VPlan] Introduce VPScalarPHIRecipe, use for can & EVL IV codegen (NFC). (#114305)
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.

Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.

PR: https://github.com/llvm/llvm-project/pull/114305
2024-12-03 14:53:51 +00:00
Ramkumar Ramachandra
2a0ee090db
IVDesc: strip redundant arg in getOpcode call (NFC) (#118476) 2024-12-03 13:40:51 +00:00
Ramkumar Ramachandra
51a895aded
IR: introduce struct with CmpInst::Predicate and samesign (#116867)
Introduce llvm::CmpPredicate, an abstraction over a floating-point
predicate, and a pack of an integer predicate with samesign information,
in order to ease extending large portions of the codebase that take a
CmpInst::Predicate to respect the samesign flag.

We have chosen to demonstrate the utility of this new abstraction by
migrating parts of ValueTracking, InstructionSimplify, and InstCombine
from CmpInst::Predicate to llvm::CmpPredicate. There should be no
functional changes, as we don't perform any extra optimizations with
samesign in this patch, or use CmpPredicate::getMatching.

The design approach taken by this patch allows for unaudited callers of
APIs that take a llvm::CmpPredicate to silently drop the samesign
information; it does not pose a correctness issue, and allows us to
migrate the codebase piece-wise.
2024-12-03 13:31:04 +00:00
Nikita Popov
f33536468b
[InstCombine] Support gep nuw in icmp folds (#118472)
Unsigned icmp of gep nuw folds to unsigned icmp of offsets. Unsigned
icmp of gep nusw nuw folds to unsigned samesign icmp of offsets.

Proofs: https://alive2.llvm.org/ce/z/VEwQY8
2024-12-03 14:28:56 +01:00
Nikita Popov
bdc6faf775 [InstCombine] Support nusw in icmp of two geps with same base
Proof: https://alive2.llvm.org/ce/z/BYNQ7s
2024-12-03 11:51:14 +01:00
Nikita Popov
9c5a84b394 [InstCombine] Support nusw in icmp of gep with base
Proof: https://alive2.llvm.org/ce/z/omnQXt
2024-12-03 11:51:14 +01:00
Antonio Frighetto
1d6ab189be [MemCpyOpt] Drop dead memmove calls on memset'd source data
When a memmove happens to clobber source data, and such data have
been previously memset'd, the memmove may be redundant.
2024-12-03 09:50:57 +01:00
Yingwei Zheng
c1ad064dd3
[InstCombine] Fold icmp spred (and X, highmask), C1 into icmp spred X, C2 (#118197)
Alive2: https://alive2.llvm.org/ce/z/Ffg64g
Closes https://github.com/llvm/llvm-project/issues/104772.
2024-12-03 16:19:12 +08:00
Rajat Bajpai
de415fbb45
[InstCombine][FP] Fix nnan preservation for transform fcmp + sel => fmax/fmin (#117977)
Preserve `nnan` constraint only if present on both `fcmp` and `select`.

Alive2: https://alive2.llvm.org/ce/z/ZNDjzt
2024-12-03 14:01:36 +08:00
Yingwei Zheng
295d6b18f7
[InstCombine] Fold (X * (Y << K)) u>> K -> X * Y when highbits are not demanded (#111151)
Alive2: https://alive2.llvm.org/ce/z/Z7QgjH
2024-12-03 12:04:04 +08:00
Han-Kuan Chen
f71ea4bc1b
[SLP][REVEC] reorderNodeWithReuses should not be called if all users of a TreeEntry are ShuffleVectorInst. (#118260) 2024-12-03 09:04:04 +08:00
Mingming Liu
6faf17b762
[ThinLTO]Supports declaration import for global variables in distributed ThinLTO (#117616)
When `-import-declaration` option is enabled, declaration import is
supported for functions. https://github.com/llvm/llvm-project/pull/88024
has the context for this option.

This patch supports declaration import for global variables in
distributed ThinLTO. The motivating use case is to propagate `dso_local`
attribute of global variables across modules, to optimize global
variable access when a binary is built with
`-fno-direct-access-external-data`.
* With `-fdirect-access-external-data`, non thread-local global
variables will [have `dso_local`
attributes](fe3c23b439/clang/lib/CodeGen/CodeGenModule.cpp (L1730-L1746)).
This optimizes the global variable access as shown by
https://gcc.godbolt.org/z/vMzWcKdh3
2024-12-02 16:15:52 -08:00
Florian Hahn
4226e0a0c7
[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.

This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.

PR: https://github.com/llvm/llvm-project/pull/118316
2024-12-02 21:35:00 +00:00
Florian Hahn
f8ce2e4bb3
[Matrix] Only retrieve analyses if there are any matrix intrinsics (NFC)
Only request analyses if there are any matrix intrinics to avoid
computing them if there are no matrix intrinsics.
2024-12-02 11:22:24 +00:00
Nikita Popov
7bbc049688
[InstCombine] Consolidate another fold into select value equivalence (#117746)
We had a separate fold that handled just the trivial case where we're
replacing exactly the argument of the select. Handle this in select
value equivalence by relaxing the infinite loop protection to allow a
replacement of a non-constant with a constant.

This also fixes https://github.com/llvm/llvm-project/issues/113301, as
the separate fold did not handle undef values correctly.
2024-12-02 09:45:39 +01:00
Veera
979a0356d4
[InstCombine] Fold X Pred C2 ? X BOp C1 : C2 BOp C1 to min/max(X, C2) BOp C1 (#116888)
Fixes #82414.

General Proof: https://alive2.llvm.org/ce/z/ERjNs4 
Proof for Tests: https://alive2.llvm.org/ce/z/K-934G

This PR transforms `select` instructions of the form `select (Cmp X C1)
(BOp X C2) C3` to `BOp (min/max X C1) C2` iff `C3 == BOp C1 C2`.

This helps in eliminating a noop loop in
https://github.com/rust-lang/rust/issues/123845 but does not improve
optimizations.
2024-12-02 09:33:45 +01:00
Florian Hahn
77767986ed
[LV] Use IsaPred in a few more places (NFC).
Simplifies the code slightly by removing explicit lambdas.
2024-12-01 18:47:53 +00:00
Yingwei Zheng
1a3eace82a
[InstCombine] Fold umax(X, C) + -C into usub.sat(X, C) (#118195)
Alive2: https://alive2.llvm.org/ce/z/oSWe5S
Closes https://github.com/llvm/llvm-project/issues/118155
2024-12-01 23:29:40 +08:00
Jonas Paulsson
0ad6be1927
[SLPVectorizer, TargetTransformInfo, SystemZ] Improve SLP getGatherCost(). (#112491)
As vector element loads are free on SystemZ, this patch improves the cost
computation in getGatherCost() to reflect this.

getScalarizationOverhead() gets an optional parameter which can hold the actual
Values so that they in turn can be passed (by BasicTTIImpl) to
getVectorInstrCost().

SystemZTTIImpl::getVectorInstrCost() will now recognize a LoadInst and
typically return a 0 cost for it, with some exceptions.
2024-11-29 21:19:45 +01:00
Tyler Nowicki
b40714b012
[Coroutines][NFC] Refactor CoroCloner (#116885)
* Move CoroCloner to its own header. For now, the header is located in llvm/lib/Transforms/Coroutines
* Change private to protected to allow inheritance
* Create CoroSwitchCloner and move some of the switch specific code into this cloner. More code will follow in later commits.
2024-11-29 11:20:33 -05:00
Alexey Bataev
f4974e0931 [SLP] Add a check for poison value in AShrChecker
Need to check if the value in AShrChecker is a poison before casting it
to instruction to avoid compiler crash

Fixes #118030
2024-11-29 06:51:19 -08:00
Luke Lau
d9c269577e
[VPlan] Remove manual constant fold in VPWidenIntOrFpInductionRecipe. NFC (#118028)
This manual constant folding was added in 2017 in
https://reviews.llvm.org/D29956, but since then it looks like IRBuilder
has learnt to fold it away itself.
I'm not sure at what point this happened, I just verified this by
stepping through the call to CreateVectorSplat in the debugger.
2024-11-29 00:21:53 +01:00
Florian Hahn
12cefcc7ec
[Matrix] Skip already fused instructions before trying to fuse multiply.
lowerDotProduct called above may already lower a matrix multiply and
mark it as procssed by adding it to FusedInsts. Don't try to process it
again in LowerMatrixMultiplyFused by checking if FusedInsts.

Without this change, we trigger an assertion when trying to erase the
same original matrix multiply twice.
2024-11-28 16:11:40 +00:00
Rafael Eckstein
2a6e5896a5
[MergeFunctions] Add support to run the pass over a set of function pointers (#111045)
This modification will enable the usage of `MergeFunctions` as a
standalone library. Currently, `MergeFunctions` can only be applied to
an entire module. By adopting this change, developers will gain the
flexibility to reuse the `MergeFunctions` code within their own
projects, choosing which functions to merge; hence, promoting code
reusability. Notice that this modification will not break backward
compatibility, because `MergeFunctions` will still work as a pass after
the modification.
2024-11-28 16:18:52 +01:00
Florian Hahn
82821254f5
[LV] Use IVUpdateMayOverflow to set HasNUW. (#111758)
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.

PR: https://github.com/llvm/llvm-project/pull/111758
2024-11-28 10:12:41 +00:00
Elvis Wang
9ea5be639d
Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC (#117109)" (#117289)
Update the test cases contains `any-of` printings from the
precomputeCost().

Origin message: 

The any-of reduction contains phi and select instructions.

The select instruction might be optimized and removed in the vplan which
may cause VF difference between legacy and VPlan-based model. But if the
select instruction be removed, planContainsAdditionalSimplifications()
will catch it and disable the assertion.

Therefore, we can just remove the ayn-of reduction calculation in the
precomputeCost().



Recommit "[LV][VPlan] Remove any-of reduction from precomputeCost. NFC
(#117109)"
2024-11-28 15:07:36 +08:00
LiqinWeng
4a3f46de50
[LV][EVL] Support call instruction with EVL-vectorization (#110412) 2024-11-28 10:05:08 +08:00
Joseph Huber
4cb4516ae9 [OpenMP] Fix RPC client not being optimized out after changes
Summary:
I forgot that this check deliberately looked through the indirection I
removed. Fix it to just check if the symbol has no users.
2024-11-27 15:56:23 -06:00
Joseph Huber
89d8e70031
[libc] Export a pointer to the RPC client directly (#117913)
Summary:
We currently have an unnecessary level of indirection when initializing
the RPC client. This is a holdover from when the RPC client was not
trivially copyable and simply makes it more complicated. Here we use the
`asm` syntax to give the C++ variable a valid name so that we can just
copy to it directly.

Another advantage to this, is that if users want to piggy-back on the
same RPC interface they need only declare theirs as extern with the same
symbol name, or make it weak to optionally use it if LIBC isn't
avaialb.e
2024-11-27 14:57:38 -06:00
Krzysztof Pszeniczny
991154d0fb
[LTO] Use .at instead of .lookup to avoid copies. (NFC) (#117888)
`DenseMap::lookup` returns by value (because it default-creates the
returned value if the key isn't present in the map), which means that we
do a lot of copying here. Since we assert that something is present in
the returned value two lines below this call, it's safe to use `.at`
here instead.

Copying and then destroying dense maps here is responsible for 60% of
the time spent in LTO indexing in a large internal build.
2024-11-27 18:41:29 +01:00
Nikita Popov
43ee6f7a01
[AlwaysInline] Avoid unnecessary BFI fetches (#117750)
AlwaysInliner doesn't use BFI itself, it only updates it. If BFI is not
already computed, it will spend time to first compute it, and then
update it. This is not necessary: If BFI is not available in the first
place, there is no need to update it.

This is mainly relevant in debug builds for IR that has a lot of
alwaysinline functions.
2024-11-27 15:53:21 +01:00
Nikita Popov
fc5c89900f [SimpleLoopUnswitch] Fix LCSSA phi node invalidation
Fixes https://github.com/llvm/llvm-project/issues/117537.
2024-11-27 11:48:05 +01:00
Yingwei Zheng
0f0c0c36e3
[ConstraintElim] Extend checkOrAndOpImpliedByOther to handle and/or expr trees. (#117123)
This patch extends `checkOrAndOpImpliedByOther` to handle and/or trees.
Limitation: At least one of the operands of root and/or instruction
should be an icmp. That is, this patch doesn't support expressions like
`(cmp1 & cmp2) & (cmp3 & cmp4)`.

Closes https://github.com/llvm/llvm-project/issues/117107.
Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=69cc3f096ccbdef526bbd5a065a25c95122e87ee&to=919416d2c4c71e3b9fe533af2c168a36c7893be5&stat=instructions%3Au
2024-11-27 09:04:52 +08:00
AdityaK
39601a6e54
Bail out jump threading on indirect branches only (#117778)
Remove check for PHI in pred as pointed out in #103688 
Reduced the testcase to remove redundant phi in pred

Fixes: #102351
2024-11-26 14:57:28 -08:00
Florian Hahn
46a08579f2
[Local] Only intersect alias.scope,noalias & parallel_loop if inst moves (#117716)
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220

Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.


PR: https://github.com/llvm/llvm-project/pull/117716
2024-11-26 20:39:53 +00:00
Florian Hahn
ab6677e7d6
[LICM] Only set AA metadata on hoisted load if it executes. (#117204)
https://github.com/llvm/llvm-project/pull/116220 clarified that
violations of aliasing metadata are UB.

Only set the AA metadata after hoisting a log, if it is guaranteed to
execute in the original loop.

PR: https://github.com/llvm/llvm-project/pull/117204
2024-11-26 14:16:16 +00:00