222 Commits

Author SHA1 Message Date
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Nikita Popov
cf47af493b
[InstCombine] Generalize folds for inversion of icmp operands (#74317)
We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.

Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.

Fixes https://github.com/llvm/llvm-project/issues/74302.
2023-12-08 11:25:41 +01:00
Nikita Popov
d77067d08a
[ValueTracking] Add dominating condition support in computeKnownBits() (#73662)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.

DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.

The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.

This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.

Fixes https://github.com/llvm/llvm-project/issues/74242.
2023-12-06 14:17:18 +01:00
Craig Topper
7ec4f6094e
[InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Craig Topper
03d4a9d94d
[InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Yingwei Zheng
dc6d077396
[CVP] Infer nneg on existing zext (#72052)
This patch infers `nneg` flags for existing zext instructions in CVP.
After https://github.com/llvm/llvm-project/pull/71534 and this patch, we
can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`:


40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)

This is an alternative to #72049.
2023-11-13 22:41:37 +08:00
dewen
3b82336188
Revert "[PM] Execute IndVarSimplifyPass precede RessociatePass" (#71617)
Reverts llvm/llvm-project#71054
2023-11-08 09:22:55 +08:00
dewen
e4d27d7f32
[PM] Execute IndVarSimplifyPass precede RessociatePass (#71054)
ReassociatePass may clear nsw/nuw flags of some instructions, which may
have side effects on optimizations in IndVarSimplifyPass.
2023-11-08 09:21:17 +08:00
Nikita Popov
30240e428f [PhaseOrdering] Regenerate test checks (NFC) 2023-10-12 14:40:13 +02:00
DianQK
2d1e8a03f5
[EarlyCSE] Compare GEP instructions based on offset (#65875)
Closes #65763.
This will provide more opportunities for constant propagation for
subsequent optimizations.
2023-09-20 06:14:45 +08:00
Alexey Bataev
c619222ea4 [SLP]Use common logic for cost estimation of the alternate vector nodes.
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.

Differential Revision: https://reviews.llvm.org/D157413
2023-08-09 11:50:39 -07:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.

Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648
[LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Arthur Eubanks
457dc72fdd Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.

Fixed clang test broken in initial land.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153815
2023-06-27 09:31:20 -07:00
Arthur Eubanks
0f9df062ec Revert "[InstCombine] Infer inbounds for more GEPs of dereferenceable pointers"
This reverts commit cd43b19c0127d80f3543803359db0f03e363e893.

Breaks clang/test/CodeGenOpenCL/builtins-amdgcn.cl.
2023-06-27 09:27:15 -07:00
Arthur Eubanks
cd43b19c01 [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153815
2023-06-27 09:13:00 -07:00
Arthur Eubanks
7d6b8249fa [test] Regenerate test checks 2023-06-20 18:20:52 -07:00
Noah Goldstein
3391bdc255 Revert "[FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP)"
Accidental commit/push!

This reverts commit 4fa971ff62c3c48c606b792c572c03bd4d5906ee.
2023-06-13 00:53:31 -05:00
Noah Goldstein
4fa971ff62 [FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP)
This is the consolidation of D151644 and D151943 moved from
InstCombine to FunctionAttrs. This is based on discussion in the above
patches as well as D152081 (Attributor). This patch was written in a
way so it can have an immediate impact in currently active passes
(FunctionAttrs), but should be easy to port elsewhere (Attributor or
Inliner) if that makes more sense later on.

Some function attributes imply the attribute for all/some instructions
in the function. These attributes can be safely propagated to
callsites within the function that are missing the attribute. This can
be useful when 1) analyzing individual instructions in a function
and 2) if the original caller is later inlined, as if the attributes are
not propagated, they will be lost.

This patch implements propagation in a new class/file
`InferCallsiteAttrs` which can hypothetically be included elsewhere.

At the moment this patch infers the following:

Function Attributes:
    - mustprogress
    - nofree
    - willreturn
    - All memory attributes (readnone, readonly, writeonly, argmem,
      etc...)
        - The memory attributes are only propagated IFF the set of
          pointers available to the callsite is the same as the set
          available outside the caller (i.e no local memory arguments
          from alloca or local malloc like functions).

Argument Attributes:
    - noundef
    - nonnull
    - nofree
    - readnone
    - readonly
    - writeonly
    - nocapture
        - nocapture is only propagated IFF the set of pointers
          available to the callsite is the same as the set available
          outside the caller and its guranteed that between the
          callsite and function return, the state of any capture
          pointers will not change (so the nocaptured gurantee of the
          caller has been met by the instruction preceding the
          callsite and will not changed).

Argument are only propagated to callsite arguments that are also function
arguments, but not derived values.

Return Attributes:
    - noundef
    - nonnull

Return attributes are only propagated if the callsite's return value
is used as the caller's return and execution is guranteed to pass from
callsite to return.

The compile time hit of this for -O3 and -O3+thinLTO is ~[.02, .37]%
regression. Proper LTO, however, has more significant regressions (up
to 3.92%):
https://llvm-compile-time-tracker.com/compare.php?from=94407e1bba9807193afde61c56b6125c0fc0b1d1&to=79feb6e78b818e33ec69abdc58c5f713d691554f&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D152226
2023-06-13 00:47:43 -05:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
luxufan
f470922a29 Revert "Revert "[ValutTracking] Use isGuaranteedNotToBePoison in impliesPoison""
This reverts commit 706e8110573c83f140a63b40803d6370c86c1414.
2023-05-10 14:35:55 +08:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Simon Pilgrim
aa754f7e0f [IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.

This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.

Differential Revision: https://reviews.llvm.org/D148221
2023-04-13 16:40:43 +01:00
Simon Pilgrim
07c5e175f6 [PhaseOrdering] Add test case for Issue #61061 2023-04-01 13:27:16 +01:00
Sanjay Patel
6c7b2eef47 [PhaseOrdering] add test for vector load and cast transforms; NFC
issue #51397
2023-03-01 13:07:16 -05:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Sanjay Patel
d5f8878a6e [InstCombine] canonicalize insertelement order based on index
This puts lower insert indexes before higher. This is independent
of endian, so it requires an adjustment to a fold added with
4446f71ce392, but it makes that fold more robust.
That's also where this patch was suggested - D139668.

This matches what we already do in DAGCombiner, but there is one
more constraint because there's an existing canonicalization for
insert-of-scalar-constant. I'm not sure if that is still needed,
so it may be adjusted/removed as a follow-up.
2022-12-18 07:08:48 -05:00
Sanjay Patel
4446f71ce3 [InstCombine] try to fold a pair of insertelements into one insertelement
This replaces patches that tried to convert related patterns to shuffles
(D138872, D138873, D138874 - reverted/abandoned) but caused codegen
problems and were questionable as a canonicalization because an
insertelement is a simpler op than a shuffle.

This detects a larger pattern -- insert-of-insert -- and replaces with
another insert, so this hopefully does not cause any problems.

As noted by TODO items in the code and tests, this could go a lot further.
But this is enough to reduce the motivating test from issue #17113.

Example proofs:
https://alive2.llvm.org/ce/z/NnUv3a

I drafted a version of this for AggressiveInstCombine, but it seems that
would uncover yet another phase ordering gap. If we do generalize this to
handle the full range of potential patterns, that may be worth looking at
again.

Differential Revision: https://reviews.llvm.org/D139668
2022-12-12 10:39:58 -05:00
Sanjay Patel
05dbdb0088 Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)"
This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461.

As discussed in the planned follow-on to this patch (D138874),
this and the subsequent patches in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
2022-12-08 14:16:46 -05:00
Roman Lebedev
75d1a815c3
[NFC] Port all PhaseOrdering tests to -passes= syntax 2022-12-08 02:38:50 +03:00
Sanjay Patel
e71b81cab0 [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)
The first attempt was reverted because a clang test changed
unexpectedly - the file is already marked with a FIXME, so
I just updated it this time to pass.

Original commit message:
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue

The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.

In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask

Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC

The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.

Differential Revision: https://reviews.llvm.org/D138872
2022-11-30 14:52:20 -05:00
Sanjay Patel
5eacdcff06 Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1"
This reverts commit a4c466766db77cd1fb42d7f98f32bb87a3d38829.
This broke clang tests that are wrongly dependent on the optimizer.
2022-11-30 14:10:50 -05:00
Sanjay Patel
a4c466766d [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue

The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.

In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask

Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC

The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.

Differential Revision: https://reviews.llvm.org/D138872
2022-11-30 13:22:04 -05:00
Sanjay Patel
c7bd82dfd8 [PhaseOrdering] add test for vector load combining; NFC
This is another example from issue #17113
2022-11-28 16:00:06 -05:00
Matt Arsenault
1c55cc600e PhaseOrdering: Convert tests to opaque pointers
Required manually running update_test_checks:
  AArch64/hoisting-sinking-required-for-vectorization.ll
  AArch64/peel-multiple-unreachable-exits-for-vectorization.ll
  ARM/arm_mult_q15.ll
  X86/hoist-load-of-baseptr.ll
  X86/spurious-peeling.ll
2022-11-27 21:26:41 -05:00
Sanjay Patel
163bb6d64e [Passes][VectorCombine] enable early run generally and try load folds
An early run of VectorCombine was added with D102496 specifically to
deal with unnecessary vector ops produced with the C matrix extension.
This patch is proposing to try those folds in general and add a pair
of load folds to the menu.

The load transform will partly solve (see PhaseOrdering diffs) a
longstanding vectorization perf bug by removing redundant loads via GVN:
issue #17113

The main reason for not enabling the extra pass generally in the initial
patch was compile-time cost. The cost of VectorCombine was significantly
(surprisingly) improved with:
87debdadaf18
https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u

...so the extra run is going to cost very little now - the total cost of
the 2 runs should be less than the 1 run before that micro-optimization:
https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u

It may be possible to reduce the cost slightly more with a few more
earlier-exits like that, but it's probably in the noise based on timing
experiments.

Differential Revision: https://reviews.llvm.org/D138353
2022-11-21 13:57:55 -05:00
Roman Lebedev
8adfa29706
[Pipelines] Introduce SROA after (final, run-time) loop unrolling
Now that we are done with loop unrolling, be it either by LoopVectorizer,
or LoopUnroll passes, some variable-offset GEP's into alloca's could have
become constant-offset, thus enabling SROA and alloca promotion,
yet we don't capitalize on that, which is surprizing.

While it would be good to not introduce one more SROA invocation,
but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`,
the existing test coverage says that is a bad idea,
though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions

So instead, i add yet another SROA run.
I have checked, and it needs to be at least after said final loop unrolling.
This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions

I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D136806
2022-11-17 21:31:30 +03:00
Sanjay Patel
3682180e5f [PhaseOrdering] add test for load combining; NFC
Based on an example in issue #17113
2022-11-17 10:33:35 -05:00
Arthur Eubanks
70dc3b811e [AggressiveInstCombine] Remove legacy PM pass
As part of legacy PM optimization pipeline removal.

This shouldn't be used in codegen pipelines so it should be ok to remove.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D137116
2022-11-15 14:35:15 -08:00
Bjorn Pettersson
893e351f2f [test] Avoid legacy PM default pipelines (O0,O1 etc) when running opt
Two lit tests were found running something like this:
  opt -O<n> -pass-locked-to-legacy-PM ...

The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure
that the -atomic-expand pass were thinking that it wasn't running at
O0 level. Same thing can be ensured by using the -codegen-opt-level=1
option, making it possible to avoid using O1 in that test case.

In the vector-reductions-expanded.ll test case it was possible to
split the RUN line into using two opt invocations. First running
"opt -O2" using the new PM, and then running "opt -expand-reductions"
using the legacy PM.

I think that given this patch we get closer to removing code related
to 'AddOptimizationPasses' in opt.cpp.

Differential Revision: https://reviews.llvm.org/D137626
2022-11-09 09:57:57 +01:00
Roman Lebedev
e14f30584d
[NFC][PhaseOrdering] Add one more test for SROA after partial unroll
https://reviews.llvm.org/D136806
2022-10-27 19:10:27 +03:00
Roman Lebedev
70324cd883
[NFC][PhaseOrdering] Add new test for SROA misplacement 2022-10-27 03:11:23 +03:00
Yaxun (Sam) Liu
9d5adc7e49 Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost"
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.

Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.

Will open RFC for further discussion.

Differential Revision: https://reviews.llvm.org/D132408
2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu
bd7949bcd8 reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost
Fixed compile time increase due to always constructing LocalCostTracker.
Now only construct LocalCostTracker when needed.
2022-10-24 15:43:53 -04:00
bipmis
38f3e44997 [AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads.
This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns.

Differential Revision: https://reviews.llvm.org/D135137
2022-10-19 11:22:58 +01:00
bipmis
82e3056255 Add test for combinations of four i8-loads spliced into a 32-bit value 2022-10-18 15:40:56 +01:00
Nikita Popov
627a0c6b40 [PhaseOrdering] Name instructions in test (NFC)
Run through opt -instnamer.
2022-10-05 17:04:11 +02:00
Simon Pilgrim
5849fcb635 Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions"
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"

Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.
2022-09-30 11:22:48 +01:00