11510 Commits

Author SHA1 Message Date
Nikita Popov
d77f944832 [LoopInfo] Add getOutermostLoop() (NFC)
This is a recurring pattern, add an API function for it.
2022-06-10 11:48:21 +02:00
Philip Reames
f85c5079b8 Pipe potentially invalid InstructionCost through CodeMetrics
Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred.

On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost.

I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change.

Differential Revision: https://reviews.llvm.org/D127131
2022-06-09 15:17:24 -07:00
Florian Hahn
20d798bd47
Recommit "[SCEV] Look through single value PHIs." (take 3)
This reverts commit 1fbdbb559569641f6d509b569966901c8fb02b63.

All known issues surfaced by this patch should have been fixed now.

The fixes included fixing issues with SCEV expansion in LV and DA's
reliance on LCSSA phis.
2022-06-09 15:20:10 +01:00
Simon Moll
b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Mircea Trofin
b8c39eb275 Fix FunctionPropertiesAnalysis updating callsite in 1-BB loop
If the callsite is in a single BB loop, we need to exclude the BB from
the successor set (in which it'd be a member), because that set forms a
boundary at which we stop traversing the CFG, when re-ingesting BBs
after inlining; but after inlining, the callsite BB's new successors
should be visited.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D127178
2022-06-08 14:32:00 -07:00
Jin Xin Ng
a3a7826d82 [mlgo] Disable accounting upon ForceStop
Once ForceStop is set to true, we only return positive inlining advice when it is mandatory; There is no need for further node/edge accounting.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D127245
2022-06-08 14:26:06 -07:00
Bardia Mahjour
c9677f6db4 [DA] Handle mismatching loop levels by considering them non-linear
To represent various loop levels within a nest, DA implements a special
numbering scheme (see comment atop establishNestingLevels). The goal of
this numbering scheme appears to be representing each unique loop
distinctively by using as little memory as possible. This numbering
scheme is simple when the source and destination of the dependence are
in the same loop. In such cases the level is simply the depth of the
loop in which src and dst reside. When the src and dst are not in the
same loop, we could run into the following situation exposed by
https://reviews.llvm.org/D71539. This patch fixes this by detecting
such cases in checkSubscripts and treating them as non-linear/non-affine.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110973
2022-06-08 11:15:37 -04:00
Chuanqi Xu
0e10f12844 [NFC] Remove commented cerr debugging loggings
There are some unused cerr debugging loggings in the codes. It is weird
to remain such commented debug helpers in the product.
2022-06-08 15:58:06 +08:00
Philip Reames
8a0cd23326 Revert "[MemDep][NFCI] Remove redundant dyn_cast, replace with cast"
This reverts commit 180d3f251d1ad5473705d3f00e6d426b5f8162e6.  This commit is simply wrong.  IsLoad is set within the same file based on modref state, not whether the instruction is a LoadInst.

This went uncaught because cast<Ty>(X) has been broken.  See https://discourse.llvm.org/t/cast-x-is-broken-implications-and-proposal-to-address/63033 for context.
2022-06-07 13:21:31 -07:00
William Huang
ba26e45ca9 [ValueTracking] Add support to deduce a PHI node being a power of 2 if each incoming value is a power of 2.
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D124889
2022-06-07 18:52:31 +00:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Fangrui Song
36c7d79dc4 Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
2022-06-04 00:10:42 -07:00
Fangrui Song
557efc9a8b [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.
2022-06-03 21:59:05 -07:00
Max Kazantsev
8555e59a71 [NFC][MemDep] Remove unnecessary Worklist.clear
This execution path leads to return 'false' where the Worklist
will be deallocated anyways. No need to clear it separately.
2022-06-03 12:31:44 +07:00
Florian Hahn
78c6b1488f
[CaptureTracking] Increase limit and use it for all visited uses.
Currently the MaxUsesToExplore limit only applies to the number of users
per value, not the total number of users to explore.

The current limit of 20 pessimizes IR with opaque pointers in some
cases. Without opaque pointers, we have deeper pointer def-use chains in
general due to extra bitcasts and geps for structs with index 0.

With opaque pointers the def-use chain is not as deep but wider, due to
bitcasts & 0-geps missing.

To improve the situation for opaque pointers, this patch does 2 things:

 1. Apply the limit to the total number of uses visited. From the
    wording in the description of the option it seems like this may be
    the original intention. With the current implementation we could
    still end up walking a lot of uses.
 2. Increase the limit to 100. This is quite arbitrary, but enables
    a good number of additional optimizations.

Those adjustments have a noticeable compile-time impact though. In part
that is likely due to additional transformations (and conversely
the current baseline misses optimizations after switching to opaque
pointers).

This recovers some regressions that showed up after enabling opaque
pointers.

Limit=100:

* NewPM-O3: +0.21%
* NewPM-ReleaseThinLTO: +0.87%
* NewPM-ReleaseLTO-g: +0.46%

https://llvm-compile-time-tracker.com/compare.php?from=2e50ecb2ef4e1da1aeab05bcf66380068e680991&to=7e6fbe519d958d09f32f01d5d44a622f551e2031&stat=instructions

Limit=60:

* NewPM-O3: +0.14%
* NewPM-ReleaseThinLTO: +0.41%
* NewPM-ReleaseLTO-g: +0.21%

https://llvm-compile-time-tracker.com/compare.php?from=aeb19817d66f1a15754163c7f48e01e9ebdd6d45&to=520563fdc146319aae90d06f88d87f2e9e1247b7&stat=instructions

Limit=40:
* NewPM-O3: +0.11%
* NewPM-ReleaseThinLTO: +0.12%
* NewPM-ReleaseLTO-g: +0.09%

https://llvm-compile-time-tracker.com/compare.php?from=aeb19817d66f1a15754163c7f48e01e9ebdd6d45&to=c9182576e9fe3f1c84a71479665aef91a416318c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D126236
2022-06-02 21:43:58 +01:00
Mingming Liu
8601f269f1 [Inline][Remark][NFC] Optionally provide inline context to inline
advisor.

This patch has no functional change, and merely a preparation patch for
main functional change. The motivating use case is to annotate inline
remark pass name with context information (e.g. prelink or postlink,
CGSCC or always-inliner), see D125495 for more details.

Differential Revision: https://reviews.llvm.org/D126824
2022-06-02 13:14:30 -07:00
Alexander Kornienko
7aa8a67882 Revert "[LAA] Initial support for runtime checks with pointer selects."
This reverts commit 5890b30105999a137e72e42f3760bebfd77001ca as per discussion
on the review thread: https://reviews.llvm.org/D114487#3547560.
2022-06-01 15:24:27 +02:00
Nikita Popov
03aceab08b [ValueTracking] Enable -branch-on-poison-as-ub by default
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.

Differential Revision: https://reviews.llvm.org/D125299
2022-06-01 10:46:06 +02:00
Mircea Trofin
f46dd19b48 [mlgo] Incrementally update FunctionPropertiesInfo during inlining
Re-computing FunctionPropertiesInfo after each inlining may be very time
consuming: in certain cases, e.g. large caller with lots of callsites,
and when the overall IR doesn't increase (thus not tripping a size bloat
threshold).

This patch addresses this by incrementally updating
FunctionPropertiesInfo.

Differential Revision: https://reviews.llvm.org/D125841
2022-05-31 17:27:32 -07:00
Mehdi Amini
aff271930e Fix warning for unused variable in the non-assert build (NFC) 2022-05-30 16:21:38 +00:00
Simon Moll
18c1ee04de Re-land "[VP] vp intrinsics are not speculatable" with test fix
Update the llvmir-intrinsics.mlir test to account for the modified
attribute sets.

This reverts commit 2e2a8a2d9082250e4aad312c6008a526f2b007c7.
2022-05-30 14:41:15 +02:00
Mehdi Amini
2e2a8a2d90 Revert "[VP] vp intrinsics are not speculatable"
This reverts commit 78a18d2b54e7e8e0e2c1d1cb33d015d7f69b8cc7.

Break MLIR bot: https://lab.llvm.org/buildbot/#/builders/61/builds/27127
2022-05-30 12:26:16 +00:00
Max Kazantsev
7e5a730473 [MemDep][NFC] Remove duplicating check in if and else branch
Same check is done whether the condition is true or false. Just hoist
it out of conditional.
2022-05-30 17:43:00 +07:00
Simon Moll
78a18d2b54 [VP] vp intrinsics are not speculatable
VP intrinsics show UB if the %evl parameter is out of bounds - they must
not carry the speculatable attribute.  The out-of-bounds UB disappears
when the %evl parameter is expanded into the mask or expansion replaces
the entire VP intrinsic with non-VP code.

This patch
- Removes the speculatable attribute on all VP intrinsics.
- Generalizes the isSafeToSpeculativelyExecute function to let VP
  expansion know whether the VP intrinsic replacement will be
  speculatable.  VP expansion may only discard %evl where this is the
  case.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D125296
2022-05-30 12:20:05 +02:00
Max Kazantsev
180d3f251d [MemDep][NFCI] Remove redundant dyn_cast, replace with cast
When `IsLoad` is `true`, we don't need to check if the instruction
is actually a load with dyn_cast. Saves some petty amount of CT.
2022-05-30 17:17:55 +07:00
eopXD
6a84579243 [LSR][TTI][PowerPC][SystemZ][X86] Add const-ness to TTI::isLSRCostLess. NFC
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D126350
2022-05-27 15:22:23 -07:00
William Huang
35b0955aa5 [ValueTracking] Added support to deduce PHI Nodes values being a power of 2
Add Value Tracking support to deduce induction variable being a power of 2, allowing urem optimizations

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D126018
2022-05-26 20:30:31 +00:00
Florian Hahn
6af5f5697c
[SCEV] Collect conditions from assumes same way as for branches.
Also collect conditions from assume up-front in applyLoopGuards.
This allows re-using the logic to handle logical ANDs as assume
conditions.

It should should pave the road for a fix for #55645.
2022-05-26 18:17:13 +01:00
serge-sans-paille
fb67d683db [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since 7030654296a0416bd9402a0278 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D126417
2022-05-26 08:12:34 +02:00
Nikita Popov
8a6698b523 [ValueTracking] Loads with !dereferenceable metadata cannot be undef/poison
A load with !dereferenceable or !dereferenceable_or_null metadata
must return a well-defined (non-undef/poison) value. Effectively
they imply !noundef. This is the same as we do for the
dereferenceable(N) attribute.

This should fix https://github.com/llvm/llvm-project/issues/55672,
or at least the specific case discussed there.

Differential Revision: https://reviews.llvm.org/D126296
2022-05-25 09:54:04 +02:00
Sanjay Patel
e8c20d995b [IR] add and use pattern match specialization for sqrt intrinsic; NFC
This was included in D126190 originally, but it's
independent and a useful change for readability.
2022-05-23 14:16:30 -04:00
Jingu Kang
bb82f74612 Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.

The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build
failure.

Differential Revision: https://reviews.llvm.org/D118979
2022-05-23 16:15:45 +01:00
Peter Waller
ade47bdc31 [LV] Improve register pressure estimate at high VFs
Previously, `getRegUsageForType` was implemented using
`getTypeLegalizationCost`.  `getRegUsageForType` is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type.  However, `getTypeLegalizationCost` currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use `getNumRegisters`, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT.  I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Differential Revision: https://reviews.llvm.org/D125918
2022-05-23 07:57:45 +00:00
Nikita Popov
c8b675eaa1 [SCEV] Use umin_seq for BECount of multi-exit loops
When computing the BECount for multi-exit loops, we need to combine
individual exit counts using umin_seq rather than umin. This is
because an earlier exit may exit on the first iteration, in which
case later exit expressions will not be evaluated and could be
poisonous. We cannot propagate potential poison values from later
exits.

In particular, this avoids the introduction of "branch on poison"
UB when optimizing multi-exit loops.

Differential Revision: https://reviews.llvm.org/D124910
2022-05-21 15:48:14 +02:00
Craig Topper
f2df53b750 [InstructionSimplify] Remove multiple 'break' after 'return'. NFC 2022-05-20 10:23:57 -07:00
Nico Weber
304a5a7a14 Revert "[ValueTracking] Added support to deduce PHI Nodes values being a power of 2"
This reverts commit d5c130f17e503e128b8a413c2ce0e522987d2a16.
Breaks tests, see https://reviews.llvm.org/D125332#3525819
2022-05-19 15:05:30 -04:00
William Huang
d5c130f17e [ValueTracking] Added support to deduce PHI Nodes values being a power of 2
Add Value Tracking support to deduce induction variable being a power of 2, allowing urem optimizations

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D125332
2022-05-19 18:39:13 +00:00
Jay Foad
6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Philip Reames
f7988d08a8 Revert "[BasicAA] Remove unneeded special case for malloc/calloc"
This reverts commit 9b1e00738c5ddba681e17e5cb7c260d9afc4c3a7.

Nikic reported in commit thread that I had forgotten history here, and that a) we'd tried this before, and b) had to revert due to an unexpected codegen impact.  Current measurements confirm the same issue still exists.
2022-05-18 07:35:27 -07:00
NAKAMURA Takumi
6ca7eb2c6d [SCEV] Part 1, Serialize function calls in function arguments.
Evaluation odering in function call arguments is implementation-dependent.
In fact, gcc evaluates bottom-top and clang does top-bottom.

Fixes #55283 partially.

Part of https://reviews.llvm.org/D125627
2022-05-18 23:20:08 +09:00
Philip Reames
9b1e00738c [BasicAA] Remove unneeded special case for malloc/calloc
This code pre-exists the generic handling for inaccessiblememonly.  If we remove it and update one test with inaccessiblememonly, nothing else changes.  Note that simply running O1 on that test would annotate malloc with the missing inaccessiblememonly.
2022-05-17 20:45:14 -07:00
Nikita Popov
b9b71c2b87 [LVI] Compute range for xor
We do have a non-trivial implementation for binaryXor() now.
2022-05-17 10:18:38 +02:00
Yang Keao
7dce9eb6e5 [DomPrinter] Migrate -dot-dom to the new pass manager.
In D123677, @YangKeao provided an implementation of `DOTGraphTraits{Viewer,Printer}` in the new pass manager. This commit migrates the `DomPrinter` and `DomViewer` to the new pass manager.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D124904
2022-05-16 15:07:16 -05:00
Nikita Popov
356d47ccb9 [ValueTracking] Handle and/or on RHS of isImpliedCondition()
isImpliedCondition() currently handles and/or on the LHS, but not
on the RHS, resulting in asymmetric behavior. This patch adds two
new implication rules:

 * LHS ==> (RHS1 || RHS2) if LHS ==> RHS1 or LHS ==> RHS2
 * LHS ==> !(RHS1 && RHS2) if LHS ==> !RHS1 or LHS ==> !RHS2

Differential Revision: https://reviews.llvm.org/D125551
2022-05-16 16:30:26 +02:00
Florian Hahn
b7315ffc3c
[LAA,LV] Add initial support for pointer-diff memory checks.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.

The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.

Note that the initial version is restricted in multiple ways:

1. Pointers must only either be read or written, by a single
   instruction (this allows re-constructing source/sink for
   dependences with the available information)
 2. Source and sink pointers must be add-recs, with matching steps
 3. The step must be a constant.
 3. abs(step) == AccessSize.

Most of those restrictions can be relaxed in the future.

See https://github.com/llvm/llvm-project/issues/53590.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D119078
2022-05-16 15:27:22 +01:00
NAKAMURA Takumi
da7d8de1e4 ScalarEvolution.cpp: Reformat. 2022-05-15 20:51:27 +09:00
Sanjay Patel
ee6754c277 [ValueTracking] recognize sub X, (X % Y) as not overflowing
I fixed some poison-safety violations on related patterns in InstCombine
and noticed that we missed adding nsw/nuw on them, so this adds clauses
to the underlying analysis for that.

We need the undef input restriction to make this safe according to Alive2:
https://alive2.llvm.org/ce/z/48g9K8

Differential Revision: https://reviews.llvm.org/D125500
2022-05-13 09:59:41 -04:00
Nikita Popov
ddfee07519 [InstSimplify] Fold and/or using implied conditions
This adds two conjugated folds:

 * A | B -> B if A implies B (https://alive2.llvm.org/ce/z/R6GU4j)
 * A & B -> A if A implies B (https://alive2.llvm.org/ce/z/EGMqyy)

If A and B are icmps themselves, we will usually fold this through
other logic already (though the tests show a couple additional cases
we previously missed). However, isImpliedCond() also supports A
being of the form X & Y, which allows us to handle cases like
(X & Y) | B where X implies B. This addresses the regression from
D125398.

Something that notably doesn't work yet is the (X | Y) & B case.
This is due to an asymmetry in the isImpliedCondition()
implementation that will have to be addressed separately.

Differential Revision: https://reviews.llvm.org/D125530
2022-05-13 15:09:14 +02:00
Florian Hahn
5890b30105
[LAA] Initial support for runtime checks with pointer selects.
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.

The more sophisticated logic for analyzing forks is in D108699

Reviewed By: huntergr

Differential Revision: https://reviews.llvm.org/D114487
2022-05-12 19:33:48 +01:00
Arthur Eubanks
7e0802aeb5 [BasicAA] Fix order in which we pass MemoryLocations to alias()
D98718 caused the order of Values/MemoryLocations we pass to alias() to
be significant due to storing the offset in the PartialAlias case. But
some callers weren't audited and were still passing swapped arguments,
causing the returned PartialAlias offset to be negative in some
cases. For example, the newly added unittests would return -1
instead of 1.

Fixes #55343, a miscompile.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D125328
2022-05-10 12:05:38 -07:00