12358 Commits

Author SHA1 Message Date
Nikita Popov
fd63a7d5c8 Revert "ValueTracking: Handle freeze in computeKnownFPClass"
This reverts commit 2c8d0048f03d054f13909a26f959ef95b2a0a4de.

This is incorrect: computeKnownFPClass() is only known up to
poison, and freeze poison may have any FP class.
2023-04-17 12:59:23 +02:00
pvanhout
ae77aceba5 [Analysis] Remove DA & LegacyDA
UniformityAnalysis offers all of the same features and much more, there is no reason left to use the legacy DAs.
See RFC: https://discourse.llvm.org/t/rfc-deprecate-divergenceanalysis-legacydivergenceanalysis/69538

- Remove LegacyDivergenceAnalysis.h/.cpp
- Remove DivergenceAnalysis.h/.cpp + Unit tests
- Remove SyncDependenceAnalysis - it was not a real registered analysis and was only used by DAs
- Remove/adjust references to the passes in the docs where applicable
- Remove TTI hook associated with those passes.
- Move tests to UniformityAnalysis folder.
  - Remove RUN lines for the DA, leave only the UA ones.
- Some tests had to be adjusted/removed depending on how they used the legacy DAs.

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D148116
2023-04-17 09:01:22 +02:00
Noah Goldstein
f688d215e5 [ValueTracking] Add shl nsw %val, %cnt != 0 if %val != 0.
Alive2 Link: https://alive2.llvm.org/ce/z/mxZLJn

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147898
2023-04-14 18:23:47 -05:00
Noah Goldstein
684963b86d [ValueTracking] Use maximum shift count in shl when determining if shl can be zero.
Previously only return `shl` non-zero if the shift value was `1`. We
can expand this if we have some bounds on the shift count.

For example:
    ```
    %cnt = and %c, 16 ; Max cnt == 16
    %val = or %v, 4 ; val[2] is known one
    %shl = shl %val, %cnt ; (val.known.one << cnt.maxval) != 0
    ```

Differential Revision: https://reviews.llvm.org/D147897
2023-04-14 18:23:45 -05:00
Matt Arsenault
2c8d0048f0 ValueTracking: Handle freeze in computeKnownFPClass 2023-04-14 17:53:41 -04:00
Matt Arsenault
49b931bdc5 ValueTracking: Implement computeKnownFPClass for arithmetic.fence 2023-04-14 17:41:27 -04:00
Matt Arsenault
3dabcdc78b ValueTracking: Implement computeKnownFPClass for llvm.trunc 2023-04-14 17:41:26 -04:00
Matt Arsenault
656b52a6c6 ValueTracking: Handle non-splat vectors in computeKnownFPClass
Avoids some regressions when the implementation of isKnownNeverNaN is
replaced with computeKnownFPClass.
2023-04-14 17:41:26 -04:00
Matt Arsenault
e2d68c2fa4 ValueTracking: Implement computeKnownFPClass for canonicalize 2023-04-14 16:17:55 -04:00
Matt Arsenault
cb022084f0 ValueTracking: Handle fptrunc in computeKnownFPClass
Handle nan.
2023-04-14 14:36:56 -04:00
Matt Arsenault
a517b4ad2d InstSimplify: Perform cheaper check first 2023-04-14 14:36:56 -04:00
Matt Arsenault
409ef45000 ValueTracking: Handle extractelement and extractvalue in computeKnownFPClass 2023-04-14 14:36:56 -04:00
Matt Arsenault
c603fd2f39 ValueTracking: Implement computeKnownFPClass for sin/cos 2023-04-14 14:36:55 -04:00
Bjorn Pettersson
40c60c025c [Passes] Remove the legacy DemandedBitsWrapperPass
Last user of DemandedBitsWrapperPass was the BDCE pass. Since
the legacy PM version of BDCE was removed in an earlier commit, this
patch removes the now unused DemandedBitsWrapperPass.

Differential Revision: https://reviews.llvm.org/D148336
2023-04-14 18:56:20 +02:00
Dmitry Makogon
e08f9894ec [SCEV] Preserve NSW for AddRec multiplied by -1 if it cannot be signed minimum
This preserves NSW flag for AddRecs multiplied by -1 if we can prove
via constant ranges that the AddRec cannot be signed minimum.

An explanation:
Let M be signed minimum. If AddRec's range contains M, then M * (-1) will
stay M and (M + 1) * (-1) will be signed maximum, so we get a signed overflow.
In all other cases if an AddRec didn't signed overflow,
then AddRec * (-1) wouldn't too.

Differential Revision: https://reviews.llvm.org/D148084
2023-04-14 19:36:56 +07:00
Nikita Popov
62ef97e063 [llvm-c] Remove PassRegistry and initialization APIs
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.

Calls to these initialization functions can simply be dropped.

Differential Revision: https://reviews.llvm.org/D145043
2023-04-14 12:12:48 +02:00
Nikita Popov
0b88adacd6 [InstSimplify] Add MaxRecurse argument to simplifyInstructionWithOperands (NFC) 2023-04-14 11:19:19 +02:00
Nikita Popov
c508e93327 [InstSimplify] Remove unused ORE argument (NFC) 2023-04-14 10:38:32 +02:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Matt Arsenault
054cac104f ValueTracking: Address todo for nan fmul handling in computeKnownFPClass
If both operands can't be zero or nan, the result can't be nan.
2023-04-13 14:44:34 -04:00
Matt Arsenault
4d044bfb33 ValueTracking: Handle no-nan check for computeKnownFPClass for fmul
Copy the logic from isKnownNeverNaN for fadd/fsub. Leave the
extension to handle the zero case for a future change.
2023-04-13 14:44:34 -04:00
Simon Pilgrim
fb8038db73 [TTI] getExtendedReductionCost - replace std::optional<FastMathFlags> args with FastMathFlags
Followup to D148149 where it was noticed that the std::optional wrapper wasn't helping with anything (we can just use an empty FastMathFlags()).
2023-04-13 11:26:28 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Matt Arsenault
6aca400986 ValueTracking: Handle no-nan check for computeKnownFPClass for fadd/fsub
Copy the logic from isKnownNeverNaN for fadd/fsub.
2023-04-12 06:48:58 -04:00
Matt Arsenault
eb8e43a2a1 ValueTracking: Remove outdated todo 2023-04-12 06:48:58 -04:00
Mircea Trofin
f3b5fca12a [mlgo] Fix the help message for interactive mode default advice
This avoids the use-after-free introduced by D147794 and fixed
in 437dfa5b0365.
2023-04-11 13:04:11 -07:00
Michael Liao
72fc08a541 [InstCombine] Teach alloca replacement to handle addrspacecast
- As the address space cast may not be valid on a specific target,
  `addrspacecast` is not handled when an `alloca` is able to be replaced
  with the source of memcpy/memmove. This patch addresses that by
  querying a target hook on whether that address space cast is valid.
  For example, on most GPU targets, the cast from a global pointer to a
  generic pointer is valid.
- If that cast is allowedd (by querying `isValidAddrSpaceCast`), the
  replacement is enhanced to handle that `addrspacecast` as well.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D147025
2023-04-11 11:47:37 -04:00
John McIver
03dcd9da1a [InstCombine] Allow splats with poison/undef in llvm::decomposeBitTestICmp
This change is made to enable conversion of a masked icmp splat vector
containing poison/undef to an equality expression.

llvm::decomposeBitTestICmp Alive2 correctness examples using splat/masking vectors:

    SLT <    https://alive2.llvm.org/ce/z/pPTTHh
    SLE <=   https://alive2.llvm.org/ce/z/qQhAmU
    SGT >    https://alive2.llvm.org/ce/z/koFHzF
    SGE >=   https://alive2.llvm.org/ce/z/3SNz2S
    ULT <u   https://alive2.llvm.org/ce/z/W8ktzQ
    ULE <=u  https://alive2.llvm.org/ce/z/G5SdUY
    UGT >u   https://alive2.llvm.org/ce/z/WFwYxq
    UGE >=u  https://alive2.llvm.org/ce/z/DzJszP

Tests have been verified using Alive2:

    icmp-logical.ll: @nomask_splat_and_B_allones       https://alive2.llvm.org/ce/z/zmJwQU
    icmp-logical.ll: @nomask_splat_and_B_mixed         https://alive2.llvm.org/ce/z/ktzgzd
    signed-truncation-check.ll: @positive_vec_undef0   https://alive2.llvm.org/ce/z/-sTRLD

Differential Revision: https://reviews.llvm.org/D143032
2023-04-11 09:03:01 +01:00
Mehdi Amini
437dfa5b03 Fix use-after-free in help message: this cl::opt was binding a StringRef to a temporary string
Caught by ASAN on a bot: https://lab.llvm.org/buildbot/#/builders/168/builds/12872/steps/14/logs/stdio
2023-04-11 00:26:15 -06:00
Joshua Cao
921b8f40e8 [SCEV][NFC] GetMinTrailingZeros switch case and naming cleanup
* combine zext and sext into the one switch case
* combine vscale and udiv into one switch case
* renames according to LLVM style
2023-04-10 22:56:29 -07:00
Joshua Cao
898a9ca5e9 [SCEV] Strengthen huge constant trip multiples.
SCEV determines that loops with trip count >=2^32 have a trip multiple
of 1 to guard against huge multiples. This patch stregthens this to
instead find the greatest power of 2 divisor that is less than the
threshold.

Differential Revision: https://reviews.llvm.org/D147868
2023-04-10 20:00:46 -07:00
Joshua Cao
569f7e547d [SCEV][NFC] Convert check to assert getSmallConstantTripMultiple() 2023-04-10 19:59:01 -07:00
Joshua Cao
585742cbfc [SCEV] When computing trip count, only zext if necessary
This patch improves on https://reviews.llvm.org/D110587. To summarize
the patch, given backedge-taken count BC, trip count TC is `BC + 1`.
However, we don't know if BC we might overflow. So the patch modifies TC
computation to `1 + zext(BC)`.

This patch only adds the zext if necessary by looking at the constant
range. If we can determine that BC cannot be the max value for its
bitwidth, then we know adding 1 will not overflow, and the zext is not
needed. We apply loop guards before computing TC to get more data.

The primary motivation is to support my work on more precise trip
multiples in https://reviews.llvm.org/D141823. For example:

```
void test(unsigned n)
  __builtin_assume(n % 6 == 0);
  for (unsigned i = 0; i < n; ++i)
    foo();
```

Prior to this patch, we had `TC = 1 + zext(-1 + 6 * ((6 umax %n) /u
6))<nuw>`. SCEV range computation is able to determine that the BC
cannot be the max value, so the zext is not needed. The result is `TC
-> (6 * ((6 umax %n) /u 6))<nuw>`. From here, we would be able to
determine that %n is a multiple of 6.

There was one change in LoopCacheAnalysis/LoopInterchange required.
Before this patch, if a loop has BC = false, it would compute `TC -> 1 +
zext(false) -> 1`, which was fine. After this patch, it computes `TC -> 1
+ false = true`. CacheAnalysis would then sign extend the `true`, which
was not the intended the behavior. I modified CacheAnalysis such that
it would only zero extend trip counts.

This patch is not NFC, but also does not change any SCEV outputs. I
would like to get this patch out first to make work with trip multiples
easier.

Differential Revision: https://reviews.llvm.org/D147117
2023-04-10 19:40:52 -07:00
Mircea Trofin
ab2e7666c2 [mlgo][inl] Interactive mode: optionally tell the default decision
This helps training algorithms that may want to sometimes replicate the
default decision. The default decision is presented as an extra feature
called `inlining_default`. It's not normally exported to save
computation time.

This is only available in interactive mode.

Differential Revision: https://reviews.llvm.org/D147794
2023-04-10 12:20:09 -07:00
Max Kazantsev
5b96b13fdf [SCEV] Improve AddRecs' range computation in Expensive Range Sharpening mode
Apply loop guards to AddRec's start in range computation for
non-self-wrapping AddRecs.

According to CT measurements, this has a wide negative compile time impact,
so we hold it in expensive range sharpening mode where it's not so critical.
However, we need to find a way to share benefits of this mode with default mode.

Patch by Aleksandr Popov!

Differential Revision: https://reviews.llvm.org/D147557
Reviewed By: mkazantsev
2023-04-10 16:37:10 +07:00
Joshua Cao
24170fb8cd [SCEV][NFC] Fix Do not use 'else' after 'return'
Follow LLVM coding standards and make clangd emit less warnings.
2023-04-08 15:56:08 -07:00
Philip Reames
0437f88b77 [LAA] Cleanup casting in replaceSymbolicStrideSCEV [nfc] 2023-04-06 09:13:55 -07:00
Philip Reames
2d79b71366 [LAA] Continue moving utilities to sole use to isolate symbolic stride reasoning [nfc] 2023-04-06 08:27:57 -07:00
Dávid Bolvanský
e1f94336e9 Revert "[InlineCost] isKnownNonNullInCallee - handle also dereferenceable attribute"
This reverts commit 3b5ff3a67c1f0450a100dca34d899ecd3744cb36.
2023-04-06 16:54:26 +02:00
Dávid Bolvanský
3b5ff3a67c [InlineCost] isKnownNonNullInCallee - handle also dereferenceable attribute 2023-04-06 16:51:28 +02:00
Philip Reames
800a99c4f4 [LAA] Group implementation of stride speculation into one file [nfc]
These utilities are only used in one place, so move them there and make them static.
2023-04-05 20:39:08 -07:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
David Sherwood
b4089cfa2f [NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.

Differential Revision: https://reviews.llvm.org/D146127
2023-04-04 14:00:49 +00:00
Craig Topper
1f60c8d025 [IR] Replace calls to ConstantFP::getNullValue with ConstantFP::getZero. NFC
There is no getNullValue in ConstantFP. Due to inheritance, we're calling
Constant::getNullValue which handles any type including FP.
Since we already know we want an FP constant we can use ConstantFP::getZero
which might be faster and is a more readable name for an FP zero.
2023-04-03 23:14:02 -07:00
Noah Goldstein
87c97d052c [InstSimplify] Extend simplifications for (icmp ({z|s}ext X), C) where C is vector
Previous logic only applied for `ConstantInt` which misses all vector
cases. New code works for splat/non-splat vectors as well. No change
to the underlying simplifications.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147275
2023-04-03 11:04:57 -05:00
Florian Hahn
0d61ffd350
[Loads] Support SCEVAddExpr as start for pointer AddRec.
Extend handling to support `%base + offset` as start for AddRecs in
isDereferenceableAndAlignedInLoop. This is done by adjusting AccessSize
by the offset and effectively checking if the full object starting from
%base to %base + offset + access-size is dereferenceable.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147260
2023-04-02 12:33:44 +01:00
Nikita Popov
3f53a58597 [ValueTracking] Fix incorrect computeConstantRange() arguments
The second argument is ForSigned, not UseInstrInfo.
2023-03-31 16:56:56 +02:00
David Green
965a090f02 Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides"
Multiple errors have being reported on
https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312

Reverting until the correctness issues can be resolved.

We are also seeing a lot of performance differences from the patch.  Some are
looking good, but some are looking pretty bad.
2023-03-31 11:08:50 +01:00
Philip Reames
498aa534f4 [IVDescriptors] Add pointer InductionDescriptors with non-constant strides
This matches the handling for integer IVs.  I left the non-opaque cases alone, mostly because they're largely irrelevant today.

This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides).  Slightly suprisingly, it's the case which *do* need runtime checks which work after this patch as they don't use the same dependency analysis path.

This will also enable non-constant stride pointer recurrences for other consumers.  I've auditted said code, and don't see any obvious issues.
2023-03-30 11:56:00 -07:00
Kazu Hirata
236c9217a9 Use Dense{Map,Set}::contains (NFC) 2023-03-29 23:01:11 -07:00