864 Commits

Author SHA1 Message Date
Sanjay Patel
c5a4d80fd4 [ValueTracking][MemCpyOpt] avoid crash on inttoptr with vector pointer type (PR48075) 2020-11-22 12:54:18 -05:00
Kazu Hirata
226beb494c [Analysis] Use llvm::is_contained (NFC) 2020-11-20 18:08:05 -08:00
Hongtao Yu
f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Simon Pilgrim
fceaff41d6 [ValueTracking] computeKnownBitsFromShiftOperator - move shift amount analysis to top of the function. NFCI.
These are all lightweight to compute and helps avoid issues with Known being used to hold both the shift amount and then the shifted result.

Minor cleanup for D90479.
2020-11-19 13:50:49 +00:00
Nikita Popov
9a85643cd3 [KnownBits] Combine abs() implementations
ValueTracking was using a more powerful abs() implementation. Roll
it into KnownBits::abs(). Also add an exhaustive test for abs(),
in both the poisoning and non-poisoning variants.
2020-11-13 22:23:50 +01:00
Nikita Popov
92b708902e [ValueTracking] Don't set nsw flag for inbounds addition
When computing the known bits for a GEP, don't set the nsw flag
when adding an offset to an address. The nsw flag only applies to
pure offset additions (see also D90708).

The nsw flag is only used in a very minor way by the code, to the
point that I was not able to come up with a test case where it
makes a difference.

Differential Revision: https://reviews.llvm.org/D90637
2020-11-13 17:58:21 +01:00
Simon Pilgrim
49623fa77a [ValueTracking] computeKnownBitsFromShiftOperator use KnownBits direct for constant shift amounts.
Let KnownBits shift handlers deal with out-of-range shift amounts.
2020-11-13 10:54:35 +00:00
Simon Pilgrim
f72d350bfb [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI.
We were creating this internally, but will need to support general KnownBits amounts as part of D90479.
2020-11-12 16:56:55 +00:00
Simon Pilgrim
8996742741 [KnownBits] Add KnownBits::makeConstant helper. NFCI.
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
2020-11-12 16:16:04 +00:00
Simon Pilgrim
11c106544b [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use KnownBits shift handling. NFCI. 2020-11-12 15:31:26 +00:00
Simon Pilgrim
f6a326adef [ValueTracking] computeKnownBitsFromShiftOperator - merge zero/one callbacks to single KnownBits callback. NFCI.
Another cleanup for D90479 - handle the Known Ones/Zeros in a single callback, which will make it much easier to jump over to the KnownBits shift handling.
2020-11-11 14:22:42 +00:00
Simon Pilgrim
1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Simon Pilgrim
46a734621d [ValueTracking] computeKnownBitsFromShiftOperator - always return with Known2 containing the shifted value source. NFCI.
As detailed on D90479, in most circumstances we will always call computeKnownBits for Op0, so always perform this by pulling out the duplicate calls.
2020-11-10 17:03:17 +00:00
Simon Pilgrim
929a127932 [ValueTracking] computeKnownBitsFromShiftOperator - consistently use Known2 for the shifted value. NFCI.
Minor cleanup as part of getting D90479 moving again.
2020-11-10 17:03:17 +00:00
Simon Pilgrim
20f87d82ed [InstCombine] computeKnownBitsMul - use KnownBits::isNonZero() helper.
Avoid an expensive isKnownNonZero() call - this is a small cleanup before moving the extra NSW functionality from computeKnownBitsMul into KnownBits::computeForMul.
2020-11-06 17:27:13 +00:00
Simon Pilgrim
6729b6de1f [KnownBits] Move ValueTracking SREM KnownBits handling to KnownBits::srem. NFCI.
Move the ValueTracking implementation to KnownBits, the SelectionDAG version is more limited so I'm intending to replace that as a separate commit.
2020-11-05 14:58:33 +00:00
Simon Pilgrim
e237d56b43 [KnownBits] Move ValueTracking/SelectionDAG UREM KnownBits handling to KnownBits::urem. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 14:30:59 +00:00
Simon Pilgrim
32bee18b84 [KnownBits] Move ValueTracking/SelectionDAG UDIV KnownBits handling to KnownBits::udiv. NFCI.
Both these have the same implementation - so move them to a single KnownBits copy.

GlobalISel will be able to use this as well with minimal effort.
2020-11-05 13:42:42 +00:00
Florian Hahn
799033d8c5 Reland "[SLP] Consider alternatives for cost of select instructions."
This reverts the revert commit a1b53db32418cb6ed6f5b2054d15a22b5aa3aeb9.

This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.

For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
2020-10-31 16:52:36 +00:00
Florian Hahn
a1b53db324 Revert "[SLP] Consider alternatives for cost of select instructions."
This reverts commit 19225704890632cd2552f41ada41600a20db1371.

This appears to cause a crash in the following example

 a, b, c;
 l() {
   int e = a, f = l, g, h, i, j;
   float *d = c, *k = b;
   for (;;)
     for (; g < f; g++) {
       k[h] = d[i];
       k[h - 1] = d[j];
       h += e << 1;
       i += e;
     }
 }

 clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c

 llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
2020-10-30 21:26:14 +00:00
Florian Hahn
1922570489 [SLP] Consider alternatives for cost of select instructions.
Some architectures do not have general vector select instructions (e.g.
AArch64). But some cmp/select patterns can be vectorized using other
instructions/intrinsics.

One example is using min/max instructions for certain patterns.

This patch updates the cost calculations for selects in the SLP
vectorizer to consider using min/max intrinsics.

This patch does not change SLP vectorizer's codegen itself to actually
generate those intrinsics, but relies on the backends to lower the
vector cmps & selects. This keeps things simple on the SLP side and
works well in practice for AArch64.

This exposes additional SLP vectorization opportunities in some
benchmarks on AArch64 (-O3 -flto).

Metric: SLP.NumVectorInstructions

Program                                        base    slp     diff
 test-suite...ications/JM/ldecod/ldecod.test   502.00  697.00  38.8%
 test-suite...ications/JM/lencod/lencod.test   1023.00 1414.00 38.2%
 test-suite...-typeset/consumer-typeset.test    56.00   65.00  16.1%
 test-suite...6/464.h264ref/464.h264ref.test   804.00  822.00   2.2%
 test-suite...006/453.povray/453.povray.test   3335.00 3357.00  0.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00 2121.00  0.5%
 test-suite...:: External/Povray/povray.test   2378.00 2382.00  0.2%

Reviewed By: RKSimon, samparker

Differential Revision: https://reviews.llvm.org/D89969
2020-10-29 20:39:50 +00:00
Alex Richardson
d323c8f791 [ValueTracking][NFC] Use Log2(Align) instead of countTrailingZeroes
The latter can probably be optimized to the same final code, but this might
help -O0 builds.
2020-10-27 12:16:45 +00:00
Shimin Cui
22e4346e05 [ValueTracking] Add tracking of the alignment assume bundle
This patch is to add the support of the value tracking of the alignment assume bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88669
2020-10-27 12:16:45 +00:00
Sanjay Patel
c72198079d [ValueTracking] add range limits for cttz
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of cttz to process any "icmp pred cttz(X), C" pattern (the
min value is initialized to zero automatically).

https://alive2.llvm.org/ce/z/Z_SLWZ

Follow-up to D89976.
2020-10-23 08:43:45 -04:00
Sanjay Patel
3fb0d6b0d5 [ValueTracking] add range limits for ctlz
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctlz to process any "icmp pred ctlz(X), C" pattern (the
min value is initialized to zero automatically).

Follow-up to D89976.
2020-10-23 08:43:45 -04:00
Sanjay Patel
748ecc6b32 [ValueTracking] add range limits for ctpop
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of ctpop to process any "icmp pred ctpop(X), C" pattern (the
min value is initialized to zero automatically).

Differential Revision: https://reviews.llvm.org/D89976
2020-10-23 08:17:54 -04:00
Quentin Colombet
ee6abef532 [ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor
Prior to this patch, computeKnownBits would only try to deduce trailing zeros
bits for getelementptrs. This patch adds the logic to treat geps as a series
of add * scaling factor.

Thanks to this patch, using a gep or performing an address computation
directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr)
offers the same computeKnownBits information.

Previously, the "by hand" approach would have given more information.

This is related to https://llvm.org/PR47241.

Differential Revision: https://reviews.llvm.org/D86364
2020-10-21 15:07:04 -07:00
Juneyoung Lee
62a0ec1612 Add support for !noundef metatdata on loads
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89050
2020-10-17 13:50:10 +09:00
Cullen Rhodes
fbd62fe60f [ValueTracking] Clarify TypeSize comparisons
TypeSize comparisons using overloaded operators should be replaced by
the new isKnownXY comparators when the operands can be fixed-length or
scalable vectors.

In ValueTracking there are several uses of the overloaded operators in
`isKnownNonZero` and `ComputeMultiple`. In the former we already bail
out on scalable vectors since we currently have no way to represent
DemandedElts, and the latter is operating on scalar integers, so we can
assume fixed-size in both instances.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D89387
2020-10-16 10:31:12 +00:00
Juneyoung Lee
9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Craig Topper
9e72d3eaf3 [ValueTracking] Use KnownBits::countMaxLeadingZeros/countMaxTrailingZeros to make code more readable. NFC 2020-10-11 14:26:18 -07:00
Quentin Colombet
9431f8ad2e [KnownBits] Add a computeForMul method
This patch refactors the logic in ValueTracking.cpp so that
computeKnownBitsForMul now uses a helper function from KnownBits.

NFC

Differential Revision: https://reviews.llvm.org/D88935
2020-10-08 11:33:06 -07:00
Simon Pilgrim
2cd7b0e130 [ValueTracking] canCreateUndefOrPoison - use APInt to check bounds instead of getZExtValue().
Fixes OSS Fuzz #26135
2020-10-05 13:45:27 +01:00
Nikita Popov
ac8a51c701 [ValueTracking] Early exit known non zero for phis
After D88276 we no longer expect computeKnownBits() to prove
non-zeroness for cases where isKnownNonZero() can't, so don't
fall through to it.
2020-09-29 21:07:36 +02:00
Serguei Katkov
297ec61130 [IsKnownNonZero] Handle the case with non-constant phi nodes
Handle the case when all inputs of phi are proven to be non zero.

Constants are checked in beginning of this method before check for depth of recursion,
so it is a partial case of non-constant phi.

Recursion depth is already handled by the function.

Reviewers: aqjune, nikic, efriedma
Reviewed By: nikic
Subscribers: dantrushin, hiraditya, jdoerfert, llvm-commits
Differential Revision: https://reviews.llvm.org/D88276
2020-09-29 15:22:10 +07:00
Juneyoung Lee
ba8911d560 [ValueTracking] Fix analyses to update CxtI to be phi's incoming edges' terminators
It was mentioned that D88276 that when a phi node is visited, terminators at their incoming edges should be used for CtxI.
This is a patch that makes two functions (ComputeNumSignBitsImpl, isGuaranteedNotToBeUndefOrPoison) to do so.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D88360
2020-09-28 23:24:20 +09:00
Sanjay Patel
645c53a9d9 [ValueTracking] enhance isKnownNeverInfinity to understand sitofp
As discussed in D87877, instcombine already has this fold,
but it was missing from the more general ValueTracking logic.

https://alive2.llvm.org/ce/z/PumYZP
2020-09-27 08:40:31 -04:00
Juneyoung Lee
92106641ae [ValueTracking] Make isGuaranteedNotToBeUndefOrPoison exit early when MetadataAsValue is given
It is set to conservatively return false, otherwise noundef attributes are added to function calls with metadata arguments.
2020-09-25 09:50:09 +09:00
Juneyoung Lee
1c45220028 [ValueTracking] Check uses of Argument if it is given to isGuaranteedNotToBeUndefOrPoison
This is a patch that allows isGuaranteedNotToBeUndefOrPoison to return more precise result
when an argument is given, by looking through its uses at the entry block (and following blocks as well, if it is checking poison only).

This is useful when there is a function call with noundef arguments at the entry block.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D88207
2020-09-25 08:57:57 +09:00
Juneyoung Lee
a6183d0f02 [ValueTracking] isKnownNonZero, computeKnownBits for freeze
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.

```
  br (x != 0), BB1, BB2
BB1:
  y = freeze x
```

In the above program, we can say that y is non-zero. The reason is as follows:

(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D75808
2020-09-10 08:07:38 +09:00
Juneyoung Lee
25ce1e0497 [ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.

isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84242
2020-09-09 20:00:26 +09:00
Nikita Popov
8453fbf088 [ValueTracking] Compute known bits of min/max intrinsics
Implement known bits for the min/max intrinsics based on the
recently added KnownBits primitives.
2020-09-08 21:08:17 +02:00
Jay Foad
5350e1b509 [KnownBits] Implement accurate unsigned and signed max and min
Use the new implementation in ValueTracking, SelectionDAG and
GlobalISel.

Differential Revision: https://reviews.llvm.org/D87034
2020-09-07 09:09:01 +01:00
Nikita Popov
b536cbaac5 [ValueTracking] Avoid known bits fallback for non-zero get check (NFCI)
The known bits fall back will never be able to infer a non-null
value here, so don't bother.
2020-09-06 23:16:38 +02:00
Eli Friedman
96ef6998df [InstCombine] Fix a couple crashes with extractelement on a scalable vector.
Differential Revision: https://reviews.llvm.org/D86989
2020-09-02 18:02:07 -07:00
David Sherwood
f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Martin Storsjö
db1ec04963 [ValueTracking] Remove a stray semicolon. NFC.
This silences warnings when built with GCC at least.
2020-08-28 09:24:10 +03:00
Vitaly Buka
23524fdece [ValueTracking] Replace recursion with Worklist
Now findAllocaForValue can handle nontrivial phi cycles.
2020-08-27 14:44:49 -07:00
Vitaly Buka
a6927c8621 [NFC][ValueTracking] Add OffsetZero into findAllocaForValue
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86692
2020-08-27 13:46:22 -07:00
Vitaly Buka
469debe027 [ValueTracking] Support select in findAllocaForValue 2020-08-27 02:13:52 -07:00