958 Commits

Author SHA1 Message Date
Roman Lebedev
c6ff867f92
[NFC][InstCombine] Simplify emitted IR for vector_reduce_xor(?ext(<n x i1>))
Now that we canonicalize low bit splatting to the form we were emitting
here ourselves, emit simpler IR that will be canonicalized later.

See 1e801439be26569c9ede6fd309a645b00adb656c for proofs:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)
2021-08-07 17:31:24 +03:00
Serge Pavlov
4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebccda1128c43ff3cee104e2c603569fb2,
reverted in 0c28a7c990c5218d6aec47c5052a51cba686ec5e because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Serge Pavlov
0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebccda1128c43ff3cee104e2c603569fb2.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov
16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Roman Lebedev
4ba3326f17
[InstCombine] vector_reduce_{or,and}(?ext(<n x i1>)) --> ?ext(vector_reduce_{or,and}(<n x i1>)) (PR51259)
This allows the expansion logic to actually trigger if the argument
was extended from i1 element type, like the rest of the reductions expect.

Alive2 agrees:
https://alive2.llvm.org/ce/z/wcfews (or zext)
https://alive2.llvm.org/ce/z/FCXNFx (or sext)
https://alive2.llvm.org/ce/z/f26zUY (and zext)
https://alive2.llvm.org/ce/z/jprViN (and sext)
2021-08-03 00:54:35 +03:00
Roman Lebedev
554fc9ad0a
[InstCombine] vector_reduce_smax(?ext(<n x i1>)) --> ?ext(vector_reduce_{and,or}(<n x i1>)) (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/3oqir9 (self)
https://alive2.llvm.org/ce/z/6cuI5m (zext)
https://alive2.llvm.org/ce/z/4FL8rD (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev
f47b7b6d10
[InstCombine] vector_reduce_smin(?ext(<n x i1>)) --> ?ext(vector_reduce_{or,and}(<n x i1>)) (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/noXtZ8 (self)
https://alive2.llvm.org/ce/z/JNrN6C (zext)
https://alive2.llvm.org/ce/z/58snuN (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev
b9b7162b8b
[InstCombine] vector_reduce_umax(?ext(<n x i1>)) --> ?ext(vector_reduce_or(<n x i1>)) (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/NbBaeT (self)
https://alive2.llvm.org/ce/z/iEaig4 (zext)
https://alive2.llvm.org/ce/z/meGb3y (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:23 +03:00
Roman Lebedev
0c13798056
[InstCombine] vector_reduce_umin(?ext(<n x i1>)) --> ?ext(vector_reduce_and(<n x i1>)) (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/XxUScW (self)
https://alive2.llvm.org/ce/z/3usTF- (zext)
https://alive2.llvm.org/ce/z/GVxwQz (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:22 +03:00
Roman Lebedev
469793efa7
[InstCombine] vector_reduce_mul(?ext(<n x i1>)) --> zext(vector_reduce_and(<n x i1>)) (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/PDansB (self)
https://alive2.llvm.org/ce/z/55D-Xc (zext)
https://alive2.llvm.org/ce/z/LxG3-r (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 21:57:51 +03:00
Roman Lebedev
1e801439be
[InstCombine] xor reduction w/ i1 elt type is a parity check
For i1 element type, `xor` and `add` are interchangeable
(https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like
an `add` reduction and consistently transform them both:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

Though, let's emit the IR that is similar to the one we produce for
`vector_reduce_add(<n x i1>)`.

See https://bugs.llvm.org/show_bug.cgi?id=51259
2021-08-02 20:21:37 +03:00
Jun Ma
958dddf7df [NFC][InstCombine] Fix typo 2021-07-27 11:33:10 +08:00
Alexey Bataev
8af69975af [InstCombine][NFC]Use only replaceInstUsesWith, NFC. 2021-07-08 13:58:30 -07:00
Alexey Bataev
b5113bff46 [Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im.
Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0.

Differential Revision: https://reviews.llvm.org/D105587
2021-07-08 07:56:41 -07:00
Philip Reames
c4fc2cb5b2 [instcombine] umin(x, 1) == zext(x != 0)
We already implemented this for the select form, but the intrinsic form was missing.  Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.
2021-06-30 10:20:01 -07:00
Alexey Bataev
129ae515fb [INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V).
After SLP + LTO we may have have reduction(shuffle V, poison,
mask). This can be simplified to just reduction(V) if the mask is only
for single vector and just all elements from this vector are permuted,
  without reusing, replacing with undefs and/or other values, etc.

Differential Revision: https://reviews.llvm.org/D105053
2021-06-29 10:02:38 -07:00
Sanjay Patel
153da08a6c [InstCombine] hoist min/max intrinsics above select with constant op
This is an extension of the handling for unary intrinsics and
follows the logic that we use for binary ops.

We don't canonicalize to min/max intrinsics yet, but this might
help unlock other folds seen in D98152.
2021-06-27 10:02:23 -04:00
Nikita Popov
8e0ff44bf8 [InstCombine] Make varargs cast transform compatible with opaque ptrs
The whole transform can be dropped once we have fully transitioned
to opaque pointers (as it's purpose is to remove no-op pointer
casts). For now, make sure that it handles opaque pointers correctly.
2021-06-24 21:57:05 +02:00
Nikita Popov
8321335fd8 [InstCombine] Use getFunctionType()
Avoid fetching pointer element type...
2021-06-23 20:28:34 +02:00
Sanjay Patel
1e9b6b89a7 [InstCombine] convert FP min/max with negated op to fabs
This is part of improving floating-point patterns seen in:
https://llvm.org/PR39480

We don't require any FMF because the 2 potential corner cases
(-0.0 and NaN) are correctly handled without FMF:
1. -0.0 is treated as strictly less than +0.0 with
   maximum/minimum, so fabs/fneg work as expected.
2. +/- 0.0 with maxnum/minnum is indeterminate, so
   transforming to fabs/fneg is more defined.
3. The sign of a NaN may be altered by this transform,
   but that is allowed in the default FP environment.

If there are FMF, they are propagated from the min/max call to
one or both new operands which seems to agree with Alive2:
https://alive2.llvm.org/ce/z/bem_xC
2021-06-23 10:41:39 -04:00
Joe Ellis
3c4dbf6ea9 [Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics
With regards to overrunning, the langref (llvm/docs/LangRef.rst)
specifies:

   (llvm.experimental.vector.insert)
   Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1)
   must be valid ``vec`` indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

   (llvm.experimental.vector.extract)
   Elements ``idx`` through (``idx`` + num_elements(result_type) - 1)
   must be valid vector indices. If this condition cannot be determined
   statically but is false at runtime, then the result vector is
   undefined.

For the non-mixed cases (e.g. inserting/extracting a scalable into/from
another scalable, or inserting/extracting a fixed into/from another
fixed), it is possible to statically check whether or not the above
conditions are met. This was previously missing from the verifier, and
if the conditions were found to be false, the result of the
insertion/extraction would be replaced with an undef.

With regards to invalid indices, the langref (llvm/docs/LangRef.rst)
specifies:

    (llvm.experimental.vector.insert)
    ``idx`` represents the starting element number at which ``subvec``
    will be inserted. ``idx`` must be a constant multiple of
    ``subvec``'s known minimum vector length.

    (llvm.experimental.vector.extract)
    The ``idx`` specifies the starting element number within ``vec``
    from which a subvector is extracted. ``idx`` must be a constant
    multiple of the known-minimum vector length of the result type.

Similarly, these conditions were not previously enforced in the
verifier. In some circumstances, invalid indices were permitted
silently, and in other circumstances, an undef was spawned where a
verifier error would have been preferred.

This commit adds verifier checks to enforce the constraints above.

Differential Revision: https://reviews.llvm.org/D104468
2021-06-23 10:33:22 +00:00
Sanjay Patel
b1f6ef92ec [InstCombine] reduce code duplication for FP min/max with casts fold; NFC 2021-06-22 14:15:04 -04:00
Sanjay Patel
198b79caae [InstCombine] move bitmanipulation-of-select folds
This is no outwardly-visible-difference-intended,
but it is obviously better to have all transforms
for an intrinsic housed together since we already
have helper functions in place.

It is also potentially more efficient to zap a
simple pattern match before trying to do expensive
computeKnownBits() calls.
2021-06-21 11:32:16 -04:00
Sanjay Patel
64b2676ca8 [InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms
Building on:
4c44b02d87
...and adding handling for the extra operand in these intrinsics.

This pattern is discussed in:
https://llvm.org/PR50140
2021-06-21 11:04:12 -04:00
Juneyoung Lee
ce192ced2b [InstCombine] Use poison constant to represent the result of unreachable instrs
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.

This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104602
2021-06-21 09:58:44 +09:00
Sanjay Patel
4c44b02d87 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel
afd44bb6f2 [InstCombine] fold ctlz/cttz of bool types
https://alive2.llvm.org/ce/z/tX4pUT
2021-06-13 08:26:40 -04:00
Juneyoung Lee
7161bb87c9 [InsCombine] Fix a few remaining vec transforms to use poison instead of undef
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.

Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .

This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
2021-05-31 18:47:09 +09:00
cynecx
8ec9fd4839 Support unwinding from inline assembly
I've taken the following steps to add unwinding support from inline assembly:

1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:

```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
    to label %exit unwind label %uexit
```

2.) Add Bitcode writing/reading support + LLVM-IR parsing.

3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.

4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.

5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.

6.) Don't allow unwinding callbr.

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D95745
2021-05-13 19:13:03 +01:00
Fangrui Song
d8aba75a76 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Coplin, Jared
6251b2f7f6 Attach metadata to simplified masked loads and stores 2021-05-05 18:01:49 -05:00
Dávid Bolvanský
08c08577f9 [InstCombine] cttz(sext(x)) -> cttz(zext(x))
```

----------------------------------------
define i32 @src(i16 %x, i1 %b) {
%0:
  %z = sext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
=>
define i32 @tgt(i16 %x, i1 %b) {
%0:
  %z = zext i16 %x to i32
  %p = cttz i32 %z, %b
  ret i32 %p
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/evomeg

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101764
2021-05-03 23:59:30 +02:00
Dávid Bolvanský
27b651ca47 [InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172)
Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'.

Proofs:
https://alive2.llvm.org/ce/z/o2dnjY

Solves https://bugs.llvm.org/show_bug.cgi?id=50172

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101582
2021-05-03 17:05:12 +02:00
Sanjay Patel
0f8b6686ac [InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
2021-04-29 15:07:16 -04:00
Sanjay Patel
025bb52903 [InstCombine] fold clamp to 2 values from min/max intrinsics
The "select" versions of these folds is also missing and can
cause infinite loops as shown in:
https://llvm.org/PR48900
...but it seems easier to match these as max/min as a first fix.

https://alive2.llvm.org/ce/z/wv-_dT
2021-04-27 15:35:49 -04:00
Dávid Bolvanský
137568e579 [InstCombine] Fixed UB in foldCtpop 2021-04-24 19:44:16 +02:00
Dávid Bolvanský
de3fa35cdb [InstCombine] ctpop(rot(X)) -> ctpop(X)
Proof:
https://alive2.llvm.org/ce/z/ss2zyt - rotl
https://alive2.llvm.org/ce/z/ZM7Aue - rotr

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101235
2021-04-24 18:25:03 +02:00
Dávid Bolvanský
5f77e7708a [InstCombine] Fixed crash when setting align attr for memalign 2021-04-23 14:04:08 +02:00
Dávid Bolvanský
324d641b75 [InstCombine] Enhance deduction of alignment for aligned_alloc
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.

> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

Currently, we simply bail out if we see a non-constant size - change that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100785
2021-04-20 02:04:18 +02:00
Yuanfang Chen
c5fda0e662 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae9d7dd76545b2abc2a3bfb66c6d3175 (relands
f4d682d6ce6c5b3a41a0acf297507c82f5c21eef with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Nikita Popov
a3fabc79ae Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce6c5b3a41a0acf297507c82f5c21eef.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Yuanfang Chen
f4d682d6ce [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Sanjay Patel
84cdccc9dc [InstCombine] try to eliminate an instruction in min/max -> abs fold
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
2021-04-09 10:34:03 -04:00
Sanjay Patel
5094e1279e [InstCombine] fold min/max intrinsic with negated operand to abs
The smax case shows up in https://llvm.org/PR49885 .
The others seem unlikely, but we might as well try
for uniformity (although that could mean an extra
instruction to create "nabs").

smax -- https://alive2.llvm.org/ce/z/8yYaGy
smin -- https://alive2.llvm.org/ce/z/0_7zc_
umax -- https://alive2.llvm.org/ce/z/EcsZWs
umin -- https://alive2.llvm.org/ce/z/Xw6WvB
2021-04-08 14:37:39 -04:00
Philip Reames
9ef6aa020b Plumb AssumeInst through operand bundle apis [nfc]
Follow up to a6d2a8d6f5.  This covers all the public interfaces of the bundle related code.  I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
2021-04-06 12:53:53 -07:00
Philip Reames
a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Sanjay Patel
c0645f1324 [InstCombine] fold popcount of exactly one bit to shift
This is discussed in https://llvm.org/PR48999 ,
but it does not solve that request.

The difference in the vector test shows that some
other logic transform is limited to scalar types.
2021-04-04 11:43:49 -04:00
Sanjay Patel
1462bdf1b9 [InstCombine] fold abs(srem X, 2)
This is a missing optimization based on an example in:
https://llvm.org/PR49763

As noted there and the test here, we could add a more
general fold if that is shown useful.

https://alive2.llvm.org/ce/z/xEHdTv
https://alive2.llvm.org/ce/z/97dcY5
2021-03-31 11:29:20 -04:00
Sanjay Patel
2986a9c7e2 [InstCombine] canonicalize 'not' op after min/max intrinsic
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..

The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
2021-03-09 11:33:28 -05:00
Sanjay Patel
41b9209a12 [InstCombine] fold min/max intrinsics with not ops
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.

https://alive2.llvm.org/ce/z/teuZ_J
2021-03-09 08:55:48 -05:00