If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
This patch adjusts the legality check for riscv to use `cpop/cpopw` since `isOperationLegal(ISD::CTPOP, MVT::i32)` returns false on rv64gc_zbb.
Clang vs gcc: https://godbolt.org/z/rc3s4hjPh
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D156390
Extension to the signbit case, if the signbits extend down through all the demanded bits then SMIN/SMAX/UMIN/UMAX nodes can be simplified to a OR/AND/AND/OR.
Alive2: https://alive2.llvm.org/ce/z/mFVFAn (general case)
Differential Revision: https://reviews.llvm.org/D158364
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
This improves some cases where a splat_vector uses a build_pair that can be
simplified, e.g:
(rotl x:i64, splat_vector (build_pair x1:i32, x2:i32))
rotl only demands the bottom 6 bits, so this patch allows it to simplify it to:
(rotl x:i64, splat_vector (build_pair x1:i32, undef:i32))
Which in turn improves some cases where a splat_vector_parts is lowered on
RV32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D158839
This reverts commit 8d0c3db388143f4e058b5f513a70fd5d089d51c3.
Causes crashes, see comments in https://reviews.llvm.org/D149367.
Some follow-up fixes are also reverted:
This reverts commit 636269f4fca44693bfd787b0a37bb0328ffcc085.
This reverts commit 5966079cf4d4de0285004eef051784d0d9f7a3a6.
This reverts commit e7294dbc85d24a08c716d9babbe7f68390cf219b.
The CodeView `S_ARMSWITCHTABLE` debug symbol is used to describe the layout of a jump table, it contains the following information:
* The address of the branch instruction that uses the jump table.
* The address of the jump table.
* The "base" address that the values in the jump table are relative to.
* The type of each entry (absolute pointer, a relative integer, a relative integer that is shifted).
Together this information can be used by debuggers and binary analysis tools to understand what an jump table indirect branch is doing and where it might jump to.
Documentation for the symbol can be found in the Microsoft PDB library dumper: 0fe89a942f/cvdump/dumpsym7.cpp (L5518)
This change adds support to LLVM to emit the `S_ARMSWITCHTABLE` debug symbol as well as to dump it out (for testing purposes).
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149367
DemandedBits is forced to all ones if there are multiple users.
The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)
I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
On X86 for vec types `(X & Y) == Y` is generally preferable to
`(X & Y) != 0`. Creating zero requires an extra instruction and on
pre-avx512 targets there is no vector `pcmpne` so it requires two
additional instructions to invert the `pcmpeq`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D157014
Move the sra(shl(x,c1),c1) -> sign_extend_inreg(x) fold inside SimplifyDemandedBits so we can recognize hidden splats with DemandedElts masks.
Because the c1 shift amount has multiple uses, hidden splats won't get simplified to a splat constant buildvector - meaning the existing fold in DAGCombiner::visitSRA can't fire as it won't see a uniform shift amount.
I also needed to add TLI preferSextInRegOfTruncate hook to help keep truncate(sign_extend_inreg(x)) vector patterns on X86 so we can use PACKSS more efficiently.
Differential Revision: https://reviews.llvm.org/D157972
Extending to f32 first (as is done for f16) results in better generated
code for RISC-V (and affects no other in-tree tests). Additionally,
performing the FP_EXTEND first seems equally justified for bf16 as for
f16.
Differential Revision: https://reviews.llvm.org/D156944
We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.
Inspired by some of the cases from D145468
Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.
A future patch will propose the equivalent shl narrowing combine.
Differential Revision: https://reviews.llvm.org/D146121
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Divverential revision: https://reviews.llvm.org/D155095
This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.
TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to
canonicalize the constant to the RHS. The bug here was that it did so whether
or not the RHS was already a constant, leading to an infinite loop.
rdar://111847838
Differential revision: https://reviews.llvm.org/D155095
InstCombine should have taken care of this, but I think
this is more useful in the future when the expansion
tries to handle multiple cases at a time with fcmp.
x87 looks worse to me but the only thing I know about it is that
I aggressively do not care about it.
https://reviews.llvm.org/D143198
Combine the two checks into a check if the exponent bits are 0. The
inverted case isn't reachable until a future change, and GlobalISel
currently doesn't attempt the inversion optimization.
https://reviews.llvm.org/D143182
The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh
using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext,
ext) can efficiently use smull/umull instructions. This extends the existing
code in GetMULHS to handle vector types for it.
Differential Revision: https://reviews.llvm.org/D154049
When the sign of either of the operands is known, it is possible to
determine what the saturating value will be without having to compute it
using the sign bits.
Differential Revision: https://reviews.llvm.org/D153575
`ctpop(X) eq/ne 1` is checking if X is a non-zero power of 2. Power of
2 check including zero is `(X & (X-1)) eq/ne 0` and unfortunately
there is no good pattern for checking a power of 2 while excluding
zero. So, when lowering `ctpop(X) eq/ne 1`, explicitly check
`IsKnownNeverZero(X)` to maybe be able to optimize out the extra zero
check.
We need this explicitly as DAGCombiner does not re-analyze provable
setcc nodes, and the middle-end never finds it beneficially to broaden
`ctpop(X) eq/ne 1` -> `ctpop(X) ule/ugt 1` (power of 2 including
zero).
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D152675
Note: Some patterns in visitFMA are needed refined to support splat of constant.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D152260
Use computeOverflowForUnsignedAdd and computeOverflowForSignedAdd
instead. Unfortunately, this recomputes some known bits and sign bits
we may have already computed, but was the easiest fix without a lot
of restructuring.
This recovers the regressions from D151472.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D151858
This exposed a miscompile due to incorrect flag preservation in
integer type legalization, which has been fixed in D151472.
-----
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
This patch is a continuation of D150110. It separates the cases for
ADD and SUB into their own cases so that computeForAddSub can be
directly called and the NSW flag passed. This allows better
optimization when the NSW flag is enabled, and allows fixing up the
TODO that was there previously in SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D150769
Define intersectWith and unionWith as two complementary ways of
combining KnownBits. The names are chosen for consistency with
ConstantRange.
Deprecate commonBits as a synonym for intersectWith.
Differential Revision: https://reviews.llvm.org/D150443
Correct the legality of i32 mul_lohi on AArch64.
Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.
After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.
This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.
I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D150333
In the SimplifyDemandedBits function, there is a fallthrough to the
default case in the case of ISD::ADD, ISD::MUL and ISD::SUB. This
leads to a call to computeKnownBits which is unnecessary as the
calls to SimplifyDemandedBits in the cases themselves handle the
calculation of the known bits. This information is discarded through
the Known2 variables.
By keeping this information around and calling
KnownBits::mul or KnownBits::computeForAddSub directly, the
unnecessary computation can be avoided. For now, the NSW bit is not
passed through to KnownBits as this is something that
computeKnownBits does not handle either. This requires updating
computeForAddCarry to handle the flag as well.
Differential Revision: https://reviews.llvm.org/D150110
Move the fold from X86 to generic expansion
(We also have several existing expansions that are missing freezes on repeated operands - I've added a TODO for now).