13322 Commits

Author SHA1 Message Date
Noah Goldstein
ae76dfb747 [X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches
It makes sense to split if the cost of computing `cond1` is high
(proportionally to how likely `cond0` is), but it doesn't really make
sense to introduce a second branch if its only a few instructions.

Splitting can also get in the way of potentially folding patterns.

This patch introduces some logic to try to check if the cost of
computing `cond1` is relatively low, and if so don't split the
branches.

Modest improvement on clang bootstrap build:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=cycles
Average stage2-O3:   0.59% Improvement (cycles)
Average stage2-O0-g: 1.20% Improvement (cycles)

Likewise on llvm-test-suite on SKX saw a net 0.84% improvement  (cycles)

There is also a modest compile time improvement with this patch:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=instructions%3Au

Note that the stage2 instruction count increases is expected, this
patch trades instructions for decreasing branch-misses (which is
proportionately lower):
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=branch-misses

NB: This will also likely help for APX targets with the new `CCMP` and
`CTEST` instructions.

Closes #81689
2024-03-01 15:35:34 -06:00
David Green
dbca8a49b6
[DAG] Improve known bits of Zext/Sext loads with range metadata (#80829)
This extends the known bits for extending loads which have range
metadata, handling the range metadata on the original memory type,
extending that to the correct BitWidth.
2024-02-29 12:53:13 +00:00
Craig Topper
e7a303e3cf
[SelectionDAG] Remove unused getIndexedStridedLoadVP/getIndexedStridedStoreVP functions. NFC (#82847)
These appear to have been copied from getIndexedLoadVP/getIndexedStoreVP
which in turn were copied from the non-VP versions.
2024-02-28 15:02:48 -08:00
David Green
6e41d60a71
[SelectionDAG] Change computeAliasing signature from optional<uint64> to LocationSize. (#83017)
This is another smaller step of #70452, changing the signature of
computeAliasing() from optional<uint64_t> to LocationSize, and follow-up
changes in DAGCombiner::mayAlias(). There are some test change due to
the previous AA->isNoAlias call incorrectly using an unknown size
(~UINT64_T(0)). This should then be improved again in #70452 when the
types are known to be scalable.
2024-02-28 09:43:05 +00:00
Craig Topper
62d0c01c2c
[SelectionDAG] Remove pointer from MMO for VP strided load/store. (#82667)
MachineIR alias analysis assumes that only bytes after the pointer will
be accessed. This is incorrect if the stride is negative.

This is causing miscompiles in our downstream after SLP started making
strided loads.

Fixes #82657
2024-02-26 16:15:34 -08:00
Noah Goldstein
15a7de697a [SelectionDAG] Support sign tracking through {S|U}INT_TO_FP
Just a minimal amount of easily provable tracking.

Proofs: https://alive2.llvm.org/ce/z/RQYbdw

Closes #82808

Alive2 to has an issue with `(sitofp i1)`, but it can
be verified by hand: https://godbolt.org/z/qKr7hT7s9
2024-02-26 15:35:38 -06:00
Craig Topper
f1bb88bee2
[RISCV] Use PromoteSetCCOperands to promote operands for UMAX/UMIN during type legalization. (#82716)
For RISC-V, we were always choosing to sign extend when promoting
i32->i64. If the promoted inputs happen to be zero extended already, we
should use zero extend instead. This is what we do for SETCC.
2024-02-26 10:31:58 -08:00
David Green
257cbea20d [DAG] Format DAGCombiner::mayAlias. NFC 2024-02-26 18:22:35 +00:00
Yeting Kuo
e510fc7753
[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627)
RISC-V implements vector lrint/llrint by vfcvt.x.f.v.
2024-02-26 16:37:41 +08:00
Owen Anderson
2c5a68858b
Fix non-splat vector SREM expansion when one of the divisors is a power of two. (#82706)
The expansion previously used, derived from Hacker's Delight,
does not work correctly when the dividend is INT_MIN and the
divisor is a power of two. We now use an alternate derivation
of the A and Q constants specifically for the power-of-two divisor
case to avoid this problem. Credit to Fabian Giesen for the
new derivation.

Fixes https://github.com/llvm/llvm-project/issues/77169
2024-02-25 10:13:05 -05:00
Craig Topper
962a6970f2
[SelectionDAG] Remove unused VP strided load/store creation functions that build an MMO. (#82676)
The base case of these call InferPtrInfo. This is dangerous due to
#82657, but it turns out none of these are used.

It seemed best to reduce the surface area until these are needed.
2024-02-23 10:15:49 -08:00
Orlando Cazalet-Hyams
8a16422020
[RemoveDIs] Add DPLabels support [3a/3] (#82633)
Patch 2 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds the DPLabel class, which is the RemoveDIs llvm.dbg.label
equivalent.

   1. Add DbgRecord base class for DPValue and the not-yet-added
       DPLabel class.
   2. Add the DPLabel class.
-> 3. Add support to passes.

The next patch, #82639, will enable conversion between dbg.labels and DPLabels.

AssignemntTrackingAnalysis support could have gone two ways:

1. Have the analysis store a DPLabel representation in its results -
   SelectionDAGBuilder reads the analysis results and ignores all DbgRecord
   kinds.
2. Ignore DPLabels in the analysis - SelectionDAGBuilder reads the analysis
   results but still needs to iterate over DPLabels from the IR.

I went with option 2 because it's less work and is no less correct than 1. It's
worth noting that causes labels to sink to the bottom of packs of debug records.
e.g., [value, label, value] becomes [value, value, label]. This shouldn't be a
problem because labels and variable locations don't have an ordering requirement.
The ordering between variable locations is maintained and the label movement is
deterministic
2024-02-23 11:37:21 +00:00
Yeting Kuo
850dde063b
[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370)
This patch also pick the MatchContext framework from DAGCombiner to an
indiviual header file to make the framework be used from other files in
llvm/lib/CodeGen/SelectionDAG/.
2024-02-23 14:17:15 +08:00
Craig Topper
de41eae41f
[SelectionDAG][RISCV] Use FP type for legality query for LRINT/LLRINT in LegalizeVectorOps. (#82728)
This matches how LRINT/LLRINT is queried for scalar types in
LegalizeDAG.

It's confusing if they do different things since a "Legal" vector
LRINT/LLRINT would get through to LegalizeDAG which would then consider
it illegal. This doesn't happen currently because RISC-V uses Custom.
2024-02-22 20:18:52 -08:00
Craig Topper
c1716e3fcf
[DAGCombiner][RISCV] CSE zext nneg and sext. (#82597)
If we have a sext and a zext nneg with the same types and operand
we should combine them into the sext. We can't go the other way
because the nneg flag may only be valid in the context of the uses
of the zext nneg.
2024-02-22 09:06:49 -08:00
David Majnemer
be36812fb7 [TargetLowering] Be more efficient in fp -> bf16 NaN conversions
We can avoid masking completely as it is OK (and probably preferable) to
bring over some of the existant NaN payload.
2024-02-21 22:47:27 +00:00
David Majnemer
9eff001d3d [TargetLowering] Correctly yield NaN from FP_TO_BF16
We didn't set the exponent field, resulting in tiny numbers instead of
NaNs.
2024-02-21 22:17:02 +00:00
David Majnemer
ddc0f1d8fe [TargetLowering] Actually add the adjustment to the significand
The logic was supposed to be choosing between {0, 1, -1} as an
adjustment to the FP bit pattern. However, the adjustment itself was
used as the bit pattern instead which result in garbage results.
2024-02-21 19:34:11 +00:00
David Majnemer
cc13f3ba45
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities

Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
2024-02-21 12:37:02 -05:00
Paul Walker
28fb2b33c2
[LLVM][SelectionDAG] Reduce number of ComputeValueVTs variants. (#75614)
This is another step in the direction of fixing the `Fixed(0) !=
Scalable(0)` bugbear, although whilst weird I don't believe it's causing
us any real issues.
2024-02-21 13:03:24 +00:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Orlando Cazalet-Hyams
ababa96475
[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252)
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class

    -> 1. Add DbgRecord base class for DPValue and the not-yet-added
          DPLabel class.
       2. Add the DPLabel class.
       3. Enable dbg.label conversion and add support to passes.

Patches 1 and 2 are NFC.

In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
2024-02-20 16:00:55 +00:00
Craig Topper
f8cbb67b10
[DAGCombiner] Preserve nneg flag from inner zext when we combine (z/s/aext (zext X)) (#82199) 2024-02-19 12:21:17 -08:00
Craig Topper
f668a08e00
[DAGCombiner][RISCV] Optimize (zext nneg (truncate X)) if X has known sign bits. (#82227)
This treats the zext nneg as sext if X is known to have sufficient sign
bits to allow the zext or truncate or both to removed. This code is
taken from the same optimization for sext.
2024-02-19 10:45:11 -08:00
Manish Kausik H
652081ca9e
[NFC][SelectionDAG] Move function getStackAlignedMMO to the beginning of LegalizeDAG.cpp (#82171) 2024-02-19 19:11:42 +04:00
Tim Northover
0215d2c58b arm64_32: extend @llvm.stackguard call to in-DAG 64-bits before handing off
Pointers are 64-bits in the DAG, so we need to extend the result of loading the
cookie when building the DAG.
2024-02-19 10:32:29 +00:00
Craig Topper
d5167c84f9
[DAGCombiner] Allow tryToFoldExtOfLoad to use a sextload for zext nneg. (#81714)
If the load is used by any signed setccs, we can use a sextload
instead of zextload. Then we don't have to give up on extending
the load.
2024-02-17 11:37:13 -08:00
Craig Topper
d485317357
[TargetLowering] Emit SIGN_EXTEND_INREG instead of shift pair from optimizeSetCCOfSignedTruncationCheck. (#81785)
sext_inreg is our canonical form of shift pair before op legalization so
DAG combiner will probably create it anyway. If it isn't legal
LegalizeDAG will expand to shifts later.
2024-02-15 09:24:02 -08:00
Simon Pilgrim
b279ca2783 [DAG] visitCTPOP - CTPOP(SHIFT(X)) -> CTPOP(X) iff the shift doesn't affect any non-zero bits
If the source is being (logically) shifted, but doesn't affect any active bits, then we can call CTPOP on the shift source directly.
2024-02-15 10:41:08 +00:00
Craig Topper
86ce491f30
[DAGCombiner] Remove unneeded commonAlignment from reduceLoadWidth. (#81707)
We already have the PtrOff factored into MachinePointerInfo. Any calls
to getAlign on the new load with do commonAlignment with the
MachinePointerInfo offset and the base alignment.
2024-02-13 23:26:25 -08:00
Craig Topper
e6253102a7
[DAGCombiner] Remove unnecessary commonAlignment from CombineExtLoad. (#81705)
The getAlign function for a load returns the commonAlignment of the
"base align" and the offset stored in the MachinePointerInfo.

We're splitting a load here, so we should take the base alignment from
the original load without any offset that may already exist in the
original load. The new load can then maintain its own alignment using
just the base alignment and its own offset.

Noticed by inspection.
2024-02-13 23:26:08 -08:00
Heejin Ahn
473ef10b0f
[WebAssembly] Demote PHIs in catchswitch BB only (#81570)
`DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in
99d60e0dab,
because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all
PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because
`catchswitch`s are removed in ISel and `catchswitch` BBs are removed as
well, so they can't have other instructions.

But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or
`cleanuppad` BBs don't need to be demoted. That was the reason
`DemoteCatchSwitchPHIOnly` option was added, in order not to demote more
instructions unnecessarily.

The problem is it should have been set to `true` for Wasm EH. (Its
default value is `false` for WinEH) And I mistakenly set it to `false`
and wasn't aware about this for more than 5 years. This was not the end
of the world; it just means we've been demoting more instructions than
we should, possibly huting code size. In practice I think it would've
had hardly any effect in real performance given that the occurrence of
PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many
people run other optimizers like Binaryen anyway.
2024-02-13 13:43:21 -08:00
Danila Malyutin
e20462a069
[StatepointLowering] Use Constant instead of TargetConstant for undef value (#81635)
Prevents isel errors when trying to lower gc relocate of undef value
(which turns into CopyToReg of TargetConstant). Such relocates may occur
after DCE (e.g. after GVN removes some dead blocks) if there are not
passes like instcombine scheduled after to clean them up.

Fixes #80294

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-02-13 21:58:01 +03:00
Joseph Huber
11fcae69db
[LLVM] Add __builtin_readsteadycounter intrinsic and builtin for realtime clocks (#81331)
Summary:
This patch adds a new intrinsic and builtin function mirroring the
existing `__builtin_readcyclecounter`. The difference is that this
implementation targets a separate counter that some targets have which
returns a fixed frequency clock that can be used to determine elapsed
time, this is different compared to the cycle counter which often has
variable frequency.

This patch only adds support for the NVPTX and AMDGPU targets.

This is done as a new and separate builtin rather than an argument to
`readcyclecounter` to avoid needing to change existing code and to make
the separation more explicit.
2024-02-13 10:06:25 -06:00
Nikita Popov
25b9ed6e49
[DAGCombine] Fix multi-use miscompile in load combine (#81586)
The load combine replaces a number of original loads with one new loads
and also replaces the output chains of the original loads with the
output chain of the new load. This is incorrect if the original load is
retained (due to multi-use), as it may get incorrectly reordered.

Fix this by using makeEquivalentMemoryOrdering() instead, which will
create a TokenFactor with both chains.

Fixes https://github.com/llvm/llvm-project/issues/80911.
2024-02-13 16:41:00 +01:00
Simon Pilgrim
d30e941a03
[DAG] Add SelectionDAG::getShiftAmountConstant APInt variant (#81484)
Asserts that the shift amount is in range and update ExpandShiftByConstant to use getShiftAmountConstant (and legal shift amount types).
2024-02-13 08:06:16 +00:00
Simon Pilgrim
b35c519762 [DAG] tryToFoldExtendOfConstant - share the same SDLoc argument instead of recreating it over and over again. 2024-02-08 11:43:29 +00:00
Jeremy Morse
faa2f9658a
[DebugInfo] Handle dbg.assigns in FastISel (#80734)
There are some rare circumstances where dbg.assign intrinsics can reach
FastISel. They are a more specialised kind of dbg.value intrinsic with
more information about the originating alloca. They only occur during
optimisation, but might reach FastISel through always_inlining an
optimised function into an optnone function.

This is a slight problem as it's not safe (for debug-info accuracy) to
ignore any intrinsics, and for RemoveDIs (the intrinsic-replacement
project) it causes a crash through an unhandled switch case. To get
around this, we can just treat the dbg.assign as a dbg.value (it's an
actual subclass) and use the variable location information from the
dbg.value fields. This loses a small amount of debug-info about stack
locations, but is more accurate than just ignoring the intrinsic.

(This has popped up deep in an LTO build of a large codebase while
testing RemoveDIs, I figured it'd be good to fix it for the
intrinsic-form at the same time, just to demonstrate the correct
behaviour).
2024-02-08 10:44:43 +00:00
Luke Lau
ece66dbc60
[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR (#80452)
This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the
step
amount is a power of two. This is taken from
https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due
to it
not having any test difference, I've included it here for completeness's
sake.

The second case handles the case when we have an upper bound on
vscale_range.
We can use this to work out the upper bound on the number of elements,
and thus
what the maximum step will be. From the maximum step we then know which
hi bits
are zero.

On its own, computing the known hi bits results in some small
improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm
hoping
to be able to use this later to reduce the LMUL in index calculations
for
vrgather/indexed accesses.

---------

Co-authored-by: Philip Reames <preames@rivosinc.com>
2024-02-08 10:04:55 +08:00
Simon Pilgrim
de7beb06e7 [DAG] ExpandShiftWithKnownAmountBit - reduce number of repeated getOpcode / getOperand calls. NFC. 2024-02-07 14:07:02 +00:00
Simon Pilgrim
670c2529bb [DAG] Use DAGCombiner::SimplifyDemandedBits wrappers with default (all) DemandedElts. NFC.
Don't call TLI.SimplifyDemandedVectorElts directly from every SimplifyDemandedBits call, use the more expressive wrappers instead first.

This reduces the number of places we call TLI.SimplifyDemandedVectorElts and CommitTargetLoweringOpt to make it easier to track.

Part of the work to process DAG nodes in topological order.
2024-02-07 11:12:29 +00:00
Craig Topper
0fb9f68bae
[SelectionDAG] Use getRegisterType instead of getTypeToTransformTo in ComputePHILiveOutRegInfo. (#80773)
Since we used getNumRegisters right before this, I think this is the
correct interface we should be using here.

I'm experimenting with making i32 legal on RISC-V 64, but using i64 for
the register type between basic blocks. This was one of the first issues
I found trying to do that.
2024-02-06 09:39:19 -08:00
David Green
2e3de997ab [DAG] Generalize setcc(setcc) fold to use known bits.
If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove
the outer setcc as it will produce the same value as the inner. This can be
generalized to anything where the top bits are known to be 0, as the value will
remain as 1 or 0.
2024-02-06 12:39:48 +00:00
Simon Pilgrim
b8cdc2638e
[DAG] visitCTPOP - if only the upper half of the ctpop operand is zero then see if its profitable to only count the lower half. (#80473) 2024-02-06 12:19:31 +00:00
Philip Reames
e722d9662d [DAG] Avoid a crash when checking size of scalable type in visitANDLike
Fixes https://github.com/llvm/llvm-project/issues/80744.  This transform
doesn't handled vectors at all,  The fixed length ones pass the first
check, but would fail the constant operand checks which immediate follow.
This patch takes the simplest approach, and just guards the transform
for scalar integers.
2024-02-05 14:30:10 -08:00
Craig Topper
6590d0fed5
[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342)
If we have a shifted mask, we may be able to reduce the load width
to the width of the non-zero part of the mask and use an offset
to the base address to remove the srl. The offset is given by
C+trailingzeros(ShiftedMask).
    
Then we add a final shl to restore the trailing zero bits.
    
I've use the ARM test because that's where the existing (and (srl
(load))) tests were.
    
The X86 test was modified to keep the H register.
2024-02-04 16:05:51 -08:00
Craig Topper
f72da9f4fd
[SelectionDAG] Use getShiftAmountConstant to simplify code. NFC (#80561)
Replace calls to getShiftAmountTy+getConstant with
getShiftAmountContant.
2024-02-04 16:05:14 -08:00
Simon Pilgrim
114a33be47 [DAG] getStackAlignedMMO - return the getMachineMemOperand result directly (style). NFC. 2024-02-04 14:01:55 +00:00
Manish Kausik H
a768bc6ef6
[SelectionDAG] Use unaligned store to move AVX registers onto stack for extractelement (#78422)
Prior to this patch, SelectionDAG generated aligned move onto stacks for
AVX registers when the function was marked as a no-realign-stack
function. This lead to misalignment between the stack and the
instruction generated. This patch fixes the issue.

Fixes #77730
2024-02-02 22:49:31 +05:30