Match naming convention for other m_Specific* matchers, and frees up the
m_Opc() matcher for future use in #84940 to allow us to capture the
opcode of a unknown binop
Moving to m_SpecificOpc does mess up the formatting in a few places,
I've tried to refactor to use the m_Value(SDValue, ....) matcher where I
can to retrieve some whitespace
A better version of #175801 . see that for more info.
Fixes#185467
The original patch was checking the correctness of the transformation
based on the original Op1 , which was then negated (in the case of
IsAdd). This patch fixes that issue by inverting the sign bit in that
case.
Also pushed a slight nfc there to simplify the code and remove some
duplication.
alive2 proofs:
abds: https://alive2.llvm.org/ce/z/oJQPss
abdu: https://alive2.llvm.org/ce/z/HfPF5q
Note that the regression test is not (wrongly) affected anymore by the
patch (as it did before)
The truncating store analogue of #181104.
Adds `Alignment` and `AddrSpace` parameters to
`TargetLoweringBase::getTruncStoreAction` and dependents, and introduces
a `getCustomTruncStoreAction` hook for targets to customize legalization
behavior using this new information.
This change is fully backwards compatible from the target's point of
view, with `setTruncStoreAction` having identical functionality. The
change is purely additive.
Fix two places where ForSigned was incorrectly passed to
computeConstantRange, causing wrong signed/unsigned range computation.
In computeConstantRangeIncludingKnownBits (DemandedElts overload),
the call omitted ForSigned, so Depth (unsigned) was implicitly
converted to bool for the ForSigned parameter. Introduced in
a6a66a4e6915.
In visitIMINMAX, the call always passed ForSigned=false, even when
folding SMAX/SMIN which query signed bounds from the resulting range.
The very first step towards #83422 - which will move DAG combines to be
processed in topological order.
There is a lot of churn on existing tests that need to be addressed
before this can be switched on globally, this patch gives the ability to
enable it both on a per-target basis, and via a command line option to
assist with testing and triage.
At the moment I'm focusing on addressing the x86 regressions (example in
the patch's basic test coverage) as that's the target I'm most familiar
with and will help with many other targets as well, but there might be
other/simpler targets that would benefit from earlier handling.
If we we are going to legalize to a vector with the same element type
and mulh or mul_lohi are supported, allow the optimization before type
legalization.
RISC-V will widen vectors using vp.udiv/sdiv that doesn't support
division by constant optimization. In addition, type legalization will create
a build_vector with undef elements making it hard to match after type
legalization.
Other targets may need to widen by a combination of vector and scalar
divisions to avoid traps if we widen a vector with garbage.
I had to enable the MULHU->SRL DAG combine before type legalization to
prevent regressions. After type legalization, the multiply constant
build_vector will have undef elements and the combine won't trigger.
Resolves#175150
Defines computeConstantRange and computeConstantRangeIncludingKnownBits
in the SelectionDAG. Currently only handles `ISD::VSCALE` operation
related to #174708.
Test cases were constructed to test varying VSCALE ranges on AArch64.
Further testing can be implemented as needed by review.
Selects of the form `cond ? 1 : 0` are created during unrolling of
setcc+vselect. Currently these are not optimized away post-legalization
even if fully redundant. Having these extra selects sitting between
things can prevent other folds from applying.
Enabling this requires some mitigations in the ARM backend, in
particular in the interaction with MVE support. There's two changes
here:
* Form CSINV/CSNEG/CSINC from CMOV, rather than only creating it during
SELECT_CC lowering. (After this change, the lowering in SELECT_CC can be
dropped without test changes, let me know if I should do that.)
* Support pushing negations through CMOV in more cases, in particular if
the operands are constant or the negation can be handled by flipping
lshr/ashr.
Additionally, in the X86 backend, try to simplify CMOV to SETCC if only the
low bit is demanded.
The index operand of ISD::EXTRACT_SUBVECTOR is implicitly scaled by
vscale, which is effectively always one for fixed-length vectors. When
combining nested extracts we must ensure all use the same implicit
scaling otherwise the transform is not equivalent.
Fixes https://github.com/llvm/llvm-project/issues/186563
Alternative approach to the same goals as #162407
This takes `TargetLoweringBase::getLoadExtAction`, renames it to
`TargetLoweringBase::getLoadAction`, merges `getAtomicLoadExtAction`
into it, and adds more inputs for relavent information (alignment,
address space).
The `isLoadExtLegal[OrCustom]` helpers are also modified in a matching
manner.
This is fully backwards compatible, with the existing `setLoadExtAction`
working as before. But this allows targets to override a new hook to
allow the query to make more use of the information. The hook
`getCustomLoadAction` is called with all the parameters whenever the
table lookup yields `LegalizeAction::Custom`, and can return any other
action it wants.
Fixes#181651
Added DemandedElts argument to isConstOrConstSplat and to
isKnowTobePowerOfTwo calls and OrZero || isKnownNeverZero(Val, Depth) is
checked before isKnowTobePowerOfTwo. Also added unit tests.
This is similar to the FSHL/FSHR handling in
hoistLogicOpWithSameOpcodeHands.
Here the opcodes aren't exactly the same, but the operations are
equivalent.
Fixes regressions from #180888
1. Use ISD::AssertNoFPClass if LoadInst has !nofpclass metadata.
2. Strip ISD::AssertNoFPClass when try to combine load with bitcast
in DAGCombiner::visitBITCAST.
For wasm, forming minnum/maxnum style ISD nodes is non-profitable,
because (in cases where any float min/max support exists at all), it has
pmin/pmax instructions that correspond to the fcmp+select semantics, or
relaxed_fmin/relaxed_fmax (for the nnan+nsz case) with even loser
semantics.
As such, return false from isProfitableToCombineMinNumMaxNum(), and also
respect that hook in the SDAGBuilder.
This reverts commit ea3fdc5972db7f2d459e543307af05c357f2be26.
Re-enable const-folding for maxnum/minnum in the middle-end, GlobalISel,
and SelectionDAG.
Re-enable optimizations that depend on maxnum/minnum sNaN semantics in
InstCombine and DAGCombiner.
Now that maxnum(x, sNaN) is specified to non-deterministically produce
either NaN or x, these constant-foldings and optimizations are now valid
again according to the newly clarified semantics in #172012 .
This patch teaches the backend how to lower the FCBRT DAG node to the
vector math library function when using ArmPL. This is similar to what
we already do for llvm.pow/FPOW, however the only way to expose this is
via a DAG combine that converts
FPOW(<2 x double> %x, <2 x double> <double 1.0/3.0, double 1.0/3.0>)
into
FCBRT(<2 x double> %x)
when the appropriate fast math flags are present on the node. I've
updated the DAG combine to handle vector types and only perform the
transformation if there exists a vector library variant of cbrt.
Implements the missing basic folds for `ISD::CLMUL` in `visitCLMUL`:
- `(clmul x, 1)` → `x`
- `(clmul x, c_pow2)` → `(shl x, log2(c_pow2))`
These were previously only folded during scalar expansion
(`expandCLMUL`), so targets with native CLMUL support (e.g. X86 pclmul,
RISCV Zbc) never had the opportunity to simplify these cases.
Fixes#181831
Add DemandedElts handling to allow better vector support
To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (mul step_vector(C0), C1) to (step_vector(C0 * C1))
fold to only occur if C0 is already non-power-of-2, C0 * C1 is a
power-of-2 or the target has good mul support.
Adds overloads
```cpp
auto m_ConstInt(uint64_t &);
auto m_ConstInt(int64_t &);
```
which behave analogously to `m_ConstInt(APInt &)`, but only match if the
captured integer fits within 64 bits.
The bitreverse(clmul(bitreverse, bitreverse)) -> clmulr fold was missing
a legality check, causing an infinite loop when CLMULR isn't supported
on the target. Added the check to match other folds in visitBITREVERSE.
Fixes#182270
Same as InstCombinerImpl::visitAnd
To prevent RISCV falling back to a mul call in known-never-zero.ll I've
had to tweak the (sub X, (vscale * C)) to (add X, (vscale * -C)) fold to
not occur if C is power-of-2 and the target has poor mul support.
Alive2: https://alive2.llvm.org/ce/z/Khvs5H
Similar for (fshr X, B, Y) | (srl X, Y) --> fshr X, (X|B), Y
This is similar to the FSHL/FSHR handling in
hoistLogicOpWithSameOpcodeHands but here we treat a shl/shr like a
fshl/fshr with 0.
The pattern doesn't require X to be the same in both sides, but that's
what occurred in the case I was looking at so that's what is
implemented.
Alive2: https://alive2.llvm.org/ce/z/eUou-u
The original implementation performed the transformation when
isTruncateFree was true:
truncate(build_vector(x, x)) -> build_vector(truncate(x), truncate(x)).
In some cases, x comes from an ext, try to pre-truncate build_vectors
source operands
when the source operands of build_vectors comes from an ext.
Testcase from: https://gcc.godbolt.org/z/bbxbYK7dh
This patch fixes an assertion failure in ForwardStoreValueToDirectLoad
during DAGCombine.
The crash occurs when `STLF (Store-to-Load Forwarding)` creates an
illegal intermediate bitcast type (e.g., `v128i1` when bridging a
128-bit store to a `<32 x i1>` load on X86). Since `v128i1` is not a
legal mask type for the backend, it violates the expectations of the
LegalizeDAG pass.
The fix adds a `TLI.isTypeLegal(InterVT)` check to ensure that the
intermediate type used for the transformation is supported by the
target.
Fixes#181130
DAGCombiner can fold a chain of INSERT_VECTOR_ELT into a vector AND/OR
operation. This patch adds protection to avoid that we end up making the
vector more poisonous by freezing the source vector when the elements
that should be set to 0/-1 may be poison in the source vector.
The patch also fixes a bug in SimplifyDemandedVectorElts for
MUL/MULHU/MULHS/AND that could result in making the vector more
poisonous. Problem was that we skipped demanding elements from Op0 that
were known to be zero in Op1. But that could result in elements being
simplified into poison when simplifying Op0, and then the result would
be poison and not zero after the MUL/MULHU/MULHS/AND. The solution is to
defensively make sure that we demand all the elements originally
demanded also when simplifying Op0.
This bugs were found when analysing the miscompiles in
https://github.com/llvm/llvm-project/issues/179448
Main culprit in #179448 seems to have been the bug in DAGCombiner. The
bug in SimplifyDemandedVectorElts surfaced when fixing the DAGCombiner,
as that fix typically introduce the (AND (FREEZE x), y) pattern that
wasn't handled correctly in SimplifyDemandedVectorElts.
Also fixes#180409.
Also fixes#176682.
This PR fixes a big-endian regression in `ForwardStoreValueToDirectLoad`
where the wrong subvector was being extracted. In big-endian, memory
offset 0 corresponds to the high bits, so the extraction index needs to
be adjusted.
As suggested by @KennethHilmersson, calculate the extraction index as
the difference between the number of elements in the intermediate vector
and the load vector when in big-endian mode.
Special thanks to Kenneth Hilmersson for providing the fix logic and the
ARM regression test.
https://github.com/llvm/llvm-project/pull/172523#issuecomment-3878065191https://github.com/llvm/llvm-project/pull/172523#issuecomment-3879575092
Two code paths in `reassociationCanBreakAddressingModePattern` were
missing a `hasUniqueMemOperand()` guard before calling
`getAddressSpace()`. Note that on `L1214` we already have the same guard
in place.
`getAddressSpace()` chains through `getPointerInfo()` to
`getMemOperand()`, which asserts that the node has exactly one memory
operand.
This patch fixes a regression introduced by PR #175022, where
a freeze was introduced with the following transformation:
ext(freeze(load(x))) -> freeze(extload(x))
If a new extend is introduced afterwards we then have
ext(freeze(extload(x)))
which doesn't get picked up by existing DAG combines due to
the freeze getting in the way.
Previously, the DAG combiner did not optimize exact signed division by a
power-of-two constant divisor for integer types exceeding the size of
division supported by the target architecture (e.g., i128 on x86-64).
However, such an optimization was expected by the division expansion
logic, leading to unsupported division operations making it to
instruction selection.
This commit addresses this issue by making an exception to the existing
exclusion of signed division with the exact flag for the aforementioned
operations. That is, the DAG combiner will now optimize exact signed
division if the divisor is a power-of-two constant and the integer type
exceeds the size of division supported by the target architecture.
---------
Signed-off-by: Steffen Holst Larsen <HolstLarsen.Steffen@amd.com>
There are target intrinsics that logically require two MMOs, such as
llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS,
so there's both a load and a store to different addresses.
Add an overload of getTgtMemIntrinsic that produces intrinsic info in a
vector, and implement it in terms of the existing (now protected)
overload.
GlobalISel and SelectionDAG paths are updated to support multiple MMOs.
The main part of this change is supporting multiple MMOs in
MemIntrinsicNodes.
Converting the backends to using the new overload is a fairly mechanical step
that is done in a separate change in the hope that that allows reducing merging
pains during review and for downstreams. A later change will then enable
using multiple MMOs in AMDGPU.
TLI.isBinOp recognises some opcodes that have multiple results,
including UADDO etc.
In most cases we currently just bail if a binop has multiple results,
but shuffle combining was missing the check and its pretty trivial to
add handling in this case.
I've added add/sub-overflow opcodes to verifyNode to help catch these
cases in the future - IIRC there was a plan to autogen these, but there
isn't anything at the moment.
Fixes#179112
This is a reland of #172523.
The original patch caused an assertion failure on RISC-V because it
attempted to create a bitcast from an illegal type (i32 on RV64) during
the post-type-legalization DAGCombine stage.
Added a `TLI.isTypeLegal(Val.getValueType())` check to ensure we only
proceed with the bitcast STLF optimization when the source value's type
is legal for the target.
This patch introduces support for Store-to-Load Forwarding (STLF) in
`DAGCombiner::ForwardStoreValueToDirectLoad` when the store and load
have **different types but equal memory size** (e.g., storing an `i32`
then loading a `float` from the same location).
### What this patch does:
**Enables Optimization:** It allows for the safe forwarding of the
stored value as a Bitcast when the value is:
* A **Constant** (`ConstantSDNode`, `ConstantFPSDNode`,
`ConstantPoolSDNode`).
* **Undef**.
* And the memory sizes (`LdMemSize` == `StMemSize`) match.
### Scope and Next Steps:
This patch **only implements forwarding for constant and undef values
that has the same memory size** so far.
**I am submitting this initial patch to get early review feedback on the
core logic and fix the immediate crashes before tackling the more
complex scenarios.**
For the simple case:
```llvm
; Case Handled by this PR so far (e.g., zeroinitializer is a constant)
define float @test_stlf_integer(ptr %p, float %v) {
store i32 0, ptr %p, align 4
%f = load float, ptr %p, align 4
; ...
}
```
Fixes: #151683
We also get some i32->i64 promotion for CLMULH. The DAGCombiner
change is to prevent an infinite loop from that.
Test file was rewritten to cover all types and split between clmul
and clmulh.
I added a couple masked tests to show that VectorPeephole works.
The test outputs were already large so I didn't want to add more than a couple.