This patch introduces the reduction intrinsic for floating point minimum
and maximum which has the same semantics (for NaN and signed zero) as
llvm.minimum and llvm.maximum.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D152370
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
This fold goes against the usual approach of pushing freeze into
operands. The idea behind the fold is that if the setcc feeds into
a brcond, the freeze can be dropped entirely.
Move the fold to brcond, where we can remove the freeze directly.
This ensures that there can be no infinite combine loops due to
conflicting transforms.
Differential Revision: https://reviews.llvm.org/D152544
Note: Some patterns in visitFMA are needed refined to support splat of constant.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D152260
If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP
environment is represented as a region in memory, extra moves can
appear. For example the code:
define void @func_01(ptr %ptr) {
%env = call i256 @llvm.get.fpenv.i256()
store i256 %env, ptr %ptr
ret void
}
produces DAG:
ch = get_fpenv_mem ch, memory_region
val: i256, ch = load ch, memory_region
ch = store ch, ptr, val
In this case the extra moves can be avoided if `get_fpenv_mem` got
pointer to the memory where the FP environment should be finally placed.
This change implement such optimization for this use case.
Differential Revision: https://reviews.llvm.org/D150437
Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D127115
Given an insert of a scalar load into a vector shuffle with mask
u,0,1,2,3,4,5,6 or 1,2,3,4,5,6,7,u (depending on the insert index),
it can be more profitable to convert to a single load and avoid the
shuffles. This adds a DAG combine for it, providing the new load is
still fast.
Differential Revision: https://reviews.llvm.org/D151029
When the worklist is initially being formed, there is no need to
consider all nodes for pruning. This is because the first time calling
getNextWorklistEntry will only clear those nodes which have no uses,
with their operands being added to the worklist. However, when the worklist is
created for the first time all nodes are added anyways, so this operation
actually ends up adding no nodes.
This patch adds a parameter IsCandidateForPruning to AddToWorklist with a
default value of true to avoid having to update every call site.
Differential Revision: https://reviews.llvm.org/D151416
This is the implementation of D149782
The patch implements a helper function that matches and fold the following cases in the DAGCombiner:
1. `bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)`
2. `bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))`
3. `bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y)` in multiuse case, which still reduces the number of instructions.
The helper function accepts SDValue with BSWAP and BITREVERSE opcode. This patch folds the BSWAP cases and remain the BITREVERSE optimization in the future
Reviewed By: RKSimon, goldstein.w.n
Differential Revision: https://reviews.llvm.org/D149783
Add basic computeOverflowForSignedAdd helper to recognise that sadd overflow can't occur if both operands have more that one sign bit.
Add computeOverflowForAdd wrapper that calls computeOverflowForSignedAdd/computeOverflowForUnsignedAdd depending on the IsSigned argument, and use this in DAGCombiner::visitADDO
The patch makes visitFSUBForFMACombine serve vp.fsub too. It helps DAGCombiner
to fuse vp.fsub and vp.fmul patterns to vp.fma.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D149821
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
This commit implements the two NTLH intrinsic functions.
```
type __riscv_ntl_load (type *ptr, int domain);
void __riscv_ntl_store (type *ptr, type val, int domain);
```
```
enum {
__RISCV_NTLH_INNERMOST_PRIVATE = 2,
__RISCV_NTLH_ALL_PRIVATE,
__RISCV_NTLH_INNERMOST_SHARED,
__RISCV_NTLH_ALL
};
```
We encode the non-temporal domain into MachineMemOperand flags.
1. Create the RISC-V built-in function with custom semantic checking.
2. Assume the domain argument is a compile time constant,
and make it as LLVM IR metadata (nontemp_node).
3. Encode domain value as two bits MachineMemOperand TargetMMOflag.
4. According to MachineMemOperand TargetMMOflag, select corrsponding ntlh instruction.
Currently, it supports scalar type and fixed-length vector type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D143364
These codes deleted are dead code, we never go into it.
1. In AggressiveAntiDepBreaker.cpp, have assert AntiDepReg != 0.
2. IfConversion.cpp, Kind can only be one unique value, so isFalse && isRev
can never be true.
3. DAGCombiner.cpp, at line 3675, we have considered the condition like
```
// fold (sub x, c) -> (add x, -c)
if (N1C) {
return DAG.getNode(ISD::ADD, DL, VT, N0,
DAG.getConstant(-N1C->getAPIntValue(), DL, VT));
}
```
4. ScheduleDAGSDNodes.cpp, we have Latency > 1 at line 663
5. MasmParser.cpp, code exists in a switch-case block which decided by
the value FirstTokenKind, at line 1621, FirstTokenKind could only be
one of AsmToken::Dollar, AsmToken::At and AsmToken::Identifier.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D148610
If the add node has wrap flags then they will be destroyed by converting to
sub/not. The flags can be useful in converting to rhadd, for example, but that
may be required late if the node types need to be legalized. This limits the
preferIncOfAddToSubOfNot fold until after legalize DAG if the node have flags
to allow more folding.
Differential Revision: https://reviews.llvm.org/D148809
As reported on Issue #62234 - we weren't correctly using the SDValue operand to get its value type, resulting in a failure when it came from a SDNode with multiple results
We haven't been able to create a suitable upstream regression test, but its been confirmed by inspection by both myself and @topperc
Fixes#62234
It looks like it is still profitable to accept a transformable to a legal vector
type, not just a legal vector, as long as vector elements are the same between
two of those types.
Differential Revision: https://reviews.llvm.org/D148229
This transformation creates an copysign node whose argument types do not match. RISCV does not handle such a case which results in a crash today. Looking at the relevant code in DAG, it looks like the process of enabling the non-matching types case was never completed for vectors at all. The transformation which triggered the RISCV crash is a specialization of another transform (specifically due to one use for profitability) which isn't enabled by default. Given that, I chose to match the preconditions for that other transform.
Other options here include:
* Updating RISCV codegen to handle the mismatched argument type case for vectors. This is slightly tricky as I don't see an obvious profitable lowering for this case which doesn't involve simply adding back in the round/trunc.
* Disabling the transform via a target hook.
This patch does involve two changes for AArch64 codegen. These could be called regressions, but well, the code after actually looks better than the code before.
Differential Revision: https://reviews.llvm.org/D148638
This DAGCombine is not valid for some combinations of the known bits
of x and non-power-of-two widths of x. As shown in the bug:
- The bitwidth of x is 35 (n=5)
- The unknown bits of x is only the least significant bit
- This gives the result of the ctlz two possible values: 34 or 35, both
of which will give 1 when left-shifted 5 bits.
- So the `eor x, 1` that this optimisation would give is not correct.
A similar instcombine optimisation is only applied when the width of x is
a power-of-two. GlobalISel does not have this bug, as shown by the testcase.
Fixes#61549
Differential Revision: https://reviews.llvm.org/D147518
This shows up in the wild, notably as a regression in D127115 .
Depends on D147821
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D147827
Compiler hits infinite loop in DAGCombine. For force-streaming-compatible-sve
mode we have custom lowering for 128-bit vector splats and later in
DAGCombiner::SimplifyVCastOp() we scalarized SPLAT because we have custom
lowering for SME. Later, we restored SPLAT opertion via performMulCombine().
This DAG combine is correct on little endian targets but
is incorrect on big endian targets.
Add big endian code to correct it.
Differential revision: https://reviews.llvm.org/D146460
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
Extend the existing store(load()) removal code to account for intermediate truncates that some targets won't remove with canCombineTruncStore - we only care about the load/store MemoryVT.
Fixes regression from D146121