The SelectionDAG Isel supports the both version of combines mentioned
below :
```
select Cond, Pow2, 0 --> (zext Cond) << log2(Pow2)
select Cond, 0, Pow2 --> (zext !Cond) << log2(Pow2)
```
The GlobalIsel for now only supports the first one defined in it's
generic combinerHelper.cpp. This patch adds the missing second one.
There are a number of backends (specifically AArch64, AMDGPU, Mips, and
RISCV) which contain a “TODO: make CombinerHelper methods const”
comment. This PR does just that and makes all of the CombinerHelper
methods const, removes the TODO comments and makes the associated
instances const. This change makes some sense because the CombinerHelper
class simply modifies the state of _other_ objects to which it holds
pointers or references.
Note that AMDGPU contains an identical comment for an instance of
AMDGPUCombinerHelper (a subclass of CombinerHelper). I deliberately
haven’t modified the methods of that class in order to limit the scope
of the change. I’m happy to do so either now or as a follow-up.
In case both LeftHandInst and RightHandInst are IMPLICIT_DEF with no input
operands, this patch protects against the post-legalizer-combiner
matchHoistLogicOpWithSameOpcodeHands with no operands. The
prelegalizercombiner-hoist-same-hands.mir test was cleaned up a little in the
process, and has a post-legalizer run line added so that the implicit_def do
not get folded awwy.
Add `ICmpInst::compare()` overload accepting `KnownBits`, similar to the
existing one accepting `APInt`. This is not directly part of KnownBits
(or APInt) for layering reasons.
The increase in fallbacks that was previously reported were not caused
by this change.
Original description:
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
This fixes a bug that started triggering after #111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.
%SrcReg = load..
%DstReg = sign_extend_inreg %SrcReg
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
Combine is needed to clear redundant ANDs with 1 that will be
created by reg-bank-select to clean-up high bits in register.
Fix replaceRegWith from CombinerHelper:
If copy had to be inserted, first create copy then delete MI.
If MI is deleted first insert point is not valid.
Credits:
https://reviews.llvm.org/D86516
combine-ext.mir
Notable semantic changes:
InstCombine does not mix zero and sign extend,
see CastInst::isEliminableCastPair.
New version has legality checks.
Folds sext/zext of anyext -> sext/zext
Support for nneg
Future work:
nneg zext of sext -> sext
sext of nneg zext -> sext
Have simpler lowering for exact udivs in both SelectionDAG and
GlobalISel.
The algorithm is the same between unsigned exact divs and signed divs
save for arithmetic vs logical shift for even divisors, according to
Hacker's Delight, 2nd Edition, page 242.
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
This patch adds basic support for scalable vector types in load & store
instructions for AArch64 with GISel.
Only scalable vector types with a 128-bit base size are supported, e.g.
`<vscale x 4 x i32>`, `<vscale x 16 x i8>`.
This patch adapted some ideas from a similar abandoned patch
[https://github.com/llvm/llvm-project/pull/72976](https://github.com/llvm/llvm-project/pull/72976).
This combine matches the existing fold in InstCombine, i.e.
InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating.
It tries to push freeze through an operand if the operand has only one
maybe-poison operand and all other operands are guaranteed non-poison,
and if the operation itself cannot generate poison (eg. add with nsw can
generate poison, even with non-poison operands).
This is beneficial because it can potentially enable other optimizations
to occur that would otherwise be blocked because of the freeze.
Two icmp/and combines forced computation of KnownBits on all operands
everytime. We can avoid computing KnownBits on the LHS by exploiting a
couple of properties:
- Constants are always on the RHS for those instructions. If we have no
KnownBits on the RHS, we can bail out early and avoid computing LHS
knownbits.
- For icmp uge/ult 0, we don't need to know the KBs of the LHS to infer
the result
This allows to save some KnownBits calls, which are very expensive,
without affecting codegen.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into
a single G_CONCAT_VECTORS instruction.
a = G_CONCAT_VECTORS x, y, undef, undef
b = G_CONCAT_VECTORS z, undef, undef, undef
c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef>
===>
c = G_CONCAT_VECTORS x, y, z, undef`