This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
The non-constant bit/bif/bsp already work through tablegen patterns, this
patch handles the constant case, mirroring the basic support for
`or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded
to either BIT, BIF or BSL depending on the best register allocation.
G_BIT can be replaced with G_BSP as a more general alternative.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
The non-constant bit/bif/bsp already work through tablegen patterns, this
patch handles the constant case, mirroring the basic support for
`or(and(X, C), and(Y, ~C))` from ISel tryCombineToBSL. BSP gets expanded
to either BIT, BIF or BSL depending on the best register allocation.
G_BIT can be replaced with G_BSP as a more general alternative.
Remove CodeGen leftovers from the old combiner backend and adapt the API to fit the new backend better.
It's now quite a bit closer to how InstructionSelector works.
- `CombinerInfo` is now a simple "options" struct.
- `Combiner` is now the base class of all TableGen'd combiner implementation.
- Many fields have been moved from derived classes into that class.
- It has been refactored to create & own the Observer and Builder.
- `tryCombineAll` TableGen'd method can now be renamed, which allows targets to implement the actual `tryCombineAll` call manually and do whatever they want to do before/after it.
Note: `CombinerHelper` needs to be mutable because none of its methods are const. This can be revisited later.
Depends on D158710
Reviewed By: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D158713
Only a few minor test changes needed because I removed the "helper" suffix from the combiner name, as it's not really a helper anymore but more like the implementation itself.
Depends on D153757
NOTE: This would land iff D153757 (RFC) lands too.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D153850
There is no case where those functions return false. It's always return true.
Even if they were to return false, it's not really something we should rely on I think.
With the current combiner implementation, it would just make `tryCombineAll` return false without retrying anymore rules.
I also believe that if an applyer were to return false, it would mean that the match function is not good enough. Asserting on failure in an apply function is a better idea, IMO.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153619
This fixes a long standing cause of awful code generation when legalization creates
G_SEXT(G_FCMP(...)), for example due to promoting the condition of a vector G_SELECT.
Since on AArch64 vector compares sign-extend the condition value, there's no need
for this extra G_SEXT. Unfortunately by the time we get to post-legalization these
G_SEXTs have already been lowered into shifts, so this combine is a bit more
involved than I'd ideally like. Oh well.
Differential Revision: https://reviews.llvm.org/D135078
Before, the isPreLegalize() query in CombinerHelper only checked for the
presence of a LegalizerInfo object. This is problematic when we want to have
a combine actually check for legality in a pre-legalizer combine pass, since
if we pass a LegalizerInfo object to the constructor it causes the combines to
think that we're running *post* legalizer, which isn't true.
This change fixes it to instead check an explicit bool that passes to signal
whether the pass will be run before or after legalization.
Doing so exposed a bug in the extending loads combine, which tried to check for
legality of candidate extending loads if LegalizerInfo was present. Since we
only ran it pre-legalizer and therefore with a null LegalizerInfo, it never
actually ran. Also fixes the legality checks to keep the tests passing.
Differential Revision: https://reviews.llvm.org/D135044
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
This patch revises the warning fix done in
a93b1792f1c8f7e2e7c931993110dc48f7ddba01. Specifically, it rolls the
MRI.getType call into the assert, thereby avoiding the named variable.
This results in a very minor improvement in most cases, generating
stores of xzr instead of moving zero to a vector register.
Differential Revision: https://reviews.llvm.org/D115479
This is a common pattern:
```
%icmp:_(s32) = G_ICMP intpred(eq), ...
%ext:_(s64) = G_ANYEXT %icmp(s32)
%and:_(s64) = G_AND %ext, 1
```
Here's an example: https://godbolt.org/z/T13f6o8zE
This pattern appears because of the following combine in the
LegalizationArtifactCombiner:
```
// zext(trunc x) - > and (aext/copy/trunc x), mask
```
Which kicks in when we widen the result of G_ICMP from 1 bit to 32 bits.
We know that, on AArch64, a scalar G_ICMP will produce 0 or 1. So the result
of `%ext` will always be 0 or 1 as well.
We have some KnownBits combines which eliminate redundant G_ANDs with masks.
These combines don't kick in with G_ANYEXT.
So, if we replace the G_ANYEXT with G_ZEXT in this situation, the KnownBits
based combines can remove the redundant G_AND.
I wasn't sure if it woud be more appropriate to
* Take this route
* Put this in the LegalizationArtifactCombiner.
* Allow 64 bit G_ICMP destinations
I decided on this route because
1) It's simple
2) I'm not sure if philosophically-speaking, we should be handling non-artifact
instructions + target-specific details like TargetBooleanContents in the
LegalizationArtifactCombiner
3) There is a lot of existing code which assumes we only have 32 bit G_ICMP
destinations. So, adding support for 64-bit destinations seems rather invasive
right now. I think that adding support for 64-bit destinations, or modelling
G_ICMP as ADDS/SUBS/etc is probably cleaner long term though.
This gives minor code size savings on all CTMark benchmarks.
Differential Revision: https://reviews.llvm.org/D110959
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT
Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.
Relevant matchers now return both DefVReg and APInt/APFloat.
Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant
In other places use integer only constant match.
Differential Revision: https://reviews.llvm.org/D104409
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.
The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.
There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.
Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.
Differential Revision: https://reviews.llvm.org/D100149
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.
This patch is NFC for fixed-width vectors.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D104454
When creating G_SBFX/G_UBFX opcodes, the last operand is the
width instead of the bit position. The bit position is used
for the AArch64 SBFM and UBFM instructions. The bit position
is converted to a width if the SBFX/UBFX aliases are generated.
For other SBMF/UBFM aliases, such as shifts, the bit position
is used.
Differential Revision: https://reviews.llvm.org/D101543
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.
This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.
Differential Revision: https://reviews.llvm.org/D99230
A G_MUL + G_PTR_ADD can also be folded into a madd. So, conservatively, we
shouldn't combine when the G_MUL is used by a G_PTR_ADD either.
Differential Revision: https://reviews.llvm.org/D96457
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.
I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.
Differential Revision: https://reviews.llvm.org/D91125
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.
Differential Revision: https://reviews.llvm.org/D90699
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)
It still makes sense to select these instructions at -O0.
Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.
This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.
Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.
And also add a 'r' which somehow got dropped from a bunch of function names.
https://reviews.llvm.org/D89820
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.
With this, we get selection for free.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.
Differential Revision: https://reviews.llvm.org/D84866
Given this:
```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```
We can produce:
```
%y:_(<n x sK) = G_DUP %lane(sK)
```
Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.
Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.
Differential Revision: https://reviews.llvm.org/D81979
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.
As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.
Depends on D81862
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81863
Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.
Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.
Tests are the same as arm64-ext.ll.
Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.
Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.
Differential Revision: https://reviews.llvm.org/D81436
We select all of these via patterns now, so there's no reason to disallow this.
Update select-dup.mir to show that we correctly select the smaller types.
Differential Revision: https://reviews.llvm.org/D81322
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.
- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer
Add select-trn.mir to test selection.
Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.
Note that both of these tests contain things we currently don't legalize.
I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.
Differential Revision: https://reviews.llvm.org/D81182
Since all of the other G_SHUFFLE_VECTOR transforms are going there, let's do
this with dup as well. This is nice, because it lets us split up the original
code into matching, register bank selection, and instruction selection.
- Create G_DUP, make it equivalent to AArch64dup
- Add a post-legalizer combine which is 90% a copy-and-paste from
tryOptVectorDup, except with shuffle matching closer to what SelectionDAG
does in `ShuffleVectorSDNode::isSplatMask`.
- Teach RegBankSelect about G_DUP. Since dup selection relies on the correct
register bank for FP/GPR dup selection, this is necessary.
- Kill `tryOptVectorDup`, since it's now entirely handled by G_DUP.
- Add testcases for the combine, RegBankSelect, and selection. The selection
test gives the same selection results as the old test.
Differential Revision: https://reviews.llvm.org/D81221