484 Commits

Author SHA1 Message Date
Matt Arsenault
1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault
cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Matt Arsenault
c0b12916a7 AMDGPU/GlobalISel: Use more wide vector load/stores
This improves the type breakdown for some large vectors. For example,
we now get a <4 x s32> and s32 store instead of 5 s32 stores for
<5 x s32>.
2020-02-01 10:47:21 -05:00
Matt Arsenault
e3117e5c30 AMDGPU/GlobalISel: Improve legalization of wide stores
This fixes legalizations of global stores > 128-bits. It seems work is
needed on how this split actually occurs. For example, we get the
right code for s160, with an s128 and s32 load, but get 5 s32 loads
for <5 x s32>.
2020-02-01 10:47:03 -05:00
Jay Foad
2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Jay Foad
31e29d4afe AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC. 2020-01-31 13:53:39 +00:00
Matt Arsenault
ea956685a1 GlobalISel: Implement s32->s64 G_FPTOSI lowering
Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.
2020-01-30 08:47:07 -05:00
Matt Arsenault
b21571f4d5 AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI 2020-01-30 08:46:37 -05:00
Matt Arsenault
8184176efd AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10
I'm pretty sure this is wrong and we should expand these in a correct
way, but this matches the existing behavior.
2020-01-30 08:38:50 -05:00
Matt Arsenault
872e899b75 AMDGPU/GlobalISel: Legalize unpacked d16 image operations
On targets that don't have the normal packed f16 layout, handle these
during legalization. Directly modify the register types. We can infer
this was a d16 load based on the mem operand size during selection.

A16 operands should possibly be handled here as well, but don't worry
about that yet.
2020-01-30 08:36:11 -05:00
Matt Arsenault
b4a0766c8d AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap 2020-01-30 08:22:43 -05:00
Matt Arsenault
c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Matt Arsenault
96352e0a1b AMDGPU/GlobalISel: Handle LDS with relocations case 2020-01-29 08:18:55 -08:00
Matt Arsenault
c3075e6171 AMDGPU/GlobalISel: Select buffer atomics
The cmpswap handling is incomplete and fails to select.
2020-01-27 15:16:44 -05:00
Matt Arsenault
0eb62d5b3f AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.store 2020-01-27 15:16:21 -05:00
Matt Arsenault
a69c26a927 AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.store[.format] 2020-01-27 15:00:21 -05:00
Matt Arsenault
533d650e94 AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling
Treat this the same way as loads. There's less value to the
intermediate nodes, but it's good to be consistent.
2020-01-27 14:59:30 -05:00
Matt Arsenault
75d66f8434 AMDGPU/GlobalISel: Select llvm.amdcn.struct.tbuffer.load 2020-01-27 14:42:04 -05:00
Matt Arsenault
09ed0e44d9 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load 2020-01-27 13:40:37 -05:00
Matt Arsenault
97711228fd AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format 2020-01-27 13:23:35 -05:00
Matt Arsenault
ce7ca2caf2 AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load 2020-01-27 13:05:55 -05:00
Matt Arsenault
198624c39d AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format 2020-01-27 13:02:19 -05:00
Matt Arsenault
fc90222a91 AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load
Use intermediate instructions, unlike with buffer stores. This is
necessary because of the need to have an internal way to distinguish
between signed and unsigned extloads. This introduces some duplication
and near duplication with the buffer store selection path. The store
handling should maybe be moved into legalization to match and
eliminate the duplication.
2020-01-27 12:49:23 -05:00
Matt Arsenault
a1d33ce73a AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR
Try to keep simple v2s16 cases as-is. This will more naturally map to
how the VOP3P op_sel modifiers work compared to the expansion
involving bitcasts and bitshifts.

This could maybe try harder with wider source vector types, although
that could be handled with a pre-legalize combine.
2020-01-27 08:28:05 -08:00
Matt Arsenault
2a160ba5b0 GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results
Only use shifts if the requested type exactly matches the source type,
and create sub-unmerges otherwise.
2020-01-27 06:18:26 -08:00
Matt Arsenault
a722cbf77c AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec
The intermediate instruction drops the extra volatile argument. We are
missing an atomic ordering on these.
2020-01-22 09:26:17 -05:00
Matt Arsenault
e47965bf64 AMDGPU/GlobalISel: Merge trivial legalize rules
Also move constant-like rules together
2020-01-21 17:37:19 -05:00
Matt Arsenault
9a5a6e9465 AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules 2020-01-21 16:57:01 -05:00
Matt Arsenault
fd109308a7 AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
2020-01-21 16:35:36 -05:00
Matt Arsenault
e12b840abf AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG
Clamping the scalar is much better than lowering with superwide shifts
for types > s64.
2020-01-16 14:29:37 -05:00
Matt Arsenault
936483fb7d GlobalISel: Implement lower for G_BITCAST
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
2020-01-15 08:58:58 -05:00
Matt Arsenault
ca19d7a399 AMDGPU/GlobalISel: Fix branch targets when emitting SI_IF
The branch target needs to be changed depending on whether there is an
unconditional branch or not.

Loops also need to be similarly fixed, but compiling a simple testcase
end to end requires another set of patches that aren't upstream yet.
2020-01-13 12:51:05 -05:00
Matt Arsenault
bac995d978 AMDGPU/GlobalISel: Clamp G_ZEXT source sizes
Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so
fewer cases actually work.
2020-01-10 09:42:49 -05:00
Matt Arsenault
0ea3c7291f GlobalISel: Handle llvm.read_register
Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a,
this uses intermediate generic instructions.
2020-01-09 17:37:52 -05:00
Matt Arsenault
35ad66fae8 AMDGPU/GlobalISel: Widen 16-bit shift amount sources
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
2020-01-09 16:29:44 -05:00
Matt Arsenault
6652cc0cf7 AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointers
4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal
condition type set for pointers with any unrecognized address space.
2020-01-07 16:36:31 -05:00
Matt Arsenault
52afc93c38 AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER 2020-01-06 19:16:32 -05:00
Matt Arsenault
4e85ca9562 AMDGPU/GlobalISel: Replace handling of boolean values
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.

Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.

Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.

Summary of how this should now work:

- G_TRUNC is always a no-op, and never should use a vcc bank result.

- SALU boolean operations should be promoted to s32 in RegBankSelect
  apply mapping

- An s1 value means vcc bank at selection. The exception is for
  legalization artifacts that use s1, which are never VCC. All other
  contexts should infer the VCC register classes for s1 typed
  registers. The LLT for the register is now needed to infer the
  correct register class. Extensions with vcc sources should be
  legalized to a select of constants during RegBankSelect.

- Copy from non-vcc to vcc ensures high bits of the input value are
  cleared during selection.

- SALU boolean inputs should ensure the inputs are 0/1. This includes
  select, conditional branches, and carry-ins.

There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.

Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.

There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.

Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.

This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
2020-01-06 18:26:42 -05:00
Matt Arsenault
f3de8ab5cc GlobalISel: Implement lower for G_INTRINSIC_ROUND
Mostly copied from AMDGPU lowering implementation, except used
G_SITOFP instead of directly creating a select on -1.0, 0.0.
2020-01-06 18:26:42 -05:00
Matt Arsenault
ee6b8722ff GlobalISel: Fix unsupported legalize action
This would complain about invalid legalizer rules otherwise.

Mark some operations as unsupported for AMDGPU. This currently seems
to produce the same legalize error as when no rules are defined, but
eventually this should produce a proper user facing error.
2020-01-06 17:21:51 -05:00
Matt Arsenault
d12f2a2998 GlobalISel: Scalarize all division operations
This only handled G_SDIV, but they all are trivially scalarizable.

Also define placeholder AMDGPU division legalizer rules.
2020-01-04 13:47:10 -05:00
Matt Arsenault
d9b5063b25 AMDGPU/GlobalISel: Legalize more odd sized loads
The attempts to widen sufficently aligned, odd sized loads wasn't
consistently applied.
2020-01-04 12:38:39 -05:00
Matt Arsenault
9fd31fdbd3 GlobalISel: moreElementsVector for FP min/max 2019-12-30 10:39:53 -05:00
Matt Arsenault
e088846712 AMDGPU/GlobalISel: Fix extra result register in fdiv64 lowering
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
2019-12-27 08:49:43 -05:00
Matt Arsenault
df5c2159d0 AMDGPU/GlobalISel: Legalize some 16-bit round instructions 2019-12-24 09:53:01 -05:00
Matt Arsenault
9035fa6b54 AMDGPU/GlobalISel: Lower llvm.amdgcn.else 2019-12-24 09:53:01 -05:00
Jay Foad
c7c05b0c8a [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71838
2019-12-23 15:58:19 +00:00
Matt Arsenault
42a26445f9 AMDGPU/GlobalISel: Fix misuse of div_scale intrinsics
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.

I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
2019-12-21 04:55:36 -05:00
Austin Kerbow
f3225f2abe AMDGPU/GlobalISel: Legalize FDIV64
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70403
2019-11-19 21:02:27 -08:00
Matt Arsenault
ea23b6428b AMDGPU: Be explicit about denormal mode in MIR tests
Start checking the machine function in GlobalISel instead of the
target directly.

This temporarily breaks fcanonicalize selection in GlobalISel.
2019-11-19 19:55:43 +05:30