2768 Commits

Author SHA1 Message Date
Matt Arsenault
c0ec72d4f8 AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics
llvm-svn: 373840
2019-10-06 01:37:38 +00:00
Matt Arsenault
bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault
a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault
69c65a8609 AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics
This wasn't updated for the immarg handling change.

llvm-svn: 373837
2019-10-06 01:37:34 +00:00
Matt Arsenault
d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault
412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Matt Arsenault
be9521acaa AMDGPU/GlobalISel: Support wave32 waterfall loops
llvm-svn: 373714
2019-10-04 08:35:35 +00:00
Matt Arsenault
ed77b27441 AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT
llvm-svn: 373639
2019-10-03 17:59:03 +00:00
Matt Arsenault
233ff982c7 AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.

llvm-svn: 373638
2019-10-03 17:55:27 +00:00
Matt Arsenault
56271fe180 AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.

llvm-svn: 373637
2019-10-03 17:50:32 +00:00
Matt Arsenault
9256183994 AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization
llvm-svn: 373636
2019-10-03 17:50:31 +00:00
Matt Arsenault
3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
Matt Arsenault
1c135a39aa AMDGPU/GlobalISel: Expand G_BITCAST legality
llvm-svn: 373567
2019-10-03 05:46:08 +00:00
Stanislav Mekhanoshin
1384c3a5b8 [AMDGPU] Fix illegal agpr use by VALU
When SIFixSGPRCopies attempts to fix an illegal copy from vector to
scalar register it calls moveToVALU(). A copy from an agpr to sgpr
becomes a copy from agpr to agpr, which may result in the illegal
register class at a use of this copy.

Solution is to copy it always into a vgpr. This may result in a
subsequent copy into an agpr if that is what really needed, however
should not happen too often and likely will be folded later.

The opposite situation may not happen because an sgpr is always
illegal where agpr is legal, so such user instructions may not
exist.

Differential Revision: https://reviews.llvm.org/D68358

llvm-svn: 373544
2019-10-02 23:23:46 +00:00
Piotr Sobczak
265e94e657 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491
2019-10-02 17:22:36 +00:00
Matt Arsenault
cdfe5efe9b AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX
In principle this should behave as any other constant. However
eliminateFrameIndex currently assumes a VALU use and uses a vector
shift. Work around this by selecting to VGPR for now until
eliminateFrameIndex is fixed.

llvm-svn: 373415
2019-10-02 01:02:24 +00:00
Matt Arsenault
bfce0c2664 AMDGPU/GlobalISel: Private loads always use VGPRs
llvm-svn: 373414
2019-10-02 01:02:21 +00:00
Matt Arsenault
05aa8a733e AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR
This will be needed to support AGPR operations.

llvm-svn: 373413
2019-10-02 01:02:18 +00:00
Matt Arsenault
3a657afb3a AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values
llvm-svn: 373412
2019-10-02 01:02:14 +00:00
Stanislav Mekhanoshin
075bc48a7f [AMDGPU] separate accounting for agprs
Account and report agprs separately on gfx908. Other targets
do not change the reporting.

Differential Revision: https://reviews.llvm.org/D68307

llvm-svn: 373411
2019-10-02 00:26:58 +00:00
Changpeng Fang
e4ee28d14c AMDGPU: Fix an out of date assert in addressing FrameIndex
Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D67574

llvm-svn: 373404
2019-10-01 23:07:14 +00:00
Matt Arsenault
9dba603748 AMDGPU/GlobalISel: Increase max legal size to 1024
There are 1024 bit register classes defined for AGPRs. Additionally
OpenCL defines vectors up to 16 x i64, and this helps those tests
legalize.

llvm-svn: 373350
2019-10-01 16:35:06 +00:00
Dmitri Gribenko
827a7fab78 Revert "GlobalISel: Handle llvm.read_register"
This reverts commit r373294. It broke Clang's
CodeGen/arm64-microsoft-status-reg.cpp:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483

llvm-svn: 373310
2019-10-01 08:24:01 +00:00
Matt Arsenault
fdea5e02ce AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP
llvm-svn: 373298
2019-10-01 02:23:20 +00:00
Matt Arsenault
59b91aa93e AMDGPU/GlobalISel: Add support for init.exec intrinsics
TThe existing wave32 behavior seems broken and incomplete, but this
reproduces it.

llvm-svn: 373296
2019-10-01 02:07:25 +00:00
Matt Arsenault
bdcc6d3d26 GlobalISel: Handle llvm.read_register
SelectionDAG has a bunch of machinery to defer this to selection time
for some reason. Just directly emit a copy during IRTranslator. The
x86 usage does somewhat questionably check hasFP, which could depend
on the whole function being at minimum translated.

This does lose the convergent bit if the callsite had it, which may be
a problem. We also lose that in general for intrinsics, which may also
be a problem.

llvm-svn: 373294
2019-10-01 02:07:16 +00:00
Matt Arsenault
8f6bdb7668 AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering
This is sort of papering over the fact that we don't run a combiner
anywhere, but avoiding creating 2 instructions in the first place is
easy.

llvm-svn: 373293
2019-10-01 01:44:46 +00:00
Matt Arsenault
54167ea316 AMDGPU/GlobalISel: Select G_UADDO/G_USUBO
llvm-svn: 373288
2019-10-01 01:23:13 +00:00
Matt Arsenault
ed85b0cee6 GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources
Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU.

llvm-svn: 373287
2019-10-01 01:06:48 +00:00
Matt Arsenault
77ac400117 AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE
Handle other cases besides LDS. Mostly a straight port of the existing
handling, without the intermediate custom nodes.

llvm-svn: 373286
2019-10-01 01:06:43 +00:00
Alexander Timofeev
565b1d3d46 [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition
Reviewers: rampitec

      Differential Revision: https://reviews.llvm.org/D67662

llvm-svn: 373221
2019-09-30 15:31:17 +00:00
Roger Ferrer Ibanez
5a2a14db0b [TargetLowering] Simplify expansion of S{ADD,SUB}O
ISD::SADDO uses the suggested sequence described in the section §2.4 of
the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for
(non-zero) positive.

Differential Revision: https://reviews.llvm.org/D47927

llvm-svn: 373187
2019-09-30 07:58:50 +00:00
Matt Arsenault
317d991fa5 AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor
llvm-svn: 373180
2019-09-30 06:31:30 +00:00
Stanislav Mekhanoshin
374c04e257 [AMDGPU] Improve fma.f64 test. NFC.
llvm-svn: 372908
2019-09-25 18:50:34 +00:00
Stanislav Mekhanoshin
d3b2b97195 [AMDGPU] gfx10 v_fmac_f16 operand folding
Fold immediates into v_fmac_f16.

Differential Revision: https://reviews.llvm.org/D68037

llvm-svn: 372906
2019-09-25 18:40:20 +00:00
Matt Arsenault
eb6eb694e4 AMDGPU/GlobalISel: Allow selection of scalar min/max
I believe all of the uniform/divergent pattern predicates are
redundant and can be removed. The uniformity bit already influences
the register class, and nothhing has broken when I've removed this and
others.

llvm-svn: 372450
2019-09-21 02:37:33 +00:00
Stanislav Mekhanoshin
af77ca7e6e Remove assert from MachineLoop::getLoopPredecessor()
According to the documentation method returns predecessor
if the given loop's header has exactly one unique predecessor
outside the loop. Otherwise return null.

In reality it asserts if there is no predecessor outside of
the loop.

The testcase has the loop where predecessors outside of the
loop were not identified as analyzeBranch() was unable to
process the mask branch and returned true. That is also not
correct to assert for the truly dead loops.

Differential Revision: https://reviews.llvm.org/D67634

llvm-svn: 372405
2019-09-20 15:26:10 +00:00
Nico Weber
03475adcf7 Revert r372366 "Use getTargetConstant for BLENDI, and add a test to catch it."
This reverts commit 52621307bcab2013e8833f3317cebd63a6db3885.

Tests have been failing all night with

    [0/2] ACTION //llvm/test:check-llvm(//llvm/utils/gn/build/toolchain:unix)
    -- Testing: 33647 tests, 64 threads --
    Testing: 0 .. 10..
    UNRESOLVED: LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll (6943 of 33647)
    ******************** TEST 'LLVM :: CodeGen/AMDGPU/GlobalISel/isel-blendi-gettargetconstant.ll' FAILED ********************
    Test has no run line!
    ********************

Since there were other concerns on https://reviews.llvm.org/D67785,
I'm just reverting for now.

llvm-svn: 372383
2019-09-20 12:05:29 +00:00
Sterling Augustine
52621307bc Use getTargetConstant for BLENDI, and add a test to catch it.
Summary: This fixes a crasher introduced by r372338.

Reviewers: echristo, arsenm

Subscribers: jvesely, wdng, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67785

Tighten up the test case.

llvm-svn: 372366
2019-09-20 02:29:16 +00:00
Matt Arsenault
dd74f4839b MachineScheduler: Fix missing dependency with multiple subreg defs
If an instruction had multiple subregister defs, and one of them was
undef, this would improperly conclude all other lanes are
killed. There could still be other defs of those read-undef lanes in
other operands. This would improperly remove register uses from
CurrentVRegUses, so the visitation of later operands would not find
the necessary register dependency. This would also mean this would
fail or not depending on how different subregister def operands were
ordered.

On an undef subregister def, scan the instruction for other
subregister defs and avoid killing those.

This possibly should be deferring removing anything from
CurrentVRegUses until the entire instruction has been processed
instead.

llvm-svn: 372362
2019-09-20 00:09:15 +00:00
Alexander Timofeev
e2f9bc3b11 [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests.
Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D67712

llvm-svn: 372340
2019-09-19 16:44:38 +00:00
Matt Arsenault
3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg
13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault
bffbeecb44 AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzle
llvm-svn: 372297
2019-09-19 04:11:17 +00:00
Matt Arsenault
494243597b AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format
This needs special handling due to some subtargets that have a
nonstandard register layout for f16 vectors

Also reject some illegal types on other targets.

llvm-svn: 372293
2019-09-19 02:35:08 +00:00
Matt Arsenault
67f1f6ff8c AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store
llvm-svn: 372292
2019-09-19 02:30:27 +00:00
Matt Arsenault
838ff36553 AMDGPU/GlobalISel: RegBankSelect struct buffer load/store
llvm-svn: 372291
2019-09-19 02:26:53 +00:00
Matt Arsenault
a62ef58346 AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load|store}
llvm-svn: 372290
2019-09-19 02:25:09 +00:00
Matt Arsenault
a30d022db6 AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsics
Images should always have 2 consecutive, mandatory SGPR arguments.

llvm-svn: 372289
2019-09-19 02:23:06 +00:00
Matt Arsenault
01213407c4 Fix typo
llvm-svn: 372288
2019-09-19 02:15:29 +00:00