6391 Commits

Author SHA1 Message Date
Kazu Hirata
ee0133dc6d [llvm] Use range-for loops (NFC) 2021-11-16 09:01:56 -08:00
Jon Chesterfield
30b29db7c7 [amdgpu] Don't crash on empty global ctor/dtor
Global ctor/dtor can be an empty array, which is a Constant not a
ConstantArray. The cast<ConstantArray> therefore asserts / crashes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D113800
2021-11-16 14:36:08 +00:00
Matt Arsenault
659887b405 AMDGPU: Mark prolog/epilog SCC defs as dead
A future change will add SCC liveness checks. Since we are still
relying on forward register scavenging, add dead flags to avoid
spuriously detecting SCC as live.
2021-11-15 21:35:06 -05:00
Dmitry Preobrazhensky
91f4650ebb [AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap*
Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes.

Differential Revision: https://reviews.llvm.org/D113746
2021-11-15 12:51:12 +03:00
Kazu Hirata
a84a401f7e [AMDGPU] Remove selectStoreIntrinsic (NFC)
The last use was removed on Jan 13, 2020 in commit
533d650e947a2f7216a315aeb8c79ac1d4740e5f.
2021-11-14 19:40:44 -08:00
Kazu Hirata
efa896e5f7 [Target] Use SDNode::uses (NFC) 2021-11-12 21:23:04 -08:00
Jay Foad
a70bbb5f7a [AMDGPU] Simplify 64-bit division/remainder expansion
The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.

Differential Revision: https://reviews.llvm.org/D113679
2021-11-12 15:48:41 +00:00
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Kazu Hirata
2ca45adf24 [CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC) 2021-11-11 22:28:55 -08:00
kpyzhov
c9690092c8 [AMDGPU] Small correction in SITargetLowering::performOrCombine().
Differential Revision: https://reviews.llvm.org/D113203
2021-11-10 21:07:27 -05:00
Stanislav Mekhanoshin
476ab0f809 [AMDGPU] Fixed stack pointer init with architected flat scratch
Even if wave offset is not present we still need to do the rest
of the initialization. The mov into s32 was missing in the
kernels.

Fixes: SWDEV-310935

Differential Revision: https://reviews.llvm.org/D113628
2021-11-10 17:18:38 -08:00
Matt Arsenault
c7a0c2d0f7 AMDGPU: Report large stack usage for recursive calls
We were previously setting an ignored bit in the kernel headers. The
current behavior is to add the large amount on top of the statically
known size of a single stack frame. I'm not sure if we should just use
the large size as the entire reported size instead.
2021-11-10 20:02:01 -05:00
Matt Arsenault
90ff148719 AMDGPU: Account for implicit argument alignment for kernarg segment
If a kernel had no formal arguments but did have the implicit
arguments, we were reporting a required kernarg alignment of 4. For
some reason we require an 8-byte alignment for this, even though
there's no real advantage and I don't see where this is documented in
the ABI.

The code object header code also claims the minimum alignment is 16,
which is what I thought you always got at runtime anyway so I don't
know why this matters.
2021-11-09 17:48:37 -05:00
Michael Liao
bf225939bc [InferAddressSpaces] Support assumed addrspaces from addrspace predicates.
- CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC.
- This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g.,

```
  foo(float *p) {
    __builtin_assume(__isGlobal(p));
    // From there, we could assume p is a global pointer instead of a
    // generic one.
  }
```

This makes the code portable without introducing the implementation-specific features.

Note that NVCC starts to support __builtin_assume from version 11.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112041
2021-11-08 16:51:57 -05:00
Joe Nash
79f52af4cd [AMDGPU] Make getInstSizeInBytes more generic
NFC. This check mainly handles size affecting literals. Make it check all
explicit operands instead of a few by name. Also make the isLiteral
check handle the KIMM operands, see https://reviews.llvm.org/D111067

Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D113318

Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e
2021-11-08 10:34:49 -05:00
Simon Pilgrim
f059b04f7b [DAG] Add SelectionDAG::ComputeMinSignedBits helper
As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper.

I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious.

Differential Revision: https://reviews.llvm.org/D113396
2021-11-08 14:12:45 +00:00
skc7
a0633f5ccb [AMDGPU] Test Commit. NFC
Reviewed By: hsmhsm

Differential Revision: https://reviews.llvm.org/D113379
2021-11-08 07:09:09 +00:00
Kazu Hirata
aee86f9b6c [AMDGPU] Remove unused declaration selectSMRD (NFC)
The function body proper was removed on Feb 20, 2019 in commit
79b5c3842b684f873d1ffad502336e973616ea51.
2021-11-07 09:53:18 -08:00
Benjamin Kramer
9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Kazu Hirata
e4bab21848 [AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-06 19:31:20 -07:00
Kazu Hirata
14d656b3d8 [Target] Use llvm::reverse (NFC) 2021-11-06 13:08:21 -07:00
Kazu Hirata
2c4ba3e9d3 [Target] Use make_early_inc_range (NFC) 2021-11-05 09:14:32 -07:00
Jay Foad
c93bf53a3e [AMDGPU] NFC formatting fixes in SIMemoryLegalizer 2021-11-05 09:10:24 +00:00
Thomas Symalla
76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
RamNalamothu
539f500e78 [AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.

Fixes: SWDEV-307590

Reviewed By: scott.linder, sebastian-ne

Differential Revision: https://reviews.llvm.org/D113100
2021-11-04 08:02:41 +05:30
alex-t
0a3d755ee9 [AMDGPU] Enable divergence-driven BFE selection
Detailed description: This change enables the bit field extract patterns
selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node
divergence.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110950
2021-11-03 23:26:59 +03:00
Kazu Hirata
4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Abinav Puthan Purayil
fbe61fb0aa [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation.
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.

Differential Revision: https://reviews.llvm.org/D113005
2021-11-03 09:09:24 +05:30
Jay Foad
be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
Martin Liska
c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
Jay Foad
7afef22926 [AMDGPU] Use MachineInstrBuilder::addReg. NFC. 2021-11-01 15:29:51 +00:00
Jay Foad
2b548b18c1 [AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32
Differential Revision: https://reviews.llvm.org/D112917
2021-11-01 13:55:53 +00:00
Kazu Hirata
72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Christudasan Devadasan
aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin
e5340ed30c [AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is
regbank selected for the first time. Defer caching of usesAGPRs()
in this case.

Differential Revision: https://reviews.llvm.org/D112644
2021-10-29 14:23:14 -07:00
Jay Foad
21a1d4cf71 [AMDGPU] Change numBitsSigned for simplicity and document it. NFC.
Change numBitsSigned to return the minimum size of a signed integer that
can hold the value. This is different by one from the previous result
but is more consistent with numBitsUnsigned. Update all callers. All
callers are now more consistent between the signed and unsigned cases,
and some callers get simpler, especially the ones that deal with
quantities like numBitsSigned(LHS) + numBitsSigned(RHS).

Differential Revision: https://reviews.llvm.org/D112813
2021-10-29 14:22:06 +01:00
Vang Thao
52b43d1549 [AMDGPU] Fix cvt_f32_ubyte combine with shl
Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112733
2021-10-28 21:43:06 -07:00
Kazu Hirata
01b4789b62 [AMDGPU] Remove hasDefinedInitializer (NFC)
The last use was removed on Sep 16, 2021 in commit
7a62a5b56d670c4e152159740cd7fc4030a9470f.
2021-10-28 20:33:34 -07:00
Kazu Hirata
dd5d46b009 [AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC)
This field seems to be unused for at least one year.
2021-10-28 20:33:32 -07:00
Kazu Hirata
309357c01a [AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC) 2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil
2da6ef3664 [AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine.
mul24 intrinsic's operands are simplified by
AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds
the mul24hi intrinsics in the combine since its operands can be
simplified like that of the mul24 intrinsics.

Differential Revision: https://reviews.llvm.org/D112702
2021-10-28 16:57:48 +05:30
Sebastian Neubauer
fd1cfc9094 [AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of
  the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit
  from optimizations
- Add support for indirect function calls

To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.

Differential Revision: https://reviews.llvm.org/D109052
2021-10-28 10:30:55 +02:00
Kazu Hirata
cee3419d65 [AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC) 2021-10-27 21:24:02 -07:00
Michael Liao
e6a4ba3aa6 [amdgpu] Handle the case where there is no scavenged register.
- When an unconditional branch is expanded into an indirect branch, if
  there is no scavenged register, an SGPR pair needs spilling to enable
  the destination PC calculation. In addition, before jumping into the
  destination, that clobbered SGPR pair need restoring.
- As SGPR cannot be spilled to or restored from memory directly, the
  spilling/restoring of that SGPR pair reuses the regular SGPR spilling
  support but without spilling it into memory. As that spilling and
  restoring points are fully controlled, we only need to spill that SGPR
  into the temporary VGPR, which needs spilling into its emergency slot.
- The target-specific hook is revised to take additional restore block,
  where the restoring code is filled. After that, the relaxation will
  place that restore block directly before the destination block and
  insert an unconditional branch in any fall-through block into the
  destination block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106449
2021-10-27 18:37:27 -04:00
Austin Kerbow
02e60f2e77 [AMDGPU] Use max waves for scheduler's initial occupancy target
The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".

Changes by this patch:

Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.

Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112373
2021-10-26 15:30:26 -07:00
Neubauer, Sebastian
eb16570ab0 [AMDGPU] Remove unused CSR defs
CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used
anywhere, so remove them.

Differential Revision: https://reviews.llvm.org/D112535
2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil
61e3b9fefe [AMDGPU] Add constrained shift pattern matches.
The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b`
in OpenCL where a is an int of bit width <width>.

Differential revision: https://reviews.llvm.org/D110231
2021-10-26 19:07:19 +05:30
Abinav Puthan Purayil
781dd39b7b [AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare.
We were bailing out of creating 24-bit muls for results wider than 32
bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this
change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul
correctly.

Differential Revision: https://reviews.llvm.org/D112395
2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil
9bd5cfeb1f [AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics.
These intrinsics maps to the 24-bit v_mul_hi instructions.

This change also fixes an incorrect assumption on the associativity of
24-bit mulhi in its SDNode record in tblgen.

Differential Revision: https://reviews.llvm.org/D112394
2021-10-26 18:53:07 +05:30
Neubauer, Sebastian
487f15603e [AMDGPU] Fix setcc combine for i128
The combine asserted if constants could not be represented as uint64_t.
Use APInts to fix this.

Differential Revision: https://reviews.llvm.org/D112416
2021-10-26 13:39:50 +02:00