6209 Commits

Author SHA1 Message Date
Arthur Eubanks
44a3241f10 [NFC] Replace some attribute methods that use confusing indexes 2021-08-19 14:10:26 -07:00
Stanislav Mekhanoshin
8d7d89b081 [AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.

Differential Revision: https://reviews.llvm.org/D108315
2021-08-19 11:40:30 -07:00
Joe Nash
9dbc968ed9 [AMDGPU] Fix atomic float max/min intrinsics
Hooked up raw.buffer.atomic.fmin/max.f64
This instruction should be available on GFX6, GFX7, and GFX10.
It was implemented for GFX90a with a different name.

Added intrinsic def for image_atomic_fmin/fmax; the instruction
defs were already there.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108208

Change-Id: I473f98d28b2afbeeb2c27822d9686b5e86634e2f
2021-08-18 14:12:42 -04:00
Christudasan Devadasan
4f5ba46e16 [AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to
zero for all non-instructions. With that we can avoid
the special handling for them in `getWaitStatesSince`
and `AdvanceCycle`. This NFC patch makes the handling
more generic.
2021-08-18 01:46:59 -04:00
Arthur Eubanks
3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks
de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks
ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Qiu Chaofan
5ca250a03d [RegAlloc] Remove addAllocPriorityToGlobalRanges hook
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D108010
2021-08-18 10:21:27 +08:00
Simon Pilgrim
fb81271e8b [AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32
As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code.

Differential Revision: https://reviews.llvm.org/D108210
2021-08-17 18:09:57 +01:00
Sebastian Neubauer
fbae34635d [GlobalISel] Add combine for PTR_ADD with regbanks
Combine two G_PTR_ADDs, but keep the register bank of the constant.
That way, the combine can be used in post-regbank-select combines.

Introduce two helper methods in CombinerHelper, getRegBank and
setRegBank that get and set an optional register bank to a register.
That way, they can be used before and after register bank selection.

Differential Revision: https://reviews.llvm.org/D103326
2021-08-17 13:58:16 +02:00
David Stuttard
ebdb0d09a4 AMDGPU: During img instruction ret value construction cater for non int values
Make sure return type is int type.

Differential Revision: https://reviews.llvm.org/D108131

Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644
2021-08-17 09:08:24 +01:00
Christudasan Devadasan
686607676f [AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE
are only placeholders to prevent certain unwanted
transformations and will get discarded during assembly
emission. They should not be counted during nop insertion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108022
2021-08-16 23:11:14 -04:00
Carl Ritson
99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Matt Arsenault
a77ae4aa6a AMDGPU: Stop attributor adding attributes to intrinsic declarations 2021-08-13 20:51:48 -04:00
Matt Arsenault
5beb9a0e6a AMDGPU: Respect compute ABI attributes with unknown OS
Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL,
so we still have the awkward unknown OS case to deal with. Previously
if the HSA ABI intrinsics appeared, we we would not add the ABI
registers to the function. We would emit an error later, but we still
need to produce some compile result. Start adding the registers to any
compute function, regardless of the OS. This keeps the internal state
more consistent, and will help avoid numerous test crashes in a future
patch which starts assuming the ABI inputs are present on functions by
default.
2021-08-13 20:44:46 -04:00
Arthur Eubanks
92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks
a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Matt Arsenault
d719f1c3cc AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected
globally. Not sure why this is a target option, and not just the
expected behavior (recently added in
1a6dc92be7d68611077f0fb0b723b361817c950c). This avoids an allocation
failure when many wide tuple spills are introduced. I think this is a
workaround since I would not expect the allocation priority to be
required, and only a performance hint. The allocator should be smarter
about when only a subregister needs to be spilled and restored.

This does regress a couple of degenerate store stress lit tests which
shouldn't be too important.
2021-08-10 13:12:34 -04:00
Tony Tye
53eb469195 [AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer
C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge
instruction memory orderings

Add common operation to merge memory orders that allows non strict memory
orderings to be combined. Use it in SIMemoryLegalizer and
MachineMemOperand::getMergedOrdering.

Reviewed By: efriedma, rampitec

Differential Revision: https://reviews.llvm.org/D106729
2021-08-10 08:43:03 +00:00
Michael Liao
05783e1cfe [amdgpu] Revise the conversion from i64 to f32.
- Replace 'cmp+sel' with 'umin' if possible.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D107507
2021-08-06 17:01:47 -04:00
Amara Emerson
2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Reshabh Sharma
5173854f19 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-06 15:53:33 +05:30
Jay Foad
83610d4eb0 [AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107474
2021-08-06 09:40:48 +01:00
Jay Foad
24b67a9024 [AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef
We can improve on the generic splitting by using ffbh/ffbl, which have a
defined result when the input is zero.

Differential Revision: https://reviews.llvm.org/D107442
2021-08-06 09:40:48 +01:00
Jay Foad
d77b43c385 [AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32
This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These
instructions have a defined result of -1 when the input is zero.

Differential Revision: https://reviews.llvm.org/D107441
2021-08-06 09:40:48 +01:00
Stanislav Mekhanoshin
d71924fbfe [AMDGPU] Improve v2i32/v2f32 insertelt patterns
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.

Fixes: SWDEV-298028

Differential Revision: https://reviews.llvm.org/D107602
2021-08-05 16:13:39 -07:00
Jay Foad
2b63933115 [AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107566
2021-08-05 15:57:40 +01:00
Jay Foad
e6c364a624 [AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107546
2021-08-05 15:57:40 +01:00
Jay Foad
e790b2b744 [AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI. 2021-08-05 09:36:13 +01:00
Michael Liao
5edc886e90 [amdgpu] Add an enhanced conversion from i64 to f32.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107187
2021-08-04 15:33:12 -04:00
Reshabh Sharma
dce35ef104 Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list"
This reverts commit d42e70b3d315645e37f3b1455d39e68678e69525.
2021-08-04 23:33:31 +05:30
Reshabh Sharma
d42e70b3d3 [AMDGPU] Handle functions in llvm's global ctors and dtors list
This patch introduces a new code object metadata field, ".kind"
which is used to add support for init and fini kernels.

HSAStreamer will use function attributes, "device-init" and
"device-fini" to distinguish between init and fini kernels from
the regular kernels and will emit metadata with ".kind" set to
"init" and "fini" respectively.

To reduce the number of init and fini kernels, the ctors and
dtors present in the llvm's global.ctors and global.dtors lists
are called from a single init and fini kernel respectively.

Reviewed by: yaxunl

Differential Revision: https://reviews.llvm.org/D105682
2021-08-04 19:53:33 +05:30
hsmahesha
596e61c332 [AMDGPU] Ignore call graph node which does not have function info.
While collecting reachable callees (from kernels), ignore call graph node which
does not have associated function or associated function is not a definition.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D107329
2021-08-04 10:22:33 +05:30
Jay Foad
40202b13b2 [AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends
These instructions have an implicit use of vcc which counts towards the
constant bus limit. Pre gfx10 this means that the explicit operands
cannot be sgprs. Use the custom inserter hook to call legalizeOperands
to enforce that restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51217

Differential Revision: https://reviews.llvm.org/D106868
2021-08-03 09:04:52 +01:00
Roman Lebedev
6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Carl Ritson
675c942373 [AMDGPU] Disable NSA for BVH instructions when appropriate
Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Differential Revision: https://reviews.llvm.org/D103230
2021-08-02 20:09:26 +09:00
Carl Ritson
a441de6d94 [AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics
Application of default mapping to BVH intrinsics was missing.
Copy parts of SelectionDAG test to GlobalISel test as these would
have indicated this error.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D107211
2021-08-02 12:43:38 +09:00
Matt Arsenault
faccf427df AMDGPU/GlobalISel: Remove special case lowering for non-pow-2 stores
We end up with extra copies from buildAnyExtOrTrunc if these are
lowered after the register types are legalized.
2021-07-30 12:37:29 -04:00
Tarindu Jayatilaka
7a797b2902 Take OptimizationLevel class out of Pass Builder
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D107025
2021-07-29 21:57:23 -07:00
Mirko Brkusanin
971f4173f8 [AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Differential Revision: https://reviews.llvm.org/D105709
2021-07-29 11:20:49 +02:00
Patrick Holland
dbed061bf1 [MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/.
Differential Revision: https://reviews.llvm.org/D106775
2021-07-28 11:23:18 -07:00
RamNalamothu
1a8c57179a [AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs
to consider caller save VGPR spills as well while anticipating if we
require FP.

Fixes: SWDEV-295978

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106758
2021-07-28 11:12:55 +05:30
Matt Arsenault
d7d2e4545e AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Matt Arsenault
b32d3d9e81 AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
2021-07-27 11:44:38 -04:00
Carl Ritson
fbaa35e169 [AMDGPU] Add SelectionDAG support for insert_subvector on v4f64
Enable custom insert_subvector for larger vector types.
This is necessary now that SelectionDAG can attempt v3f64 insert
to v4f64, etc.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D105385
2021-07-27 10:11:34 +09:00
Michael Liao
b0402a35fc [amdgpu] Add 64-bit PC support when expanding unconditional branches.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106445
2021-07-26 14:50:30 -04:00
Jay Foad
59f6865231 [AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

D106284 did this for SelectionDAG. This is the corresponding fix for
GlobalISel.

Differential Revision: https://reviews.llvm.org/D106451
2021-07-26 14:27:30 +01:00
Jay Foad
9ac10658ae [AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

Differential Revision: https://reviews.llvm.org/D106284
2021-07-26 14:27:30 +01:00
David Sherwood
0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Alexander Belyaev
edb05d555e [llvm] Inline getAssociatedFunction() in LLVM_DEBUG.
Function* F is used only inside LLVM_DEBUG, so that it causes unused
variable warning.
2021-07-24 11:49:21 +02:00