35 Commits

Author SHA1 Message Date
Juan Manuel MARTINEZ CAAMAÑO
dd1df099ae [InlineCost][TargetTransformInfo][AMDGPU] Consider cost of alloca instructions in the caller (2/2)
Before this patch, the compiler gave a bump to the inline-threshold
when the total size of the allocas passed as arguments to the
callee was below 256 bytes.
This heuristic ignores that some of these allocas could have be removed
by SROA if inlining was applied.

Ideally, this bonus would be attributed to the threshold once the
size of all the allocas that could not be handled by SROA is known:
at the end of the InlineCost analysis.
However, we may never reach this point if the inline-cost analysis exits
early when the inline cost goes over the threshold mid-analysis.

This patch proposes:
* Attribute the bonus in the inline-threshold when allocas are passed
  as arguments (regardless of their total size).
* Assigns a cost to each alloca proportional to its size,
  such that the cost of all the allocas cancels the bonus.

Potential problems:
* This patch assumes that removing alloca instructions with SROA is
  always profitable. This may not be the case if the total size of the
  allocas is still too big to be promoted to registers/LDS.
* Redundant calls to getTotalAllocaSize
* Awkwardly, the threshold attributed contributes to the single-bb and
  vector bonus.

Reviewed By: scchan

Differential Revision: https://reviews.llvm.org/D149741
2023-06-29 09:49:16 +02:00
Matt Arsenault
19293b82c1 Inline: Fix case of not inlining with denormal-fp-math-f32
This was failing to inline the opencl libraries with daz enabled. As a
modifier to the base mode, denormal-fp-mode-f32 is weird and has no
meaning if it's missing.
2023-06-09 19:09:48 -04:00
Matt Arsenault
d0b9cb1f65 AMDGPU: Add inlining testcases for denormal-fp-math
Somehow missed this one and it's not working correctly
2023-06-09 19:09:48 -04:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Matt Arsenault
bc37be1855 LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
2023-04-29 08:44:59 -04:00
Janek van Oirschot
e3515ba381 Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack"
Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures.

This diff fixes said failures by eliding most of `adjustInliningThresholdUsingCallee` for indirect calls as the callee function is unavailable for indirect calls.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D143498
2023-02-13 12:17:43 +00:00
Janek van Oirschot
1beba44526 Revert "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack"
This reverts commit 142c28ffa1323e9a8d53200a22c80d5d778e0d0f.
2023-02-03 19:13:57 +00:00
Janek van Oirschot
142c28ffa1 [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack
A regression from when new PM got enabled as default. Functions with a big number of instructions will elide getting inlined but do not consider the cost of passing arguments over stack if there are a lot of function arguments. This patch attempts to add a heuristic for AMDGPU's function calling convention that also considers function arguments passed through the stack.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D140242
2023-02-03 15:31:24 +00:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Nikita Popov
151602c7a9 [Inline] Convert some tests to opaque pointers (NFC) 2022-12-08 10:05:23 +01:00
Roman Lebedev
e5369823bc
[NFC] Port all Inline tests to -passes= syntax 2022-12-08 03:09:27 +03:00
Johannes Doerfert
f6e3a89cc0 [AMDGPU] Annotate the intrinsics to be default and nocallback
Differential Revision: https://reviews.llvm.org/D135155
2022-12-07 14:25:25 -08:00
Matt Arsenault
5460b45564 AMDGPU: Rename test functions and add some cases for consistency
Test all the permutations.
2022-12-07 15:56:21 -05:00
Matt Arsenault
27712243ab Revert "Inliner: Correctly merge amdgpu-unsafe-fp-atomics attribute"
This reverts commit 169ebf03ab2a6f16bfa32a36305929c7bc8e4784.

This was effectively rendering the attribute useless in the real
world, although this is still broken.
2022-03-03 15:25:32 -05:00
Matt Arsenault
169ebf03ab Inliner: Correctly merge amdgpu-unsafe-fp-atomics attribute
It seems we don't have already have any target specific attributes
handled in the inliner. Include a separate tablegen file to define the
rules, similar to the target specific intrinsics.
2021-12-15 18:37:18 -05:00
Matt Arsenault
946e805665 AMDGPU: Add baseline test for unsafe fp atomics attribute inlining 2021-12-15 18:37:18 -05:00
Stanislav Mekhanoshin
b7b99b0799 [AMDGPU] Fix -amdgpu-inline-arg-alloca-cost
Before D94153 this threshold was in a pre-scaled units.
After D94153 inlining threshold multiplier is not applied
to this portion of the threshold anymore. Restore the
threshold by applying the multiplier.

Differential Revision: https://reviews.llvm.org/D98362
2021-03-12 10:19:50 -08:00
Arthur Eubanks
a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Konstantin Zhuravlyov
3fdf3b1539 AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
2020-10-14 13:04:27 -04:00
Matt Arsenault
156afb2253 AMDGPU: Fix inlining logic for denormals
This was backwards from intended and missing a test. We perhaps should
just ignored the FP mode here, since it shouldn't be legal to mix code
with different default modes in the absence of strictfp.
2020-04-23 15:30:48 -04:00
Matt Arsenault
5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Daniil Fukalov
d912a9ba9b [AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.

amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64642

llvm-svn: 366348
2019-07-17 16:51:29 +00:00
Stanislav Mekhanoshin
fbbe5230f4 [AMDGPU] Use InliningThresholdMultiplier for inline hint
AMDGPU uses multiplier 9 for the inline cost. It is taken into account
everywhere except for inline hint threshold. As a result we are penalizing
functions with the inline hint making them less probable to be inlined
than those without the hint. Defaults are 225 for a normal function and
325 for a function with an inline hint. Currently we have effective
threshold 225 * 9 = 2025 for normal functions and just 325 for those with
the hint. That is fixed by this patch.

Differential Revision: https://reviews.llvm.org/D62707

llvm-svn: 362239
2019-05-31 16:19:26 +00:00
Matt Arsenault
0ff901fba0 AMDGPU: Boost inline threshold with addrspacecasted alloca arguments
This was skipping GetUnderlyingObject for nonprivate addresses, but an
alloca could also be found through an addrspacecast if it's flat.

llvm-svn: 361649
2019-05-24 16:52:35 +00:00
Matt Arsenault
df24c92c0f AMDGPU: Assume xnack is enabled by default
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.

Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.

llvm-svn: 360903
2019-05-16 14:48:34 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Matt Arsenault
f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00
Matt Arsenault
294e07cf03 AMDGPU: Fix test filename
llvm-svn: 357441
2019-04-02 00:36:04 +00:00
Matt Arsenault
055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Matt Arsenault
d24296e282 AMDGPU: Ignore CodeObjectV3 when inlining
This was inhibiting inlining of library functions when clang was
invoking the inliner directly. This is covering a bit of a mess with
subtarget feature handling, and this shouldn't be a subtarget
feature. The behavior is different depending on whether you are using
a -mattr flag in clang, or llc, opt.

llvm-svn: 353899
2019-02-12 23:30:11 +00:00
Matt Arsenault
aac47c1c00 AMDGPU: Use a custom areInlineCompatible
Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

llvm-svn: 310269
2017-08-07 17:08:44 +00:00