0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.
Both input and output of ballot are lane-masks:
result is lane-mask with 'S32/S64 LLT and SGPR bank'
input is lane-mask with 'S1 LLT and VCC reg bank'.
Ballot copies bits from input lane-mask for
all active lanes and puts 0 for inactive lanes.
GlobalISel did not set 0 in result for inactive lanes
for non-constant input.
global_wb with scopes lower than SCOPE_SYS is unnecessary for
correctness.
I was initially optimistic they would be very cheap no-ops but they can
actually be quite expensive so let's avoid them.
Adds hidden kernel arguments to the function signature and marks them
inreg if they should be preloaded into user SGPRs. The normal kernarg
preloading logic then takes over with some additional checks for the
correct implicitarg_ptr alignment.
Special care is needed so that metadata for the hidden arguments is not
added twice when generating the code object.
Converts AMDGPUResourceUsageAnalysis pass from Module to MachineFunction
pass. Moves function resource info propagation to to MC layer (through
helpers in AMDGPUMCResourceInfo) by generating MCExprs for every
function resource which the emitters have been prepped for.
Fixes https://github.com/llvm/llvm-project/issues/64863
Propose an extension to base type DIEs such that DW_ATE_address-encoded
base types can include an architecture specific address space. Use this
to implement DW_OP_convert conversions between AMDGPU address space
addresses where meaningful.
This patch adds a description of the core file format used for AMDGPU.
Reference implementation for creating and loading AMDGPU core dump is
available in
[ROCgdb-6.2](https://github.com/ROCm/ROCgdb/tree/rocm-6.2.x/gdb)
Add a spec for yet-to-be-implemented metadata to allow the backend to
fully handle atomicrmw lowering. This is the base of an alternative
to #69229, which inverts the direction to be correct by default, and
extends to cover the peer device case.
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
The various preloaded registers (workitem IDs, workgroup IDs, and
various implicit pointers) always have a finite, invariant, well-defined
value throughout a well-defined program.
In cases where the compiler infers or the user declares that some
implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the
behavior of the entire program is undefined, since that misdeclaration
can cause arbitrary other preloaded-register intrinsics to access the
wrong register. This case is not expected to arise in practice, but
could occur when the no implicit argument attributes were not cleared
correctly in the presence of external functions, indrect calls, or other
means of executing un-analyzable code. Failure to detect that case would
be a bug in the attributor.
This commit updates the documentation to reflect this long-standing
reality.
Then, on the basis that all implicit arguments are defined in all
correct programs, the intrinsics that return those values are
annototated with `noundef``. Some implicit pointer arguments gain a
`nonnull`, but the kernel argument segment pointer or implicit argument
pointers don't necessarily have this property.
This will prevent spurious calls to `freeze` in front-end optimizations
that destroy user-provided ranges on built-in IDs.
(While I'm here, this commit adds a test for `noundef` on kernel
arguments which is currently unimplemented)
Now that we've got (minus some issues around datatypes and invariant
loads) working lowerings for address space 7, update the table in the
AMDGPU usage guide to properly indicate the nature of these address
spaces.
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"
This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.
MSVC does not like constexpr on the definition after an extern
declaration of a global.
Revert "AMDGPU: Try to fix build error with old gcc"
This reverts commit c7ad12d0d7606b0b9fb531b0b273bdc5f1490ddb.
Revert "AMDGPU: Use umin in set.rounding expansion"
This reverts commit a56f0b51dd988ad2b533de759c98457c1ed42456.
Revert "AMDGPU: Optimize set_rounding if input is known to fit in 2 bits (#88588)"
This reverts commit b4e751e2ab0ff152ed18dea59ebf9691e963e1dd.
Revert "AMDGPU: Implement llvm.set.rounding (#88587)"
This reverts commit 9731b77e80261c627d79980f8c275700bdaf6591.
Use a shift of a magic constant and some offseting to convert from
flt_rounds values.
I don't know why the enum defines Dynamic = 7. The standard suggests -1
is the cannot determine value. If we could start the extended values at
4 we wouldn't need the extra compare sub and select.
https://reviews.llvm.org/D153257
These mostly are checking for various reserved bits being set. The diagnostics
for gpu-dependent reserved bits have a bit more context since they seem like the
most likely ones to be observed in practice.
This commit also improves the error handling mechanism for
MCDisassembler::onSymbolStart(). Previously it had a comment stream parameter
that was just being ignored by llvm-objdump, now it returns errors using
Expected<T>.
The link to the Heterogeneous-race-free Memory Models ASPLOS'14 paper by
Hower et al. pointed to a bogus website, probably because the domain
ownership has changed.
This patch updates it to a version hosted on research.cs.wisc.edu.
SIMachineFunctionInfo has a scan of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
The table syntax for sphinx is really insufferably whitespace dependent.
I've been meaning to convert the existing attribute and intrinsic tables
to use list-table, which is less painful to merge.
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.
The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.
After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.
Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.
This contains the documentation changes for both this change and #76954
as well.