This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Make codegen emit correctly rounded sqrt by default.
Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare
based on !fpmath, like the fdiv case. Hack around visitation ordering
problems from AMDGPUCodeGenPrepare using forward iteration instead of
a well behaved combiner.
https://reviews.llvm.org/D158129
Factor out and unify some common code that calculates and tracks the
number of user SGRPs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159439
Currently s_getreg_b32 is missing the possible mode use. Really we
need separate pseudos for mode-only accesses, but leave this as a
pre-existing issue.
https://reviews.llvm.org/D152710
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:
t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!
Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
Recommit with a fix for a use-after-free bug in the first version of
this patch (#65340) which was caught by asan.
SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:
t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!
Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
There are really two rounding modes, so only return the standard
values if both modes are the same. Otherwise, return a bitmask
representing the two modes.
Annoyingly the register doesn't use the same values as FLT_ROUNDS. Use
a simple integer table we can shift into to convert.
https://reviews.llvm.org/D153158
Introducing rsq contract flags is wrong, and also requires some level
of approximate functions. AMDGPUCodeGenPrepare already should handle
the f32 cases with appropriate flags, and I don't see how new
situations to handle would arise during legalization (other than cases
involving the rcp intrinsic, which instcombine tries to
handle). AMDGPUCodeGenPrepare does need to learn better handling of
rcp/rsq for f64 though, which we never bothered to handle well.
Removes another obstacle to correctly lowering sqrt.
https://reviews.llvm.org/D158099
Simple function which scalarizes Ops then ExtOrTruncs them according to function parameters
Differential Revision: https://reviews.llvm.org/D157733
Change-Id: Ie5215069228f7bf530cd2dbb4bd17cbf409e046a
Not sure if the only valid use is to have stackrestore directly
consume stacksave outputs or not. Handled exactly like a regular stack
pointer so all the edge cases theoretically should work.
https://reviews.llvm.org/D156669
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:
- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS
Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154766
This is required for many trees produced in practice for i8 CodeGen.
Differential Revision: https://reviews.llvm.org/D155864
Change-Id: Iac01d183d9998b15138bdc7a5051e3bed338e7d9
The patterns were ripped out in
a4a3ac10cb1a40ccebed4e81cd7e94f1eb71602d so this always needs to be
custom lowered. I absolutely hate how difficult it is to write tests
for these, I have no doubt there are more of these hidden.
Fixes#64142
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these users should be
updated to call the intrinsic instead.
The library and llpc expansions are slightly different.
llpc uses an ldexp to do the scale; the library uses a multiply.
Use ldexp to do the scale instead of the multiply.
I believe v_ldexp_f64 and v_mul_f64 are always the same number of
cycles, but it's cheaper to materialize the 32-bit integer constant
than the 64-bit double constant.
The libraries have another fast version of sqrt which will
be handled separately.
I am tempted to do this in an IR expansion instead. In the IR
we could take advantage of computeKnownFPClass to avoid
the 0-or-inf argument check.
When input to intrinsic is uniform value, reduced value is
same as input whereas if input value is divergent we need
to iterate over all active lanes of WaveFront to perform
the reduction.
The control flow for a `loop` has been set up, which
iterates over `only` active lanes to perform reduction.
Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and
WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which
are lowered Post-ISel (in `EmitInstrWithCustomInserter `).
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D154858
We were dropping the flags and thus blocking contract into potential
fadd users. GlobalISel was already preserving the flags here.
https://reviews.llvm.org/D155443
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.
Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.
After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.
Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D155190
Documentation for TargetLowering::getShiftAmountTy says that LegalTypes
should generally be true during type legalization, so this patch does
that.
On AMDGPU the effect is that we use i32 (a sane type) instead of i64
(pointer sized type) for more shift amounts, which in turn allows more
formation of rotates and funnel shifts pre-legalization.
Differential Revision: https://reviews.llvm.org/D154960
The most notable issue was producing v_mad_f32 in functions with the
dynamic mode, since it just ignores the mode. fdiv lowering is still
somewhat broken because it involves a mode switch and we need to query
the original mode.
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
1) Removes the check and updates affected tests which now do some more reassociations.
2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
check. Not sure why I didn't do this the first time.
To reduce the register pressure during allocation,
when the allocator spills a virtual register that
corresponds to a whole wave mode operation, the
spill loads and restores should be activated for
all lanes by temporarily flipping all bits in exec
register to one just before the spills. It is not
implemented in the compiler as of today and this
patch enables the necessary support.
This is a pre-patch before the SGPR spill to virtual
VGPR lanes that would eventually causes the whole
wave register spills during allocation.
Reviewed By: arsenm, cdevadas
Differential Revision: https://reviews.llvm.org/D143759
These inherited the fast math checks from f32, but the manual suggests
these should be accurate enough for unconditional use. The definition
of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've
been a bit nervous about changing this as the OpenCL conformance test
does not cover half. Brute force produces identical values compared to
a reference host implementation for all values.
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
Remove DAGISel checks on calling conventions. GlobalISel doesn't have
these checks either and we prefer it that way (see D152794).
Add a simple test like the one introduced in D117479 for GlobalISel.
Differential Revision: https://reviews.llvm.org/D153535
We previously directly codegened to v_log_f32, which is broken for
denormals. The lowering isn't complicated, you simply need to scale
denormal inputs and adjust the result. Note log and log10 are still
not accurate enough, and will be fixed separately.