Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.
Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.
To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.
Co-authored-by: Shilei Tian <shilei.tian@amd.com>
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they
are different on some bots but reverting this is far too disruptive.
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
Define subtarget features for atomic fmin/fmax support.
The flat/global support is a real messe. We had float/double support at
the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them.
gfx11 removed the f64 versions again.
gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.