This re-applies commit ace20e24287b, which was reverted in eff4ef25b3dc.
The issues were fixed in:
* b30765caf874 [AArch64] Fix an incorrect handling of debug values in
MachineSink (#68107)
* b454b04d6869 [AArch64] Fix a compiler crash in MachineSink (#67705)
Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or
INSERT_SUBREG where the result is an SGPR, but some of the inputs are
constants materialized into VGPRs. This may happen in cases where for
instance several instructions use an immediate zero and SelectionDAG
chooses to put it in a VGPR to satisfy all of them. This however causes
the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may
lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into
an SGPR fixes the issue.
This was originally reverted because it triggered an unrelated bug in
PEI on one of the OpenMP buildbots. That bug has been fixed in #68299,
so it should be ok to try again.
In a special case in frame index elimination (when the offset is 0), we
generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code.
However, they don't expect their operands in the same order - S_LSHR_B32
takes the value to be shifted first and then the shift amount, whereas
V_LSHRREV_B32 has the operands reversed (hence the REV in its name).
Update the code & tests to take this into account. Also remove an
outdated comment (this code is definitely reachable now that non-entry
functions no longer have a fixed emergency scavenge slot).
In the WebAssembly back end, the TrapUnreachable option is currently
load-bearing for correctness, inserting wasm `unreachable` instructions
where needed to create valid wasm. There is another option,
NoTrapAfterNoreturn, that removes some of those traps and causes
incorrect wasm to be emitted.
This turns off `NoTrapAfterNoreturn` for the Wasm backend and adds new
tests.
Fixes MUBUF path for most vectors and pointers, which unblocks fixing
the gfx6/7 run lines in assorted tests. Also fixes inconsistent behavior
for -flat-for-global.
When lowering a constant build_vector sequence of doubles on RV32, if
the
addend wasn't zero, or the step/denominator wasn't one, it would crash
trying
to emit an illegal build_vector of <n x i64> with i32 operands, e.g:
t15: v2i64 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>
This patch fixes this by lowering the splats with
SelectionDAG::getConstant
with the vector type, which handles making it legal via
splat_vector_parts.
This was reverted in 824251c9b349d859a9169196cd9533c619a715ce exposed by this change in a previous patch. Fixed in 199cbec987ee68d70611db8e7961b43c3dbad83e. Original commit message follows.
This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.
A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.
This reverts commit 7a0b9daac9edde4293d2e9fdf30d8b35c04d16a6 and
63bbc250440141b1c51593904fba9bdaa6724280.
I'm seeing issues (e.g. on the GCC torture suite) where
combineBinOpOfExtractToReduceTree is called when the V extensions aren't
enabled and triggers a crash due to RISCVSubtarget::getElen asserting.
I'll aim to follow up with a minimal reproducer. Although it's pretty
obvious how to avoid this crash with some extra gating, there are a few
options as to where that should be inserted so I think it's best to
revert and agree the appropriate fix separately.
V_FMAAK_F16_t16 takes VGPR_32_Lo128 operands whereas the original
instructions would have VGPR_32 operands. Switching the opcodes without
updating operands' register classes leads to MachineVerifier complaining
about the classes not matching instruction definitions. The problem only
reveals itself of builds with expensive checks enabled because of
missing -verify-machineinstrs in the test.
This is the third attempt to update CodeGen/AMDGPU/fma.f16.ll to run for
GFX11, following the second attempt in a1e38e0b8e3e, partially reverted
in eaf737a4e004.
If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext.
Followup to D146121
Differential Revision: https://reviews.llvm.org/D155472
This changes the DUP(constant) -> MOVI code to handle either integer or fp
types, allowing more constant to be selected, and fixes up some cases where fp
constants were being incorrectly selected.
This PR replaces `isMachineBlockAddressTaken` by `hasAddressTaken` to
include blocks which have their IR address taken as well. These blocks
are also not removable since their predecessors' terminators do not
directly point to the block.
Usually `llvm.stacksave/stackrestore` are used together with `alloca`
but they can appear without it (e.g. `alloca` can be optimized away).
WebAssembly's function local physical user sp register, which is
referenced by `llvm.stacksave` is created while frame lowering and
replaced with virtual register.
However the sp register was not created when `llvm.stacksave` is used
without `alloca`, and it led MIR verification failure about
use-before-def of sp virtual register.
Resolves https://github.com/llvm/llvm-project/issues/62235
Following the version bump in #67964 and the bug fix in #68026 I believe
we're ready to mark Zfa as non-experimental. I'll note the GCC torture
suite passes now with Zfa enabled (though it's more of a litmus test
than anything else).
The Zfa specification was recently ratified
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>. This
commit bumps the version to 1.0, but leaves it as an experimental
extension (to be done in a follow-on patch), so reviews can focus on
confirming there haven't been spec changes we have missed (which as
noted below, is more difficult than usual).
Because the development of the Zfa spec overlapped with the transition
of riscv-isa-manual from LaTeX to AsciiDoc, it's more difficult than
usual to confirm version changes. The linked PDF in RISCVUsage is for
some reason a 404. Key commit histories to review are:
* Changes to zfa.adoc on the main branch
<https://github.com/riscv/riscv-isa-manual/commits/main/src/zfa.adoc>
* Changes to zfa.tex on the now defunct latex branch
<https://github.com/riscv/riscv-isa-manual/commits/latex/src/zfa.tex>
From reviewing these, I believe there have been no changes to the spec
since version 0.1/0.2 (sadly the AsciiDoc and LaTeX versions of the spec
are inconsistent about version numbering).
vandn, vrev8 and vro{l,r} are now part of Zvkb, which Zvbb now implies.
This
patch updates the predicates to check for Zvkb instead of Zvbb in the
tablegen
patterns for the SD and VL nodes, as well as some of the lowering logic
in
RISCVISelLowering.
On z/OS, many library functions have a non-standard name. This change
initializes the table of runtime function which results from lowering
intrinsics to library calls.
This change enablse some of shuffle lowering with TBL instruction with SVE and
SVE2 for indexing for one register and TBL version for SVE2 while indexing to
both registers.
Differential Revision: https://reviews.llvm.org/D152205
This builds on the transform introduced in
https://github.com/llvm/llvm-project/pull/67821, and generalizes it for
all integer reduction types.
A couple of notes:
* This will only form smax/smin/umax/umin reductions when zbb is
enabled. Otherwise, we lower the min/max expressions early. I don't care
about this case, and don't plan to address this further.
* This excludes floating point. Floating point introduces concerns about
associativity. I may or may not do a follow up patch for that case.
* The explodevector test change is mildly undesirable from a clarity
perspective. If anyone sees a good way to rewrite that to stablize the
test, please suggest.
This should fix#66912. When emitting SEH unwind info, we need to be
able to calculate the exact length of functions before alignments are
fixed. Until that limitation is overcome, just disable all loop
alignment on Windows targets.
The logic in `RISCVLoadFPImm::getLoadFPImm` recognises that the only
supported negative value is -1.0, but due to a typo returns `false`
otherwise (entry 0, which is -1.0) rather than returning -1 (indicating
no match found).
There's a miscompile currently for negative numbers (other than -1) that
are the negated form of numbers in the fli lookup table. This adds tests
that capture the issue, with a fix to follow in a separate commit/PR.
FastISel currently drops dbg.values targeting allocas. It may seem
surprising that a simple case would fail to be lowered, but dbg.values
targeting allocas are not common; we usually have dbg.declares doing
that, and those are handled by the common code between FastISel and
SelectionDAGISel.
This patch addresses the issue by querying the static alloca map from
FuncInfo. If we have a frame index for it, we create a DBG_VALUE
intrinsic from it.
This is a conservative workaround for broken liveness tracking of
SUBREG_TO_REG to speculatively fix all targets. The current reported
failures are on X86 only, but this issue should appear for all targets
that use SUBREG_TO_REG. The next minimally correct refinement would be
to disallow only implicit defs.
The coalescer now introduces implicit-defs of the super register to
track the dependency on other subregisters. If we see such an implicit
operand, we cannot simply treat the subregister def as the result
operand in case downstream users depend on the implicitly defined
parts. Really target implementations should be considering the
implicit defs and trying to interpret them appropriately (maybe with
some generic helpers). The full implicit def could possibly be
reported as the move result, rather than the subregister def but that
requires additional work.
Hopefully fixes#64060 as well.
This needs to be applied to the release branch.
https://reviews.llvm.org/D156346