Support for the unratified 1.0-rc3 specification was introduced in
D133443. The specification has since been ratified (in November 2022
according to the recently ratified extensions list
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>.
A review of the diff
<https://github.com/riscv/riscv-zawrs/compare/V1.0-rc3...main> of the
1.0-rc3 spec vs the current/ratified document shows no changes to the
instruction encoding or naming. At one point, a note was added
<e84f42406a>
indicating Zawrs depends on the Zalrsc extension (not officially
specified, but I believe to be just the LR/SC instructions from the A
extension). The final text ended up as "The instructions in the Zawrs
extension are only useful in conjunction with the LR instructions, which
are provided by the A extension, and which we also expect to be provided
by a narrower Zalrsc extension in the future." I think it's consistent
with this phrasing to not require the A extension for Zawrs, which
matches what was implemented.
No intrinsics are implemented for Zawrs currently, meaning we don't need
to additionally review whether those intrinsics can be considered
finalised and ready for exposure to end users.
Differential Revision: https://reviews.llvm.org/D143507
Given concat(zip1(a, b), zip2(a, b)), we can convert that to a 128bit zip1(a, b)
if we widen a and b out first.
Fixes#54226
Differential Revision: https://reviews.llvm.org/D121088
The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.
Fixes issue #59630
Experimental support for the zfa extension was recently added in https://reviews.llvm.org/D141984. A couple of the normal test changes and clang plumbing got missed in that change. This commit updates the usual suspects.
Differential Revision: https://reviews.llvm.org/D144288
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.
This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144229
Also get rid of explicitly specified '-march' values for old architectures.
This simplifies %ptxas-verify statements.
After the change, we can potentially miss cases where a new functionality
is added to the architecture without appropriate checks in the
backend. On the other hand, this is mostly true for old architectures
that have been thoroughly tested.
Differential Revision: https://reviews.llvm.org/D141736
The instructions in the XTHeadMac extension (multiply accumulate
instructions) were marked as commutative but because the destination
register was also an input (accumulate) register and was connected to
the destination register with a register allocator constraint, all
three operands (instead of two) were incorrectly considered
commutative. To fix that an appropriate fixCommutedOpIndices call was
added for these instructions in findCommutedOpIndices
New test functions have been added to test the correct behaviour in
xtheadmac.ll.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D144278
Recommit by preames with commit message, various style cleanups, and unaddressed review comments corrected.
This patch implements experimental codegen support for the RISCV Zfa extension as specified here: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221119-5234c63/riscv-spec.pdf, Ch. 25. This extension has not been ratified.
This change does not include support for FLI (upcoming in a follow up change) or FCVTMOD (not relevant for C/C++).
Differential Revision: https://reviews.llvm.org/D143982
This is needed to support the new interleave intrinsics from D141924 for
fixed vectors.
I've reworked the core loop to operate in terms of half of a source. Making 4
possible half sources. The first element of the half is used to indicate which
source using the same numbering as the shuffle where the second source elements
are numbered after the first source.
I've added restrictions to only match the first half of two vectors or the
first and second half of a single vector. This was done to prevent regressions
on the cases we have coverage for. I saw cases where generic DAG combine split
a single interleave into 2 smaller interleaves a concat. We can revisit in the
future.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D144143
I added a new pass, callbrprepare, to the pass pipelines in
commit a3a84c9e2511 ("[llvm] add CallBrPrepare pass to pipelines")
but did not test experimental backends.
For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.
This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.
Depends on D140541
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D140542
I introduced new tests in
commit 5cc1016a57b3 ("[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic")
https://reviews.llvm.org/D140160
that fails expensive checks. Disable -verify-machineinstrs in those
tests for now. Enable it in other tests for now, since MachineVerifier
isn't on by default for assertion builds.
Link: https://github.com/llvm/llvm-project/issues/60827
Original version hit assert in `incDecVectorConstant` because VT could
be EVT (as opposed to MVT). Fix is to add check for VT.isSimple() in
`incDecVectorConstant`.
Reviewed By: saugustine
Differential Revision: https://reviews.llvm.org/D142254
Now that we've inserted a call to an intrinsic, we need to update
certain previous uses of CallBrInst values to use the value of this
intrinsic instead.
There are 3 cases to handle:
1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same
BasicBlock as the use of the callbr we're replacing.
2. The use is dominated by the direct destination.
3. The use is not dominated by the direct destination, and may or may
not be dominated by the indirect destination.
Part 2c of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.
Reviewed By: efriedma, void, jyknight
Differential Revision: https://reviews.llvm.org/D139970
If the value from constant-pool is a splat value of vector type, do not
need swap after load from constant-pool.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D139491
Unlike ARM target, current AArch64 target doesn't have facility to encode the
operation bit: whether to add an offset to base pointer for pre-inc/post-inc
addressing mode, or to subtract an offset from base pointer for
pre-dec/post-dec addressing mode.
A mis-compile (https://github.com/llvm/llvm-project/issues/60645) was noticed
due to this limitation.
Therefore, for AArch64 auto-indexed load/store with constant offset, always
use pre-inc/post-inc addressing mode. The constant offset is negated for
pre-dec/post-dec addressing mode.
An auto-indexed address with non-constant offset is currently not split into
base and offset parts. If we are to handle non-constant offset in the future,
offset node will need to take a negate.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D143796
We currently don't handle tail calls in fast-isel but
we continue with the lowering when -mlongcall is
specified and lower the calls normally. We should
defer to SDISel for this so that it is lowered correctly.
Differential revision: https://reviews.llvm.org/D123997
This change adds the definition of the two extensions, but does not either a) make any instruction conditional on them or b) enabled the extensions by default. (The *instructions* do remain enabled by default per ISA version 2.0 which is our current default.)
This is meant to be a building block towards something like https://reviews.llvm.org/D141666, and in the meantime, address one of the most surprising of the current user experience warts. The current behavior of rejecting the extensions at the command line despite emitting code which appears to use them is surprising to anyone not deeply versed in the details of this situation.
Between versions 2.0 and 2.1 of the base I specification, a backwards incompatible change was made to remove selected instructions and CSRs from the base ISA. These instructions were grouped into a set of new extensions (these), but were no longer required by the base ISA. This change is described in “Preface to Document Version 20190608-Base-Ratified” from the specification document.
As LLVM currently implements only version 2.0 of the base specification, accepting these extensions at the command line introduces a configuration which doesn't actually match any spec version. It's a pretty harmless variant since the 2.0 extension definitions, to my knowledge, exactly match the text from the 2.0 I text before they were moved into standalone extensions in 2.1 of I. (The version numbering in that sentence is a tad confusing to say the least. Hopefully I got it right.)
It is worth noting that we already have numerous examples of accepting extensions in the march string which didn't exist in version of the spec document corresponding to our current base I version, so this doesn't set any new precedent.
Differential Revision: https://reviews.llvm.org/D143953
NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to
reduce the complexity of the search space down by picking a best formula with
the highest number of reuses and assuming it will yield profitable reuse. In
certain cases we can find a best formula like {X+30,+,1} and later check a
formula like {X,+,1} with the same number of Uses. On some architectures it
can be better to pick {X,+,1}, especially if an offset of 30 can be used as a
legal addressing mode, but -30 cannot. That happens under Thumb1 code, which
has fairly limited addressing modes. This patch adds a check to see if it can
pick the simpler formula, if it looks more profitable.
Differential Revision: https://reviews.llvm.org/D144014
This is the same routine generated in two different ways that ends up with
different orders to loads. The first currently does better than the second
with ordered loads, but needn't if the filtering in LSR is improved.
This patchs purpose is very similar to https://reviews.llvm.org/D119644
The gist of the issue is that SEH unwinding has certain invariants around call instructions. One of those is that a call instruction must not be immediately followed by the function epilogue. Failing to do so leads to Windows' unwinder not recognizing the frame and skipping it when unwinding the stack.
LLVM ensures this invariant by inserting a noop after a call prior to an epilogue. The implementation however, makes the unfortunate assumption that pseudo instructions may not be calls, leading to statepoints being skipped and no noop being inserted.
This patch fixes that issue by only skipping over pseudo instructions that aren't calls.
Differential Revision: https://reviews.llvm.org/D143812
This reverts commit f1c4241fb6e50c507adafbe14faf82a755ab92ca.
It causes use-after-poison asan failures for
CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll and CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll
This is a small extension to D143143 to ensure that nodes with multiple uses to
not get transformed. The tests have also been extended to include more mla
cases.
We can always use FMOVH0 to lower fp16 zero, even without fullfp16. We can
either expand it to movi d0, #0 or fmov s0, wzr, which will both clear all the
bits of the register.
Differential Revision: https://reviews.llvm.org/D143988
Previously we were looking for the returns twice attribute by manually
getting the function attributes from the call. This meant that we only
found attributes on the call itself, not what it was calling.
So if you had:
%call1 = call i32 @setjmp(ptr noundef null)
We would not BTI protect that even though setjmp clearly needs it.
Clang happens to produce:
%call = call i32 @setjmp(ptr noundef null) #0 ; returns_twice
So all valid calls were protected. This is not guaranteed,
the frontend may choose not to put attributes on the call.
It is undefined behaviour to call setjmp indirectly
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html)
but as I misused the APIs here I think it's worth fixing up regardless.
Added comments to the test file where the IR differs from what
clang would output.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144082