47063 Commits

Author SHA1 Message Date
Alex Bradbury
d41a73aa94 [RISCV][MC] Mark Zawrs extension as non-experimental
Support for the unratified 1.0-rc3 specification was introduced in
D133443. The specification has since been ratified (in November 2022
according to the recently ratified extensions list
<https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>.

A review of the diff
<https://github.com/riscv/riscv-zawrs/compare/V1.0-rc3...main> of the
1.0-rc3 spec vs the current/ratified document shows no changes to the
instruction encoding or naming. At one point, a note was added
<e84f42406a>
indicating Zawrs depends on the Zalrsc extension (not officially
specified, but I believe to be just the LR/SC instructions from the A
extension). The final text ended up as "The instructions in the Zawrs
extension are only useful in conjunction with the LR instructions, which
are provided by the A extension, and which we also expect to be provided
by a narrower Zalrsc extension in the future." I think it's consistent
with this phrasing to not require the A extension for Zawrs, which
matches what was implemented.

No intrinsics are implemented for Zawrs currently, meaning we don't need
to additionally review whether those intrinsics can be considered
finalised and ready for exposure to end users.

Differential Revision: https://reviews.llvm.org/D143507
2023-02-19 20:43:03 +00:00
Craig Topper
3d0a5bf7de [RISCV] Add Zfa test cases for strict ONE and UEQ comparisons. NFC
These correspond to islessgreater and it inverse.
2023-02-18 17:28:10 -08:00
David Green
8e3dc1366f [AArch64] Concat zip1 and zip2 is a wider zip1
Given concat(zip1(a, b), zip2(a, b)), we can convert that to a 128bit zip1(a, b)
if we widen a and b out first.

Fixes #54226

Differential Revision: https://reviews.llvm.org/D121088
2023-02-18 19:54:29 +00:00
Amara Emerson
ddf167c442 [GlobalISel] Fix G_ZEXTLOAD being converted to G_SEXTLOAD incorrectly.
The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.

Fixes issue #59630
2023-02-18 10:05:08 -08:00
Amara Emerson
556657c0fd [NFC][GlobalISel] Regenerate test checks for extending-loads test. 2023-02-18 01:49:08 -08:00
Philip Reames
495b653480 [RISCV] Add missing plumbing and tests for zfa
Experimental support for the zfa extension was recently added in https://reviews.llvm.org/D141984. A couple of the normal test changes and clang plumbing got missed in that change. This commit updates the usual suspects.

Differential Revision: https://reviews.llvm.org/D144288
2023-02-17 17:56:30 -08:00
Amara Emerson
b309bc04ee [GlobalISel] Combine out-of-range shifts to undef.
Differential Revision: https://reviews.llvm.org/D144303
2023-02-17 15:05:00 -08:00
Philipp Tomsich
10b7cd660c [RISCV] Select signed and unsigned bitfield extracts for XTHeadBb
The XTHeadBb extension hab both signed and unsigned bitfield
extraction instructions (TH.EXT and TH.EXTU, respectively) which have
previously only been supported for sign extension on byte, halfword,
and word-boundaries.

This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary
bitfield extraction.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144229
2023-02-17 21:46:26 +01:00
Pavel Kopyl
01afb3fb99 [NVPTX] Use by default 'sm_60' architecture when expanding %ptxas-verify macro.
Also get rid of explicitly specified '-march' values for old architectures.
This simplifies %ptxas-verify statements.
After the change, we can potentially miss cases where a new functionality
is added to the architecture without appropriate checks in the
backend. On the other hand, this is mostly true for old architectures
that have been thoroughly tested.

Differential Revision: https://reviews.llvm.org/D141736
2023-02-17 20:49:04 +01:00
Philipp Tomsich
16a66af0a0 Revert "[RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension"
This reverts commit d2918544a7fc4b5443879fe12f32a712e6dfe325.
2023-02-17 19:45:55 +01:00
Manolis Tsamis
6774ba8411 [RISCV] xtheadmac: fix commutativity issue for the in/out register
The instructions in the XTHeadMac extension (multiply accumulate
instructions) were marked as commutative but because the destination
register was also an input (accumulate) register and was connected to
the destination register with a register allocator constraint, all
three operands (instead of two) were incorrectly considered
commutative. To fix that an appropriate fixCommutedOpIndices call was
added for these instructions in findCommutedOpIndices

New test functions have been added to test the correct behaviour in
xtheadmac.ll.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144278
2023-02-17 19:45:22 +01:00
Manolis Tsamis
d2918544a7 [RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension
The vendor-defined XTHeadMemPair (no comparable standard extension exists
at the time of writing) extension adds two-GPR load/store pair instructions.

It is supported by the C9xx cores (e.g., found in the wild in the
Allwinner D1) by Alibaba T-Head.

The current (as of this commit) public documentation for this
extension is available at:
  https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf

Support for these instructions has already landed in GNU Binutils:
  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd

Depends on D143847

Differential Revision: https://reviews.llvm.org/D144002
2023-02-17 19:45:22 +01:00
Jun Sha (Joshua)
df56b55e12 [RISCV][CodeGen] Add codegen patterns for experimental zfa extension (try 2)
Recommit by preames with commit message, various style cleanups, and unaddressed review comments corrected.

This patch implements experimental codegen support for the RISCV Zfa extension as specified here: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221119-5234c63/riscv-spec.pdf, Ch. 25. This extension has not been ratified.

This change does not include support for FLI (upcoming in a follow up change) or FCVTMOD (not relevant for C/C++).

Differential Revision: https://reviews.llvm.org/D143982
2023-02-17 10:28:08 -08:00
Craig Topper
42944abf85 [RISCV] Improve isInterleaveShuffle to handle interleaving the high half and low half of the same source.
This is needed to support the new interleave intrinsics from D141924 for
fixed vectors.

I've reworked the core loop to operate in terms of half of a source. Making 4
possible half sources. The first element of the half is used to indicate which
source using the same numbering as the shuffle where the second source elements
are numbered after the first source.

I've added restrictions to only match the first half of two vectors or the
first and second half of a single vector. This was done to prevent regressions
on the cases we have coverage for. I saw cases where generic DAG combine split
a single interleave into 2 smaller interleaves a concat. We can revisit in the
future.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144143
2023-02-17 10:00:40 -08:00
Nick Desaulniers
cf86855c44 [M68k] fix test regression introduced by D140180
I added a new pass, callbrprepare, to the pass pipelines in
commit a3a84c9e2511 ("[llvm] add CallBrPrepare pass to pipelines")
but did not test experimental backends.
2023-02-17 09:22:24 -08:00
Paul Walker
cf4df61688 [SVE] Add intrinsics for floating-point operations that explicitly undefine the result for inactive lanes.
This patch is the floating-point equivalent of D141937.

Depends on D143764.

Differential Revision: https://reviews.llvm.org/D143765
2023-02-17 14:21:01 +00:00
Anton Sidorenko
2693efa8a5 [MachineCombiner] Support local strategy for traces
For in-order cores MachineCombiner makes better decisions when the critical path
is calculated only for the current basic block and does not take into account
other blocks from the trace.

This patch adds a virtual method to TargetInstrInfo to allow each target decide
which strategy to use.

Depends on D140541

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D140542
2023-02-17 13:17:22 +03:00
Nick Desaulniers
39811e2e53 [llvm][test] enable/disable -verify-machineinstrs where possible for callbr
I introduced new tests in
commit 5cc1016a57b3 ("[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic")
https://reviews.llvm.org/D140160
that fails expensive checks. Disable -verify-machineinstrs in those
tests for now. Enable it in other tests for now, since MachineVerifier
isn't on by default for assertion builds.

Link: https://github.com/llvm/llvm-project/issues/60827
2023-02-16 20:28:18 -08:00
Noah Goldstein
9e9444ca7d Recommit "Transform vector SET{LE/ULT/ULE} -> SETLT and SET{GE/UGT/UGE} -> SETGT if possible" (2nd Try)
Original version hit assert in `incDecVectorConstant` because VT could
be EVT (as opposed to MVT). Fix is to add check for VT.isSimple() in
`incDecVectorConstant`.

Reviewed By: saugustine

Differential Revision: https://reviews.llvm.org/D142254
2023-02-16 20:39:14 -06:00
Nick Desaulniers
a3a84c9e25 [llvm] add CallBrPrepare pass to pipelines
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: https://github.com/llvm/llvm-project/issues/53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
2023-02-16 17:58:34 -08:00
Nick Desaulniers
5cc1016a57 [llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic
Given a CallBrInst, retain its first virtual register in SelectionDagBuilder's
FunctionLoweringInfo if there's corresponding landingpad. Walk the list
of COPY MachineInstr to find the original virtual and physical registers
defined by the INLINEASM_BR MachineInst.

Test cases from https://reviews.llvm.org/D139565.
Link: https://github.com/llvm/llvm-project/issues/59538

Part 3 from
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Follow up patches still need to wire up CallBrPrepare into the pass
pipelines.

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D140160
2023-02-16 17:58:34 -08:00
Nick Desaulniers
28d45c843c [llvm][CallBrPrepare] use SSAUpdater to use intrinsic value
Now that we've inserted a call to an intrinsic, we need to update
certain previous uses of CallBrInst values to use the value of this
intrinsic instead.

There are 3 cases to handle:
1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same
   BasicBlock as the use of the callbr we're replacing.
2. The use is dominated by the direct destination.
3. The use is not dominated by the direct destination, and may or may
   not be dominated by the indirect destination.

Part 2c of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Reviewed By: efriedma, void, jyknight

Differential Revision: https://reviews.llvm.org/D139970
2023-02-16 17:58:34 -08:00
Nick Desaulniers
094190c2f5 [llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic
Insert a new intrinsic call after splitting critical edges, and verify
it. Later commits will update the SSA values to use this new value along
indirect branches rather than the callbr's value, and have SelectionDAG
consume this new value.

Part 2b of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Reviewed By: efriedma, jyknight

Differential Revision: https://reviews.llvm.org/D139883
2023-02-16 17:58:33 -08:00
Nick Desaulniers
0a39af0eb7 [llvm][CallBrPrepare] split critical edges
If we have a CallBrInst with output that's used, we need to split
critical edges so that we have some place to insert COPYs for physregs
to virtregs.

Part 2a of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Test cases and logic re-purposed from D138078.

Reviewed By: efriedma, void, jyknight

Differential Revision: https://reviews.llvm.org/D139872
2023-02-16 17:58:33 -08:00
Nick Desaulniers
fb471158aa [llvm] boilerplate for new callbrprepare codegen IR pass
Because this pass is to be a codegen pass, it must use the legacy pass
manager.

Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Reviewed By: aeubanks, void

Differential Revision: https://reviews.llvm.org/D139861
2023-02-16 17:58:33 -08:00
Ting Wang
52a774fd4c [PowerPC] remove XXSWAPD after load from CP which is a splat value
If the value from constant-pool is a splat value of vector type, do not
need swap after load from constant-pool.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139491
2023-02-16 19:21:35 -05:00
Huihui Zhang
fb7c38073e [AArch64][ISel] Always use pre-inc/post-inc addressing mode for auto-indexed load/store with constant offset.
Unlike ARM target, current AArch64 target doesn't have facility to encode the
operation bit: whether to add an offset to base pointer for pre-inc/post-inc
addressing mode, or to subtract an offset from base pointer for
pre-dec/post-dec addressing mode.

A mis-compile (https://github.com/llvm/llvm-project/issues/60645) was noticed
due to this limitation.

Therefore, for AArch64 auto-indexed load/store with constant offset, always
use pre-inc/post-inc addressing mode. The constant offset is negated for
pre-dec/post-dec addressing mode.
An auto-indexed address with non-constant offset is currently not split into
base and offset parts. If we are to handle non-constant offset in the future,
offset node will need to take a negate.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D143796
2023-02-16 16:19:09 -08:00
Nemanja Ivanovic
56e41fcf50 [PowerPC] Bail out of FISel when lowering long calls
We currently don't handle tail calls in fast-isel but
we continue with the lowering when -mlongcall is
specified and lower the calls normally. We should
defer to SDISel for this so that it is lowered correctly.

Differential revision: https://reviews.llvm.org/D123997
2023-02-16 16:15:32 -05:00
Philip Reames
22e199e6af [RISCV] Accept zicsr and zifencei command line options
This change adds the definition of the two extensions, but does not either a) make any instruction conditional on them or b) enabled the extensions by default. (The *instructions* do remain enabled by default per ISA version 2.0 which is our current default.)

This is meant to be a building block towards something like https://reviews.llvm.org/D141666, and in the meantime, address one of the most surprising of the current user experience warts. The current behavior of rejecting the extensions at the command line despite emitting code which appears to use them is surprising to anyone not deeply versed in the details of this situation.

Between versions 2.0 and 2.1 of the base I specification, a backwards incompatible change was made to remove selected instructions and CSRs from the base ISA. These instructions were grouped into a set of new extensions (these), but were no longer required by the base ISA. This change is described in “Preface to Document Version 20190608-Base-Ratified” from the specification document.

As LLVM currently implements only version 2.0 of the base specification, accepting these extensions at the command line introduces a configuration which doesn't actually match any spec version. It's a pretty harmless variant since the 2.0 extension definitions, to my knowledge, exactly match the text from the 2.0 I text before they were moved into standalone extensions in 2.1 of I. (The version numbering in that sentence is a tad confusing to say the least. Hopefully I got it right.)

It is worth noting that we already have numerous examples of accepting extensions in the march string which didn't exist in version of the spec document corresponding to our current base I version, so this doesn't set any new precedent.

Differential Revision: https://reviews.llvm.org/D143953
2023-02-16 10:41:41 -08:00
Krzysztof Parzyszek
35742743d2 [Hexagon] Fix number of arguments in call to DAG.getNode for VINSERTW0
HexagonISD::VINSERTW0 takes two inputs, but only one was provided.
2023-02-16 09:57:58 -08:00
Jay Foad
8a17cd9905 AMDGPU: Add a regression test case for D143963 2023-02-16 17:11:32 +00:00
Jay Foad
8e5a41e827 Revert "AMDGPU: Override getNegatedExpression constant handling"
This reverts commit 11c3cead23783e65fb30e673d62771352078ff05.

It was causing infinite loops in the DAG combiner.
2023-02-16 17:11:32 +00:00
Jay Foad
9305b63d69 [AMDGPU] Add another G_UNMERGE_VALUES legalization test case 2023-02-16 16:45:35 +00:00
Florian Hahn
2ac85cd563
[AMDGPU] Regenerate check lines to enable updating for D144050. 2023-02-16 16:38:15 +00:00
Philip Reames
80abf86d50 Revert "[RISCV][CodeGen] Add codegen pattern for experimental zfa extension (FLI and FCVTMOD not included)"
This reverts commit fc6d517e2f335c2ab2b14a34eb747a4703aca7e4.  It was submitted without an appropriate patch description.  Will reapply shortly.
2023-02-16 07:49:44 -08:00
David Green
7abe3497e7 [LSR] Improve filtered uses in NarrowSearchSpaceByPickingWinnerRegs
NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to
reduce the complexity of the search space down by picking a best formula with
the highest number of reuses and assuming it will yield profitable reuse. In
certain cases we can find a best formula like {X+30,+,1} and later check a
formula like {X,+,1} with the same number of Uses. On some architectures it
can be better to pick {X,+,1}, especially if an offset of 30 can be used as a
legal addressing mode, but -30 cannot. That happens under Thumb1 code, which
has fairly limited addressing modes. This patch adds a check to see if it can
pick the simpler formula, if it looks more profitable.

Differential Revision: https://reviews.llvm.org/D144014
2023-02-16 15:48:12 +00:00
David Green
66749ce927 [ARM] Add Thumb LSR codegen tests. NFC
This is the same routine generated in two different ways that ends up with
different orders to loads. The first currently does better than the second
with ordered loads, but needn't if the filtering in LSR is improved.
2023-02-16 14:24:51 +00:00
Kerry McLaughlin
ba23bca0a8 [SME2][AArch64] Add multi-single multiply-add long long intrinsics
Adds intrinsics for the following SME2 instructions:
 - smlall (1, 2 & 4 vectors)
 - umlall (1, 2 & 4 vectors)
 - smlsll (1, 2 & 4 vectors)
 - umlsll (1, 2 & 4 vectors)
 - sumlall (2 & 4 vectors)
 - usmlall (1, 2 & 4 vectors)

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D143276
2023-02-16 13:12:47 +00:00
Tim Northover
2002c82278 AArch64: count callee stack we use when estimating scavenging requirements. 2023-02-16 09:59:27 +00:00
Diana Picus
819dfc338b [AMDGPU] Autogenerate checks for several tests. NFCI 2023-02-16 10:54:34 +01:00
Fangrui Song
f62b084e92 [LoopDeletion] Remove legacy pass
Following recent changes to remove non-core legacy passes.
2023-02-15 23:31:05 -08:00
Xiang1 Zhang
96df79af02 [X86] Support load/store for bf16 in avx
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D144163
2023-02-16 14:39:35 +08:00
Jun Sha (Joshua)
fc6d517e2f [RISCV][CodeGen] Add codegen pattern for experimental zfa extension (FLI and FCVTMOD not included) 2023-02-16 13:41:41 +08:00
Noah Goldstein
8bd0e9481c Revert "Transform vector SET{LE/ULT/ULE} -> SETLT and SET{GE/UGT/UGE} -> SETGT if possible"
This reverts commit f3732c2b18df305a1927b9d4a94610421a2750e7.
2023-02-15 16:33:38 -06:00
Markus Böck
d464edde95 [X86][Win64] Avoid statepoints prior to SEH epilogue
This patchs purpose is very similar to https://reviews.llvm.org/D119644

The gist of the issue is that SEH unwinding has certain invariants around call instructions. One of those is that a call instruction must not be immediately followed by the function epilogue. Failing to do so leads to Windows' unwinder not recognizing the frame and skipping it when unwinding the stack.

LLVM ensures this invariant by inserting a noop after a call prior to an epilogue. The implementation however, makes the unfortunate assumption that pseudo instructions may not be calls, leading to statepoints being skipped and no noop being inserted.
This patch fixes that issue by only skipping over pseudo instructions that aren't calls.

Differential Revision: https://reviews.llvm.org/D143812
2023-02-15 22:47:28 +01:00
Fangrui Song
6f3e6a765a Revert D129735 "[RISCV] Add new pass to transform undef to pseudo for vector values."
This reverts commit f1c4241fb6e50c507adafbe14faf82a755ab92ca.

It causes use-after-poison asan failures for
CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll and CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll
2023-02-15 11:51:08 -08:00
David Green
8a7b5e0e50 [AArch64] Guard extra uses in mls combine.
This is a small extension to D143143 to ensure that nodes with multiple uses to
not get transformed. The tests have also been extended to include more mla
cases.
2023-02-15 18:36:46 +00:00
David Green
b0bfbad19b [AArch64] Always lower fp16 zero to FMOVH0
We can always use FMOVH0 to lower fp16 zero, even without fullfp16. We can
either expand it to movi d0, #0 or fmov s0, wzr, which will both clear all the
bits of the register.

Differential Revision: https://reviews.llvm.org/D143988
2023-02-15 16:06:32 +00:00
David Spickett
93164dba08 [llvm][AArch64] Fix BTI after returns_twice when call has no attributes
Previously we were looking for the returns twice attribute by manually
getting the function attributes from the call. This meant that we only
found attributes on the call itself, not what it was calling.

So if you had:
%call1 = call i32 @setjmp(ptr noundef null)

We would not BTI protect that even though setjmp clearly needs it.

Clang happens to produce:
%call = call i32 @setjmp(ptr noundef null) #0 ; returns_twice

So all valid calls were protected. This is not guaranteed,
the frontend may choose not to put attributes on the call.

It is undefined behaviour to call setjmp indirectly
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html)
but as I misused the APIs here I think it's worth fixing up regardless.

Added comments to the test file where the IR differs from what
clang would output.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144082
2023-02-15 15:30:37 +00:00
Simon Pilgrim
b0e7ca79ab [X86] Remove abs(sub_nsw()) -> abds fold from adbu test file
Copy+paste typo - it was correctly removed from 128/512 variants
2023-02-15 12:49:28 +00:00