llvm-project

Author	SHA1	Message	Date
Alex Bradbury	d41a73aa94	[RISCV][MC] Mark Zawrs extension as non-experimental Support for the unratified 1.0-rc3 specification was introduced in D133443. The specification has since been ratified (in November 2022 according to the recently ratified extensions list <https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions>. A review of the diff <https://github.com/riscv/riscv-zawrs/compare/V1.0-rc3...main> of the 1.0-rc3 spec vs the current/ratified document shows no changes to the instruction encoding or naming. At one point, a note was added <`e84f42406a`> indicating Zawrs depends on the Zalrsc extension (not officially specified, but I believe to be just the LR/SC instructions from the A extension). The final text ended up as "The instructions in the Zawrs extension are only useful in conjunction with the LR instructions, which are provided by the A extension, and which we also expect to be provided by a narrower Zalrsc extension in the future." I think it's consistent with this phrasing to not require the A extension for Zawrs, which matches what was implemented. No intrinsics are implemented for Zawrs currently, meaning we don't need to additionally review whether those intrinsics can be considered finalised and ready for exposure to end users. Differential Revision: https://reviews.llvm.org/D143507	2023-02-19 20:43:03 +00:00
Craig Topper	3d0a5bf7de	[RISCV] Add Zfa test cases for strict ONE and UEQ comparisons. NFC These correspond to islessgreater and it inverse.	2023-02-18 17:28:10 -08:00
David Green	8e3dc1366f	[AArch64] Concat zip1 and zip2 is a wider zip1 Given concat(zip1(a, b), zip2(a, b)), we can convert that to a 128bit zip1(a, b) if we widen a and b out first. Fixes #54226 Differential Revision: https://reviews.llvm.org/D121088	2023-02-18 19:54:29 +00:00
Amara Emerson	ddf167c442	[GlobalISel] Fix G_ZEXTLOAD being converted to G_SEXTLOAD incorrectly. The extending loads combine tries to prefer sign-extends folding into loads vs zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used by a G_SEXT, it would select the G_SEXT even though the load is already zero-extending. Fixes issue #59630	2023-02-18 10:05:08 -08:00
Amara Emerson	556657c0fd	[NFC][GlobalISel] Regenerate test checks for extending-loads test.	2023-02-18 01:49:08 -08:00
Philip Reames	495b653480	[RISCV] Add missing plumbing and tests for zfa Experimental support for the zfa extension was recently added in https://reviews.llvm.org/D141984. A couple of the normal test changes and clang plumbing got missed in that change. This commit updates the usual suspects. Differential Revision: https://reviews.llvm.org/D144288	2023-02-17 17:56:30 -08:00
Amara Emerson	b309bc04ee	[GlobalISel] Combine out-of-range shifts to undef. Differential Revision: https://reviews.llvm.org/D144303	2023-02-17 15:05:00 -08:00
Philipp Tomsich	10b7cd660c	[RISCV] Select signed and unsigned bitfield extracts for XTHeadBb The XTHeadBb extension hab both signed and unsigned bitfield extraction instructions (TH.EXT and TH.EXTU, respectively) which have previously only been supported for sign extension on byte, halfword, and word-boundaries. This adds the infrastructure to use TH.EXT and TH.EXTU for arbitrary bitfield extraction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144229	2023-02-17 21:46:26 +01:00
Pavel Kopyl	01afb3fb99	[NVPTX] Use by default 'sm_60' architecture when expanding %ptxas-verify macro. Also get rid of explicitly specified '-march' values for old architectures. This simplifies %ptxas-verify statements. After the change, we can potentially miss cases where a new functionality is added to the architecture without appropriate checks in the backend. On the other hand, this is mostly true for old architectures that have been thoroughly tested. Differential Revision: https://reviews.llvm.org/D141736	2023-02-17 20:49:04 +01:00
Philipp Tomsich	16a66af0a0	Revert "[RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension" This reverts commit d2918544a7fc4b5443879fe12f32a712e6dfe325.	2023-02-17 19:45:55 +01:00
Manolis Tsamis	6774ba8411	[RISCV] xtheadmac: fix commutativity issue for the in/out register The instructions in the XTHeadMac extension (multiply accumulate instructions) were marked as commutative but because the destination register was also an input (accumulate) register and was connected to the destination register with a register allocator constraint, all three operands (instead of two) were incorrectly considered commutative. To fix that an appropriate fixCommutedOpIndices call was added for these instructions in findCommutedOpIndices New test functions have been added to test the correct behaviour in xtheadmac.ll. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144278	2023-02-17 19:45:22 +01:00
Manolis Tsamis	d2918544a7	[RISCV] Add vendor-defined XTheadMemPair (two-GPR Memory Operations) extension The vendor-defined XTHeadMemPair (no comparable standard extension exists at the time of writing) extension adds two-GPR load/store pair instructions. It is supported by the C9xx cores (e.g., found in the wild in the Allwinner D1) by Alibaba T-Head. The current (as of this commit) public documentation for this extension is available at: https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.2.2/xthead-2023-01-30-2.2.2.pdf Support for these instructions has already landed in GNU Binutils: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=6e17ae625570ff8f3c12c8765b8d45d4db8694bd Depends on D143847 Differential Revision: https://reviews.llvm.org/D144002	2023-02-17 19:45:22 +01:00
Jun Sha (Joshua)	df56b55e12	[RISCV][CodeGen] Add codegen patterns for experimental zfa extension (try 2) Recommit by preames with commit message, various style cleanups, and unaddressed review comments corrected. This patch implements experimental codegen support for the RISCV Zfa extension as specified here: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221119-5234c63/riscv-spec.pdf, Ch. 25. This extension has not been ratified. This change does not include support for FLI (upcoming in a follow up change) or FCVTMOD (not relevant for C/C++). Differential Revision: https://reviews.llvm.org/D143982	2023-02-17 10:28:08 -08:00
Craig Topper	42944abf85	[RISCV] Improve isInterleaveShuffle to handle interleaving the high half and low half of the same source. This is needed to support the new interleave intrinsics from D141924 for fixed vectors. I've reworked the core loop to operate in terms of half of a source. Making 4 possible half sources. The first element of the half is used to indicate which source using the same numbering as the shuffle where the second source elements are numbered after the first source. I've added restrictions to only match the first half of two vectors or the first and second half of a single vector. This was done to prevent regressions on the cases we have coverage for. I saw cases where generic DAG combine split a single interleave into 2 smaller interleaves a concat. We can revisit in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D144143	2023-02-17 10:00:40 -08:00
Nick Desaulniers	cf86855c44	[M68k] fix test regression introduced by D140180 I added a new pass, callbrprepare, to the pass pipelines in commit a3a84c9e2511 ("[llvm] add CallBrPrepare pass to pipelines") but did not test experimental backends.	2023-02-17 09:22:24 -08:00
Paul Walker	cf4df61688	[SVE] Add intrinsics for floating-point operations that explicitly undefine the result for inactive lanes. This patch is the floating-point equivalent of D141937. Depends on D143764. Differential Revision: https://reviews.llvm.org/D143765	2023-02-17 14:21:01 +00:00
Anton Sidorenko	2693efa8a5	[MachineCombiner] Support local strategy for traces For in-order cores MachineCombiner makes better decisions when the critical path is calculated only for the current basic block and does not take into account other blocks from the trace. This patch adds a virtual method to TargetInstrInfo to allow each target decide which strategy to use. Depends on D140541 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D140542	2023-02-17 13:17:22 +03:00
Nick Desaulniers	39811e2e53	[llvm][test] enable/disable -verify-machineinstrs where possible for callbr I introduced new tests in commit 5cc1016a57b3 ("[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic") https://reviews.llvm.org/D140160 that fails expensive checks. Disable -verify-machineinstrs in those tests for now. Enable it in other tests for now, since MachineVerifier isn't on by default for assertion builds. Link: https://github.com/llvm/llvm-project/issues/60827	2023-02-16 20:28:18 -08:00
Noah Goldstein	9e9444ca7d	Recommit "Transform vector SET{LE/ULT/ULE} -> SETLT and SET{GE/UGT/UGE} -> SETGT if possible" (2nd Try) Original version hit assert in `incDecVectorConstant` because VT could be EVT (as opposed to MVT). Fix is to add check for VT.isSimple() in `incDecVectorConstant`. Reviewed By: saugustine Differential Revision: https://reviews.llvm.org/D142254	2023-02-16 20:39:14 -06:00
Nick Desaulniers	a3a84c9e25	[llvm] add CallBrPrepare pass to pipelines Capstone of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Clang changes are still necessary to enable the use of outputs along indirect edges of asm goto statements. Link: https://github.com/llvm/llvm-project/issues/53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D140180	2023-02-16 17:58:34 -08:00
Nick Desaulniers	5cc1016a57	[llvm][SelectionDAGBuilder] codegen callbr.landingpad intrinsic Given a CallBrInst, retain its first virtual register in SelectionDagBuilder's FunctionLoweringInfo if there's corresponding landingpad. Walk the list of COPY MachineInstr to find the original virtual and physical registers defined by the INLINEASM_BR MachineInst. Test cases from https://reviews.llvm.org/D139565. Link: https://github.com/llvm/llvm-project/issues/59538 Part 3 from https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Follow up patches still need to wire up CallBrPrepare into the pass pipelines. Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D140160	2023-02-16 17:58:34 -08:00
Nick Desaulniers	28d45c843c	[llvm][CallBrPrepare] use SSAUpdater to use intrinsic value Now that we've inserted a call to an intrinsic, we need to update certain previous uses of CallBrInst values to use the value of this intrinsic instead. There are 3 cases to handle: 1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same BasicBlock as the use of the callbr we're replacing. 2. The use is dominated by the direct destination. 3. The use is not dominated by the direct destination, and may or may not be dominated by the indirect destination. Part 2c of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Reviewed By: efriedma, void, jyknight Differential Revision: https://reviews.llvm.org/D139970	2023-02-16 17:58:34 -08:00
Nick Desaulniers	094190c2f5	[llvm][CallBrPrepare] add llvm.callbr.landingpad intrinsic Insert a new intrinsic call after splitting critical edges, and verify it. Later commits will update the SSA values to use this new value along indirect branches rather than the callbr's value, and have SelectionDAG consume this new value. Part 2b of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Reviewed By: efriedma, jyknight Differential Revision: https://reviews.llvm.org/D139883	2023-02-16 17:58:33 -08:00
Nick Desaulniers	0a39af0eb7	[llvm][CallBrPrepare] split critical edges If we have a CallBrInst with output that's used, we need to split critical edges so that we have some place to insert COPYs for physregs to virtregs. Part 2a of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Test cases and logic re-purposed from D138078. Reviewed By: efriedma, void, jyknight Differential Revision: https://reviews.llvm.org/D139872	2023-02-16 17:58:33 -08:00
Nick Desaulniers	fb471158aa	[llvm] boilerplate for new callbrprepare codegen IR pass Because this pass is to be a codegen pass, it must use the legacy pass manager. Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Reviewed By: aeubanks, void Differential Revision: https://reviews.llvm.org/D139861	2023-02-16 17:58:33 -08:00
Ting Wang	52a774fd4c	[PowerPC] remove XXSWAPD after load from CP which is a splat value If the value from constant-pool is a splat value of vector type, do not need swap after load from constant-pool. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D139491	2023-02-16 19:21:35 -05:00
Huihui Zhang	fb7c38073e	[AArch64][ISel] Always use pre-inc/post-inc addressing mode for auto-indexed load/store with constant offset. Unlike ARM target, current AArch64 target doesn't have facility to encode the operation bit: whether to add an offset to base pointer for pre-inc/post-inc addressing mode, or to subtract an offset from base pointer for pre-dec/post-dec addressing mode. A mis-compile (https://github.com/llvm/llvm-project/issues/60645) was noticed due to this limitation. Therefore, for AArch64 auto-indexed load/store with constant offset, always use pre-inc/post-inc addressing mode. The constant offset is negated for pre-dec/post-dec addressing mode. An auto-indexed address with non-constant offset is currently not split into base and offset parts. If we are to handle non-constant offset in the future, offset node will need to take a negate. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D143796	2023-02-16 16:19:09 -08:00
Nemanja Ivanovic	56e41fcf50	[PowerPC] Bail out of FISel when lowering long calls We currently don't handle tail calls in fast-isel but we continue with the lowering when -mlongcall is specified and lower the calls normally. We should defer to SDISel for this so that it is lowered correctly. Differential revision: https://reviews.llvm.org/D123997	2023-02-16 16:15:32 -05:00
Philip Reames	22e199e6af	[RISCV] Accept zicsr and zifencei command line options This change adds the definition of the two extensions, but does not either a) make any instruction conditional on them or b) enabled the extensions by default. (The instructions do remain enabled by default per ISA version 2.0 which is our current default.) This is meant to be a building block towards something like https://reviews.llvm.org/D141666, and in the meantime, address one of the most surprising of the current user experience warts. The current behavior of rejecting the extensions at the command line despite emitting code which appears to use them is surprising to anyone not deeply versed in the details of this situation. Between versions 2.0 and 2.1 of the base I specification, a backwards incompatible change was made to remove selected instructions and CSRs from the base ISA. These instructions were grouped into a set of new extensions (these), but were no longer required by the base ISA. This change is described in “Preface to Document Version 20190608-Base-Ratified” from the specification document. As LLVM currently implements only version 2.0 of the base specification, accepting these extensions at the command line introduces a configuration which doesn't actually match any spec version. It's a pretty harmless variant since the 2.0 extension definitions, to my knowledge, exactly match the text from the 2.0 I text before they were moved into standalone extensions in 2.1 of I. (The version numbering in that sentence is a tad confusing to say the least. Hopefully I got it right.) It is worth noting that we already have numerous examples of accepting extensions in the march string which didn't exist in version of the spec document corresponding to our current base I version, so this doesn't set any new precedent. Differential Revision: https://reviews.llvm.org/D143953	2023-02-16 10:41:41 -08:00
Krzysztof Parzyszek	35742743d2	[Hexagon] Fix number of arguments in call to DAG.getNode for VINSERTW0 HexagonISD::VINSERTW0 takes two inputs, but only one was provided.	2023-02-16 09:57:58 -08:00
Jay Foad	8a17cd9905	AMDGPU: Add a regression test case for D143963	2023-02-16 17:11:32 +00:00
Jay Foad	8e5a41e827	Revert "AMDGPU: Override getNegatedExpression constant handling" This reverts commit 11c3cead23783e65fb30e673d62771352078ff05. It was causing infinite loops in the DAG combiner.	2023-02-16 17:11:32 +00:00
Jay Foad	9305b63d69	[AMDGPU] Add another G_UNMERGE_VALUES legalization test case	2023-02-16 16:45:35 +00:00
Florian Hahn	2ac85cd563	[AMDGPU] Regenerate check lines to enable updating for D144050.	2023-02-16 16:38:15 +00:00
Philip Reames	80abf86d50	Revert "[RISCV][CodeGen] Add codegen pattern for experimental zfa extension (FLI and FCVTMOD not included)" This reverts commit fc6d517e2f335c2ab2b14a34eb747a4703aca7e4. It was submitted without an appropriate patch description. Will reapply shortly.	2023-02-16 07:49:44 -08:00
David Green	7abe3497e7	[LSR] Improve filtered uses in NarrowSearchSpaceByPickingWinnerRegs NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to reduce the complexity of the search space down by picking a best formula with the highest number of reuses and assuming it will yield profitable reuse. In certain cases we can find a best formula like {X+30,+,1} and later check a formula like {X,+,1} with the same number of Uses. On some architectures it can be better to pick {X,+,1}, especially if an offset of 30 can be used as a legal addressing mode, but -30 cannot. That happens under Thumb1 code, which has fairly limited addressing modes. This patch adds a check to see if it can pick the simpler formula, if it looks more profitable. Differential Revision: https://reviews.llvm.org/D144014	2023-02-16 15:48:12 +00:00
David Green	66749ce927	[ARM] Add Thumb LSR codegen tests. NFC This is the same routine generated in two different ways that ends up with different orders to loads. The first currently does better than the second with ordered loads, but needn't if the filtering in LSR is improved.	2023-02-16 14:24:51 +00:00
Kerry McLaughlin	ba23bca0a8	[SME2][AArch64] Add multi-single multiply-add long long intrinsics Adds intrinsics for the following SME2 instructions: - smlall (1, 2 & 4 vectors) - umlall (1, 2 & 4 vectors) - smlsll (1, 2 & 4 vectors) - umlsll (1, 2 & 4 vectors) - sumlall (2 & 4 vectors) - usmlall (1, 2 & 4 vectors) NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D143276	2023-02-16 13:12:47 +00:00
Tim Northover	2002c82278	AArch64: count callee stack we use when estimating scavenging requirements.	2023-02-16 09:59:27 +00:00
Diana Picus	819dfc338b	[AMDGPU] Autogenerate checks for several tests. NFCI	2023-02-16 10:54:34 +01:00
Fangrui Song	f62b084e92	[LoopDeletion] Remove legacy pass Following recent changes to remove non-core legacy passes.	2023-02-15 23:31:05 -08:00
Xiang1 Zhang	96df79af02	[X86] Support load/store for bf16 in avx Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D144163	2023-02-16 14:39:35 +08:00
Jun Sha (Joshua)	fc6d517e2f	[RISCV][CodeGen] Add codegen pattern for experimental zfa extension (FLI and FCVTMOD not included)	2023-02-16 13:41:41 +08:00
Noah Goldstein	8bd0e9481c	Revert "Transform vector SET{LE/ULT/ULE} -> SETLT and SET{GE/UGT/UGE} -> SETGT if possible" This reverts commit f3732c2b18df305a1927b9d4a94610421a2750e7.	2023-02-15 16:33:38 -06:00
Markus Böck	d464edde95	[X86][Win64] Avoid statepoints prior to SEH epilogue This patchs purpose is very similar to https://reviews.llvm.org/D119644 The gist of the issue is that SEH unwinding has certain invariants around call instructions. One of those is that a call instruction must not be immediately followed by the function epilogue. Failing to do so leads to Windows' unwinder not recognizing the frame and skipping it when unwinding the stack. LLVM ensures this invariant by inserting a noop after a call prior to an epilogue. The implementation however, makes the unfortunate assumption that pseudo instructions may not be calls, leading to statepoints being skipped and no noop being inserted. This patch fixes that issue by only skipping over pseudo instructions that aren't calls. Differential Revision: https://reviews.llvm.org/D143812	2023-02-15 22:47:28 +01:00
Fangrui Song	6f3e6a765a	Revert D129735 "[RISCV] Add new pass to transform undef to pseudo for vector values." This reverts commit f1c4241fb6e50c507adafbe14faf82a755ab92ca. It causes use-after-poison asan failures for CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll and CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll	2023-02-15 11:51:08 -08:00
David Green	8a7b5e0e50	[AArch64] Guard extra uses in mls combine. This is a small extension to D143143 to ensure that nodes with multiple uses to not get transformed. The tests have also been extended to include more mla cases.	2023-02-15 18:36:46 +00:00
David Green	b0bfbad19b	[AArch64] Always lower fp16 zero to FMOVH0 We can always use FMOVH0 to lower fp16 zero, even without fullfp16. We can either expand it to movi d0, #0 or fmov s0, wzr, which will both clear all the bits of the register. Differential Revision: https://reviews.llvm.org/D143988	2023-02-15 16:06:32 +00:00
David Spickett	93164dba08	[llvm][AArch64] Fix BTI after returns_twice when call has no attributes Previously we were looking for the returns twice attribute by manually getting the function attributes from the call. This meant that we only found attributes on the call itself, not what it was calling. So if you had: %call1 = call i32 @setjmp(ptr noundef null) We would not BTI protect that even though setjmp clearly needs it. Clang happens to produce: %call = call i32 @setjmp(ptr noundef null) #0 ; returns_twice So all valid calls were protected. This is not guaranteed, the frontend may choose not to put attributes on the call. It is undefined behaviour to call setjmp indirectly (https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html) but as I misused the APIs here I think it's worth fixing up regardless. Added comments to the test file where the IR differs from what clang would output. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144082	2023-02-15 15:30:37 +00:00
Simon Pilgrim	b0e7ca79ab	[X86] Remove abs(sub_nsw()) -> abds fold from adbu test file Copy+paste typo - it was correctly removed from 128/512 variants	2023-02-15 12:49:28 +00:00

1 2 3 4 5 ...

47063 Commits