llvm-project

Author	SHA1	Message	Date
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Sanjay Patel	ec0b332cd8	[AArch64] add tests for funnel+or == 0; NFC These are copied from x86 ( 1074bdfb52b2e1753e51472 ) to provide more coverage for a potential generic combine.	2022-04-01 13:39:25 -04:00
Nicholas Guy	7d676714fb	[AArch64] Set MaxBytesForLoopAlignment for more targets Differential Revision: https://reviews.llvm.org/D122566	2022-03-31 11:37:11 +01:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Eli Friedman	a8ebd85e46	[MC] Make MCAsmInfo::isAcceptableChar reflect MCAsmInfo::doesAllowAtInName On targets which don't allow "@" in unquoted identifiers, make sure we don't emit them; otherwise, we can't parse our own output. Differential Revision: https://reviews.llvm.org/D122516	2022-03-29 14:01:32 -07:00
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
zhongyunde	2b3becb41d	[AArch64][GlobalISel] Add new MOVI pattern for fp constants GlobalISel is used in option -O0, so add MOVI pattern for it, which is done similar in gcc.(https://godbolt.org/z/8j6fzG3h6) Fix https://github.com/llvm/llvm-project/issues/53651 Reviewed By: dmgreen, paquette Differential Revision: https://reviews.llvm.org/D122559	2022-03-29 10:57:22 +08:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
zhongyunde	758be63ac6	[test][AArch64] Add a test case for D121180 NFC Now, perform last active true vector combine only where we're extracting from a flag-setting operation. But in fact, the last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so precommit this case to test the potentially further optimization. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122453	2022-03-26 19:12:16 +08:00
David Green	3d8d60e147	Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL" This reverts commit ec93b28909749619dbe58b092a13da9d1ff1eb1e as problems with it have been reported.	2022-03-25 10:03:10 +00:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
David Green	ec93b28909	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-24 10:02:33 +00:00
David Green	311bdbc9b7	[AArch64] Add tests showing inefficient TBL3/4 generation. NFC	2022-03-23 16:43:23 +00:00
David Spickett	c3b98194df	Reland "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit edb7ba714acba1d18a20d9f4986d2e38aee1d109. This changes BLR_BTI to take variable_ops meaning that we can accept a register or a label. The pattern still expects one argument so we'll never get more than one. Then later we can check the type of the operand to choose BL or BLR to emit. (this is what BLR_RVMARKER does but I missed this detail of it first time around) Also require NoSLSBLRMitigation which I missed in the first version.	2022-03-23 11:43:43 +00:00
David Spickett	edb7ba714a	Revert "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit eb5ecbbcbb6ce38e29237ab5d17156fcb2e96e74 due to failures on buildbots with expensive checks enabled.	2022-03-23 10:43:20 +00:00
David Spickett	eb5ecbbcbb	[llvm][AArch64] Insert "bti j" after call to setjmp Some implementations of setjmp will end with a br instead of a ret. This means that the next instruction after a call to setjmp must be a "bti j" (j for jump) to make this work when branch target identification is enabled. The BTI extension was added in armv8.5-a but the bti instruction is in the hint space. This means we can emit it for any architecture version as long as branch target enforcement flags are passed. The starting point for the hint number is 32 then call adds 2, jump adds 4. Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see at the start of functions). The existing Arm command line option -mno-bti-at-return-twice has been applied to AArch64 as well. Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will defer to SelectionDAG. Based on the change done for M profile Arm in https://reviews.llvm.org/D112427 Fixes #48888 Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D121707	2022-03-23 09:51:02 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
David Green	0fa4aeb453	[AArch64] Add extra insert-subvector tests. NFC	2022-03-17 15:29:07 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
David Green	09a2b5b506	[AArch64] Regenerate and extend peephole-and-tst.ll tests. NFC	2022-03-16 09:44:20 +00:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Amara Emerson	8cbf18cb04	[GlobalISel] Fix store merging incorrectly merging volatile stores. The existing volatile checks only handle aliasing hazards between stores, but that isn't enough since by that point volatile stores may have already been added to the current candidate group.	2022-03-14 13:48:51 -07:00
Florian Mayer	628c537b32	[MTE] Add test that stack tagging does not mess up stack coloring. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121433	2022-03-14 13:36:21 -07:00
Mircea Trofin	294eca35a0	[regalloc] Remove -consider-local-interval-cost Discussed extensively on D98232. The functionality introduced in D35816 never worked correctly. In D98232, it was fixed, but, as it was introducing a large compile-time regression, and the value of the original patch was called into doubt, we disabled it by default everywhere. A year later, it appears that caused no grief, so it seems safe to remove the disabled code. This should be accompanied by re-opening bug 26810. Differential Revision: https://reviews.llvm.org/D121128	2022-03-14 10:49:16 -07:00
zhongyunde	3568333815	[AArch64] Perform last active true vector combine Test bit of lane EC-1 can use P register directly, eg: Materialize : Idx = (add (mul vscale, NumEls), -1) i1 = extract_vector_elt t37, Constant:i64<Idx> ... into: "ptrue p, all" + PTEST Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121180	2022-03-15 01:25:03 +08:00
Arthur Eubanks	250620f76e	[OpaquePtr][AArch64] Use elementtype on ldxr/stxr Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120527	2022-03-14 10:09:59 -07:00
Sanjay Patel	c2592c374e	[SDAG] simplify bitwise logic with repeated operand We do not have general reassociation here (and probably do not need it), but I noticed these were missing in patches/tests motivated by D111530, so we can at least handle the simplest patterns. The VE test diff looks correct, but we miss that pattern in IR currently: https://alive2.llvm.org/ce/z/u66_PM	2022-03-13 11:12:30 -04:00
Sanjay Patel	9f4caf55db	[AArch64] add tests for bitwise logic reassociation; NFC Chooses from a variety of scalar/vector/illegal types because that should not inhibit any folds.	2022-03-13 11:12:30 -04:00
David Sherwood	aeeb1199b4	[AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types When building the LLVM test suite with SVE I discovered a crash when compiling some Halide tests, which occurs because we try to use SVE to lower 64-bit vector multiplies and there is no vscale_range attribute on the function. In this case the min SVE vector bits was 0, which caused an assert in LowerToPredicatedOp to fire. I have amended the asserts in this function to check that the fixed-width type is legal. If the fixed-width type is larger than NEON and is legal then it must be because we've set the min SVE vector bits to something > 128. Or if the min SVE bits is 0, then the only legal types allowed are 128 bit types - for any other types the assert will fire. Tests added here: CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll Differential Revision: https://reviews.llvm.org/D121297	2022-03-11 09:57:58 +00:00
Philippe Valembois	26cd258420	[AArch64] Use correct calling convention for each vararg While checking is tail call optimization is possible, the calling convention applied to fixed arguments is not the correct one. This implies for DarwinPCS that all arguments of a vararg function will go to the stack although fixed ones can go in registers. This prevents non-virtual thunks to be tail optimized although they are marked as musttail. Differential Revision: https://reviews.llvm.org/D120622	2022-03-10 15:07:25 -08:00
David Green	21a97a2ac1	[AArch64] TBL uses zero for out of range elements. A TBL instruction will use zero for any out of range values. We can use this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need to materialise the zero. Differential Revision: https://reviews.llvm.org/D121139	2022-03-10 14:45:13 +00:00
David Green	43591be2aa	[AArch64] Extra tests for tbl with zero elements. NFC	2022-03-10 13:51:04 +00:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Saleem Abdulrasool	c31f0a0050	AArch64: correct epilogue/prologue emission for swift async The prologue and epilogue emission were unbalanced in light of different strategies of async frame context emission. Adjust the epilogue emission to match the prologue emission. This makes the elision work properly as well as the deployment based. Due to the fact that the epilogue always was clearing a bit (which should not be set in the first place), the client would not notice the behavioural issue unless the deployment version was in effect.	2022-03-09 18:41:10 +00:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( 74a65e3834d9487 ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Florian Hahn	3836003e87	[AArch64] Add test for D120481 with multiple uses.	2022-03-08 11:11:03 +00:00
zhongyunde	c22c8b151b	[AArch64] Perform first active true vector combine Materialize : i1 = extract_vector_elt t37, Constant:i64<0> ... into: "ptrue p, all" + PTEST Test bit of lane 0 can use P register directly, and the instruction “pture all” is loop invariant, which will beneficial to SVE after hoisting out the loop. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D120891	2022-03-08 01:10:21 +08:00
David Green	d9633d1490	[AArch64] Turn truncating buildvectors into truncates When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal "concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which is much better than expanding by moving vector lanes around. Differential Revision: https://reviews.llvm.org/D119469	2022-03-07 09:42:54 +00:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
David Green	84ccd015e7	[AArch64] Some tests to show reconstructing truncates. NFC	2022-03-05 18:35:43 +00:00
Karl Meakin	1d8093fe1e	[AArch64] fix i128-math.ll	2022-03-05 17:51:58 +00:00
Karl Meakin	f3e254b3f3	[AArch64] Add test for i128 overflow/saturation ops (NFC) This test exposes opportunities for future optimization work Differential Revision: https://reviews.llvm.org/D121013	2022-03-05 17:25:04 +00:00
Sanjay Patel	f4b53972ce	[SDAG] fold bitwise logic with shifted operands This extends acb96ffd149d to 'and' and 'xor' opcodes. Copying from that message: LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().	2022-03-05 11:14:45 -05:00
Sanjay Patel	90c2330c15	[AArch64][x86] add tests for bitwise logic + shifts; NFC Copy tests from ecf606cb4329ae and replace 'or' with 'xor' / 'and'. This provides coverage for an enhancement of D120516 / acb96ffd149d	2022-03-05 11:14:45 -05:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.	2022-03-04 17:36:26 +01:00
zhongyunde	7a605ab7bf	[AArch64] Use simd mov to materialize big fp constants mov w8, #1325400064 + fmov s0, w8 ==> movi v0.2s, 0x4f, lsl 24 Fix https://github.com/llvm/llvm-project/issues/53651 Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D120452	2022-03-04 11:34:20 -05:00

1 2 3 4 5 ...

5451 Commits