llvm-project

Author	SHA1	Message	Date
Ard Biesheuvel	a19da876ab	[ARM] implement support for TLS register based stack protector Implement support for loading the stack canary from a memory location held in the TLS register, with an optional offset applied. This is used by the Linux kernel to implement per-task stack canaries, which is impossible on SMP systems when using a global variable for the stack canary. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112768	2021-11-09 18:19:47 +01:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Simon Pilgrim	4ed13275b7	[ARM] Precommit i128 test from D111530	2021-11-08 16:08:21 +00:00
Jay Foad	bdaa181007	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 21:20:30 +00:00
Jay Foad	0321bd64e6	Revert "[TwoAddressInstructionPass] Update existing physreg live intervals" This reverts commit ec0e1e88d24fadb2cb22f431d66b22ee1b01cd43. It was pushed by mistake.	2021-11-05 09:54:26 +00:00
Jay Foad	ec0e1e88d2	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 09:10:24 +00:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
Simon Pilgrim	a763d0010c	[ARM] Regenerate shift-combine.ll test checks	2021-11-04 14:27:31 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit da1d1a08694bbfe0ea7a23ea094612436e8a2dd0 . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit b6420e575f3bbb6b6df848c0284d6b60eeb07350.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit da1d1a08694bbfe0ea7a23ea094612436e8a2dd0 . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
Max Kazantsev	8daf76935d	[Test] Regenerate some of llc test checks using auto updater	2021-10-28 16:18:30 +07:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit da1d1a08694bbfe0ea7a23ea094612436e8a2dd0.	2021-10-27 14:29:35 +02:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
Simon Pilgrim	d8e50c9dba	[CodeGen] Add PR50197 AArch64/ARM/X86 test coverage Pre-commit for D111530	2021-10-22 14:22:46 +01:00
Craig Topper	b75f3dd88e	[ARM] Use correct name of floating point ceil intrinsic in test. The intrinsic is called llvm.ceil not llvm.fceil. The checks weren't strong enough to notice that a call to llvm.fceil was emitted in the final assembly.	2021-10-20 17:30:26 -07:00
John Brawn	082fa56819	[ARM] Fix MOVCC peephole to not use an incorrect register class The MOVCC peephole eliminates a MOVCC by making one of its inputs a conditional instruction, but when doing this it should be using both inputs of the MOVCC to decide on the register class to use as otherwise we can get an error when using -verify-machineinstrs. Differential Revision: https://reviews.llvm.org/D111714	2021-10-15 10:54:26 +01:00
Andrew Savonichev	dc8a41de34	[ARM] Simplify address calculation for NEON load/store The patch attempts to optimize a sequence of SIMD loads from the same base pointer: %0 = gep float, float base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float, float base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2 For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction: add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0] This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest: add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0] In order to do that, the patch adds more patterns to DAGCombine: - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants. - (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned. In addition to that, we now search for all possible base updates and then pick the best one. Differential Revision: https://reviews.llvm.org/D108988	2021-10-14 15:23:10 +03:00
Guozhi Wei	6599961c17	[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation This patch contains following enhancements to SrcRegMap and DstRegMap: 1 In findOnlyInterestingUse not only check if the Reg is two address usage, but also check after commutation can it be two address usage. 2 If a physical register is clobbered, remove SrcRegMap entries that are mapped to it. 3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap entry only when the COPY instruction is coalescable. (The COPY src is killed) With these enhancements isProfitableToCommute can do better commute decision, and finally more register copies are removed. Differential Revision: https://reviews.llvm.org/D108731	2021-10-11 15:28:31 -07:00
Qiu Chaofan	573531fb1f	Fix typo of colon to semicolon in lit tests	2021-10-09 10:03:50 +08:00
Pengxuan Zheng	b0045f5595	[ARM] Fix a bug in finding a pair of extracts to create VMOVRRD D100244 missed a check on the ResNo of the extract's operand 0 when finding a pair of extracts to combine into a VMOVRRD (extract(x, n); extract(x, n+1) -> VMOVRRD(extract x, n/2)). As a result, it can incorrectly pair an extract(x, n) with another extract(x:3, n+1) for example. This patch fixes the bug by adding the proper check on ResNo. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111188	2021-10-06 10:03:32 -07:00
David Green	ffaaa9b05c	[ARM] Reset speculation-hardening-sls.ll test checks. The commit e497b12a69604b6d691312a30f6b86da4f18f7f8 went and regenerated all the checks lines in the Arm speculation-hardening-sls.ll test in a way that removed most of the important checks. This just resets them back to how they were before, with the single character fix to change: ; NOHARDENARM: {{bxge lr$}} to ; NOHARDENARM: {{bxgt lr$}} Differential Revision: https://reviews.llvm.org/D111074	2021-10-05 10:51:18 +01:00
Amara Emerson	8bde5e58c0	Delay outgoing register assignments to last. The delayed stack protector feature which is currently used for SDAG (and thus allows for more commonly generating tail calls) depends on being able to extract the tail call into a separate return block. To do this it also has to extract the vreg->physreg copies that set up the call's arguments, since if it doesn't then the call inst ends up using undefined physregs in it's new spliced block. SelectionDAG implementations can do this because they delay emitting register copies until after the stack arguments are set up. GISel however just processes and emits the arguments in IR order, so stack arguments always end up last, and thus this breaks the code that looks for any register arg copies that precede the call instruction. This patch adds a thunk argument to the assignValueToReg() and custom assignment hooks. For outgoing arguments, register assignments use this return param to return a thunk that does the actual generating of the copies. We collect these until all the outgoing stack assignments have been done and then execute them, so that the copies (and perhaps some artifacts like G_SEXTs) are placed after any stores. Differential Revision: https://reviews.llvm.org/D110610	2021-10-04 12:33:20 -07:00
David Green	20b1a16a69	[ARM] Mark <= -1 immediate constant as cheap A <= -1 constant on a compare can be converted to a < 0 operation, which is usually cheap. If we mark the constant as cheap, preventing hoisting, we allow that fold to happen even across different blocks. Differential Revision: https://reviews.llvm.org/D109360	2021-10-03 19:30:08 +01:00
David Green	d6482df683	[ARM] Tests for constant hoisting -1 immediates	2021-10-03 16:32:31 +01:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.	2021-09-24 10:26:11 -07:00
Jay Foad	7863cc6c1c	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238 Unrevert with some changes to the tests: - Add -verify-machineinstrs to check for remaining problems in live interval support in TwoAddressInstructionPass. - Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from some of those remaining problems.	2021-09-24 11:44:49 +01:00
Jay Foad	deb2ca566a	Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases" This reverts commit 8229cb74125322ff337cfe316ab35c6ebf412bde. It was failing on buildbots with expensive checks enabled.	2021-09-23 17:55:05 +01:00
Jay Foad	8229cb7412	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238	2021-09-23 17:16:14 +01:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
Petar Avramovic	e4c46ddd91	[GlobalISel] Improve elimination of dead instructions in legalizer Add eraseInstr(s) utility functions. Before deleting an instruction collects its use instructions. After deletion deletes use instructions that became trivially dead. This patch clears all dead instructions in existing legalizer mir tests. Differential Revision: https://reviews.llvm.org/D109154	2021-09-20 13:00:58 +02:00
David Green	1da52ef294	[ARM] Add VGETLANEu patterns for v4f16 and v8f16 These were apparently missing, having no pattern that could convert a VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the same code.	2021-09-19 14:25:21 +01:00
Alexandros Lamprineas	1bd5ea968e	[ARM] Mitigate the cve-2021-35465 security vulnurability. Recently a vulnerability issue is found in the implementation of VLLDM instruction in the Arm Cortex-M33, Cortex-M35P and Cortex-M55. If the VLLDM instruction is abandoned due to an exception when it is partially completed, it is possible for subsequent non-secure handler to access and modify the partial restored register values. This vulnerability is identified as CVE-2021-35465. The mitigation sequence varies between v8-m and v8.1-m as follows: v8-m.main --------- mrs r5, control tst r5, #8 /* CONTROL_S.SFPA / it ne .inst.w 0xeeb00a40 / vmovne s0, s0 / 1: vlldm sp / Lazy restore of d0-d16 and FPSCR. / v8.1-m.main ----------- vscclrm {vpr} / Clear VPR. / vlldm sp / Lazy restore of d0-d16 and FPSCR. */ More details on developer.arm.com/support/arm-security-updates/vlldm-instruction-security-vulnerability Differential Revision: https://reviews.llvm.org/D109157	2021-09-16 12:56:43 +01:00
Alexandros Lamprineas	61f25daa8d	[ARM][CMSE] Clear the secure fp-registers when using softfp abi. When expanding the non-secure call instruction we are emiting code to clear the secure floating-point registers only if the targeted architecture has floating-point support. The potential problem is when the source code containing non-secure calls are built with -mfloat-abi=soft but some other part of the system has been built with -mfloat-abi=softfp (soft and softfp are compatible as they use the same procedure calling standard). In this case floating-point registers could leak to non-secure state as the non-secure won't have cleared them assuming no floating point has been used. Differential Revision: https://reviews.llvm.org/D109153	2021-09-16 12:56:43 +01:00
Philip Reames	debbf8049d	autogen a test for ease of update	2021-09-15 11:11:07 -07:00
David Green	a2332d5332	[ARM] Prevent continuous folding of SUBC Under some situations under Thumb1, we could be stuck in an infinite loop recombining the same instruction. This puts a limit on that, not combining SUBC with SUBE repeatedly.	2021-09-15 11:23:32 +01:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Nikita Popov	f5806830e0	[ARM] Support neon.vld auto-upgrade with opaque pointers This code manually constructs the intrinsic name, so we need to use p0 instead of p0i8 in opaque pointer mode.	2021-09-11 16:34:32 +02:00
Arthur Eubanks	fe15347a1e	Port the cost model printer to New PM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109284	2021-09-08 14:47:05 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC 79845ed6dfc6511f99 folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Ben Shi	63ca9371c7	[ARM] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding in DAGCombine if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109124	2021-09-07 15:42:43 +08:00
Ben Shi	20f890696f	[ARM][test] Add new tests for (mul (add r, c0), c1) Reviewed By: RKSimon, dmgreen Differential Revision: https://reviews.llvm.org/D109123	2021-09-07 15:42:32 +08:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
David Green	7801d7963d	[DAG] Add tests for select_cc and setcc with constant patterns.	2021-09-05 10:17:21 +01:00

1 2 3 4 5 ...

4429 Commits