llvm-project

Author	SHA1	Message	Date
AtariDreams	c5d000b1a8	[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908 ) Consider the following: ldr r0, [r4] ldr r7, [r0, #4] cmp r7, r3 bhi .LBB0_6 cmp r0, r2 push {r0} pop {r4} bne .LBB0_3 movs r0, r6 pop {r4, r5, r6, r7} pop {r1} bx r1 Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to. push {r0} ends up being popped to pop {r4}. movs r4, r0 would destroy the flags set by cmp right above. The compiler has no alternative in this case, except one: the only alternative is to transfer through a high register. However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register. This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.	2024-04-05 10:18:22 +01:00
Jay Foad	7a0e222a17	Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905 )" This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c. It was causing test failures on the expensive check builders.	2024-03-07 08:20:26 +00:00
AtariDreams	2a13422b8b	Convert many LivePhysRegs uses to LiveRegUnits (#83905 )	2024-03-06 10:38:14 +05:30
Nikita Popov	b31fffbc7f	[ARM] Convert tests to opaque pointers (NFC)	2024-02-05 13:56:59 +01:00
PiJoules	a356e6ccad	[SelectionDAG] Expand fixed point multiplication into libcall (#79352 ) 32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom nodes during expansion which will cause fixed point multiplication of _Accum types to fail with fixed point arithmetic. Prior to this, we just happen to use fixed point multiplication on platforms that happen to support these MULHS/MUL_LOHI. This patch attempts to check if the multiplication can be done via libcalls, which are provided by the arm runtime. These libcall attempts are made elsewhere, so this patch refactors that libcall logic into its own functions and the fixed point expansion calls and reuses that logic.	2024-01-30 13:58:55 -08:00
Danial Klimkin	16df714e77	[test] Update stack_guard_remat.ll (#79139 ) Replace cp with a cat. This allows to create a writable file when the original one is read-only.	2024-01-23 14:48:33 +01:00
Fangrui Song	eaef645a58	[test] Update stack_guard_remat.ll	2024-01-22 14:18:23 -08:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Keith Walker	2d9c6e699a	[Thumb1] Use callee-saved register to adjust stack pointer When adjusting the Stack Pointer at the end of the function epilogue, use a callee-saved register, rather than explicitly using R4 which may not have been saved. Differential Revision: https://reviews.llvm.org/D157500	2023-08-17 18:29:50 +01:00
Simon Wallis	33b9634394	[ARM] v6-M XO: save CPSR around LoadStackGuard For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset. It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR. To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register. expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register. Use R12 as a scratch register, as is usual when expanding a pseudo. MSR/MRS are some of the few v6-M instructions which operate on a high register. New stack-guard test case added which was generating incorrect code without the save/restore CPSR. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D156968	2023-08-09 08:40:35 +01:00
John Brawn	f83ab2b3be	[ARM] Improve generation of thumb stack accesses Currently when a stack access is out of range of an sp-relative ldr or str then we jump straight to generating the offset with a literal pool load or mov32 pseudo-instruction. This patch improves that in two ways: * If the offset is within range of sp-relative add plus an ldr then use that. * When we use the mov32 pseudo-instruction, if putting part of the offset into the ldr will simplify the expansion of the mov32 then do so. Differential Revision: https://reviews.llvm.org/D156875	2023-08-07 17:53:32 +01:00
Jay Foad	945123384e	[PEI][ARM] Switch to backwards frame index elimination This adds better support for call frame pseudos that adjust SP in PEI::replaceFrameIndicesBackward. Running frame index elimination backwards is preferred because it can do backwards register scavenging (on targets that require scavenging) which does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156434	2023-07-28 17:32:51 +01:00
Ties Stuij	84f888ca82	[ARM] don't emit constant pool for Thumb1 XO/stack guard combo Currently for armv6-m and armv8-m.baseline, we emit constant pool code when we use execute-only (XO) in combination with stack guards. XO is a new feature for armv6-m, and this patch is part of a series of patches that substitutes constant pool generation with the tMOVi32imm equivalent. However XO for armv8-m.baseline has been available for about 6 years, and so for armv8-m.baseline this is a bugfix. Reviewed By: simonwallis2, olista01 Differential Revision: https://reviews.llvm.org/D155170	2023-07-19 13:51:43 +01:00
Ties Stuij	1f082d2da0	[ARM] make execute only long call test checks more robust Reviewed By: olista01 Differential Revision: https://reviews.llvm.org/D154355	2023-07-04 10:51:48 +01:00
Ties Stuij	4f19c6a7c7	[ARM] allow long-call codegen for armv6-M eXecute Only (XO) Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this was only implemented for ~armv7+, effectively if MOVW/MOVT is available. Regarding long calls, we remove the check for MOVW/MOVT when generating code for XO, which already was redundant as in the subtarget initialization we already check if XO is valid for the target. And targets that generate valid XO code should be able to handle the (wrapper globaladdress) node. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153782	2023-06-28 10:50:24 +01:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Fangrui Song	e018cbf720	[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data There are two motivations. `-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data. Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me. --- Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR. Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much. As a compromise, we set "direct-access-external-data" only when it's different from the implied default value. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D150841	2023-05-23 09:49:57 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Florian Hahn	4e2b4f97a0	[ShrinkWrap] Use underlying object to rule out stack access. Allow shrink-wrapping past memory accesses that only access globals or function arguments. This patch uses getUnderlyingObject to try to identify the accessed object by a given memory operand. If it is a global or an argument, it does not access the stack of the current function and should not block shrink wrapping. Note that the caller's stack may get accessed when passing an argument via the stack, but not the stack of the current function. This addresses part of the TODO from D63152. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D149668	2023-05-03 09:28:07 +01:00
Zhiyao Ma	1d0ccebcd7	[ARM] Don't allocate memory if free space in segmented stack is just enough Assuming that the stack grows downwards, it is fine if the stack pointer is exactly at the stacklet boundary. We should use less-or-equal condition when deciding whether to skip new memory allocation. Differential Revision: https://reviews.llvm.org/D149315	2023-05-02 13:09:49 +01:00
sgokhale	bb5befefc6	Revert "[CodeGen][ShrinkWrap] Split restore point" This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83. An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833	2023-04-13 10:52:28 +05:30
sgokhale	5f0bccc3d1	[CodeGen][ShrinkWrap] Split restore point This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index). Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change for others. Co-authored-by: junbuml Differential Revision: https://reviews.llvm.org/D42600	2023-04-11 11:58:50 +05:30
David Green	7abe3497e7	[LSR] Improve filtered uses in NarrowSearchSpaceByPickingWinnerRegs NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to reduce the complexity of the search space down by picking a best formula with the highest number of reuses and assuming it will yield profitable reuse. In certain cases we can find a best formula like {X+30,+,1} and later check a formula like {X,+,1} with the same number of Uses. On some architectures it can be better to pick {X,+,1}, especially if an offset of 30 can be used as a legal addressing mode, but -30 cannot. That happens under Thumb1 code, which has fairly limited addressing modes. This patch adds a check to see if it can pick the simpler formula, if it looks more profitable. Differential Revision: https://reviews.llvm.org/D144014	2023-02-16 15:48:12 +00:00
David Green	66749ce927	[ARM] Add Thumb LSR codegen tests. NFC This is the same routine generated in two different ways that ends up with different orders to loads. The first currently does better than the second with ordered loads, but needn't if the filtering in LSR is improved.	2023-02-16 14:24:51 +00:00
Nikita Popov	e6bf3fa05b	[Thumb] Convert tests to opaque pointers (NFC)	2022-12-19 13:05:05 +01:00
Roman Lebedev	6b3c2aed49	[NFC] Port codegen Thumb tests that invoke opt to `-passes=` syntax	2022-12-09 01:04:47 +03:00
Roman Lebedev	b1a9584818	[opt] Disincentivize new tests from using old pass syntax Over the past day or so, i've took a large swing at our tests, and reduced the number of tests that were still using the old syntax from ~1800 to just 200. Left to handle: (as it is seen in this patch) * Transforms/LSR * Transforms/CGP * Transforms/TypePromotion * Transforms/HardwareLoops * Analysis/* * some misc. I think this is the right point to start actively refusing to honor the old syntax, except for the old tests, to prevent the old syntax from creeping back in. Thus, let's add temporary default-off flag, and if it is not passed refuse to accept old syntax. The tests that still need porting are annotated with this flag. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D139647	2022-12-08 23:54:03 +03:00
Jonas Paulsson	5ecd363295	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." This reverts commit 122efef8ee9be57055d204d52c38700fe933c033. - Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now. Original commit message: A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-05 12:53:50 -06:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Daniel Thornburgh	75cdab6dc2	[llvm-objdump] Add --no-print-imm-hex to tests depending on it. This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.	2022-10-29 15:40:26 -07:00
Simon Pilgrim	78739fdb4d	[DAG] Enable combineShiftOfShiftedLogic folds after type legalization This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides. Fixes #57872 Differential Revision: https://reviews.llvm.org/D136042	2022-10-29 12:30:04 +01:00
Lucas Prates	70a5c52534	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-27 14:08:48 +01:00
Krasimir Georgiev	8f2ba36336	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m" This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and dependent cbcce82ef6b512d97e92a319a75a03e997c844e1. Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test failures under asan, e.g., CodeGen/ARM/execute-only.ll: https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.	2022-06-15 16:10:02 +02:00
Lucas Prates	cbcce82ef6	[NFC][Thumb] Update frame-chain codegen test to use thumbv6m The `llvm/test/CodeGen/Thumb/frame-chain.ll`, recently added by D125094, currently fails when expensive checks are enabled due to a tMOVr instruction that is only valid from V6 onwards. The use of the invalid instruction is unrelated to the contents of the original patch, and continues to be triggered by this test if its CodeGen changes are reverted, so this patch updates the test to use V6-M while the issue is not resolved.	2022-06-14 14:36:57 +01:00
Lucas Prates	7625e01d66	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-14 13:37:51 +01:00
Lucas Prates	33b9ad647e	Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records" Reverting change due to test failure. This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.	2022-06-13 11:00:49 +01:00
Lucas Prates	6119053dab	[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records Currently the a AAPCS compliant frame record is not always created for functions when it should. Although a consistent frame record might not be required in some cases, there are still scenarios where applications may want to make use of the call hierarchy made available trough it. In order to enable the use of AAPCS compliant frame records whilst keep backwards compatibility, this patch introduces a new command-line option (`-mframe-chain=[none\|aapcs\|aapcs+leaf]`) for Aarch32 and Thumb backends. The option allows users to explicitly select when to use it, and is also useful to ensure the extra overhead introduced by the frame records is only introduced when necessary, in particular for Thumb targets. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D125094	2022-06-13 10:21:06 +01:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Craig Topper	74f6ded49d	[AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC This bug is in generic DAG combine and easily reproducible on many targets. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D125640	2022-05-16 09:28:11 -07:00
Zhiyao Ma	bd606afe26	[ARM] Only update the successor edges for immediate predecessors of PrologueMBB When adjusting the function prologue for segmented stacks, only update the successor edges of the immediate predecessors of the original prologue. Differential Revision: https://reviews.llvm.org/D122959	2022-05-03 12:36:35 +01:00
Zhiyao Ma	adc26b4eae	[ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue. It fixes the overflow of 8-bit immediate field in the emitted instruction that allocates large stacklet. For thumb2 targets, load large immediate by a pair of movw and movt instruction. For thumb1 and ARM targets, load large immediate by reading from literal pool. Differential Revision: https://reviews.llvm.org/D118545	2022-03-10 15:15:24 -08:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
Jay Foad	f510045d82	[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC. Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ]. Differential Revision: https://reviews.llvm.org/D117298	2022-02-18 16:10:56 +00:00

1 2 3 4 5 ...

470 Commits