llvm-project

Author	SHA1	Message	Date
AZero13	6a425f1e54	[ARM] Have custom lowering for ucmp and scmp (#149315 ) Limited to non-thumb1 for scmp at the moment, since there is no good way to do it.	2025-08-08 06:51:18 +01:00
Jessica Clarke	1b41599cf8	[MC][AArch64][ARM][X86] Push target-dependent assembler flags into targets (#139844 ) The .syntax unified directive and .codeX/.code X directives are, other than some simple common printing code, exclusively implemented in the targets themselves. Thus, remove the corresponding MCAF_* flags and reimplement the directives solely within the targets. This avoids exposing all targets to all other targets' flags. Since MCAF_SubsectionsViaSymbols is all that remains, convert it to its own function like other directives, simplifying its implementation. Note that, on X86, we now always need a target streamer when parsing assembly, as it's now used for directives that aren't COFF-specific. It still does not however need to do anything when producing a non-COFF object file, so this commit does not introduce any new target streamers. There is some churn in test output, and corresponding UTC regex changes, due to comments no longer being flushed by these various directives (and EmitEOL is not exposed outside MCAsmStreamer.cpp so we couldn't do so even if we wanted to), but that was a bit odd to be doing anyway. This is motivated by Morello LLVM, which adds yet another assembler flag to distinguish A64 and C64 instruction sets, but did not update every switch and so emits warnings during the build. Rather than fix those warnings it seems better to instead make the problem not exist in the first place via this change.	2025-05-18 20:09:43 +01:00
Akshat Oke	aa1fe57b19	[RegAlloc][NewPM] Plug Greedy RA in codegen pipeline (#120557 ) Use `-passes="regallocgreedy<[all\|sgpr\|wwm\|vgpr]>` to insert the greedy RA with a filter and `-regalloc-npm=<type>` to control which RA to use in existing pipeline.	2025-03-03 11:06:15 +05:30
Vikash Gupta	352c48f278	[SelectionDAG] Utilizing target hook convertSelectOfConstantsToMath for SelectwithConstant (#127599 ) The Target hook convertSelectOfConstantsToMath() needs to be used within SimplifySelectCC helper combine function in SelectionDAG Isel, where generic select folding with constants is happening into simple maths op using the condition as it is. It necessarily fixes #121145.	2025-02-25 20:32:24 +05:30
Craig Topper	e30a4fc3e2	[TargetLowering] Improve one signature of forceExpandWideMUL. (#123991 ) We have two forceExpandWideMUL functions. One takes the low and high half of 2 inputs and calculates the low and high half of their product. This does not calculate the full 2x width product. The other signature takes 2 inputs and calculates the low and high half of their full 2x width product. Previously it did this by sign/zero extending the inputs to create the high bits and then calling the other function. We can instead copy the algorithm from the other function and use the Signed flag to determine whether we should do SRA or SRL. This avoids the need to multiply the high part of the inputs and add them to the high half of the result. This improves the generated code for signed multiplication. This should improve the performance of #123262. I don't know yet how close we will get to gcc.	2025-01-23 12:49:35 -08:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Oliver Stannard	0c0f765cab	[ARM] Fix llvm.returnaddress for Thumb1 with R11 frame-pointer (#117735 ) When the llvm.returnaddress intrinsic is used, the LR is marked as live-in to the function, so it must be preserved through the prologue. This is normally fine, but there is one case for Thumb1 where we use LR as a temporary in the prologue to set up a frame chain using r11 as the frame pointer. There are no other registers guaranteed to be free to do this, so we have to re-load LR from the stack after pushing the callee saved registers.	2024-11-28 09:37:57 +00:00
Akshat Oke	c4c60c0db9	[CodeGen][NewPM] Port OptimizePHIs to NPM (#113433 )	2024-10-23 16:55:21 +05:30
gxlayer	4a2bd78f5b	[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628 ) The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures and addresses the bug reported in #108019	2024-10-17 20:25:06 +08:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Oliver Stannard	1a5239251e	[ARM] r11 is reserved when using -mframe-chain=aapcs (#86951 ) When using the -mframe-chain=aapcs or -mframe-chain=aapcs-leaf options, we cannot use r11 as an allocatable register, even if -fomit-frame-pointer is also used. This is so that r11 will always point to a valid frame record, even if we don't create one in every function.	2024-06-07 10:58:10 +01:00
paperchalice	1bc8b3258e	[NewPM][CodeGen] Port `regallocfast` to new pass manager (#94426 ) This pull request port `regallocfast` to new pass manager. It exposes the parameter `filter` to handle different register classes for AMDGPU. IIUC AMDGPU need to allocate different register classes separately so it need implement its own `--<reg-class>-regalloc`. Now users can use e.g. `-passe=regallocfast<filter=sgpr>` to allocate specific register class. The command line option `--regalloc-npm` is still in work progress, plan to reuse the syntax of passes, e.g. use `--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to replace `--sgpr-regalloc` and `--vgpr-regalloc`.	2024-06-07 12:22:42 +08:00
AtariDreams	1d3329c2e8	[Thumb] Resolve FIXME: Transform "(and (shl x, c2), c1)" into "(shl (and x, c1>>c2), c2)" (#82120 ) Transform "(and (shl x, c2), c1)" into "(shl (and x, c1>>c2), c2)" if "c1 >> c2" is a cheaper immediate than "c1" using HasLowerConstantMaterializationCost	2024-05-26 14:58:32 -04:00
AtariDreams	c5d000b1a8	[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908 ) Consider the following: ldr r0, [r4] ldr r7, [r0, #4] cmp r7, r3 bhi .LBB0_6 cmp r0, r2 push {r0} pop {r4} bne .LBB0_3 movs r0, r6 pop {r4, r5, r6, r7} pop {r1} bx r1 Here is a snippet of the generated THUMB1 code of the K&R malloc function that clang currently compiles to. push {r0} ends up being popped to pop {r4}. movs r4, r0 would destroy the flags set by cmp right above. The compiler has no alternative in this case, except one: the only alternative is to transfer through a high register. However, it seems like LLVM does not consider that this is a valid approach, even though it is a free clobbering a high register. This patch addresses the FIXME so the compiler can do that when it can in r10 or r11, or r12.	2024-04-05 10:18:22 +01:00
Jay Foad	7a0e222a17	Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905 )" This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c. It was causing test failures on the expensive check builders.	2024-03-07 08:20:26 +00:00
AtariDreams	2a13422b8b	Convert many LivePhysRegs uses to LiveRegUnits (#83905 )	2024-03-06 10:38:14 +05:30
Nikita Popov	b31fffbc7f	[ARM] Convert tests to opaque pointers (NFC)	2024-02-05 13:56:59 +01:00
PiJoules	a356e6ccad	[SelectionDAG] Expand fixed point multiplication into libcall (#79352 ) 32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom nodes during expansion which will cause fixed point multiplication of _Accum types to fail with fixed point arithmetic. Prior to this, we just happen to use fixed point multiplication on platforms that happen to support these MULHS/MUL_LOHI. This patch attempts to check if the multiplication can be done via libcalls, which are provided by the arm runtime. These libcall attempts are made elsewhere, so this patch refactors that libcall logic into its own functions and the fixed point expansion calls and reuses that logic.	2024-01-30 13:58:55 -08:00
Danial Klimkin	16df714e77	[test] Update stack_guard_remat.ll (#79139 ) Replace cp with a cat. This allows to create a writable file when the original one is read-only.	2024-01-23 14:48:33 +01:00
Fangrui Song	eaef645a58	[test] Update stack_guard_remat.ll	2024-01-22 14:18:23 -08:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Keith Walker	2d9c6e699a	[Thumb1] Use callee-saved register to adjust stack pointer When adjusting the Stack Pointer at the end of the function epilogue, use a callee-saved register, rather than explicitly using R4 which may not have been saved. Differential Revision: https://reviews.llvm.org/D157500	2023-08-17 18:29:50 +01:00
Simon Wallis	33b9634394	[ARM] v6-M XO: save CPSR around LoadStackGuard For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset. It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR. To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register. expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register. Use R12 as a scratch register, as is usual when expanding a pseudo. MSR/MRS are some of the few v6-M instructions which operate on a high register. New stack-guard test case added which was generating incorrect code without the save/restore CPSR. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D156968	2023-08-09 08:40:35 +01:00
John Brawn	f83ab2b3be	[ARM] Improve generation of thumb stack accesses Currently when a stack access is out of range of an sp-relative ldr or str then we jump straight to generating the offset with a literal pool load or mov32 pseudo-instruction. This patch improves that in two ways: * If the offset is within range of sp-relative add plus an ldr then use that. * When we use the mov32 pseudo-instruction, if putting part of the offset into the ldr will simplify the expansion of the mov32 then do so. Differential Revision: https://reviews.llvm.org/D156875	2023-08-07 17:53:32 +01:00
Jay Foad	945123384e	[PEI][ARM] Switch to backwards frame index elimination This adds better support for call frame pseudos that adjust SP in PEI::replaceFrameIndicesBackward. Running frame index elimination backwards is preferred because it can do backwards register scavenging (on targets that require scavenging) which does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156434	2023-07-28 17:32:51 +01:00
Ties Stuij	84f888ca82	[ARM] don't emit constant pool for Thumb1 XO/stack guard combo Currently for armv6-m and armv8-m.baseline, we emit constant pool code when we use execute-only (XO) in combination with stack guards. XO is a new feature for armv6-m, and this patch is part of a series of patches that substitutes constant pool generation with the tMOVi32imm equivalent. However XO for armv8-m.baseline has been available for about 6 years, and so for armv8-m.baseline this is a bugfix. Reviewed By: simonwallis2, olista01 Differential Revision: https://reviews.llvm.org/D155170	2023-07-19 13:51:43 +01:00
Ties Stuij	1f082d2da0	[ARM] make execute only long call test checks more robust Reviewed By: olista01 Differential Revision: https://reviews.llvm.org/D154355	2023-07-04 10:51:48 +01:00
Ties Stuij	4f19c6a7c7	[ARM] allow long-call codegen for armv6-M eXecute Only (XO) Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this was only implemented for ~armv7+, effectively if MOVW/MOVT is available. Regarding long calls, we remove the check for MOVW/MOVT when generating code for XO, which already was redundant as in the subtarget initialization we already check if XO is valid for the target. And targets that generate valid XO code should be able to handle the (wrapper globaladdress) node. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153782	2023-06-28 10:50:24 +01:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Fangrui Song	e018cbf720	[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data There are two motivations. `-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data. Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me. --- Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR. Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much. As a compromise, we set "direct-access-external-data" only when it's different from the implied default value. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D150841	2023-05-23 09:49:57 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Florian Hahn	4e2b4f97a0	[ShrinkWrap] Use underlying object to rule out stack access. Allow shrink-wrapping past memory accesses that only access globals or function arguments. This patch uses getUnderlyingObject to try to identify the accessed object by a given memory operand. If it is a global or an argument, it does not access the stack of the current function and should not block shrink wrapping. Note that the caller's stack may get accessed when passing an argument via the stack, but not the stack of the current function. This addresses part of the TODO from D63152. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D149668	2023-05-03 09:28:07 +01:00
Zhiyao Ma	1d0ccebcd7	[ARM] Don't allocate memory if free space in segmented stack is just enough Assuming that the stack grows downwards, it is fine if the stack pointer is exactly at the stacklet boundary. We should use less-or-equal condition when deciding whether to skip new memory allocation. Differential Revision: https://reviews.llvm.org/D149315	2023-05-02 13:09:49 +01:00
sgokhale	bb5befefc6	Revert "[CodeGen][ShrinkWrap] Split restore point" This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83. An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833	2023-04-13 10:52:28 +05:30
sgokhale	5f0bccc3d1	[CodeGen][ShrinkWrap] Split restore point This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index). Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change for others. Co-authored-by: junbuml Differential Revision: https://reviews.llvm.org/D42600	2023-04-11 11:58:50 +05:30
David Green	7abe3497e7	[LSR] Improve filtered uses in NarrowSearchSpaceByPickingWinnerRegs NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to reduce the complexity of the search space down by picking a best formula with the highest number of reuses and assuming it will yield profitable reuse. In certain cases we can find a best formula like {X+30,+,1} and later check a formula like {X,+,1} with the same number of Uses. On some architectures it can be better to pick {X,+,1}, especially if an offset of 30 can be used as a legal addressing mode, but -30 cannot. That happens under Thumb1 code, which has fairly limited addressing modes. This patch adds a check to see if it can pick the simpler formula, if it looks more profitable. Differential Revision: https://reviews.llvm.org/D144014	2023-02-16 15:48:12 +00:00
David Green	66749ce927	[ARM] Add Thumb LSR codegen tests. NFC This is the same routine generated in two different ways that ends up with different orders to loads. The first currently does better than the second with ordered loads, but needn't if the filtering in LSR is improved.	2023-02-16 14:24:51 +00:00
Nikita Popov	e6bf3fa05b	[Thumb] Convert tests to opaque pointers (NFC)	2022-12-19 13:05:05 +01:00
Roman Lebedev	6b3c2aed49	[NFC] Port codegen Thumb tests that invoke opt to `-passes=` syntax	2022-12-09 01:04:47 +03:00
Roman Lebedev	b1a9584818	[opt] Disincentivize new tests from using old pass syntax Over the past day or so, i've took a large swing at our tests, and reduced the number of tests that were still using the old syntax from ~1800 to just 200. Left to handle: (as it is seen in this patch) * Transforms/LSR * Transforms/CGP * Transforms/TypePromotion * Transforms/HardwareLoops * Analysis/* * some misc. I think this is the right point to start actively refusing to honor the old syntax, except for the old tests, to prevent the old syntax from creeping back in. Thus, let's add temporary default-off flag, and if it is not passed refuse to accept old syntax. The tests that still need porting are annotated with this flag. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D139647	2022-12-08 23:54:03 +03:00
Jonas Paulsson	5ecd363295	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." This reverts commit 122efef8ee9be57055d204d52c38700fe933c033. - Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now. Original commit message: A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-05 12:53:50 -06:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Daniel Thornburgh	75cdab6dc2	[llvm-objdump] Add --no-print-imm-hex to tests depending on it. This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.	2022-10-29 15:40:26 -07:00

1 2 3 4 5 ...

486 Commits