llvm-project

Author	SHA1	Message	Date
John Brawn	8336d38be9	[ARM] Correctly handle combining segmented stacks with execute-only Using segmented stacks with execute-only mostly works, but we need to use the correct movi32 opcode in 6-M, and there's one place where for thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally used which needed to be fixed. Differential Revision: https://reviews.llvm.org/D156339	2023-07-28 10:37:40 +01:00
Fangrui Song	845d83d85f	[test] Add --show-all-symbols to some llvm-objdump -d commands llvm-objdump -d will be changed to not display mapping symbols by default (D156190). Add --show-all-symbols to make the intent clearer and prevent test adjustment with the new behavior.	2023-07-27 19:33:51 -07:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
Jay Foad	6c8f4472b4	[ARM] Extend regression test for D154281 Add a test case with a larger call frame which does not satisfy ARMFrameLowering::hasReservedCallFrame.	2023-07-21 15:48:45 +01:00
Momchil Velikov	4c95f79cce	[CodeGenPrepare] Refactor optimizeSelectInst (NFC) Refactor to use BasicBlockUtils functions and make life easier for a subsequent patch for updating the dominator tree. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154053	2023-07-19 18:56:44 +01:00
John Brawn	cee7e7b245	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-19 13:56:36 +01:00
John Brawn	1b12b1a335	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-19 13:56:36 +01:00
Jay Foad	496766840f	[ARM] Add a regression test for D154281 This is a reduced version of one of the tests that was broken by the original commit of D154281 "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.". Differential Revision: https://reviews.llvm.org/D155471	2023-07-19 10:32:21 +01:00
John Brawn	343e204a52	[ARM] Replace TransferImpOps with copyImplicitOps In most places where TransferImpOps is currently used we just have one machine instruction, so it's doing the same thing as copyImplicitOps anyway. In those cases where we have more than one machine instruction the destination is written to in each instruction so any implicit defs should appear on all of them (and we shouldn't see any implicit refs as these pseudo-instruction don't have any register inputs), meaning the current use of TransferImpOps is incorrect and we should be using copyImplicitOps on all of the generated instructions. Differential Revision: https://reviews.llvm.org/D155301	2023-07-18 14:01:04 +01:00
Maurice Heumann	a1cdb323e2	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Recommitting after undefined behavior fix in D155093. Differential Revision: https://reviews.llvm.org/D153800	2023-07-14 12:54:18 -07:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.	2023-07-13 14:25:39 +01:00
Caslyn Tonelli	b11559122e	Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions" This reverts commit 647aff28558b6b1379f0892138059b403192512a. Differential Revision: https://reviews.llvm.org/D155122	2023-07-12 23:29:15 +00:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
John Brawn	210f61cbdd	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-12 11:48:01 +01:00
John Brawn	647aff2855	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-12 11:48:01 +01:00
Simon Wallis	82458ce69e	[ARM] mark tMOVi32imm as killing flags Mark the tMOVi32imm pseudo instr as killing the flags register. The pseudo instruction expands to a sequence of 7 movs/lsls/adds instructions, which are all Thumb-1 flag setting instructions. For a test case, take an existing arm test which checks for "Don't CSE a cmp across a call that clobbers CPSR." and retarget it at thumbv6m execute-only. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D154845 Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d	2023-07-11 14:42:07 +01:00
Ties Stuij	f0ae3c23b5	[ARM] in LowerConstantFP, make sure we cover armv6-m execute-only Currently in LowerConstantFP, when we compile for execute-only (XO) we don't check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get here for v6m, so put in an assert. Reviewed By: simonwallis2, dmgreen Differential Revision: https://reviews.llvm.org/D154506	2023-07-11 10:42:15 +01:00
Ties Stuij	d145abcfb3	[ARM] fix typo in large-stack.ll introduced when fixing another typo	2023-07-04 11:23:24 +01:00
Ties Stuij	61bcaae7ab	[ARM] fix typo in large-stack.ll test In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently often configured case-insensitive.	2023-07-04 11:18:25 +01:00
Ties Stuij	112d769e5e	[ARM] generate correct code for armv6-m XO big stack operations The ARM backend codebase is dotted with places where armv6-m will generate constant pools. Now that we can generate execute-only code for armv6-m, we need to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of these. Big stacks is one of the obvious places. In this patch we take care of two sites: 1. take care of big stacks in prologue/epilogue 2. take care of save/tSTRspi nodes, which implicitly fixes emitThumbRegPlusImmInReg which is used in several frame lowering fns Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D154233	2023-07-04 10:40:06 +01:00
David Spickett	ab3bb86d44	Revert "[ARM] Adjust strd/ldrd codegen alignment requirements" This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9. This has caused a test failure in the 2nd stage of Linaro's Arm 32 bit buildbots. LLVM::simplified-template-names.s 7: error: Simplified template DW_AT_name could not be reconstituted: check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: original: f3<unsigned char, (unsigned char)'\x00'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I suspect a load/store is slightly off.	2023-07-03 14:05:49 +00:00
Maurice Heumann	92a9c30c61	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Differential Revision: https://reviews.llvm.org/D153800	2023-07-02 14:25:25 -07:00
Fangrui Song	afd20587f9	MachineFunction: -fsanitize={function,kcfi}: ensure 4-byte alignment Fix https://github.com/llvm/llvm-project/issues/63579 ``` % cat a.c void foo() {} % clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - \| grep p2align .p2align 1 % clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - \| grep p2align .p2align 1 ``` Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at least 4, so that loading the type hash before the function label will not cause a misaligned access. This is especially important for -mno-unaligned-access configurations that don't set `setMinFunctionAlignment` to 4 or greater. With this patch, the generated assembly for the examples above will contain `.p2align 2` before the type hash. If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the larger alignment will be used. Reviewed By: simon_tatham, samitolvanen Differential Revision: https://reviews.llvm.org/D154125	2023-06-30 09:13:19 -07:00
Matt Arsenault	160d7227e0	DAG: Fix libcall expansion for frexp on ARM The ExpandLibcallResult result was a bitcast and not the direct call result, so we couldn't find the chain. Use the new separate chain return value instead.	2023-06-30 09:03:45 -04:00
Luke Lau	742fb8b5c7	[DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x) If we have a store of a load with no other uses in between it, it's considered dead and is removed. So sometimes when legalizing a fixed length vector store of an insert, we end up producing better code through scalarization than without. An example is the follow below: %a = load <4 x i64>, ptr %x %b = insertelement <4 x i64> %a, i64 %y, i32 2 store <4 x i64> %b, ptr %x If this is scalarized, then DAGCombine successfully removes 3 of the 4 stores which are considered dead, and on RISC-V we get: sd a1, 16(a0) However if we make the vector type legal (-mattr=+v), then we lose the optimisation because we don't scalarize it. This patch attempts to recover the optimisation for vectors by identifying patterns where we store a load with a single insert inbetween, replacing it with a scalar store of the inserted element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152276	2023-06-28 22:45:04 +01:00
John Brawn	4fb0e0114f	[ARM] Generate out-of-line jump tables for XO without 32-bit branch When we only have a 16-bit pc-relative branch instruction we generate a table of address for a jump table. Currently this is placed inline, but this won't work with execute-only memory. In this case generate the jump table out-of-line. Differential Revision: https://reviews.llvm.org/D153774	2023-06-28 13:30:39 +01:00
Ties Stuij	4f19c6a7c7	[ARM] allow long-call codegen for armv6-M eXecute Only (XO) Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this was only implemented for ~armv7+, effectively if MOVW/MOVT is available. Regarding long calls, we remove the check for MOVW/MOVT when generating code for XO, which already was redundant as in the subtarget initialization we already check if XO is valid for the target. And targets that generate valid XO code should be able to handle the (wrapper globaladdress) node. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D153782	2023-06-28 10:50:24 +01:00
Ties Stuij	03db28edbb	[ARM] in ExpandTMOV32BitImm, CPSR register ops should be `Define`d The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm were marked as kill, instead of define. Best to use the pre-existing t1CondCodeOp fn to construct CPSRs. Reviewed By: simonwallis2 Differential Revision: https://reviews.llvm.org/D153763	2023-06-27 14:58:22 +01:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Matthias Braun	759b217626	Switch tests to use update_llc_test_checks Switch and update some tests to use `update_llc_test_checks` to reduce clutter in upcoming change. Differential Revision: https://reviews.llvm.org/D152215	2023-06-26 13:50:36 -07:00
Maurice Heumann	249bd9eab0	[ARM] Fix codegen of unaligned volatile load/store of i64 Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE. However, these instructions require the addresses to be aligned. Unaligned loads/stores therefore should be ignored by this handling. Differential Revision: https://reviews.llvm.org/D152790	2023-06-26 10:45:41 -07:00
Eli Friedman	bc7f11ccb0	[SelectionDAG] Improve expansion of wide min/max The current implementation tries to handle the high and low halves separately, but that's less efficient in most cases; use a wide SETCC instead. Differential Revision: https://reviews.llvm.org/D151358	2023-06-26 10:45:41 -07:00
Amaury Séchet	8412a17b79	[NFC] Autogenerate CodeGen/ARM/2013-07-29-vector-or-combine.ll	2023-06-25 01:05:21 +00:00
Amaury Séchet	7457acb842	[NFC] Autogenerate CodeGen/ARM/2011-03-15-LdStMultipleBug.ll	2023-06-25 01:02:49 +00:00
Amaury Séchet	e271a539c5	[NFC] Autogenerate CodeGen/ARM/and-sext-combine.ll	2023-06-25 00:55:03 +00:00
Amaury Séchet	78c1985f99	[NFC] Autogenerate CodeGen/ARM/machine-cse-cmp.ll	2023-06-25 00:44:30 +00:00
Amaury Séchet	2e8111d4c4	[NFC] Autogenerate CodeGen/ARM/pr35103.ll	2023-06-25 00:29:14 +00:00
Ties Stuij	5ddd561cb5	disable execute-only tests which are failing with expensive checks Temporarily disabling the execute-only tests. We recently added codegen for armv6-m, which is still in heavy development (D152795). Disabling the tests while we're figuring out what's going on is probably the least disruptive option, as a patch dependent on it also already landed.	2023-06-23 16:35:24 +01:00
Ties Stuij	2273741ea2	[ARM] generate armv6m eXecute Only (XO) code [ARM] generate armv6m eXecute Only (XO) code for immediates, globals Previously eXecute Only (XO) support was implemented for targets that support MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449 XO prevents the compiler from generating data accesses to code sections. This patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and must resort to the following general pattern to avoid loads: movs r3, :upper8_15:foo lsls r3, #8 adds r3, :upper0_7:foo lsls r3, #8 adds r3, :lower8_15:foo lsls r3, #8 adds r3, :lower0_7:foo ldr r3, [r3] This is equivalent to the code pattern generated by GCC. The above relocations are new to LLVM and have been implemented in a parent patch: https://reviews.llvm.org/D149443. This patch limits itself to implementing codegen for this pattern and enabling XO for armv6-M in the backend. Separate patches will follow for: - switch tables - replacing specific loads from constant islands which are spread out over the ARM backend codebase. Amongst others: FastISel, call lowering, stack frames. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D152795	2023-06-23 10:50:47 +01:00
Fangrui Song	bef8294650	[XRay] Make xray_instr_map compatible with Mach-O The `__DATA,xray_instr_map` section has label differences like `.quad Lxray_sled_0-Ltmp0` that is represented as a pair of UNSIGNED and SUBTRACTOR relocations. LLVM integrated assembler attempts to rewrite A-B into A-B'+offset where B' can be included in the symbol table. B' is called an atom and should be a non-temporary symbol in the same section. However, since `xray_instr_map` does not define a non-temporary symbol, the SUBTRACTOR relocation will have no associated symbol, and its `r_extern` value will be 0. Therefore, we will see linker errors like: error: SUBTRACTOR relocation must be extern at offset 0 of __DATA,xray_instr_map in a.o To fix this issue, we need to define a non-temporary symbol in the section. We can accomplish this by renaming `Lxray_sleds_start0` to `lxray_sleds_start0` ("L" to "l"). `lxray_sleds_start0` serves as the atom for this dead-strippable subsection. With the `S_ATTR_LIVE_SUPPORT` attribute, `ld -dead_strip` will retain subsections that reference live functions. Special thanks to Oleksii Lozovskyi for reporting the issue and providing initial analysis. Differential Revision: https://reviews.llvm.org/D153239	2023-06-22 10:03:17 -07:00
David Green	400b3c47c2	[ARM] Repair check lines in sub-cmp-peephole.ll test. NFC Commit ec77747fbdca901e0fded58f940dae62e0f6b726 regenerated the check lines without being very careful about which lines were updated. This attempts to fix them to make sure the V7 and V8 lines are emitted as needed.	2023-06-21 22:47:30 +01:00
Matt Arsenault	80e2c26dfd	RegisterCoalescer: Fix name of pass I finally snapped and fixed this inconsistency.	2023-06-21 10:30:43 -04:00
Fangrui Song	e0a6561ec9	[XRay] Make xray_fn_idx entries PC-relative As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020), PC-relative entries avoid dynamic relocations and can therefore make the section read-only. This is similar to D78082 and D78590. We cannot commit to support compiler/runtime built at different versions, so just don't play with versions. For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+` symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`. A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d). Differential Revision: https://reviews.llvm.org/D152661	2023-06-20 22:40:56 -07:00
Fangrui Song	b9a134aa62	[XRay] Mark Mach-O xray_instr_map and xray_fn_idx as S_ATTR_LIVE_SUPPORT Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip` will retain subsections that reference live functions, once we we add linker private "l" symbols as atoms.	2023-06-18 19:30:16 -07:00
Fangrui Song	49b61ead47	[XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes	2023-06-18 13:32:40 -07:00
Amaury Séchet	c8f4ba374b	[NFC] Autogenerate CodeGen/ARM/vlddup.ll	2023-06-16 15:35:47 +00:00
Amaury Séchet	ec77747fbd	[NFC] Autogenerate CodeGen/ARM/sub-cmp-peephole.ll	2023-06-16 15:14:47 +00:00
Simon Tatham	10e4228114	[ARM,AArch64] Add a full set of -mtp= options. AArch64 has five system registers intended to be useful as thread pointers: one for each exception level which is RW at that level and inaccessible to lower ones, and the special TPIDRRO_EL0 which is readable but not writable at EL0. AArch32 has three, corresponding to the AArch64 ones that aren't specific to EL2 or EL3. Currently clang supports only a subset of these registers, and not even a consistent subset between AArch64 and AArch32: - For AArch64, clang permits you to choose between the four TPIDR_ELn thread registers, but not the fifth one, TPIDRRO_EL0. - In AArch32, on the other hand, the //only// thread register you can choose (apart from 'none, use a function call') is TPIDRURO, which corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0. So there is no thread register that you can currently use in both targets! For custom and bare-metal purposes, users might very reasonably want to use any of these thread registers. There's no reason they shouldn't all be supported as options, even if the default choices follow existing practice on typical operating systems. This commit extends the range of values acceptable to the `-mtp=` clang option, so that you can specify any of these registers by (the lower-case version of) their official names in the ArmARM: - For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3 - For AArch32: tpidrurw, tpidruro, tpidrprw All existing values of the option are still supported and behave the same as before. Defaults are also unchanged. No command line that worked already should change behaviour as a result of this. The new values for the `-mtp=` option have been agreed with Arm's gcc developers (although I don't know whether they plan to implement them in the near future). Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D152433	2023-06-15 09:27:41 +01:00
Amaury Séchet	a70d5e25f3	[DAGCombine] Make sure combined nodes are added back to the worklist in topological order. Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127115	2023-06-13 09:14:37 +00:00

1 2 3 4 5 ...

4780 Commits