llvm-project

Author	SHA1	Message	Date
David Green	8a701024f3	[ARM] Lower i1 concat via MVETRUNC The MVETRUNC operation can perform the same truncate of two vectors, without requiring lane inserts/extracts from every vector lane. This moves the concat i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra stack space for less instructions.	2023-10-18 19:40:11 +01:00
Nikita Popov	a72d88fb4f	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 8840da2db237cd714d975c199d5992945d2b71e9. This results in verifier failures during LTO, see #68929.	2023-10-16 12:17:24 +02:00
weiguozhi	b6043f9867	[RA] Disable split around hint register if optimize for size (#68619 ) Split a virtual register with hint may generate COPY instructions in multiple cold basic blocks, and increase code size. So disable this split when the function is optimized for size.	2023-10-11 14:57:15 -07:00
Nikita Popov	8840da2db2	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply now that generation of incorrect debuginfo for FnDef in rustc has been fixed. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-10-09 14:22:12 +02:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Fangrui Song	d20190e684	[test] Change llc -march=aarch64\|arm64 to -mtriple=aarch64\|arm64 Similar to commit 806761a7629df268c8aed49657aeccffa6bca449 to avoid issues due to object file format differences. These tests are currently benign.	2023-09-29 10:13:06 -07:00
Tobias Stadler	305fbc1b32	Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND" This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721	2023-09-29 03:35:40 +02:00
Tobias Stadler	3686a0b611	[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140	2023-09-29 02:11:57 +02:00
Jay Foad	fb32baf0ec	[ARM] Make some test checks more robust This makes some tests robust against minor codegen differences that will be caused by PR #67038.	2023-09-28 14:26:13 +01:00
Douglas Yung	6716d3dd77	Move test split-deadloop.mir that was added in e3d714f to AArch64 directory instead of ARM.	2023-09-26 09:51:47 -07:00
weiguozhi	31f81e96a4	[RA] Don't split a register generated from another split (#67351 ) Split a register generated from another split usually doesn't bring us too much benefit. It may also cause dead loop as pr67188 shows if the heuristic cost always satisfy the split condition. So prevent such splitting. It fixed pr67188.	2023-09-26 08:38:18 -07:00
Muhammad Omair Javaid	431969ede1	Revert "[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 )" This reverts commit fc86d031fec5e47c6811efd3a871742ad244afdd. This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage https://lab.llvm.org/buildbot/#/builders/176/builds/5474 I am going to revert this patch as the bot has been failing for more than a day without a fix.	2023-09-26 15:47:16 +05:00
XChy	fc86d031fe	[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 ) This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to handle the similar cases below. ```llvm define i8 @src(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %case012 i8 1, label %case1 i8 2, label %case2 i8 3, label %end ] unreachable: unreachable case1: br label %case012 case2: br label %case012 case012: %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] br label %end end: %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ] ret i8 %phi2 } ``` The phis here should be merged into one phi, so that we can better optimize it: ```llvm define i8 @tgt(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %end i8 1, label %case1 i8 2, label %case2 i8 3, label %case3 ] unreachable: unreachable case1: br label %end case2: br label %end case3: br label %end end: %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] ret i8 %phi } ``` Proof: [normal](https://alive2.llvm.org/ce/z/vAWi88) [multiple stages](https://alive2.llvm.org/ce/z/DDBQqp) [multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN) [multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp) And lookup table optimization should convert it into add %arg 1. This patch just match similar CFG structure and merge the phis in different cases. Maybe such transform can be applied to other situations besides switch, but I'm not sure whether it's better than not merging. Therefore, I only try it in switch, Related issue: #63876 [Migrated](https://reviews.llvm.org/D155940)	2023-09-25 10:13:45 +08:00
Matt Harding	64d1ceaa38	Add command line option --no-trap-after-noreturn (#67051 ) Add the command line option --no-trap-after-noreturn, which exposes the pre-existing TargetOption `NoTrapAfterNoreturn`. This pull request was split off from this one: https://github.com/llvm/llvm-project/pull/65876	2023-09-22 22:03:21 +02:00
Jon Roelofs	83e6d2edfc	Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 )" This reverts commit 003bcad9a8b21e15e3786a52b1dafa844075ab84. ARM folks say it regresses some of their benchmarks: https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162	2023-09-18 09:45:46 -07:00
Nikita Popov	38c59b9f53	Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 47324cfd7d8ca1a2a5cbb9f948ecff66a28ee6bc. This exposed incorrect debuginfo in rustc. Revert the verification until this has been fixed.	2023-09-18 17:24:53 +02:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Jon Roelofs	003bcad9a8	[ARM] Always lower direct calls as direct when the outliner is enabled (#66434 ) The indirect lowering hinders the outliner's ability to see that sequences are in fact common, since the sequence similarity is rendered opaque by the register callee. The size savings from making them indirect seems to be dwarfed by the outliner's savings from de-duplication. rdar://115178034 rdar://115459865	2023-09-15 10:04:56 -07:00
Nikita Popov	47324cfd7d	Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size Reapply after fixing a clang bug this exposed in D158972 and adjusting a number of tests that failed for 32-bit targets. ----- Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-09-15 14:51:50 +02:00
Allen	347b3f1209	[ARM][ISel] Fix crash of ISD::FMINNUM/FMAXNUM (#65849 ) The instruction of ISD::FMINNUM/FMAXNUM should be legal if HasFPARMv8 && HasNEON. For the combination of armv7+fp-armv8, armv7 imply the feature HasNEON on, and fp-armv8 matchs the feature HasFPARMv8, so it is legal Fixes https://github.com/llvm/llvm-project/issues/65820	2023-09-14 10:35:07 +08:00
David Green	a82c106e57	[ARM] Change CRC predicate to just HasCRC This removes the backend requirement for crc instructions on HasV8, relying on just HasCRC instead. This should allow them to be selected with ArmV7 + crc, making them more usable whilst hopefully not making them incorrectly generated (they only come from intrinsics, and HasCRC usually requires HasV8). This is how most other instructions are specified.	2023-09-08 09:02:15 +01:00
Matt Arsenault	b14e83d1a4	IR: Add llvm.exp10 intrinsic We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871	2023-09-01 19:45:03 -04:00
Nikita Popov	98cf20f890	Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size" This reverts commit 183f49c3e0f4a7facf237581f83ae07e7f4544ab. The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on buildbots.	2023-08-28 09:44:51 +02:00
Nikita Popov	183f49c3e0	[Verifier] Sanity check alloca size against DILocalVariable fragment size Add a check that the DILocalVariable fragment size in dbg.declare does not exceed the size of the alloca. This would have caught the invalid debuginfo regenerated by rustc in https://github.com/llvm/llvm-project/issues/64149. Differential Revision: https://reviews.llvm.org/D158743	2023-08-28 09:16:33 +02:00
Oliver Stannard	40614e1c14	[ARM] Save and restore CPSR around tMOVimm32 When resolving a frame index with a large offset for v6M execute-only, we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a sequence of instructions, all of which are flag-setting. However, a frame index may be generated for a register spill or reload instruction, which can be inserted at a point where CPSR is live. This patch inserts MRS and MSR instructions around the tMOVimm32 to save and restore the value of CPSR, if CPSR is live at that point. This may need up to two virtual registers (one to build the immediate value, one to save CPSR) during frame index lowering, which happens after register allocation, so we need to ensure two spill slots are avilable to the register scavenger to ensure it can free up enough registers for this. There is no test for the emission (or not) of the MRS/MSR pair, because it requires a spill or reload to be inserted at a point where CPSR is live, which requires a large, complex function and is fragile enough that any optimisation changes will break the test. This bug was easily found by csmith with -verify-machineinstrs, which I now run regularly on v6M execute-only (and many other combinations). Patch by John Brawn and myself. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D158404	2023-08-24 14:15:02 +01:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Keith Walker	2d9c6e699a	[Thumb1] Use callee-saved register to adjust stack pointer When adjusting the Stack Pointer at the end of the function epilogue, use a callee-saved register, rather than explicitly using R4 which may not have been saved. Differential Revision: https://reviews.llvm.org/D157500	2023-08-17 18:29:50 +01:00
Nicholas Guy	d65feccb12	[ARM] Set preferred function alignment Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited. Differential Revision: https://reviews.llvm.org/D157514	2023-08-16 17:31:21 +01:00
Matt Arsenault	c8cac15613	PreISelIntrinsicLowering: Check RuntimeLibcalls instead of TLI for memory functions We need a better mechanism for expressing which calls you are allowed to emit and which calls are recognized. This should be applied to the 17 branch.	2023-08-10 16:40:04 -04:00
John Brawn	f83ab2b3be	[ARM] Improve generation of thumb stack accesses Currently when a stack access is out of range of an sp-relative ldr or str then we jump straight to generating the offset with a literal pool load or mov32 pseudo-instruction. This patch improves that in two ways: * If the offset is within range of sp-relative add plus an ldr then use that. * When we use the mov32 pseudo-instruction, if putting part of the offset into the ldr will simplify the expansion of the mov32 then do so. Differential Revision: https://reviews.llvm.org/D156875	2023-08-07 17:53:32 +01:00
Francesco Petrogalli	cd921e0fd7	[MISched] Do not erase resource booking history for subunits. When dealing with the subunits of a resource group, we should reset the subunits availability at the first avaiable cycle of the resource that contains the subunits. Previously, the reset operation was returning cycle 0, effectively erasing the booking history of the subunits. Without this change, when using intervals for models have make use of subunits, the erasing of resource booking for subunits can raise the assertion "A resource is being overwritten" in `ResourceSegments::add`. The test added in the patch is one of such cases. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D156530	2023-08-01 14:00:37 +02:00
John Brawn	8336d38be9	[ARM] Correctly handle combining segmented stacks with execute-only Using segmented stacks with execute-only mostly works, but we need to use the correct movi32 opcode in 6-M, and there's one place where for thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally used which needed to be fixed. Differential Revision: https://reviews.llvm.org/D156339	2023-07-28 10:37:40 +01:00
Fangrui Song	845d83d85f	[test] Add --show-all-symbols to some llvm-objdump -d commands llvm-objdump -d will be changed to not display mapping symbols by default (D156190). Add --show-all-symbols to make the intent clearer and prevent test adjustment with the new behavior.	2023-07-27 19:33:51 -07:00
Jay Foad	2dcf051259	[CodeGen] Store call frame size in MachineBasicBlock Record the call frame size on entry to each basic block. This is usually zero except when a basic block has been split in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156113	2023-07-27 10:32:00 +01:00
Jay Foad	6c8f4472b4	[ARM] Extend regression test for D154281 Add a test case with a larger call frame which does not satisfy ARMFrameLowering::hasReservedCallFrame.	2023-07-21 15:48:45 +01:00
Momchil Velikov	4c95f79cce	[CodeGenPrepare] Refactor optimizeSelectInst (NFC) Refactor to use BasicBlockUtils functions and make life easier for a subsequent patch for updating the dominator tree. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154053	2023-07-19 18:56:44 +01:00
John Brawn	cee7e7b245	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-19 13:56:36 +01:00
John Brawn	1b12b1a335	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-19 13:56:36 +01:00
Jay Foad	496766840f	[ARM] Add a regression test for D154281 This is a reduced version of one of the tests that was broken by the original commit of D154281 "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.". Differential Revision: https://reviews.llvm.org/D155471	2023-07-19 10:32:21 +01:00
John Brawn	343e204a52	[ARM] Replace TransferImpOps with copyImplicitOps In most places where TransferImpOps is currently used we just have one machine instruction, so it's doing the same thing as copyImplicitOps anyway. In those cases where we have more than one machine instruction the destination is written to in each instruction so any implicit defs should appear on all of them (and we shouldn't see any implicit refs as these pseudo-instruction don't have any register inputs), meaning the current use of TransferImpOps is incorrect and we should be using copyImplicitOps on all of the generated instructions. Differential Revision: https://reviews.llvm.org/D155301	2023-07-18 14:01:04 +01:00
Maurice Heumann	a1cdb323e2	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Recommitting after undefined behavior fix in D155093. Differential Revision: https://reviews.llvm.org/D153800	2023-07-14 12:54:18 -07:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.	2023-07-13 14:25:39 +01:00
Caslyn Tonelli	b11559122e	Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions" This reverts commit 647aff28558b6b1379f0892138059b403192512a. Differential Revision: https://reviews.llvm.org/D155122	2023-07-12 23:29:15 +00:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
John Brawn	210f61cbdd	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-12 11:48:01 +01:00
John Brawn	647aff2855	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-12 11:48:01 +01:00
Simon Wallis	82458ce69e	[ARM] mark tMOVi32imm as killing flags Mark the tMOVi32imm pseudo instr as killing the flags register. The pseudo instruction expands to a sequence of 7 movs/lsls/adds instructions, which are all Thumb-1 flag setting instructions. For a test case, take an existing arm test which checks for "Don't CSE a cmp across a call that clobbers CPSR." and retarget it at thumbv6m execute-only. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D154845 Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d	2023-07-11 14:42:07 +01:00
Ties Stuij	f0ae3c23b5	[ARM] in LowerConstantFP, make sure we cover armv6-m execute-only Currently in LowerConstantFP, when we compile for execute-only (XO) we don't check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get here for v6m, so put in an assert. Reviewed By: simonwallis2, dmgreen Differential Revision: https://reviews.llvm.org/D154506	2023-07-11 10:42:15 +01:00

1 2 3 4 5 ...

4812 Commits