llvm-project

Author	SHA1	Message	Date
Hua Tian	a9d2834508	[llvm][CodeGen] Fix the issue caused by live interval checking in window scheduler (#123184 ) At some corner cases, the cloned MI still retains an old slot index, which leads to the compiler crashing. This patch update the slot index map before delete the recycled MI. https://github.com/llvm/llvm-project/issues/123165	2025-01-23 09:39:03 +08:00
Venkata Ramanaiah Nalamothu	f7d8336a2f	[llvm] Pass MachineInstr flags to storeRegToStackSlot/loadRegFromStackSlot (NFC) (#120622 ) This patch is in preparation to enable setting the MachineInstr::MIFlag flags, i.e. FrameSetup/FrameDestroy, on callee saved register spill/reload instructions in prologue/epilogue. This eventually helps in setting the prologue_end and epilogue_begin markers more accurately. The DWARF Spec in "6.4 Call Frame Information" says: The code that allocates space on the call frame stack and performs the save operation is called the subroutine’s prologue, and the code that performs the restore operation and deallocates the frame is called its epilogue. which means the callee saved register spills and reloads are part of prologue (a.k.a frame setup) and epilogue (a.k.a frame destruction), respectively. And, IIUC, LLVM backend uses FrameSetup/FrameDestroy flags to identify instructions that are part of call frame setup and destruction. In the trunk, while most targets consistently set FrameSetup/FrameDestroy on save/restore call frame information (CFI) instructions of callee saved registers, they do not consistently set those flags on the actual callee saved register spill/reload instructions. I believe this patch provides a clean mechanism to set FrameSetup/FrameDestroy flags on the actual callee saved register spill/reload instructions as needed. And, by having default argument of MachineInstr::NoFlags for Flags, this patch is a NFC. With this patch, the targets have to just pass FrameSetup/FrameDestroy flag to the storeRegToStackSlot/loadRegFromStackSlot calls from the target derived spillCalleeSavedRegisters and restoreCalleeSavedRegisters to set those flags on callee saved register spill/reload instructions. Also, this patch makes it very easy to set the source line information on callee saved register spill/reload instructions which is needed by the DwarfDebug.cpp implementation to set prologue_end and epilogue_begin markers more accurately. As per DwarfDebug.cpp implementation: prologue_end is the first known non-DBG_VALUE and non-FrameSetup location that marks the beginning of the function body epilogue_begin is the first FrameDestroy location that has been seen in the epilogue basic block With this patch, the targets have to just do the following to set the source line information on callee saved register spill/reload instructions, without hampering the LLVM's efforts to avoid adding source line information on the artificial code generated by the compiler. <Foo>InstrInfo::storeRegToStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameSetup ? DebugLoc() : MBB.findDebugLoc(I); ... } <Foo>InstrInfo::loadRegFromStackSlot() { ... DebugLoc DL = Flags & MachineInstr::FrameDestroy ? MBB.findDebugLoc(I) : DebugLoc(); ... } While I understand this patch would break out-of-tree backend builds, I think it is in the right direction. One immediate use case that can benefit from this patch is fixing #120553 becomes simpler.	2025-01-22 13:36:39 +05:30
Craig Topper	4a486e773e	[CodeGen] Use Register/MCRegister::isPhysical. NFC	2025-01-18 23:37:03 -08:00
Shubham Sandeep Rastogi	ee1c852252	[DebugInfo][InstrRef] Treat ORRWrr as a copy instr (#123102 ) The insturction selector uses the `MachineFunction::copySalvageSSA` function to insert `DBG_PHIs` or identify a defining instruction for a copy-like instruction when finalizing Instruction References. AArch64 has the ORR instruction which is a logical OR with the variants ORRWrr which refers to a register to register variant, and ORRWrs which is a register to a shifted register variant. An ORRWrs where the shift amount is 0, and the zero register ($wzr) is used is considered a copy, for example: `$w0 = ORRWrs $wzr, killed $w3, 0` However an ORRWrr with a zero register is not considered a copy `$w0 = ORRWrr $wzr, killed $w3` This causes an issue in the livedebugvalues pass because in aarch64-isel the instruction is the ORRWrr variant, but is then changed to the ORRWrs variant before the livedebugvalues pass. This causes a mismatch between the two passes which leads to a crash in the livedebugvalues pass. This patch fixes the issue.	2025-01-17 09:27:36 -08:00
Oliver Stannard	e31c70d9fa	[AArch64] Add immediate range checks for more MTE instructions (#119216 ) This would have turned the bug fixed in #117146 from a miscompilation into an assertion failure.	2024-12-16 10:35:28 +00:00
Benjamin Maxwell	83c7784c35	[AArch64] Don't emit Neon in streaming[-compatible] functions with -fzero-call-used-regs (#116995 ) Previously, with `-fzero-call-used-regs` clang/LLVM would incorrectly emit Neon instructions in streaming functions, and streaming-compatible functions without SVE. With this change: * In streaming functions, Z/p registers will be zeroed * In streaming compatible functions w/o SVE, D registers will be zeroed - (As Neon vector instructions are illegal including `movi v..`)	2024-11-21 11:02:07 +00:00
David Green	d7263d6d6d	[AArch64] Use second reg class in genSubAdd2SubSub machine combine. In case the first operand is a physical register with no register class, use the second operand of the sub as the register class for the new virtual register in genSubAdd2SubSub machine combine.	2024-11-13 09:22:08 +00:00
Anatoly Trosinenko	44076c9822	[AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (#110705 ) Move the emission of the checks performed on the authenticated LR value during tail calls to AArch64AsmPrinter class, so that different checker sequences can be reused by pseudo instructions expanded there. This adds one more option to AuthCheckMethod enumeration, the generic XPAC variant which is not restricted to checking the LR register.	2024-11-12 18:27:19 +03:00
Kazu Hirata	a41922ad75	[AArch64] Remove unused includes (NFC) (#115685 ) Identified with misc-include-cleaner.	2024-11-11 07:35:08 -08:00
Maurice Heumann	607c525110	[ARM64] [Windows] Mark block address as taken when expanding catchrets (#109252 ) This fixes issue #109250 The issue happens during the `MachineBlockPlacement` pass. The block, whose address was previously not taken, is deemed redundant by the pass and subsequently replaced using `MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`. ReplaceUsesOfBlockWith only replaces uses in the terminator. However, `expandPostRAPseudo` introduces new block uses when expanding catchrets. These uses do not get replaced, which results in undefined label errors later on. Marking the block addresss as taken prevents the replacement of the block, without also replacing non-terminator uses.	2024-09-30 11:14:38 -07:00
Sander de Smalen	91a3c6f3d6	[AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396 ) This removes a redundant 'COPY' instruction that #81716 probably forgot to remove. This redundant COPY led to an issue because because code in LiveRangeSplitting expects that the instruction emitted by `loadRegFromStackSlot` is an instruction that accesses memory, which isn't the case for the COPY instruction.	2024-09-05 17:54:57 +01:00
Kyungwoo Lee	140381d4bf	[MachineOutliner][NFC] Remove unnecessary RepeatedSequenceLocs.clear() (#106171 ) - When `getOutliningCandidateInfo()` returns `std::nullopt` (meaning no `OutlinedFunction` is created), there is no need to clear the input argument, `RepeatedSequenceLocs`, as it's already being cleared in the main loop of `findCandidates()`. - Replaced `2` by `MinRepeats`, which I missed from https://github.com/llvm/llvm-project/pull/105398	2024-08-28 07:09:54 -07:00
zhongyunde 00443407	e5a5ac0c23	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2024-08-28 14:15:09 +08:00
Kyungwoo Lee	93b8d07a75	[MachineOutliner][NFC] Refactor (#105398 ) This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-08-27 14:38:36 -07:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Craig Topper	7e6b1504c7	[AArch64] Pass DebugLoc by reference to AArch64InstrInfo::copyGPRRegTuple. NFC	2024-08-25 22:20:58 -07:00
Craig Topper	b12d338c17	[AArch64] Use MCRegister in AArch64InstrInfo::copyGPRRegTuple interface. NFC This matches copyPhysReg.	2024-08-25 22:11:31 -07:00
Thurston Dang	324b676a3d	Revert "[AArch64] Fold more load.x into load.i with large offset" This reverts commit 43ffe2eed0d9f73789dbe213023733d164999306. Reason: buildbot breakage starting at https://lab.llvm.org/buildbot/#/builders/85/builds/1102 I manually bisected and found that clang crashed with 43ffe2eed0d9f73789dbe213023733d164999306 but not the immediately preceding commit (33190490c667aaf8b08d5af8b8ce84524f856e80)	2024-08-16 22:32:12 +00:00
Kazu Hirata	dca820951c	[llvm] Use llvm::any_of (NFC) (#104443 )	2024-08-15 17:59:10 -07:00
zhongyunde 00443407	43ffe2eed0	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2024-08-15 18:22:52 +08:00
zhongyunde 00443407	33190490c6	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917 Note: This PR is try relanding the commit 32878c2065 with fix crash for PR79756 this crash is exposes when there is MOVKWi instruction in the head of a block, but without MOVZWi	2024-08-15 18:22:52 +08:00
David Green	36231a5b55	[AArch64] Add verification for MemOp immediate ranges (#97561 ) This adds an implementation of AArch64InstrInfo::verifyInstruction for AArch64, and adds some basic verification of the range of immediate ranges of memory operations using the information from getMemOpInfo. Some extra memory operations have been added to getMemOpInfo, along with the equivalent opcodes to getLoadStoreImmIdx to ensure we use the correct index. Please let us know if this starts reporting verification failures, Thanks.	2024-08-15 11:20:20 +01:00
David Green	a3cf8642bf	[AArch64] Cleanup existing values in getMemOpInfo (#98196 ) This patch tries to clean up some of the existing values in getMemOpInfo. All values should now be in bytes (not bits), and the MinOffset/MaxOffset are now always represented unscaled (the immediate that will be present in the final instruction). Although I could not find a place where it altered codegen, the offset of a post-index instruction will be 0, not scale*imm. A IsPostIndexLdStOpcode method has been added to try and make sure that case is handled properly.	2024-08-03 12:31:10 +01:00
Momchil Velikov	461126c29c	[AArch64] Fix incorrectly getting the destination reg of an insn (#101205 ) This popped up while investigating https://github.com/llvm/llvm-project/issues/96950 In a few places where we need the destination reg of an instruction we were using a call that worked only by accident.	2024-08-02 15:43:28 +01:00
Daniil Kovalev	56fd2472d8	[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552 ) For pauthtest ABI, there is a bunch of ptrauth-* options, including ptrauth-returns. Use "ptrauth-returns" function attribute to indicate need for LR signing with B key for non-leaf function to avoid using "sign-return-address" and "sign-return-address-key" which were originally designed for pac-ret. Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org> Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>	2024-07-25 22:21:03 +03:00
Matt Arsenault	3cb5604d2c	MachineOutliner: Use PM to query MachineModuleInfo (#99688 ) Avoid getting this from the MachineFunction	2024-07-24 13:22:56 +04:00
David Green	0d7403184d	[AArch64] Add a AArch64InstrInfo::isFpOrNEON method for checking physical register call. NFC	2024-07-15 08:13:52 +01:00
David Green	d3cb277ea3	[AArch64] Rearrange Opcodes in getMemOpInfo. NFC This just changes the order of the opcodes and fields in getMemOpInfo, none of the values are altered.	2024-07-08 23:05:50 +01:00
Nikita Popov	4169338e75	[IR] Don't include Module.h in Analysis.h (NFC) (#97023 ) Replace it with a forward declaration instead. Analysis.h is pulled in by all passes, but not all passes need to access the module.	2024-06-28 14:30:47 +02:00
Sander de Smalen	c436649313	[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543 ) I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.	2024-06-25 13:27:06 +01:00
Sander de Smalen	62baf21daa	[AArch64] Check for streaming mode in HasSME* features. (#96302 ) This also fixes up some asserts in copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot.	2024-06-24 20:12:31 +01:00
Momchil Velikov	6ec02f7316	[AArch64] Refactor redundant PTEST optimisations (NFC) (#87802 ) This patch refactors `AArch64InstrInfo::optimizePTestInstr` to simplify the convoluted conditions and control flow and make it easier to add the optimisation in https://github.com/llvm/llvm-project/pull/81141	2024-06-18 08:00:59 +01:00
Nikita Popov	db08b0999d	[ARM][AArch64] Bail out if CandidatesWithoutStackFixups is empty (#95410 ) The following code assumes that RepeatedSequenceLocs is non-empty. Bail out if there are less than 2 candidates left, as no outlining is possible in that case. The same check is already present in all the other places where elements from RepeatedSequenceLocs may be dropped. This fixes the issue reported at: https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716	2024-06-14 09:29:21 +02:00
Kerry McLaughlin	ea6577a74b	[AArch64][SME] Disable outlining for functions with streaming-mode changes (#95132 )	2024-06-12 10:35:29 +01:00
Yuta Mukai	0c5319e546	[ModuloSchedule][AArch64] Implement modulo variable expansion for pipelining (#65609 ) Modulo variable expansion is a technique that resolves overlap of variable lifetimes by unrolling. The existing implementation solves it by making a copy by move instruction for processors with ordinary registers such as Arm and x86. This method may result in a very large number of move instructions, which can cause performance problems. Modulo variable expansion is enabled by specifying -pipeliner-mve-cg. A backend must implement some newly defined interfaces in PipelinerLoopInfo. They were implemented for AArch64. Discourse thread: https://discourse.llvm.org/t/implementing-modulo-variable-expansion-for-machinepipeliner	2024-06-12 10:27:35 +09:00
Nikita Popov	1c9f4d4b6f	[ARM] Avoid reference into modified vector (#93965 ) FirstCand is a reference to RepeatedSequenceLocs[0]. However, that vector is being modified a lot throughout the function, including one place that reassigns the whole vector. I'm not sure whether this can really happen in practice, but it doesn't seem unlikely that this could lead to a use-after-free. Avoid this by directly using RepeatedSequenceLocs[0] at the start of the function (as a lot of other places already do) and only creating FirstCand at the end where no more modifications take place.	2024-06-03 17:10:35 +02:00
Sander de Smalen	b71434f8b3	[AArch64] Avoid NEON ORR when NEON and SVE are unavailable (#93940 ) For streaming-compatible functions with only +sme, we can't use a NEON ORR (aliased as 'mov') for copies of Q-registers, so we need to use a spill/fill instead. This also fixes the fill, which should use the post-incrementing addressing mode.	2024-06-03 09:22:21 +01:00
Ahmed Bougacha	cc548ec47c	[AArch64][PAC] Lower authenticated calls with ptrauth bundles. (#85736 ) This adds codegen support for the "ptrauth" operand bundles, which can be used to augment indirect calls with the equivalent of an `@llvm.ptrauth.auth` intrinsic call on the call target (possibly preceded by an `@llvm.ptrauth.blend` on the auth discriminator if applicable.) This allows the generation of combined authenticating calls on AArch64 (in the BLRA* PAuth instructions), while avoiding the raw just-authenticated function pointer from being exposed to attackers. This is done by threading a PtrAuthInfo descriptor through the call lowering infrastructure, eventually selecting a BLRA pseudo. The pseudo encapsulates the safe discriminator computation, which together with the real BLRA* call get emitted in late pseudo expansion in AsmPrinter. Note that this also applies to the other forms of indirect calls, notably invokes, rvmarker, and tail calls. Tail-calls in particular bring some additional complexity, with the intersecting register constraints of BTI and PAC discriminator computation. However this doesn't currently support PAuth_LR tail-call variants. This also adopts an x8+ allocation order for GPR64noip, matching GPR64.	2024-05-31 14:08:10 -07:00
Paul Walker	37c6b9ff72	[NFC][LLVM] Mainly whitespace changes. Also marks AliasSetTracker::size() as const.	2024-05-17 10:37:26 +00:00
Antonio Frighetto	23b6709c72	[AArch64] Drop poison-generating flags in `genSubAdd2SubSub` combiner A miscompilation issue has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/88950.	2024-04-26 11:33:56 +02:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Kai Nacke	21d177096f	[NFC] Refactor looping over recomputeLiveIns into function (#88040 ) https://github.com/llvm/llvm-project/pull/79940 put calls to recomputeLiveIns into a loop, to repeatedly call the function until the computation converges. However, this repeats a lot of code. This changes moves the loop into a function to simplify the handling. Note that this changes the order in which recomputeLiveIns is called. For example, ``` bool anyChange = false; do { anyChange = recomputeLiveIns(ExitMBB) \|\| recomputeLiveIns(LoopMBB); } while (anyChange); ``` only begins to recompute the live-ins for LoopMBB after the computation for ExitMBB has converged. With this change, all basic blocks have a recomputation of the live-ins for each loop iteration. This can result in less or more calls, depending on the situation.	2024-04-15 17:12:25 -04:00
Pengcheng Wang	b564036933	[MachineCombiner][NFC] Split target-dependent patterns We split target-dependent MachineCombiner patterns into their target folder. This makes MachineCombiner much more target-independent. Reviewers: davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc Reviewed By: topperc, mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/87991	2024-04-11 12:20:27 +08:00
Sam Tebbs	fb8dbd1fb6	[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716 ) 7dc20ab introduced an extra COPY when spilling and filling a PNR register, which can't be elided as the input (PNR predicate) and output (PPR predicate) register classes differ. The patch adds a new register class that covers both PPR and PNR so that STR_PXI and LDR_PXI can take either of them, removing the need for the copy.	2024-04-09 16:17:27 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00
Harvin Iriawan	57146daeaa	[CodeGen] Update for scalable MemoryType in MMO (#70452 ) Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable TypeSize, the MemoryType created becomes a scalable_vector. 2 MMOs that have scalable memory access can then use the updated BasicAA that understands scalable LocationSize. Original Patch by Harvin Iriawan Co-authored-by: David Green <david.green@arm.com>	2024-03-23 12:56:25 +00:00
zhongyunde 00443407	a110a1c0ed	[AArch64] MachineCombiner msub matching for i64	2024-03-08 18:14:26 +08:00
zhongyunde 00443407	3a62edcf52	[AArch64] MachineCombiner msub matching Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix https://github.com/llvm/llvm-project/issues/84152	2024-03-08 18:14:25 +08:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Sander de Smalen	5bd01ac822	[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235 ) We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.	2024-02-29 15:35:46 +00:00

1 2 3 4 5 ...

645 Commits