llvm-project

Author	SHA1	Message	Date
Maurice Heumann	607c525110	[ARM64] [Windows] Mark block address as taken when expanding catchrets (#109252 ) This fixes issue #109250 The issue happens during the `MachineBlockPlacement` pass. The block, whose address was previously not taken, is deemed redundant by the pass and subsequently replaced using `MachineBasicBlock::ReplaceUsesOfBlockWith` in `BranchFolding`. ReplaceUsesOfBlockWith only replaces uses in the terminator. However, `expandPostRAPseudo` introduces new block uses when expanding catchrets. These uses do not get replaced, which results in undefined label errors later on. Marking the block addresss as taken prevents the replacement of the block, without also replacing non-terminator uses.	2024-09-30 11:14:38 -07:00
Sander de Smalen	91a3c6f3d6	[AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396 ) This removes a redundant 'COPY' instruction that #81716 probably forgot to remove. This redundant COPY led to an issue because because code in LiveRangeSplitting expects that the instruction emitted by `loadRegFromStackSlot` is an instruction that accesses memory, which isn't the case for the COPY instruction.	2024-09-05 17:54:57 +01:00
Kyungwoo Lee	140381d4bf	[MachineOutliner][NFC] Remove unnecessary RepeatedSequenceLocs.clear() (#106171 ) - When `getOutliningCandidateInfo()` returns `std::nullopt` (meaning no `OutlinedFunction` is created), there is no need to clear the input argument, `RepeatedSequenceLocs`, as it's already being cleared in the main loop of `findCandidates()`. - Replaced `2` by `MinRepeats`, which I missed from https://github.com/llvm/llvm-project/pull/105398	2024-08-28 07:09:54 -07:00
zhongyunde 00443407	e5a5ac0c23	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2024-08-28 14:15:09 +08:00
Kyungwoo Lee	93b8d07a75	[MachineOutliner][NFC] Refactor (#105398 ) This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-08-27 14:38:36 -07:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Craig Topper	7e6b1504c7	[AArch64] Pass DebugLoc by reference to AArch64InstrInfo::copyGPRRegTuple. NFC	2024-08-25 22:20:58 -07:00
Craig Topper	b12d338c17	[AArch64] Use MCRegister in AArch64InstrInfo::copyGPRRegTuple interface. NFC This matches copyPhysReg.	2024-08-25 22:11:31 -07:00
Thurston Dang	324b676a3d	Revert "[AArch64] Fold more load.x into load.i with large offset" This reverts commit 43ffe2eed0d9f73789dbe213023733d164999306. Reason: buildbot breakage starting at https://lab.llvm.org/buildbot/#/builders/85/builds/1102 I manually bisected and found that clang crashed with 43ffe2eed0d9f73789dbe213023733d164999306 but not the immediately preceding commit (33190490c667aaf8b08d5af8b8ce84524f856e80)	2024-08-16 22:32:12 +00:00
Kazu Hirata	dca820951c	[llvm] Use llvm::any_of (NFC) (#104443 )	2024-08-15 17:59:10 -07:00
zhongyunde 00443407	43ffe2eed0	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2024-08-15 18:22:52 +08:00
zhongyunde 00443407	33190490c6	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917 Note: This PR is try relanding the commit 32878c2065 with fix crash for PR79756 this crash is exposes when there is MOVKWi instruction in the head of a block, but without MOVZWi	2024-08-15 18:22:52 +08:00
David Green	36231a5b55	[AArch64] Add verification for MemOp immediate ranges (#97561 ) This adds an implementation of AArch64InstrInfo::verifyInstruction for AArch64, and adds some basic verification of the range of immediate ranges of memory operations using the information from getMemOpInfo. Some extra memory operations have been added to getMemOpInfo, along with the equivalent opcodes to getLoadStoreImmIdx to ensure we use the correct index. Please let us know if this starts reporting verification failures, Thanks.	2024-08-15 11:20:20 +01:00
David Green	a3cf8642bf	[AArch64] Cleanup existing values in getMemOpInfo (#98196 ) This patch tries to clean up some of the existing values in getMemOpInfo. All values should now be in bytes (not bits), and the MinOffset/MaxOffset are now always represented unscaled (the immediate that will be present in the final instruction). Although I could not find a place where it altered codegen, the offset of a post-index instruction will be 0, not scale*imm. A IsPostIndexLdStOpcode method has been added to try and make sure that case is handled properly.	2024-08-03 12:31:10 +01:00
Momchil Velikov	461126c29c	[AArch64] Fix incorrectly getting the destination reg of an insn (#101205 ) This popped up while investigating https://github.com/llvm/llvm-project/issues/96950 In a few places where we need the destination reg of an instruction we were using a call that worked only by accident.	2024-08-02 15:43:28 +01:00
Daniil Kovalev	56fd2472d8	[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552 ) For pauthtest ABI, there is a bunch of ptrauth-* options, including ptrauth-returns. Use "ptrauth-returns" function attribute to indicate need for LR signing with B key for non-leaf function to avoid using "sign-return-address" and "sign-return-address-key" which were originally designed for pac-ret. Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org> Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>	2024-07-25 22:21:03 +03:00
Matt Arsenault	3cb5604d2c	MachineOutliner: Use PM to query MachineModuleInfo (#99688 ) Avoid getting this from the MachineFunction	2024-07-24 13:22:56 +04:00
David Green	0d7403184d	[AArch64] Add a AArch64InstrInfo::isFpOrNEON method for checking physical register call. NFC	2024-07-15 08:13:52 +01:00
David Green	d3cb277ea3	[AArch64] Rearrange Opcodes in getMemOpInfo. NFC This just changes the order of the opcodes and fields in getMemOpInfo, none of the values are altered.	2024-07-08 23:05:50 +01:00
Nikita Popov	4169338e75	[IR] Don't include Module.h in Analysis.h (NFC) (#97023 ) Replace it with a forward declaration instead. Analysis.h is pulled in by all passes, but not all passes need to access the module.	2024-06-28 14:30:47 +02:00
Sander de Smalen	c436649313	[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543 ) I've not added any new tests for these, because the original conditions were wrong (they did not consider streaming mode) and we have tests for the positive cases.	2024-06-25 13:27:06 +01:00
Sander de Smalen	62baf21daa	[AArch64] Check for streaming mode in HasSME* features. (#96302 ) This also fixes up some asserts in copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot.	2024-06-24 20:12:31 +01:00
Momchil Velikov	6ec02f7316	[AArch64] Refactor redundant PTEST optimisations (NFC) (#87802 ) This patch refactors `AArch64InstrInfo::optimizePTestInstr` to simplify the convoluted conditions and control flow and make it easier to add the optimisation in https://github.com/llvm/llvm-project/pull/81141	2024-06-18 08:00:59 +01:00
Nikita Popov	db08b0999d	[ARM][AArch64] Bail out if CandidatesWithoutStackFixups is empty (#95410 ) The following code assumes that RepeatedSequenceLocs is non-empty. Bail out if there are less than 2 candidates left, as no outlining is possible in that case. The same check is already present in all the other places where elements from RepeatedSequenceLocs may be dropped. This fixes the issue reported at: https://github.com/llvm/llvm-project/pull/93965#issuecomment-2151989716	2024-06-14 09:29:21 +02:00
Kerry McLaughlin	ea6577a74b	[AArch64][SME] Disable outlining for functions with streaming-mode changes (#95132 )	2024-06-12 10:35:29 +01:00
Yuta Mukai	0c5319e546	[ModuloSchedule][AArch64] Implement modulo variable expansion for pipelining (#65609 ) Modulo variable expansion is a technique that resolves overlap of variable lifetimes by unrolling. The existing implementation solves it by making a copy by move instruction for processors with ordinary registers such as Arm and x86. This method may result in a very large number of move instructions, which can cause performance problems. Modulo variable expansion is enabled by specifying -pipeliner-mve-cg. A backend must implement some newly defined interfaces in PipelinerLoopInfo. They were implemented for AArch64. Discourse thread: https://discourse.llvm.org/t/implementing-modulo-variable-expansion-for-machinepipeliner	2024-06-12 10:27:35 +09:00
Nikita Popov	1c9f4d4b6f	[ARM] Avoid reference into modified vector (#93965 ) FirstCand is a reference to RepeatedSequenceLocs[0]. However, that vector is being modified a lot throughout the function, including one place that reassigns the whole vector. I'm not sure whether this can really happen in practice, but it doesn't seem unlikely that this could lead to a use-after-free. Avoid this by directly using RepeatedSequenceLocs[0] at the start of the function (as a lot of other places already do) and only creating FirstCand at the end where no more modifications take place.	2024-06-03 17:10:35 +02:00
Sander de Smalen	b71434f8b3	[AArch64] Avoid NEON ORR when NEON and SVE are unavailable (#93940 ) For streaming-compatible functions with only +sme, we can't use a NEON ORR (aliased as 'mov') for copies of Q-registers, so we need to use a spill/fill instead. This also fixes the fill, which should use the post-incrementing addressing mode.	2024-06-03 09:22:21 +01:00
Ahmed Bougacha	cc548ec47c	[AArch64][PAC] Lower authenticated calls with ptrauth bundles. (#85736 ) This adds codegen support for the "ptrauth" operand bundles, which can be used to augment indirect calls with the equivalent of an `@llvm.ptrauth.auth` intrinsic call on the call target (possibly preceded by an `@llvm.ptrauth.blend` on the auth discriminator if applicable.) This allows the generation of combined authenticating calls on AArch64 (in the BLRA* PAuth instructions), while avoiding the raw just-authenticated function pointer from being exposed to attackers. This is done by threading a PtrAuthInfo descriptor through the call lowering infrastructure, eventually selecting a BLRA pseudo. The pseudo encapsulates the safe discriminator computation, which together with the real BLRA* call get emitted in late pseudo expansion in AsmPrinter. Note that this also applies to the other forms of indirect calls, notably invokes, rvmarker, and tail calls. Tail-calls in particular bring some additional complexity, with the intersecting register constraints of BTI and PAC discriminator computation. However this doesn't currently support PAuth_LR tail-call variants. This also adopts an x8+ allocation order for GPR64noip, matching GPR64.	2024-05-31 14:08:10 -07:00
Paul Walker	37c6b9ff72	[NFC][LLVM] Mainly whitespace changes. Also marks AliasSetTracker::size() as const.	2024-05-17 10:37:26 +00:00
Antonio Frighetto	23b6709c72	[AArch64] Drop poison-generating flags in `genSubAdd2SubSub` combiner A miscompilation issue has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/88950.	2024-04-26 11:33:56 +02:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Kai Nacke	21d177096f	[NFC] Refactor looping over recomputeLiveIns into function (#88040 ) https://github.com/llvm/llvm-project/pull/79940 put calls to recomputeLiveIns into a loop, to repeatedly call the function until the computation converges. However, this repeats a lot of code. This changes moves the loop into a function to simplify the handling. Note that this changes the order in which recomputeLiveIns is called. For example, ``` bool anyChange = false; do { anyChange = recomputeLiveIns(ExitMBB) \|\| recomputeLiveIns(LoopMBB); } while (anyChange); ``` only begins to recompute the live-ins for LoopMBB after the computation for ExitMBB has converged. With this change, all basic blocks have a recomputation of the live-ins for each loop iteration. This can result in less or more calls, depending on the situation.	2024-04-15 17:12:25 -04:00
Pengcheng Wang	b564036933	[MachineCombiner][NFC] Split target-dependent patterns We split target-dependent MachineCombiner patterns into their target folder. This makes MachineCombiner much more target-independent. Reviewers: davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc Reviewed By: topperc, mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/87991	2024-04-11 12:20:27 +08:00
Sam Tebbs	fb8dbd1fb6	[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716 ) 7dc20ab introduced an extra COPY when spilling and filling a PNR register, which can't be elided as the input (PNR predicate) and output (PPR predicate) register classes differ. The patch adds a new register class that covers both PPR and PNR so that STR_PXI and LDR_PXI can take either of them, removing the need for the copy.	2024-04-09 16:17:27 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00
Harvin Iriawan	57146daeaa	[CodeGen] Update for scalable MemoryType in MMO (#70452 ) Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable TypeSize, the MemoryType created becomes a scalable_vector. 2 MMOs that have scalable memory access can then use the updated BasicAA that understands scalable LocationSize. Original Patch by Harvin Iriawan Co-authored-by: David Green <david.green@arm.com>	2024-03-23 12:56:25 +00:00
zhongyunde 00443407	a110a1c0ed	[AArch64] MachineCombiner msub matching for i64	2024-03-08 18:14:26 +08:00
zhongyunde 00443407	3a62edcf52	[AArch64] MachineCombiner msub matching Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix https://github.com/llvm/llvm-project/issues/84152	2024-03-08 18:14:25 +08:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Sander de Smalen	5bd01ac822	[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235 ) We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.	2024-02-29 15:35:46 +00:00
ostannard	5452cbc4a6	[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020 ) When using -mbranch-protection=pac-ret+pc, x16 is used in the function epilogue to hold the address of the signing instruction. This is used by a HINT instruction which can only use x16, so we can't change this. This means that we can't use it to hold the function pointer for an indirect tail-call. There is existing code to force indirect tail-calls to use x16 or x17 when BTI is enabled, so there are now 4 combinations: bti pac-ret+pc Valid function pointer registers off off Any non callee-saved register on off x16 or x17 off on Any non callee-saved register except x16 on on x17	2024-02-08 15:31:54 +00:00
Sjoerd Meijer	35904ec4e1	[AArch64] MI Scheduler STP combine (#80188 ) Add opcodes for different store instructions to the target hook that can enable more STP pairs. This is split off from the patch that does the same for some load instructions (#79003). Patch co-authored by Cameron McInally.	2024-02-06 10:29:42 +00:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Yuta Mukai	70eab122bc	[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589 ) Add AArch64 implementations for the interfaces of MachinePipeliner pass. The pass is disabled by default for AArch64. It is enabled by specifying --aarch64-enable-pipeliner. 5 tests in llvm-test-suites show performance improvement by more than 5% on a Neoverse V1 processor. \| test \| improvement \| \| ---------------------------------------------------------------- \| -----------:\| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test \| 16% \| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test \| 16% \| \| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test \| 14% \| \| SingleSource/Benchmarks/Misc/flops-5.test \| 13% \| \| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test \| 6% \| (base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm -pipeliner-enable-copytophi=0) On the other hand, there are cases of significant performance degradation. Algorithm improvements and adding the option/pragma will be needed in the future.	2024-02-02 10:33:44 +09:00
Sjoerd Meijer	8841846050	[AArch64] MI Scheduler LDP combine follow up (#79003 ) This is a follow up of 75d820dcdd86, adding more opcodes to the combine target hook enabling more LDP creation. Patch co-authored by Cameron McInally.	2024-01-31 15:41:32 +00:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
David Green	915c3d9e5a	Revert "[AArch64] merge index address with large offset into base address" This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.	2024-01-28 17:01:21 +00:00
Nikita Popov	07a1925b8b	Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498 )" This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b. Introduces a major compile-time regression.	2024-01-26 22:33:17 +01:00
Oskar Wirga	59bf60519f	Refactor recomputeLiveIns to operate on whole CFG (#79498 ) Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. This PR fixes that by simply recomputing the liveins for the entire CFG until convergence is achieved. This makes it harder to introduce subtle bugs which alter liveness.	2024-01-26 11:25:36 -08:00

1 2 3 4 5 ...

636 Commits