llvm-project

Author	SHA1	Message	Date
Sam Tebbs	fb8dbd1fb6	[AArch64] Remove copy in SVE/SME predicate spill and fill (#81716 ) 7dc20ab introduced an extra COPY when spilling and filling a PNR register, which can't be elided as the input (PNR predicate) and output (PPR predicate) register classes differ. The patch adds a new register class that covers both PPR and PNR so that STR_PXI and LDR_PXI can take either of them, removing the need for the copy.	2024-04-09 16:17:27 +01:00
Eli Friedman	c83f23d6ab	[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894 ) The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.	2024-04-04 11:25:44 -07:00
Harvin Iriawan	57146daeaa	[CodeGen] Update for scalable MemoryType in MMO (#70452 ) Remove getSizeOrUnknown call when MachineMemOperand is created. For Scalable TypeSize, the MemoryType created becomes a scalable_vector. 2 MMOs that have scalable memory access can then use the updated BasicAA that understands scalable LocationSize. Original Patch by Harvin Iriawan Co-authored-by: David Green <david.green@arm.com>	2024-03-23 12:56:25 +00:00
zhongyunde 00443407	a110a1c0ed	[AArch64] MachineCombiner msub matching for i64	2024-03-08 18:14:26 +08:00
zhongyunde 00443407	3a62edcf52	[AArch64] MachineCombiner msub matching Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix https://github.com/llvm/llvm-project/issues/84152	2024-03-08 18:14:25 +08:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Sander de Smalen	5bd01ac822	[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235 ) We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.	2024-02-29 15:35:46 +00:00
ostannard	5452cbc4a6	[AArch64] Indirect tail-calls cannot use x16 with pac-ret+pc (#81020 ) When using -mbranch-protection=pac-ret+pc, x16 is used in the function epilogue to hold the address of the signing instruction. This is used by a HINT instruction which can only use x16, so we can't change this. This means that we can't use it to hold the function pointer for an indirect tail-call. There is existing code to force indirect tail-calls to use x16 or x17 when BTI is enabled, so there are now 4 combinations: bti pac-ret+pc Valid function pointer registers off off Any non callee-saved register on off x16 or x17 off on Any non callee-saved register except x16 on on x17	2024-02-08 15:31:54 +00:00
Sjoerd Meijer	35904ec4e1	[AArch64] MI Scheduler STP combine (#80188 ) Add opcodes for different store instructions to the target hook that can enable more STP pairs. This is split off from the patch that does the same for some load instructions (#79003). Patch co-authored by Cameron McInally.	2024-02-06 10:29:42 +00:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Yuta Mukai	70eab122bc	[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589 ) Add AArch64 implementations for the interfaces of MachinePipeliner pass. The pass is disabled by default for AArch64. It is enabled by specifying --aarch64-enable-pipeliner. 5 tests in llvm-test-suites show performance improvement by more than 5% on a Neoverse V1 processor. \| test \| improvement \| \| ---------------------------------------------------------------- \| -----------:\| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test \| 16% \| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test \| 16% \| \| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test \| 14% \| \| SingleSource/Benchmarks/Misc/flops-5.test \| 13% \| \| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test \| 6% \| (base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm -pipeliner-enable-copytophi=0) On the other hand, there are cases of significant performance degradation. Algorithm improvements and adding the option/pragma will be needed in the future.	2024-02-02 10:33:44 +09:00
Sjoerd Meijer	8841846050	[AArch64] MI Scheduler LDP combine follow up (#79003 ) This is a follow up of 75d820dcdd86, adding more opcodes to the combine target hook enabling more LDP creation. Patch co-authored by Cameron McInally.	2024-01-31 15:41:32 +00:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
David Green	915c3d9e5a	Revert "[AArch64] merge index address with large offset into base address" This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.	2024-01-28 17:01:21 +00:00
Nikita Popov	07a1925b8b	Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498 )" This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b. Introduces a major compile-time regression.	2024-01-26 22:33:17 +01:00
Oskar Wirga	59bf60519f	Refactor recomputeLiveIns to operate on whole CFG (#79498 ) Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. This PR fixes that by simply recomputing the liveins for the entire CFG until convergence is achieved. This makes it harder to introduce subtle bugs which alter liveness.	2024-01-26 11:25:36 -08:00
Anatoly Trosinenko	10bd69a4f7	[MachineOutliner] Refactor iterating over Candidate's instructions (#78972 ) Make Candidate's front() and back() functions return references to MachineInstr and introduce begin() and end() returning iterators, the same way it is usually done in other container-like classes. This makes possible to iterate over the instructions contained in Candidate the same way one can iterate over MachineBasicBlock (note that begin() and end() return bundled iterators, just like MachineBasicBlock does, but no instr_begin() and instr_end() are defined yet).	2024-01-23 17:21:40 +03:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
Sjoerd Meijer	75d820dcdd	[AArch64] MI Scheduler: create more LDP/STP pairs (#77565 ) Target hook `canPairLdStOpc` is missing quite a few opcodes for which LDPs/STPs can created. I was hoping that it would not be necessary to add these missing opcodes here and that the attached motivating test case would be handled by the LoadStoreOptimiser (especially after #71908), but it's not. The problem is that after register allocation some things are a lot harder to do. Consider this for the motivating example ``` [1] renamable $q1 = LDURQi renamable $x9, -16 :: (load (s128) from %ir.r51, align 8, !tbaa !0) [2] renamable $q2 = LDURQi renamable $x0, -16 :: (load (s128) from %ir.r53, align 8, !tbaa !4) [3] renamable $q1 = nnan ninf nsz arcp contract afn reassoc nofpexcept FMLSv2f64 killed renamable $q1(tied-def 0), killed renamable $q2, renamable $q0, implicit $fpcr [4] STURQi killed renamable $q1, renamable $x9, -16 :: (store (s128) into %ir.r51, align 1, !tbaa !0) [5] renamable $q1 = LDRQui renamable $x9, 0 :: (load (s128) from %ir.r.G0001_609.0, align 8, !tbaa !0) ``` We can't combine the the load in line [5] into the load on [1]: regisister q1 is used in between. And we can can't combine [1] into [5]: it is aliasing with the STR on line [4]. So, adding some missing opcodes here seems the best/easiest approach. I will follow up to add some more missing cases here.	2024-01-11 09:46:47 +00:00
Momchil Velikov	4b6968952e	[AArch64] Implement spill/fill of predicate pair register classes (#76068 ) We are getting ICE with, e.g. ``` #include <arm_sve.h> void g(); svboolx2_t f0(int64_t i, int64_t n) { svboolx2_t r = svwhilelt_b16_x2(i, n); g(); return r; } ```	2023-12-22 15:54:12 +00:00
Vitaly Buka	0ccc1e7acd	Revert "[AArch64] Fold more load.x into load.i with large offset" Issue #76202 This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.	2023-12-21 21:12:40 -08:00
Tomas Matheson	7bd17212ef	Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947 ) This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae. Fix expensive checks failure by properly marking register def for ADR.	2023-12-21 18:32:55 +00:00
Tomas Matheson	9f0f558742	Revert "[AArch64] Codegen support for FEAT_PAuthLR" This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5. Builtbot failures with expensive checks enabled.	2023-12-21 16:25:55 +00:00
Tomas Matheson	5992ce90b8	[AArch64] Codegen support for FEAT_PAuthLR - Adds a new +pc option to -mbranch-protection that will enable the use of PC as a diversifier in PAC branch protection code. - When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions (pacibsppc, retaasppc, etc) are used. Documentation for the relevant instructions can be found here: https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/ Co-authored-by: Lucas Prates <lucas.prates@arm.com>	2023-12-21 14:18:33 +00:00
zhongyunde 00443407	f568763641	[AArch64] Fold more load.x into load.i with large offset The list of load.x is refer to canFoldIntoAddrMode on D152828. Also support LDRSroX missed in canFoldIntoAddrMode	2023-12-21 18:54:15 +08:00
zhongyunde 00443407	32878c2065	[AArch64] merge index address with large offset into base address A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, #56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix https://github.com/llvm/llvm-project/issues/71917	2023-12-21 18:54:14 +08:00
DianQK	7649d22306	[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184 ) Follows https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782. Fixes #74680.	2023-12-14 19:19:55 +08:00
Oskar Wirga	9930f3e298	[AArch64] Fix case of 0 dynamic alloc when stack probing (#74877 ) I accidentally closed https://github.com/llvm/llvm-project/pull/74806 If the dynamic allocation size is 0, then we will still probe the current sp value despite not decrementing sp! This results in overwriting stack data, in my case the stack canary. The fix here is just to load the value of [sp] into xzr which is essentially a no-op but still performs a read/probe of the new page.	2023-12-10 08:01:29 -05:00
Alex Bradbury	b717365216	[MachineScheduler][NFCI] Add Offset and OffsetIsScalable args to shouldClusterMemOps (#73778 ) These are picked up from getMemOperandsWithOffsetWidth but weren't then being passed through to shouldClusterMemOps, which forces backends to collect the information again if they want to use the kind of heuristics typically used for the similar shouldScheduleLoadsNear function (e.g. checking the offset is within 1 cache line). This patch just adds the parameters, but doesn't attempt to use them. There is potential to use them in the current PPC and AArch64 shouldClusterMemOps implementation, and I intend to use the offset in the heuristic for RISC-V. I've left these for future patches in the interest of being as incremental as possible. As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.	2023-12-06 15:30:48 +00:00
Momchil Velikov	cc944f502f	[AArch64] Stack probing for function prologues (#66524 ) This adds code to AArch64 function prologues to protect against stack clash attacks by probing (writing to) the stack at regular enough intervals to ensure that the guard page cannot be skipped over. The patch depends on and maintains the following invariants: Upon function entry the caller guarantees that it has probed the stack (e.g. performed a store) at some address [sp, #N], where`0 <= N <= 1024`. This invariant comes from a requirement for compatibility with GCC. Any address range in the allocated stack, no smaller than stack-probe-size bytes contains at least one probe At any time the stack pointer is above or in the guard page Probes are performed in descreasing address order The stack-probe-size is a function attribute that can be set by a platform to correspond to the guard page size. By default, the stack probe size is 4KiB, which is a safe default as this is the smallest possible page size for AArch64. Linux uses a 64KiB guard for AArch64, so this can be overridden by the stack-probe-size function attribute. For small frames without a frame pointer (<= 240 bytes), no probes are needed. For larger frame sizes, LLVM always stores x29 to the stack. This serves as an implicit stack probe. Thus, while allocating stack objects the compiler assumes that the stack has been probed at [sp]. There are multiple probing sequences that can be emitted, depending on the size of the stack allocation: A straight-line sequence of subtracts and stores, used when the allocation size is smaller than 5 guard pages. A loop allocating and probing one page size per iteration, plus at most a single probe to deal with the remainder, used when the allocation size is larger but still known at compile time. A loop which moves the SP down to the target value held in a register (or a loop, moving a scratch register to the target value help in SP), used when the allocation size is not known at compile-time, such as when allocating space for SVE values, or when over-aligning the stack. This is emitted in AArch64InstrInfo because it will also be used for dynamic allocas in a future patch. A single probe where the amount of stack adjustment is unknown, but is known to be less than or equal to a page size. --------- Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>	2023-11-30 17:41:51 +00:00
David Green	4d80122598	[AArch64] Teach areMemAccessesTriviallyDisjoint about scalable widths. (#73655 ) The base change here is to change getMemOperandWithOffsetWidth to return a TypeSize Width, which in turn allows areMemAccessesTriviallyDisjoint to reason about trivially disjoint widths.	2023-11-30 16:54:28 +00:00
Alex Bradbury	6cf3566850	[NFC][MachineScheduler] Rename NumLoads parameter of shouldClusterMemOps to ClusterSize (#73757 ) As the same hook is called for both load and store clustering, NumLoads is a misleading name. Use ClusterSize instead.	2023-11-29 09:47:03 +00:00
Sander de Smalen	81b7f115fb	[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979 ) It seems TypeSize is currently broken in the sense that: TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8) without failing its assert that explicitly tests for this case: assert(LHS.Scalable == RHS.Scalable && ...); The reason this fails is that `Scalable` is a static method of class TypeSize, and LHS and RHS are both objects of class TypeSize. So this is evaluating if the pointer to the function Scalable == the pointer to the function Scalable, which is always true because LHS and RHS have the same class. This patch fixes the issue by renaming `TypeSize::Scalable` -> `TypeSize::getScalable`, as well as `TypeSize::Fixed` to `TypeSize::getFixed`, so that it no longer clashes with the variable in FixedOrScalableQuantity. The new methods now also better match the coding standard, which specifies that: * Variable names should be nouns (as they represent state) * Function names should be verb phrases (as they represent actions)	2023-11-22 08:52:53 +00:00
Antonio Frighetto	c16b94a3bf	[AArch64] Fix missing opcode when calling `isAArch64FrameOffsetLegal` `LDAPURi` was accidentally left unhandled in `isAArch64FrameOffsetLegal`. Reported-by: fhahn	2023-11-06 18:06:44 +01:00
Sander de Smalen	7dc20abed0	[AArch64] Fix spillfill-sve.mir with expensive checks. This fixes an issue introduced by PR #70679. Using constrainRegClass() is not strong enough to actually force the use of a register to be a PPR register class. It will need an actual COPY to do the conversion. The downside is that this introduces an extra register, which is an issue we may want to fix at a later point using a custom copy operation where the register allocator uses the same register when it can.	2023-11-01 16:29:44 +00:00
Sander de Smalen	2efea512c2	[AArch64] Fix spilling/filling of virtual registers in PNR regclass. (#70679 ) We made the assumption that the registers were always physical registers, which doesn't have to be true.	2023-11-01 10:57:12 +00:00
Antonio Frighetto	8ce4b7bcd5	[AArch64] Handle newly-added atomic instructions in `getMemOpInfo` 2-stage AArch64 buildbot was previously failing. Fixes: https://lab.llvm.org/buildbot/#/builders/198/builds/5636.	2023-10-31 18:58:20 +01:00
Sander de Smalen	73498d2608	[AArch64] Also implement PNR -> PNR copies. (#70682 ) Previously we only implemented PNR -> PPR and PPR -> PNR copies.	2023-10-31 16:52:42 +00:00
Paul Walker	7c90be2857	[SVE] Fix incorrect offset calculation when rewriting an instruction's frame index. (#70315 ) When partially packing an offset into an SVE load/store instruction we are incorrectly calculating the remainder.	2023-10-27 16:53:30 +01:00
Bill Wendling	389958a9f6	[CodeGen][NFC] Fix formatting This fixes the formatting introduced by fbf0a77e80f18a6d0fd8a28833b0bc87a99b1b2f.	2023-10-17 12:35:30 -07:00
Bill Wendling	fbf0a77e80	[CodeGen] Avoid potential sideeffects from XOR (#67193 ) XOR may change flag values (e.g. for X86 gprs). In the case where that's not desirable, specify that buildClearRegister() should use MOV instead.	2023-10-17 12:03:26 -07:00
Vladislav Dzhidzhoev	abd0d5d262	Reland: [AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG This relands the fb8f59156f0f208f6192ed808fc223eda6c0e7ec and makes isAArch64FrameOffsetLegal function recognize LD1R instructions. Original PR: https://github.com/llvm/llvm-project/pull/66914 PR of the fix: https://github.com/llvm/llvm-project/pull/69003	2023-10-17 17:40:05 +02:00
Momchil Velikov	bea3684944	[AArch64] Allow only LSL to be folded into addressing mode (#69235 ) There was an error in decoding shift type, which permitted shift types other than LSL to be (incorrectly) folded into the addressing mode of a load/store instruction.	2023-10-17 11:30:14 +01:00
john-brawn-arm	a574ef6176	[AArch64] Fix incorrect big-endian spill in foldMemoryOperandImpl (#65601 ) When an sreg sub-register of a q register was spilled, AArch64InstrInfo::foldMemoryOperandImpl would emit a spill of a d register, which gives the wrong result when the target is big-endian as the following q register fill will put the value in the top half. Fix this by greatly simplifying the existing code for widening the spill to only handle wzr to xzr widening, as the default result we get if the function returns nullptr is already that a widened spill will be emitted.	2023-10-12 16:10:28 +01:00
Anatoly Trosinenko	1d2b558265	[AArch64][PAC] Check authenticated LR value during tail call When performing a tail call, check the value of LR register after authentication to prevent the callee from signing and spilling an untrusted value. This commit implements a few variants of check, more can be added later. If it is safe to assume that executable pages are always readable, LR can be checked just by dereferencing the LR value via LDR. As an alternative, LR can be checked as follows: ; lowered AUT* instruction ; <some variant of check that LR contains a valid address> b.cond break_block ret_block: ; lowered TCRETURN break_block: brk 0xc471 As the existing methods either break the compatibility with execute-only memory mappings or can degrade the performance, they are disabled by default and can be explicitly enabled with a command line option. Individual subtargets can opt-in to use one of the available methods by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod(). Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D156716	2023-10-11 17:38:17 +03:00
Anatoly Trosinenko	f1b2dd2a11	[AArch64][BTI] Prevent Machine Scheduler from moving branch targets (#68313 ) Moving instructions that are recognized as branch targets by BTI can result in runtime crash. In outliner tests, replaced "BRK 1" with "HINT 0" (a.k.a. NOP) as a generic outlinable instruction.	2023-10-06 18:05:03 +03:00
Matt Arsenault	f79379398d	Revert "CodeGen: Disable isCopyInstrImpl if there are implicit operands" This reverts commit bc7d88faf1a595ab59952a2054418cdd0d9eeee8. This is broken with 414ff812d6241b728754ce562081419e7fc091eb reverted.	2023-10-02 22:43:24 +03:00
Matt Arsenault	bc7d88faf1	CodeGen: Disable isCopyInstrImpl if there are implicit operands This is a conservative workaround for broken liveness tracking of SUBREG_TO_REG to speculatively fix all targets. The current reported failures are on X86 only, but this issue should appear for all targets that use SUBREG_TO_REG. The next minimally correct refinement would be to disallow only implicit defs. The coalescer now introduces implicit-defs of the super register to track the dependency on other subregisters. If we see such an implicit operand, we cannot simply treat the subregister def as the result operand in case downstream users depend on the implicitly defined parts. Really target implementations should be considering the implicit defs and trying to interpret them appropriately (maybe with some generic helpers). The full implicit def could possibly be reported as the move result, rather than the subregister def but that requires additional work. Hopefully fixes #64060 as well. This needs to be applied to the release branch. https://reviews.llvm.org/D156346	2023-10-02 15:16:40 +03:00
Momchil Velikov	fe763d8ad4	[AArch64] Limit immediate offsets when folding instructions into addressing modes (#67345 ) Don't increase/decrease immediate offsets in folded instructions beyond the limits of `LDP`.	2023-09-26 14:21:32 +01:00
Simon Pilgrim	98e8f04c9f	Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. NFC.	2023-09-25 17:19:21 +01:00

1 2 3 4 5 ...

602 Commits