llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	40ca089d99	[CodeGenPrepare] Replace deleted ext instr with the promoted value. (#71058 ) This PR replaces the deleted ext with the promoted value in `AddrMode`. Fixes #70938. (cherry picked from commit 3c6aa04cf4dee65113e2a780b9f90b36bb4c4e04)	2025-01-31 17:49:18 -08:00
David Green	b23297a7f1	[GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799 ) After we fall back from GlobalISel to SDAG, the verifier gets called, which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs which caches the result of UsesAGPRs. Because we have just fallen-back the function is empty and it incorrectly gets cached to false. This patch makes sure we don't try to run the verifier whilst the function is empty. (cherry picked from commit 66e0498dafbfa7f8fd7deaa88ae62bdf38a12113)	2025-01-31 16:38:42 -08:00
Kazu Hirata	1d5ce614a7	[CodeGen] Avoid repeated hash lookups (NFC) (#124677 )	2025-01-28 10:57:29 -08:00
Stephen Tozer	22687aa97b	[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses (#111551 ) In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.	2025-01-28 13:59:41 +00:00
Renat Idrisov	11db7fb09b	[GlobalISel] Catching inconsistencies in load memory, result, and range metadata type (#121247 ) This is a fix for: https://github.com/llvm/llvm-project/issues/97290 Please let me know if that is the right way to address the issue. Thank you! --------- Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-28 20:54:34 +07:00
abhishek-kaushik22	015aed18ee	[SelectionDAG] WidenVecOp_INSERT_SUBVECTOR - Replace `INSERT_SUBVECTOR` with series of `INSERT_VECTOR_ELT` (#124420 ) If the operands to `INSERT_SUBVECTOR` can't be widened legally, just replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`. Closes #124255 (and possibly #102016)	2025-01-28 18:54:49 +05:30
Pierre van Houtryve	8ea018ce1d	[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730 ) #78569 did not implement this correctly and an edge case breaks it by triggering `Assertion `!Leafs.empty()' failed.` Fixes SWDEV-507698	2025-01-28 13:27:26 +01:00
Akshat Oke	7cd6f85578	[CodeGen][NFC] Format RegisterCoalescer sources (#124697 )	2025-01-28 15:49:21 +05:30
Aiden Grossman	00f692b94f	Reland "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 )" This reverts commit aa65f93b71dee8cacb22be1957673c8be6a3ec24. This relands commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9. It looks like this was a transitive include issue.	2025-01-28 07:09:25 +00:00
Craig Topper	d839e765f0	[TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC This caller does not need the libcall portion so it can directly call forceExpandMultiply.	2025-01-27 17:10:37 -08:00
Aiden Grossman	aa65f93b71	Revert "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 )" This reverts commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9. This was causing builbot failures. https://lab.llvm.org/buildbot/#/builders/90/builds/4198 https://lab.llvm.org/buildbot/#/builders/110/builds/3616	2025-01-28 00:22:25 +00:00
mingmingl	934532d8b1	remove unused var after refactoring	2025-01-27 15:47:32 -08:00
Mingming Liu	e98b2028c7	[NFCI]Refactor AsmPrinter around jump table emission (#124645 ) Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices. This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215	2025-01-27 15:29:38 -08:00
Aiden Grossman	8cc83b66e2	[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 ) This patch adjusts the mlregalloc-max-cascade flag (renaming it to mlregalloc-max-eviction-count) to actually count evictions rather than just looking at the cascade number. The cascade number is not very representative of how many times a LR has been evicted, which can lead to some problems in certain cases, where we might end up with many eviction problems where we have now masked off all the interferences and are forced to evict the candidate. This is probably what I should've done in the first place. No test case as this only shows up in quite large functions post ThinLTO and it would be hard to construct something that would serve as a nice regression test without being super brittle. I've tested this on the pathological cases that we have come across so far and it works. Fixes #122829	2025-01-27 15:23:37 -08:00
David Green	5a81a559d6	[GISel] Explicitly disable BF16 tablegen patterns. (#124113 ) We currently have an issue where bf16 patters can be used to match fp16 types, as GISel does not know about the difference between the two. This patch explicitly disables them to make sure that they are never used. The opposite can also happen too, where fp16 patterns are used for operators that should be bf16. So this also changes any operations with bf16 types to now cause a fallback to SDAG. The pass setup for GISel has been slightly adjusted to make sure that a verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass, which otherwise can cause verifier issues when falling back.	2025-01-27 22:21:12 +00:00
Craig Topper	c24e5f982e	[GlobalMerge] Fix inaccurate debug print. (#124377 ) This message was not updated when MinSize was added.	2025-01-27 12:45:41 -08:00
Craig Topper	0cbb1d5673	[GlobalMerge] Use constructor to set all bits in BitVector. NFC (#124375 ) The constructor has an optional bool for the starting value for each bit. Use that instead of calling set().	2025-01-27 12:44:44 -08:00
Kazu Hirata	817e777296	[CodeGen] Avoid repeated hash lookups (NFC) (#124506 )	2025-01-27 10:35:52 -08:00
Shubham Sandeep Rastogi	44c9e46fce	[InstrRef] Fix mismatch between LiveDebugValues and salvageCopySSA (#124233 ) The LiveDebugValues pass and the instruction selector (which calls salvageCopySSA) need to be consistent on what they consider a copy instruction. With https://github.com/llvm/llvm-project/pull/75184, the definition of what a copy instruction is was narrowed for AArch64 to exclude a w->x ORR and treat it as a zero-extend rather than a copy However, to make sure LiveDebugValues still treats a w->x ORR as a copy, the new function, isCopyLikeInstr was created. We need to make sure that salvageCopySSA also calls that function. This patch addresses this mismatch.	2025-01-27 09:26:22 -08:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Michael Maitland	559287575b	[GlobalMerge][NFC] Reland "Skip sorting by profitability when it is not needed" Relands #124146 but without changes to the sorting algorithm and the following reverse.	2025-01-27 07:28:47 -08:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
Alexey Bader	e278e1b6ec	[NFC][CodeGen] Fix typos in code comments. (#124382 ) This fixes typos in `calcUniqueIDUpdateFlagsAndSize` function.	2025-01-26 13:58:58 -08:00
Kazu Hirata	850852e9a4	[CodeGen] Avoid repeated hash lookups (NFC) (#124455 )	2025-01-26 01:35:39 -08:00
Craig Topper	37fdde6025	[CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC	2025-01-25 23:12:14 -08:00
Craig Topper	4bcd8184a0	[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371 ) These functions have similar code. One of them calculates the 2x width full product from 2 sources. The other calculates the product from 2 sources that have low and high halves. This patch introduces a new function that takes HiLHS and HiRHS as optional values. If they are not null, they will be used in the calculation of the Hi half. The Signed flag can only be set when HiLHS/HiRHS are null.	2025-01-25 10:53:01 -08:00
James Y Knight	9325a61aa0	Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needed" (#124411 ) Reverts llvm/llvm-project#124146 -- new comparator is not a strict-weak as required by stable_sort. Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-25 10:16:37 -05:00
Kazu Hirata	72918fd11d	[GlobalISel] Avoid repeated hash lookups (NFC) (#124393 )	2025-01-25 01:17:38 -08:00
Kazu Hirata	0cc74a8941	[CodeGen] Avoid repeated hash lookups (NFC) (#124392 )	2025-01-25 01:17:22 -08:00
Craig Topper	ac1ba1f9dd	[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768 ) LiveIntervals and MachineVerifier were previously using Register to store this, but reg units are different than physical registers. One important difference is that 0 is a valid reg unit number, but it is not a valid phyiscal register. This patch introduces a new VirtRegOrUnit class that is distinct from Register. It can be be converted to/from a virtual Register or a MCRegUnit. I've made all conversions explicit and used assertions to check the validity. I also fixed a place in MachineVerifier that was ignoring reg unit 0.	2025-01-24 18:30:28 -08:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Jeffrey Byrnes	6c11b7e689	[CodeGen] NFC: Change order of checks in MachineInstr->isDead() (#124207 ) [[Change-Id: Ic349022bb99ef91f5396e462ade0366bc772ae02](https://github.com/llvm/llvm-project/pull/123531)](https://github.com/llvm/llvm-project/pull/123531) moved isDead() from DeadMachineInstrElim to MachineInstr . In the process of moving, I reordered the checks to improve chances of early exit, but this has caused a slight increase in compile time. This PR reverts back to the original order of checks.	2025-01-24 07:23:22 -08:00
Emma Pilkington	f2b253b961	[SelectionDAG] Fix an incorrect DebugLoc on a COPY (#122963 ) Fixes: SWDEV-502134	2025-01-24 09:28:27 -05:00
Michael Maitland	e5e55c04d6	[GlobalMerge][NFC] Skip sorting by profitability when it is not needed (#124146 ) We were previously sorting by profitability even if we were choosing to merge all globals together, which is not impacted by UsedGlobalSet order. We can also remove iteration of UsedGlobalSets in reverse order in both cases. In the first csae, the order does not matter. In the second case, we just sort by the order we need instead of sorting in the opposite direction and calling reverse. This change should only be an improvement on compile time. I have not measured it, but I think it would never make things worse.	2025-01-24 09:08:34 -05:00
Jeremy Morse	6292a808b3	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>	2025-01-24 13:27:56 +00:00
Petar Avramovic	b60c118f53	MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (#112866 ) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.	2025-01-24 12:43:40 +01:00
Petar Avramovic	0ee037b861	AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864 ) Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.	2025-01-24 12:12:45 +01:00
Jeremy Morse	8e70273509	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).	2025-01-24 10:53:11 +00:00
Akshat Oke	a9c61e0d76	[NewPM] LiveIntervals: Check dependencies for invalidation (#123563 )	2025-01-24 11:23:46 +05:30
Kazu Hirata	9fecb4f907	[CodeGen] Fix a warning This patch fixes: llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable 'Preheader' [-Werror,-Wunused-variable]	2025-01-23 19:37:28 -08:00
Matt Arsenault	0ef39a882b	MachineCSE: Remove check for subreg on a def operand (#124095 ) There are no subregister defs in SSA.	2025-01-24 09:35:30 +07:00
Jeffrey Byrnes	acb7859f07	[MachineSink] Extend loop sinking capability (#117247 ) The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.	2025-01-23 17:08:23 -08:00
Min-Yih Hsu	bc74a1edbe	[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863 ) Previously, AArch64 used pattern matching to support llvm.vector.(de)interleave of 2 and 4; RISC-V only supported (de)interleave of 2. This patch consolidates the logics in these two targets by factoring out the common factor calculations into the InterleaveAccess Pass.	2025-01-23 15:27:51 -08:00
Jeffrey Byrnes	f2942b9077	[CodeGen] NFC: Move isDead to MachineInstr (#123531 ) Provide isDead interface for access to ad-hoc isDead queries. LivePhysRegs is optional: if not provided, pessimistically check deadness of a single MI without doing the LivePhysReg walk; if provided it is assumed to be at the position of MI.	2025-01-23 12:54:29 -08:00
Craig Topper	e30a4fc3e2	[TargetLowering] Improve one signature of forceExpandWideMUL. (#123991 ) We have two forceExpandWideMUL functions. One takes the low and high half of 2 inputs and calculates the low and high half of their product. This does not calculate the full 2x width product. The other signature takes 2 inputs and calculates the low and high half of their full 2x width product. Previously it did this by sign/zero extending the inputs to create the high bits and then calling the other function. We can instead copy the algorithm from the other function and use the Signed flag to determine whether we should do SRA or SRL. This avoids the need to multiply the high part of the inputs and add them to the high half of the result. This improves the generated code for signed multiplication. This should improve the performance of #123262. I don't know yet how close we will get to gcc.	2025-01-23 12:49:35 -08:00
Florian Hahn	0d0190815d	[TailDup] Allow large number of predecessors/successors without phis. (#116072 ) This adjusts the threshold logic added in #78582 to only trigger for cases where there are actually phis to duplicate in either TailBB or in one of the successors. In cases there are no phis, we only have to pay the cost of extra edges, but have no explosion in PHI related instructions. This improves performance of Python on some inputs by 2-3% on Apple Silicon CPUs. PR: https://github.com/llvm/llvm-project/pull/116072	2025-01-23 18:24:20 +00:00
Kazu Hirata	bb019dd165	[CodeGen] Avoid repeated hash lookups (NFC) (#124078 )	2025-01-23 08:46:19 -08:00
Michael Maitland	7db4ba3916	[GlobalMerge][NFC] Fix inaccurate comments (#124136 ) I was studying the code here and realized that the comments were talking about grouping by basic blocks when the code was grouping by Function. Fix the comments so they reflect what the code is actually doing.	2025-01-23 11:36:53 -05:00
Matt Arsenault	fb3fa41aee	MachineRegisterInfo: Use variable for TRI	2025-01-23 20:29:25 +07:00
Jeremy Morse	cb714e74cc	[DebugInfo][InstrRef] Avoid producing broken DW_OP_deref_sizes (#123967 ) We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer to "the low lane of $xmm0", and this is reflected in how DWARF is interpreted too. However InstrRefBasedLDV tries to be smart and interprets such a DBG_VALUE as a 128-bit reference. We then issue a DW_OP_deref_size of 128 bits to the stack, which isn't permitted by DWARF (it's larger than a pointer). Solve this for now by not using DW_OP_deref_size if it would be illegal. Instead we'll use DW_OP_deref, and the consumer will load the variable type from the stack, which should be correct. There's still a risk of imprecision when LLVM decides to use smaller or larger value types than the source-variable type, which manifests as too-little or too-much memory being read from the stack. However we can't solve that without putting more type information in debug-info. fixes #64093	2025-01-23 10:47:15 +00:00

1 2 3 4 5 ...

37137 Commits