llvm-project

Author	SHA1	Message	Date
Alex Bradbury	8fcb1263f4	[PreISelIntrinsicLowering] Produce a memset_pattern16 libcall for llvm.experimental.memset.pattern when available (#120420 ) This is to enable a transition of LoopIdiomRecognize to selecting the llvm.experimental.memset.pattern intrinsic as requested in #118632 (as opposed to supporting selection of the libcall or the intrinsic). As such, although it _is_ a TODO to add costing considerations on whether to lower to the libcall (when available) or expand directly, lacking such logic is helpful at this stage in order to minimise any unexpected code gen changes in this transition.	2025-01-30 07:12:53 +00:00
Craig Topper	dd3edc8365	[CodeGen] Add Register::stackSlotIndex(). Replace uses of Register::stackSlot2Index. NFC (#125028 )	2025-01-29 23:02:07 -08:00
Akshat Oke	11026a8d8b	[CodeGen][NewPM] Preserve all MF analyses in MFPM (#124707 ) Invalidation is already handled in the passes loop for MFAM, so all of the rest analyses are preserved. (See `PassManager::run()`) This won't change the number of invalidations, but will prevent needless `MFAM::Invalidator::invalidate()` invocations made by results depending on other results (since the invalidate shorts if `<AllAnalysesOn<MF>>` is preserved)	2025-01-30 10:01:58 +05:30
Matt Arsenault	6017480461	MachineVerifier: Fix check for range type (#124894 ) We need to permit scalar extending loads with range annotations. Fix expensive_checks failures after 11db7fb09b36e656a801117d6a2492133e9c2e46	2025-01-30 10:56:12 +07:00
Matt Arsenault	97a1f494a6	DAG: Avoid breaking legal vector_shuffle with multiple uses (#123712 ) Previously this combine would undo AMDGPU's new custom legalization of wide vector shuffles into 2 element pieces. The comment also states that this combine is only done before legalization, but the case with a build_vector source was unconditional. We probably don't want to do this if the multiple uses are full scalarization of the vector, but this seems to work well enough. Scalarizing extracts should have folded out pre-legalize.	2025-01-30 10:55:21 +07:00
Yingwei Zheng	3c6aa04cf4	[CodeGenPrepare] Replace deleted ext instr with the promoted value. (#71058 ) This PR replaces the deleted ext with the promoted value in `AddrMode`. Fixes #70938.	2025-01-30 08:58:23 +08:00
Michael Maitland	35defdf470	Revert "[ReachingDefAnalysis][NFC] Use at instead of lookup for DenseMap access" This reverts commit 3ce97e4aa98ad6a3502528818ff11eee89ef2fae. Pushed to main prematurley.	2025-01-29 08:21:59 -08:00
Michael Maitland	3ce97e4aa9	[ReachingDefAnalysis][NFC] Use at instead of lookup for DenseMap access `at` has an assert that the key exists. Since we are assuming the key exists, use `at` instead of `lookup`.	2025-01-29 08:15:56 -08:00
Mikhail Gudim	3c3c850a45	[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097 ) We track definitions of stack objects, the implementation is identical to tracking of registers. Also, added printing of all found reaching definitions for testing purposes. --------- Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-29 10:55:16 -05:00
Kazu Hirata	8baa0d9d54	[CodeGen] Avoid repeated hash lookups (NFC) (#124885 )	2025-01-29 07:49:05 -08:00
David Blaikie	ce96c26cd6	Revert "[llvm][DebugInfo] Attach object-pointer to DISubprogram declarations (#122742 )" (#124853 ) This introduces a substantial (5-10%) regression in .debug_info size, so we're discussing alternatives in #122742 and #124790. This reverts commit 7c729418d721147bf1f2b257afd30f84721888ad.	2025-01-29 15:11:33 +01:00
David Green	66e0498daf	[GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799 ) After we fall back from GlobalISel to SDAG, the verifier gets called, which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs which caches the result of UsesAGPRs. Because we have just fallen-back the function is empty and it incorrectly gets cached to false. This patch makes sure we don't try to run the verifier whilst the function is empty.	2025-01-29 12:48:11 +00:00
Akshat Oke	a3aa452a21	[CodeGen] RegisterCoalescer: Remove unused AliasAnalysis dependency (#124773 )	2025-01-29 13:27:14 +05:30
Mingming Liu	3feb724496	[AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (#122215 ) https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to infer machine jump table entry's hotness from the MBB hotness. This is a follow-up PR to produce `.hot` and or `.unlikely` section prefix for jump table's (read-only) data sections in the relocatable `.o` files. When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`, `.rodata.unlikely`} in input sections. It can map `.rodata.hot` and `.rodata` in the input sections to `.rodata.hot` in the executable, and map `.rodata.unlikely` into `.rodata` with a pending extension to `--keep-text-section-prefix` like `059e7cbb66`, or with a linker script. 1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group. 2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement `TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on MJTE's hotness. - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does. 3. The new code path is off by default. - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it readable through `TargetMachine`. - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in `CodeGen/CommandFlags.h/cpp`. - A subsequent patch ([draft](`8f36a13743`)) will add a clang option to enable the new code path. --------- Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>	2025-01-28 22:49:28 -08:00
Kazu Hirata	1d5ce614a7	[CodeGen] Avoid repeated hash lookups (NFC) (#124677 )	2025-01-28 10:57:29 -08:00
Stephen Tozer	22687aa97b	[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses (#111551 ) In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.	2025-01-28 13:59:41 +00:00
Renat Idrisov	11db7fb09b	[GlobalISel] Catching inconsistencies in load memory, result, and range metadata type (#121247 ) This is a fix for: https://github.com/llvm/llvm-project/issues/97290 Please let me know if that is the right way to address the issue. Thank you! --------- Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-28 20:54:34 +07:00
abhishek-kaushik22	015aed18ee	[SelectionDAG] WidenVecOp_INSERT_SUBVECTOR - Replace `INSERT_SUBVECTOR` with series of `INSERT_VECTOR_ELT` (#124420 ) If the operands to `INSERT_SUBVECTOR` can't be widened legally, just replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`. Closes #124255 (and possibly #102016)	2025-01-28 18:54:49 +05:30
Pierre van Houtryve	8ea018ce1d	[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730 ) #78569 did not implement this correctly and an edge case breaks it by triggering `Assertion `!Leafs.empty()' failed.` Fixes SWDEV-507698	2025-01-28 13:27:26 +01:00
Akshat Oke	7cd6f85578	[CodeGen][NFC] Format RegisterCoalescer sources (#124697 )	2025-01-28 15:49:21 +05:30
Aiden Grossman	00f692b94f	Reland "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 )" This reverts commit aa65f93b71dee8cacb22be1957673c8be6a3ec24. This relands commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9. It looks like this was a transitive include issue.	2025-01-28 07:09:25 +00:00
Craig Topper	d839e765f0	[TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC This caller does not need the libcall portion so it can directly call forceExpandMultiply.	2025-01-27 17:10:37 -08:00
Aiden Grossman	aa65f93b71	Revert "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 )" This reverts commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9. This was causing builbot failures. https://lab.llvm.org/buildbot/#/builders/90/builds/4198 https://lab.llvm.org/buildbot/#/builders/110/builds/3616	2025-01-28 00:22:25 +00:00
mingmingl	934532d8b1	remove unused var after refactoring	2025-01-27 15:47:32 -08:00
Mingming Liu	e98b2028c7	[NFCI]Refactor AsmPrinter around jump table emission (#124645 ) Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices. This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215	2025-01-27 15:29:38 -08:00
Aiden Grossman	8cc83b66e2	[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440 ) This patch adjusts the mlregalloc-max-cascade flag (renaming it to mlregalloc-max-eviction-count) to actually count evictions rather than just looking at the cascade number. The cascade number is not very representative of how many times a LR has been evicted, which can lead to some problems in certain cases, where we might end up with many eviction problems where we have now masked off all the interferences and are forced to evict the candidate. This is probably what I should've done in the first place. No test case as this only shows up in quite large functions post ThinLTO and it would be hard to construct something that would serve as a nice regression test without being super brittle. I've tested this on the pathological cases that we have come across so far and it works. Fixes #122829	2025-01-27 15:23:37 -08:00
David Green	5a81a559d6	[GISel] Explicitly disable BF16 tablegen patterns. (#124113 ) We currently have an issue where bf16 patters can be used to match fp16 types, as GISel does not know about the difference between the two. This patch explicitly disables them to make sure that they are never used. The opposite can also happen too, where fp16 patterns are used for operators that should be bf16. So this also changes any operations with bf16 types to now cause a fallback to SDAG. The pass setup for GISel has been slightly adjusted to make sure that a verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass, which otherwise can cause verifier issues when falling back.	2025-01-27 22:21:12 +00:00
Craig Topper	c24e5f982e	[GlobalMerge] Fix inaccurate debug print. (#124377 ) This message was not updated when MinSize was added.	2025-01-27 12:45:41 -08:00
Craig Topper	0cbb1d5673	[GlobalMerge] Use constructor to set all bits in BitVector. NFC (#124375 ) The constructor has an optional bool for the starting value for each bit. Use that instead of calling set().	2025-01-27 12:44:44 -08:00
Kazu Hirata	817e777296	[CodeGen] Avoid repeated hash lookups (NFC) (#124506 )	2025-01-27 10:35:52 -08:00
Shubham Sandeep Rastogi	44c9e46fce	[InstrRef] Fix mismatch between LiveDebugValues and salvageCopySSA (#124233 ) The LiveDebugValues pass and the instruction selector (which calls salvageCopySSA) need to be consistent on what they consider a copy instruction. With https://github.com/llvm/llvm-project/pull/75184, the definition of what a copy instruction is was narrowed for AArch64 to exclude a w->x ORR and treat it as a zero-extend rather than a copy However, to make sure LiveDebugValues still treats a w->x ORR as a copy, the new function, isCopyLikeInstr was created. We need to make sure that salvageCopySSA also calls that function. This patch addresses this mismatch.	2025-01-27 09:26:22 -08:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
Michael Maitland	559287575b	[GlobalMerge][NFC] Reland "Skip sorting by profitability when it is not needed" Relands #124146 but without changes to the sorting algorithm and the following reverse.	2025-01-27 07:28:47 -08:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
Alexey Bader	e278e1b6ec	[NFC][CodeGen] Fix typos in code comments. (#124382 ) This fixes typos in `calcUniqueIDUpdateFlagsAndSize` function.	2025-01-26 13:58:58 -08:00
Kazu Hirata	850852e9a4	[CodeGen] Avoid repeated hash lookups (NFC) (#124455 )	2025-01-26 01:35:39 -08:00
Craig Topper	37fdde6025	[CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC	2025-01-25 23:12:14 -08:00
Craig Topper	4bcd8184a0	[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371 ) These functions have similar code. One of them calculates the 2x width full product from 2 sources. The other calculates the product from 2 sources that have low and high halves. This patch introduces a new function that takes HiLHS and HiRHS as optional values. If they are not null, they will be used in the calculation of the Hi half. The Signed flag can only be set when HiLHS/HiRHS are null.	2025-01-25 10:53:01 -08:00
James Y Knight	9325a61aa0	Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needed" (#124411 ) Reverts llvm/llvm-project#124146 -- new comparator is not a strict-weak as required by stable_sort. Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-25 10:16:37 -05:00
Kazu Hirata	72918fd11d	[GlobalISel] Avoid repeated hash lookups (NFC) (#124393 )	2025-01-25 01:17:38 -08:00
Kazu Hirata	0cc74a8941	[CodeGen] Avoid repeated hash lookups (NFC) (#124392 )	2025-01-25 01:17:22 -08:00
Craig Topper	ac1ba1f9dd	[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768 ) LiveIntervals and MachineVerifier were previously using Register to store this, but reg units are different than physical registers. One important difference is that 0 is a valid reg unit number, but it is not a valid phyiscal register. This patch introduces a new VirtRegOrUnit class that is distinct from Register. It can be be converted to/from a virtual Register or a MCRegUnit. I've made all conversions explicit and used assertions to check the validity. I also fixed a place in MachineVerifier that was ignoring reg unit 0.	2025-01-24 18:30:28 -08:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Jeffrey Byrnes	6c11b7e689	[CodeGen] NFC: Change order of checks in MachineInstr->isDead() (#124207 ) [[Change-Id: Ic349022bb99ef91f5396e462ade0366bc772ae02](https://github.com/llvm/llvm-project/pull/123531)](https://github.com/llvm/llvm-project/pull/123531) moved isDead() from DeadMachineInstrElim to MachineInstr . In the process of moving, I reordered the checks to improve chances of early exit, but this has caused a slight increase in compile time. This PR reverts back to the original order of checks.	2025-01-24 07:23:22 -08:00
Emma Pilkington	f2b253b961	[SelectionDAG] Fix an incorrect DebugLoc on a COPY (#122963 ) Fixes: SWDEV-502134	2025-01-24 09:28:27 -05:00
Michael Maitland	e5e55c04d6	[GlobalMerge][NFC] Skip sorting by profitability when it is not needed (#124146 ) We were previously sorting by profitability even if we were choosing to merge all globals together, which is not impacted by UsedGlobalSet order. We can also remove iteration of UsedGlobalSets in reverse order in both cases. In the first csae, the order does not matter. In the second case, we just sort by the order we need instead of sorting in the opposite direction and calling reverse. This change should only be an improvement on compile time. I have not measured it, but I think it would never make things worse.	2025-01-24 09:08:34 -05:00
Jeremy Morse	6292a808b3	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>	2025-01-24 13:27:56 +00:00
Petar Avramovic	b60c118f53	MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (#112866 ) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.	2025-01-24 12:43:40 +01:00
Petar Avramovic	0ee037b861	AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864 ) Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.	2025-01-24 12:12:45 +01:00
Jeremy Morse	8e70273509	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).	2025-01-24 10:53:11 +00:00

1 2 3 4 5 ...

37149 Commits