37149 Commits

Author SHA1 Message Date
Alex Bradbury
8fcb1263f4
[PreISelIntrinsicLowering] Produce a memset_pattern16 libcall for llvm.experimental.memset.pattern when available (#120420)
This is to enable a transition of LoopIdiomRecognize to selecting the
llvm.experimental.memset.pattern intrinsic as requested in #118632 (as
opposed to supporting selection of the libcall or the intrinsic). As
such, although it _is_ a TODO to add costing considerations on whether
to lower to the libcall (when available) or expand directly, lacking
such logic is helpful at this stage in order to minimise any unexpected
code gen changes in this transition.
2025-01-30 07:12:53 +00:00
Craig Topper
dd3edc8365
[CodeGen] Add Register::stackSlotIndex(). Replace uses of Register::stackSlot2Index. NFC (#125028) 2025-01-29 23:02:07 -08:00
Akshat Oke
11026a8d8b
[CodeGen][NewPM] Preserve all MF analyses in MFPM (#124707)
Invalidation is already handled in the passes loop for MFAM, so all of
the rest analyses are preserved. (See `PassManager::run()`)

This won't change the number of invalidations, but will prevent needless
`MFAM::Invalidator::invalidate()` invocations made by results depending
on other results (since the invalidate shorts if `<AllAnalysesOn<MF>>`
is preserved)
2025-01-30 10:01:58 +05:30
Matt Arsenault
6017480461
MachineVerifier: Fix check for range type (#124894)
We need to permit scalar extending loads with range annotations.

Fix expensive_checks failures after 11db7fb09b36e656a801117d6a2492133e9c2e46
2025-01-30 10:56:12 +07:00
Matt Arsenault
97a1f494a6
DAG: Avoid breaking legal vector_shuffle with multiple uses (#123712)
Previously this combine would undo AMDGPU's new custom legalization of
wide vector shuffles into 2 element pieces. The comment also
states that this combine is only done before legalization,
but the case with a build_vector source was unconditional.

We probably don't want to do this if the multiple uses are full
scalarization of the vector, but this seems to work well enough.
Scalarizing extracts should have folded out pre-legalize.
2025-01-30 10:55:21 +07:00
Yingwei Zheng
3c6aa04cf4
[CodeGenPrepare] Replace deleted ext instr with the promoted value. (#71058)
This PR replaces the deleted ext with the promoted value in `AddrMode`.
Fixes #70938.
2025-01-30 08:58:23 +08:00
Michael Maitland
35defdf470 Revert "[ReachingDefAnalysis][NFC] Use at instead of lookup for DenseMap access"
This reverts commit 3ce97e4aa98ad6a3502528818ff11eee89ef2fae. Pushed to main
prematurley.
2025-01-29 08:21:59 -08:00
Michael Maitland
3ce97e4aa9 [ReachingDefAnalysis][NFC] Use at instead of lookup for DenseMap access
`at` has an assert that the key exists. Since we are assuming the key exists,
use `at` instead of `lookup`.
2025-01-29 08:15:56 -08:00
Mikhail Gudim
3c3c850a45
[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097)
We track definitions of stack objects, the implementation is identical
to tracking of registers.

Also, added printing of all found reaching definitions for testing
purposes.

---------

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
2025-01-29 10:55:16 -05:00
Kazu Hirata
8baa0d9d54
[CodeGen] Avoid repeated hash lookups (NFC) (#124885) 2025-01-29 07:49:05 -08:00
David Blaikie
ce96c26cd6
Revert "[llvm][DebugInfo] Attach object-pointer to DISubprogram declarations (#122742)" (#124853)
This introduces a substantial (5-10%) regression in .debug_info size, so
we're discussing alternatives in #122742 and #124790.

This reverts commit 7c729418d721147bf1f2b257afd30f84721888ad.
2025-01-29 15:11:33 +01:00
David Green
66e0498daf
[GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799)
After we fall back from GlobalISel to SDAG, the verifier gets called,
which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs
which caches the result of UsesAGPRs. Because we have just fallen-back
the function is empty and it incorrectly gets cached to false. This
patch makes sure we don't try to run the verifier whilst the function is
empty.
2025-01-29 12:48:11 +00:00
Akshat Oke
a3aa452a21
[CodeGen] RegisterCoalescer: Remove unused AliasAnalysis dependency (#124773) 2025-01-29 13:27:14 +05:30
Mingming Liu
3feb724496
[AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (#122215)
https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.

When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.

1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.

2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
    - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.

3. The new code path is off by default.
    - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
    - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
    -  A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.

---------

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
2025-01-28 22:49:28 -08:00
Kazu Hirata
1d5ce614a7
[CodeGen] Avoid repeated hash lookups (NFC) (#124677) 2025-01-28 10:57:29 -08:00
Stephen Tozer
22687aa97b
[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses (#111551)
In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are
only used by fake uses, as well as the fake use in question. There are
two existing errors with the pass however: it incorrectly examines every
operand of each FAKE_USE, when only the first is relevant (extra
operands will just be "killed" regs assigned by a previous pass), and it
ignores cases where the FAKE_USE register is not an exact match for the
loaded register, which is incorrect as regalloc may choose to load a
wider value than the FAKE_USE required pre-regalloc. This patch fixes
both of these cases.
2025-01-28 13:59:41 +00:00
Renat Idrisov
11db7fb09b
[GlobalISel] Catching inconsistencies in load memory, result, and range metadata type (#121247)
This is a fix for:
https://github.com/llvm/llvm-project/issues/97290
Please let me know if that is the right way to address the issue. Thank
you!

---------

Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-01-28 20:54:34 +07:00
abhishek-kaushik22
015aed18ee
[SelectionDAG] WidenVecOp_INSERT_SUBVECTOR - Replace INSERT_SUBVECTOR with series of INSERT_VECTOR_ELT (#124420)
If the operands to `INSERT_SUBVECTOR` can't be widened legally, just
replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`.

Closes #124255 (and possibly #102016)
2025-01-28 18:54:49 +05:30
Pierre van Houtryve
8ea018ce1d
[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730)
#78569 did not implement this correctly and an edge case breaks it by
triggering `Assertion `!Leafs.empty()' failed.`

Fixes SWDEV-507698
2025-01-28 13:27:26 +01:00
Akshat Oke
7cd6f85578
[CodeGen][NFC] Format RegisterCoalescer sources (#124697) 2025-01-28 15:49:21 +05:30
Aiden Grossman
00f692b94f Reland "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)"
This reverts commit aa65f93b71dee8cacb22be1957673c8be6a3ec24.

This relands commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9.

It looks like this was a transitive include issue.
2025-01-28 07:09:25 +00:00
Craig Topper
d839e765f0 [TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC
This caller does not need the libcall portion so it can directly
call forceExpandMultiply.
2025-01-27 17:10:37 -08:00
Aiden Grossman
aa65f93b71 Revert "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)"
This reverts commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9.

This was causing builbot failures.
https://lab.llvm.org/buildbot/#/builders/90/builds/4198
https://lab.llvm.org/buildbot/#/builders/110/builds/3616
2025-01-28 00:22:25 +00:00
mingmingl
934532d8b1 remove unused var after refactoring 2025-01-27 15:47:32 -08:00
Mingming Liu
e98b2028c7
[NFCI]Refactor AsmPrinter around jump table emission (#124645)
Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices.

This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215
2025-01-27 15:29:38 -08:00
Aiden Grossman
8cc83b66e2
[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)
This patch adjusts the mlregalloc-max-cascade flag (renaming it to
mlregalloc-max-eviction-count) to actually count evictions rather than
just looking at the cascade number. The cascade number is not very
representative of how many times a LR has been evicted, which can lead
to some problems in certain cases, where we might end up with many
eviction problems where we have now masked off all the interferences and
are forced to evict the candidate.

This is probably what I should've done in the first place. No test case
as this only shows up in quite large functions post ThinLTO and it would
be hard to construct something that would serve as a nice regression
test without being super brittle. I've tested this on the pathological
cases that we have come across so far and it works.

Fixes #122829
2025-01-27 15:23:37 -08:00
David Green
5a81a559d6
[GISel] Explicitly disable BF16 tablegen patterns. (#124113)
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the two. This
patch explicitly disables them to make sure that they are never used.

The opposite can also happen too, where fp16 patterns are used for
operators that should be bf16. So this also changes any operations with
bf16 types to now cause a fallback to SDAG.

The pass setup for GISel has been slightly adjusted to make sure that a
verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass,
which otherwise can cause verifier issues when falling back.
2025-01-27 22:21:12 +00:00
Craig Topper
c24e5f982e
[GlobalMerge] Fix inaccurate debug print. (#124377)
This message was not updated when MinSize was added.
2025-01-27 12:45:41 -08:00
Craig Topper
0cbb1d5673
[GlobalMerge] Use constructor to set all bits in BitVector. NFC (#124375)
The constructor has an optional bool for the starting value for each
bit. Use that instead of calling set().
2025-01-27 12:44:44 -08:00
Kazu Hirata
817e777296
[CodeGen] Avoid repeated hash lookups (NFC) (#124506) 2025-01-27 10:35:52 -08:00
Shubham Sandeep Rastogi
44c9e46fce
[InstrRef] Fix mismatch between LiveDebugValues and salvageCopySSA (#124233)
The LiveDebugValues pass and the instruction selector (which calls
salvageCopySSA) need to be consistent on what they consider a copy
instruction. With https://github.com/llvm/llvm-project/pull/75184, the
definition of what a copy instruction is was narrowed for AArch64 to
exclude a w->x ORR and treat it as a zero-extend rather than a copy

However, to make sure LiveDebugValues still treats a w->x ORR as a copy,
the new function, isCopyLikeInstr was created. We need to make sure that
salvageCopySSA also calls that function.

This patch addresses this mismatch.
2025-01-27 09:26:22 -08:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Michael Maitland
559287575b [GlobalMerge][NFC] Reland "Skip sorting by profitability when it is not needed"
Relands #124146 but without changes to the sorting algorithm and the following
reverse.
2025-01-27 07:28:47 -08:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Alexey Bader
e278e1b6ec
[NFC][CodeGen] Fix typos in code comments. (#124382)
This fixes typos in `calcUniqueIDUpdateFlagsAndSize` function.
2025-01-26 13:58:58 -08:00
Kazu Hirata
850852e9a4
[CodeGen] Avoid repeated hash lookups (NFC) (#124455) 2025-01-26 01:35:39 -08:00
Craig Topper
37fdde6025 [CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC 2025-01-25 23:12:14 -08:00
Craig Topper
4bcd8184a0
[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371)
These functions have similar code. One of them calculates the 2x width
full product from 2 sources. The other calculates the product from 2
sources that have low and high halves.

This patch introduces a new function that takes HiLHS and HiRHS as
optional values. If they are not null, they will be used in the
calculation of the Hi half. The Signed flag can only be set when
HiLHS/HiRHS are null.
2025-01-25 10:53:01 -08:00
James Y Knight
9325a61aa0
Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needed" (#124411)
Reverts llvm/llvm-project#124146 -- new comparator is not a strict-weak
as required by stable_sort.

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
2025-01-25 10:16:37 -05:00
Kazu Hirata
72918fd11d
[GlobalISel] Avoid repeated hash lookups (NFC) (#124393) 2025-01-25 01:17:38 -08:00
Kazu Hirata
0cc74a8941
[CodeGen] Avoid repeated hash lookups (NFC) (#124392) 2025-01-25 01:17:22 -08:00
Craig Topper
ac1ba1f9dd
[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768)
LiveIntervals and MachineVerifier were previously using Register to
store this, but reg units are different than physical registers. One
important difference is that 0 is a valid reg unit number, but it is not
a valid phyiscal register.

This patch introduces a new VirtRegOrUnit class that is distinct from
Register. It can be be converted to/from a virtual Register or a
MCRegUnit. I've made all conversions explicit and used assertions to
check the validity.

I also fixed a place in MachineVerifier that was ignoring reg unit 0.
2025-01-24 18:30:28 -08:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00
Jeffrey Byrnes
6c11b7e689
[CodeGen] NFC: Change order of checks in MachineInstr->isDead() (#124207)
[[Change-Id:
Ic349022bb99ef91f5396e462ade0366bc772ae02](https://github.com/llvm/llvm-project/pull/123531)](https://github.com/llvm/llvm-project/pull/123531)
moved isDead() from DeadMachineInstrElim to MachineInstr . In the
process of moving, I reordered the checks to improve chances of early
exit, but this has caused a slight increase in compile time.

This PR reverts back to the original order of checks.
2025-01-24 07:23:22 -08:00
Emma Pilkington
f2b253b961
[SelectionDAG] Fix an incorrect DebugLoc on a COPY (#122963)
Fixes: SWDEV-502134
2025-01-24 09:28:27 -05:00
Michael Maitland
e5e55c04d6
[GlobalMerge][NFC] Skip sorting by profitability when it is not needed (#124146)
We were previously sorting by profitability even if we were choosing to
merge all globals together, which is not impacted by UsedGlobalSet
order.

We can also remove iteration of UsedGlobalSets in reverse order in both
cases. In the first csae, the order does not matter. In the second case,
we just sort by the order we need instead of sorting in the opposite
direction and calling reverse.

This change should only be an improvement on compile time. I have not
measured it, but I think it would never make things worse.
2025-01-24 09:08:34 -05:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Petar Avramovic
b60c118f53
MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (#112866)
Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
2025-01-24 12:43:40 +01:00
Petar Avramovic
0ee037b861
AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864)
Lower G_ instructions that can't be inst-selected with register bank
assignment from AMDGPURegBankSelect based on uniformity analysis.
- Lower instruction to perform it on assigned register bank
- Put uniform value in vgpr because SALU instruction is not available
- Execute divergent instruction in SALU - "waterfall loop"

Given LLTs on all operands after legalizer, some register bank
assignments require lowering while other do not.
Note: cases where all register bank assignments would require lowering
are lowered in legalizer.

AMDGPURegBankLegalize goals:
- Define Rules: when and how to perform lowering
- Goal of defining Rules it to provide high level table-like brief
  overview of how to lower generic instructions based on available
  target features and uniformity info (uniform vs divergent).
- Fast search of Rules, depends on how complicated Rule.Predicate is
- For some opcodes there would be too many Rules that are essentially
  all the same just for different combinations of types and banks.
  Write custom function that handles all cases.
- Rules are made from enum IDs that correspond to each operand.
  Names of IDs are meant to give brief description what lowering does
  for each operand or the whole instruction.
- AMDGPURegBankLegalizeHelper implements lowering algorithms

Since this is the first patch that actually enables -new-reg-bank-select
here is the summary of regression tests that were added earlier:
- if instruction is uniform always select SALU instruction if available
- eliminate back to back vgpr to sgpr to vgpr copies of uniform values
- fast rules: small differences for standard and vector instruction
- enabling Rule based on target feature - salu_float
- how to specify lowering algorithm - vgpr S64 AND to S32
- on G_TRUNC in reg, it is up to user to deal with truncated bits
  G_TRUNC in reg is treated as no-op.
- dealing with truncated high bits - ABS S16 to S32
- sgpr S1 phi lowering
- new opcodes for vcc-to-scc and scc-to-vcc copies
- lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC)
- S1 zext and sext lowering to select
- uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into
  SALU instruction
- divergent phi with uniform inputs
- divergent instruction with temporal divergent use, source instruction
  is defined as uniform(AMDGPURegBankSelect) - missing temporal
  divergence lowering
- uniform phi, because of undef incoming, is assigned to vgpr. Will be
  fixed in AMDGPURegBankSelect via another fix in machine uniformity
  analysis.
2025-01-24 12:12:45 +01:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00