37137 Commits

Author SHA1 Message Date
Yingwei Zheng
40ca089d99 [CodeGenPrepare] Replace deleted ext instr with the promoted value. (#71058)
This PR replaces the deleted ext with the promoted value in `AddrMode`.
Fixes #70938.

(cherry picked from commit 3c6aa04cf4dee65113e2a780b9f90b36bb4c4e04)
2025-01-31 17:49:18 -08:00
David Green
b23297a7f1 [GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799)
After we fall back from GlobalISel to SDAG, the verifier gets called,
which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs
which caches the result of UsesAGPRs. Because we have just fallen-back
the function is empty and it incorrectly gets cached to false. This
patch makes sure we don't try to run the verifier whilst the function is
empty.

(cherry picked from commit 66e0498dafbfa7f8fd7deaa88ae62bdf38a12113)
2025-01-31 16:38:42 -08:00
Kazu Hirata
1d5ce614a7
[CodeGen] Avoid repeated hash lookups (NFC) (#124677) 2025-01-28 10:57:29 -08:00
Stephen Tozer
22687aa97b
[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses (#111551)
In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are
only used by fake uses, as well as the fake use in question. There are
two existing errors with the pass however: it incorrectly examines every
operand of each FAKE_USE, when only the first is relevant (extra
operands will just be "killed" regs assigned by a previous pass), and it
ignores cases where the FAKE_USE register is not an exact match for the
loaded register, which is incorrect as regalloc may choose to load a
wider value than the FAKE_USE required pre-regalloc. This patch fixes
both of these cases.
2025-01-28 13:59:41 +00:00
Renat Idrisov
11db7fb09b
[GlobalISel] Catching inconsistencies in load memory, result, and range metadata type (#121247)
This is a fix for:
https://github.com/llvm/llvm-project/issues/97290
Please let me know if that is the right way to address the issue. Thank
you!

---------

Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-01-28 20:54:34 +07:00
abhishek-kaushik22
015aed18ee
[SelectionDAG] WidenVecOp_INSERT_SUBVECTOR - Replace INSERT_SUBVECTOR with series of INSERT_VECTOR_ELT (#124420)
If the operands to `INSERT_SUBVECTOR` can't be widened legally, just
replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`.

Closes #124255 (and possibly #102016)
2025-01-28 18:54:49 +05:30
Pierre van Houtryve
8ea018ce1d
[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730)
#78569 did not implement this correctly and an edge case breaks it by
triggering `Assertion `!Leafs.empty()' failed.`

Fixes SWDEV-507698
2025-01-28 13:27:26 +01:00
Akshat Oke
7cd6f85578
[CodeGen][NFC] Format RegisterCoalescer sources (#124697) 2025-01-28 15:49:21 +05:30
Aiden Grossman
00f692b94f Reland "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)"
This reverts commit aa65f93b71dee8cacb22be1957673c8be6a3ec24.

This relands commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9.

It looks like this was a transitive include issue.
2025-01-28 07:09:25 +00:00
Craig Topper
d839e765f0 [TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC
This caller does not need the libcall portion so it can directly
call forceExpandMultiply.
2025-01-27 17:10:37 -08:00
Aiden Grossman
aa65f93b71 Revert "[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)"
This reverts commit 8cc83b66e20e72cdb3bb5fbd549c941797b0e0c9.

This was causing builbot failures.
https://lab.llvm.org/buildbot/#/builders/90/builds/4198
https://lab.llvm.org/buildbot/#/builders/110/builds/3616
2025-01-28 00:22:25 +00:00
mingmingl
934532d8b1 remove unused var after refactoring 2025-01-27 15:47:32 -08:00
Mingming Liu
e98b2028c7
[NFCI]Refactor AsmPrinter around jump table emission (#124645)
Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices.

This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215
2025-01-27 15:29:38 -08:00
Aiden Grossman
8cc83b66e2
[MLGO] Count LR Evictions Rather than Relying on Cascade (#124440)
This patch adjusts the mlregalloc-max-cascade flag (renaming it to
mlregalloc-max-eviction-count) to actually count evictions rather than
just looking at the cascade number. The cascade number is not very
representative of how many times a LR has been evicted, which can lead
to some problems in certain cases, where we might end up with many
eviction problems where we have now masked off all the interferences and
are forced to evict the candidate.

This is probably what I should've done in the first place. No test case
as this only shows up in quite large functions post ThinLTO and it would
be hard to construct something that would serve as a nice regression
test without being super brittle. I've tested this on the pathological
cases that we have come across so far and it works.

Fixes #122829
2025-01-27 15:23:37 -08:00
David Green
5a81a559d6
[GISel] Explicitly disable BF16 tablegen patterns. (#124113)
We currently have an issue where bf16 patters can be used to match fp16
types, as GISel does not know about the difference between the two. This
patch explicitly disables them to make sure that they are never used.

The opposite can also happen too, where fp16 patterns are used for
operators that should be bf16. So this also changes any operations with
bf16 types to now cause a fallback to SDAG.

The pass setup for GISel has been slightly adjusted to make sure that a
verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass,
which otherwise can cause verifier issues when falling back.
2025-01-27 22:21:12 +00:00
Craig Topper
c24e5f982e
[GlobalMerge] Fix inaccurate debug print. (#124377)
This message was not updated when MinSize was added.
2025-01-27 12:45:41 -08:00
Craig Topper
0cbb1d5673
[GlobalMerge] Use constructor to set all bits in BitVector. NFC (#124375)
The constructor has an optional bool for the starting value for each
bit. Use that instead of calling set().
2025-01-27 12:44:44 -08:00
Kazu Hirata
817e777296
[CodeGen] Avoid repeated hash lookups (NFC) (#124506) 2025-01-27 10:35:52 -08:00
Shubham Sandeep Rastogi
44c9e46fce
[InstrRef] Fix mismatch between LiveDebugValues and salvageCopySSA (#124233)
The LiveDebugValues pass and the instruction selector (which calls
salvageCopySSA) need to be consistent on what they consider a copy
instruction. With https://github.com/llvm/llvm-project/pull/75184, the
definition of what a copy instruction is was narrowed for AArch64 to
exclude a w->x ORR and treat it as a zero-extend rather than a copy

However, to make sure LiveDebugValues still treats a w->x ORR as a copy,
the new function, isCopyLikeInstr was created. We need to make sure that
salvageCopySSA also calls that function.

This patch addresses this mismatch.
2025-01-27 09:26:22 -08:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Michael Maitland
559287575b [GlobalMerge][NFC] Reland "Skip sorting by profitability when it is not needed"
Relands #124146 but without changes to the sorting algorithm and the following
reverse.
2025-01-27 07:28:47 -08:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Alexey Bader
e278e1b6ec
[NFC][CodeGen] Fix typos in code comments. (#124382)
This fixes typos in `calcUniqueIDUpdateFlagsAndSize` function.
2025-01-26 13:58:58 -08:00
Kazu Hirata
850852e9a4
[CodeGen] Avoid repeated hash lookups (NFC) (#124455) 2025-01-26 01:35:39 -08:00
Craig Topper
37fdde6025 [CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC 2025-01-25 23:12:14 -08:00
Craig Topper
4bcd8184a0
[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371)
These functions have similar code. One of them calculates the 2x width
full product from 2 sources. The other calculates the product from 2
sources that have low and high halves.

This patch introduces a new function that takes HiLHS and HiRHS as
optional values. If they are not null, they will be used in the
calculation of the Hi half. The Signed flag can only be set when
HiLHS/HiRHS are null.
2025-01-25 10:53:01 -08:00
James Y Knight
9325a61aa0
Revert "[GlobalMerge][NFC] Skip sorting by profitability when it is not needed" (#124411)
Reverts llvm/llvm-project#124146 -- new comparator is not a strict-weak
as required by stable_sort.

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
2025-01-25 10:16:37 -05:00
Kazu Hirata
72918fd11d
[GlobalISel] Avoid repeated hash lookups (NFC) (#124393) 2025-01-25 01:17:38 -08:00
Kazu Hirata
0cc74a8941
[CodeGen] Avoid repeated hash lookups (NFC) (#124392) 2025-01-25 01:17:22 -08:00
Craig Topper
ac1ba1f9dd
[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768)
LiveIntervals and MachineVerifier were previously using Register to
store this, but reg units are different than physical registers. One
important difference is that 0 is a valid reg unit number, but it is not
a valid phyiscal register.

This patch introduces a new VirtRegOrUnit class that is distinct from
Register. It can be be converted to/from a virtual Register or a
MCRegUnit. I've made all conversions explicit and used assertions to
check the validity.

I also fixed a place in MachineVerifier that was ignoring reg unit 0.
2025-01-24 18:30:28 -08:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00
Jeffrey Byrnes
6c11b7e689
[CodeGen] NFC: Change order of checks in MachineInstr->isDead() (#124207)
[[Change-Id:
Ic349022bb99ef91f5396e462ade0366bc772ae02](https://github.com/llvm/llvm-project/pull/123531)](https://github.com/llvm/llvm-project/pull/123531)
moved isDead() from DeadMachineInstrElim to MachineInstr . In the
process of moving, I reordered the checks to improve chances of early
exit, but this has caused a slight increase in compile time.

This PR reverts back to the original order of checks.
2025-01-24 07:23:22 -08:00
Emma Pilkington
f2b253b961
[SelectionDAG] Fix an incorrect DebugLoc on a COPY (#122963)
Fixes: SWDEV-502134
2025-01-24 09:28:27 -05:00
Michael Maitland
e5e55c04d6
[GlobalMerge][NFC] Skip sorting by profitability when it is not needed (#124146)
We were previously sorting by profitability even if we were choosing to
merge all globals together, which is not impacted by UsedGlobalSet
order.

We can also remove iteration of UsedGlobalSets in reverse order in both
cases. In the first csae, the order does not matter. In the second case,
we just sort by the order we need instead of sorting in the opposite
direction and calling reverse.

This change should only be an improvement on compile time. I have not
measured it, but I think it would never make things worse.
2025-01-24 09:08:34 -05:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Petar Avramovic
b60c118f53
MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (#112866)
Change existing code for G_PHI to match what LLVM-IR version is doing
via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI
since it may appear with an undef operand and getVRegDef can fail.
Most notably this improves number of values that can be allocated
to sgpr in AMDGPURegBankSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
2025-01-24 12:43:40 +01:00
Petar Avramovic
0ee037b861
AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864)
Lower G_ instructions that can't be inst-selected with register bank
assignment from AMDGPURegBankSelect based on uniformity analysis.
- Lower instruction to perform it on assigned register bank
- Put uniform value in vgpr because SALU instruction is not available
- Execute divergent instruction in SALU - "waterfall loop"

Given LLTs on all operands after legalizer, some register bank
assignments require lowering while other do not.
Note: cases where all register bank assignments would require lowering
are lowered in legalizer.

AMDGPURegBankLegalize goals:
- Define Rules: when and how to perform lowering
- Goal of defining Rules it to provide high level table-like brief
  overview of how to lower generic instructions based on available
  target features and uniformity info (uniform vs divergent).
- Fast search of Rules, depends on how complicated Rule.Predicate is
- For some opcodes there would be too many Rules that are essentially
  all the same just for different combinations of types and banks.
  Write custom function that handles all cases.
- Rules are made from enum IDs that correspond to each operand.
  Names of IDs are meant to give brief description what lowering does
  for each operand or the whole instruction.
- AMDGPURegBankLegalizeHelper implements lowering algorithms

Since this is the first patch that actually enables -new-reg-bank-select
here is the summary of regression tests that were added earlier:
- if instruction is uniform always select SALU instruction if available
- eliminate back to back vgpr to sgpr to vgpr copies of uniform values
- fast rules: small differences for standard and vector instruction
- enabling Rule based on target feature - salu_float
- how to specify lowering algorithm - vgpr S64 AND to S32
- on G_TRUNC in reg, it is up to user to deal with truncated bits
  G_TRUNC in reg is treated as no-op.
- dealing with truncated high bits - ABS S16 to S32
- sgpr S1 phi lowering
- new opcodes for vcc-to-scc and scc-to-vcc copies
- lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC)
- S1 zext and sext lowering to select
- uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into
  SALU instruction
- divergent phi with uniform inputs
- divergent instruction with temporal divergent use, source instruction
  is defined as uniform(AMDGPURegBankSelect) - missing temporal
  divergence lowering
- uniform phi, because of undef incoming, is assigned to vgpr. Will be
  fixed in AMDGPURegBankSelect via another fix in machine uniformity
  analysis.
2025-01-24 12:12:45 +01:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Akshat Oke
a9c61e0d76
[NewPM] LiveIntervals: Check dependencies for invalidation (#123563) 2025-01-24 11:23:46 +05:30
Kazu Hirata
9fecb4f907 [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable
  'Preheader' [-Werror,-Wunused-variable]
2025-01-23 19:37:28 -08:00
Matt Arsenault
0ef39a882b
MachineCSE: Remove check for subreg on a def operand (#124095)
There are no subregister defs in SSA.
2025-01-24 09:35:30 +07:00
Jeffrey Byrnes
acb7859f07
[MachineSink] Extend loop sinking capability (#117247)
The current MIR cycle sinking capabilities are rather limited. It only
support sinking copies into a single successor block while obeying
limits.

This opt-in feature adds a more aggressive option, that is not limited
to the above concerns. The feature will try to "sink" by duplicating any
top-level preheader instruction (that we are sure is safe to sink) into
any user block, then does some dead code cleanup. In particular, this is
useful for high RP situations when loop bodies have control flow.
2025-01-23 17:08:23 -08:00
Min-Yih Hsu
bc74a1edbe
[IA] Generalize the support for power-of-two (de)interleave intrinsics (#123863)
Previously, AArch64 used pattern matching to support
llvm.vector.(de)interleave of 2 and 4; RISC-V only supported
(de)interleave of 2.

This patch consolidates the logics in these two targets by factoring out
the common factor calculations into the InterleaveAccess Pass.
2025-01-23 15:27:51 -08:00
Jeffrey Byrnes
f2942b9077
[CodeGen] NFC: Move isDead to MachineInstr (#123531)
Provide isDead interface for access to ad-hoc isDead queries.
LivePhysRegs is optional: if not provided, pessimistically check
deadness of a single MI without doing the LivePhysReg walk; if provided
it is assumed to be at the position of MI.
2025-01-23 12:54:29 -08:00
Craig Topper
e30a4fc3e2
[TargetLowering] Improve one signature of forceExpandWideMUL. (#123991)
We have two forceExpandWideMUL functions. One takes the low and high
half of 2 inputs and calculates the low and high half of their product.
This does not calculate the full 2x width product.

The other signature takes 2 inputs and calculates the low and high half
of their full 2x width product. Previously it did this by sign/zero
extending the inputs to create the high bits and then calling the other
function.

We can instead copy the algorithm from the other function and use the
Signed flag to determine whether we should do SRA or SRL. This avoids
the need to multiply the high part of the inputs and add them to the
high half of the result. This improves the generated code for signed
multiplication.

This should improve the performance of #123262. I don't know yet how
close we will get to gcc.
2025-01-23 12:49:35 -08:00
Florian Hahn
0d0190815d
[TailDup] Allow large number of predecessors/successors without phis. (#116072)
This adjusts the threshold logic added in #78582 to only trigger for
cases where there are actually phis to duplicate in either TailBB or in
one of the successors.

In cases there are no phis, we only have to pay the cost of extra edges,
but have no explosion in PHI related instructions.

This improves performance of Python on some inputs by 2-3% on Apple
Silicon CPUs.

PR: https://github.com/llvm/llvm-project/pull/116072
2025-01-23 18:24:20 +00:00
Kazu Hirata
bb019dd165
[CodeGen] Avoid repeated hash lookups (NFC) (#124078) 2025-01-23 08:46:19 -08:00
Michael Maitland
7db4ba3916
[GlobalMerge][NFC] Fix inaccurate comments (#124136)
I was studying the code here and realized that the comments were talking
about grouping by basic blocks when the code was grouping by Function.
Fix the comments so they reflect what the code is actually doing.
2025-01-23 11:36:53 -05:00
Matt Arsenault
fb3fa41aee MachineRegisterInfo: Use variable for TRI 2025-01-23 20:29:25 +07:00
Jeremy Morse
cb714e74cc
[DebugInfo][InstrRef] Avoid producing broken DW_OP_deref_sizes (#123967)
We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer
to "the low lane of $xmm0", and this is reflected in how DWARF is
interpreted too. However InstrRefBasedLDV tries to be smart and
interprets such a DBG_VALUE as a 128-bit reference. We then issue a
DW_OP_deref_size of 128 bits to the stack, which isn't permitted by
DWARF (it's larger than a pointer).

Solve this for now by not using DW_OP_deref_size if it would be illegal.
Instead we'll use DW_OP_deref, and the consumer will load the variable
type from the stack, which should be correct.

There's still a risk of imprecision when LLVM decides to use smaller or
larger value types than the source-variable type, which manifests as
too-little or too-much memory being read from the stack. However we
can't solve that without putting more type information in debug-info.

fixes #64093
2025-01-23 10:47:15 +00:00