37268 Commits

Author SHA1 Message Date
Fangrui Song
78bac7f0a6 [MC] Remove unneeded getMemtagAttr() 2025-02-23 10:56:59 -08:00
Fangrui Song
34387fc63b
[AsmPrinter] Simplify $local after D131429. NFC
setType is unneeded (and AsmPrinter tries not to modify symbols).
AsmPrinter. MCSA_ELF_TypeFunction is available on all
targets using getSymbolPreferLocal.

Pull Request: https://github.com/llvm/llvm-project/pull/128138
2025-02-23 10:19:27 -08:00
Craig Topper
228dbd254a [RegAllocGreedy] Use MCRegister instead of Register for functions that return a physical register.
The callers of these functions return the value as an MCRegister
so this removes some casts from Register to MCRegister.
2025-02-22 21:39:25 -08:00
Craig Topper
0bd66c4194 [RegAllocGreedy] Remove unnecessary conversion from MCRegister to Register. NFC 2025-02-22 16:20:19 -08:00
Craig Topper
6fe780ce63 [RegAllocGreedy] Use Register() instead of 0 for invalid Register. NFC 2025-02-22 16:20:19 -08:00
Kazu Hirata
34172bba11
[CodeGen] Avoid repeated hash lookups (NFC) (#128300) 2025-02-22 02:24:35 -08:00
Craig Topper
444859634f
[MachineVerifier] Use Register instead of unsigned for DenseSet key. NFC (#128234) 2025-02-21 21:25:21 -08:00
Yingwei Zheng
646e4f2eed
[DAGCombiner] visitFREEZE: Early exit when N is deleted (#128161)
`N` may get merged with existing nodes inside the loop. Early exit when
it is deleted to avoid the crash.
Alternative solution: use `DAGNodeDeletedListener` to refresh the value
of N.

Closes https://github.com/llvm/llvm-project/issues/128143.
2025-02-22 12:06:34 +08:00
Matt Arsenault
1bb43068f1
PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle
composing subregister extracts through reg_sequence.
2025-02-22 09:16:14 +07:00
Kazu Hirata
b044350701
[CodeGen] Avoid repeated hash lookups (NFC) (#128126)
This patch eliminates repeated hash lookups at three levels:

- RegToSlotIdx  of DenseMap
- Reloads       of DenseMap
- Reloads[MBB]  of SmallSet
2025-02-21 11:07:07 -08:00
Matt Arsenault
0c50054820 Revert "RegAlloc: Fix verifier error after failed allocation (#119690)"
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.

Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
2025-02-22 00:23:21 +07:00
Matt Arsenault
34167f9966
RegAlloc: Fix verifier error after failed allocation (#119690)
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616
2025-02-21 22:11:51 +07:00
Craig Topper
af64f0a6c2
[FrameLowering] Use MCRegister instead of Register in CalleeSavedInfo. NFC (#128095)
Callee saved registers should always be phyiscal registers. They are
often passed directly to other functions that take MCRegister like
getMinimalPhysRegClass or TargetRegisterClass::contains.

Unfortunately, sometimes the MCRegister is compared to a Register which
gave an ambiguous comparison error when the MCRegister is on the LHS.
Adding a MCRegister==Register comparison operator created more ambiguous
comparison errors elsewhere. These cases were usually comparing against
a base or frame pointer register that is a physical register in a
Register. For those I added an explicit conversion of Register to
MCRegister to fix the error.
2025-02-20 23:44:05 -08:00
Christopher Di Bella
08c69b2ef6 Revert "[CodeGen] Remove static member function Register::isVirtualRegister. NFC (#127968)"
This reverts commit ff99af7ea03b3be46bec7203bd2b74048d29a52a.
2025-02-20 22:06:21 +00:00
Christopher Di Bella
309e3ca081 Revert "[CodeGen] Remove static member function Register::isPhysicalRegister. NFC"
This reverts commit 5fadb3d680909ab30b37eb559f80046b5a17045e.
2025-02-20 22:06:21 +00:00
Craig Topper
5fadb3d680 [CodeGen] Remove static member function Register::isPhysicalRegister. NFC
Prefer the nonstatic member by converting unsigned to Register instead.
2025-02-20 10:49:53 -08:00
Craig Topper
ff99af7ea0
[CodeGen] Remove static member function Register::isVirtualRegister. NFC (#127968)
Use nonstatic member instead. This requires explicit conversions, but
many will go away as we continue converting unsigned to Register.

In a few places where it was simple, I changed unsigned to Register.
2025-02-20 08:35:50 -08:00
David Green
70ed381b16
[GlobalISel][AArch64] Fix fptoi.sat lowering. (#127901)
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
2025-02-20 12:22:11 +00:00
Craig Topper
77183a46a5
[CodeGen] Remove static member function Register::virtReg2Index. NFC (#127962)
Use the nonstatic member instead.

I'm pretty sure the code in SPRIV is a layering violation. MC layer
files are using a CodeGen header.
2025-02-19 23:34:55 -08:00
Piotr Fusik
8b58cb853a
[SelectionDAG][NFC] Refactor duplicate code into SDNode::bitcastToAPInt() (#127503) 2025-02-20 13:23:00 +07:00
Craig Topper
92ddbbd89f [CodeGen] Remove static member functions Register::stackSlot2Index/isStackSlot. NFC
Migrate the few users to the nonstatic member functions.
2025-02-19 21:54:43 -08:00
Akshat Oke
557628dbe6
[CodeGen][NewPM] Port RegAllocPriorityAdvisor analysis to NPM (#118462)
Similar to #117309.

The advisor and logger are accessed through the provider, which is
served by the new PM. Legacy PM forwards calls to the provider.
New PM is a machine function analysis that lazily initializes the
provider.
2025-02-20 09:35:49 +05:30
Matt Arsenault
37c341df28 Revert "AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)"
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.

This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
2025-02-20 10:19:14 +07:00
Benjamin Maxwell
f178e51747
[SDAG] Add missing ppc_fp128 ExpandFloatRes legalization for modf (#127895)
Should fix: https://lab.llvm.org/buildbot/#/builders/72/builds/8380

(`test_modf_ppcf128` is the test case that needed the additional
legalization)
2025-02-20 09:50:16 +07:00
Changpeng Fang
36eaf0daf5
AMDGPU: Don't canonicalize fminnum/fmaxnum if targets support IEEE fminimum(maximum)_num (#127711)
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
2025-02-19 11:16:43 -08:00
Kazu Hirata
af922cf9f7
[CodeGen] Avoid repeated hash lookups (NFC) (#127745) 2025-02-19 08:20:46 -08:00
Kazu Hirata
c23256ecbd
[AsmPrinter] Avoid repeated hash lookups (NFC) (#127744) 2025-02-19 08:20:21 -08:00
zhijian lin
1ac0db44fd
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF (#127713)
[NFC] using isUndef() instead of getOpcode() == ISD::UNDEF
2025-02-19 08:42:38 -05:00
Kazu Hirata
4405451a22
[AsmPrinter] Avoid repeated map lookups (NFC) (#127576) 2025-02-18 08:39:14 -08:00
Akshat Oke
519b53e65e
[CodeGen][NewPM] Port RegAllocEvictionAdvisor analysis to NPM (#117309)
Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.

All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.

There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.

Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
2025-02-18 18:55:06 +07:00
James Chesterman
d4a0848dc6
[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
2025-02-18 09:08:47 +00:00
Craig Topper
ef9f0b3c41
[DAGCombiner] Don't peek through truncates of shift amounts in takeInexpensiveLog2. (#126957)
Shift amounts in SelectionDAG don't have to match the result type
of the shift. SelectionDAGBuilder will aggressively truncate shift
amounts to the target's preferred type. This may result in a zero extend
that existed in IR being removed.
    
If we look through a truncate here, we can't guarantee the upper bits
of the truncate input are zero. There may have been a zext that was
removed. Unfortunately, this regresses tests where no truncate was
involved. The only way I can think to fix this is to add an assertzext
when SelectionDAGBuilder truncates a shift amount or remove the
early truncation of shift amounts from SelectionDAGBuilder all together.
    
Fixes #126889.
2025-02-17 20:26:05 -08:00
Matt Arsenault
ed38d6702f
PeepholeOpt: Handle subregister compose when looking through reg_sequence (#127051)
Previously this would give up on folding subregister copies through
a reg_sequence if the input operand already had a subregister index.
d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these
subregister uses, and this is the first step to lifting that restriction.

I was expecting to be able to implement this only purely with compose /
reverse compose, but I wasn't able to make it work so relies on testing
the lanemasks for whether the copy reads a subset of the input.
2025-02-18 08:07:29 +07:00
Kazu Hirata
153dd19e30
[SelectionDAG] Remove lowerCallToExternalSymbol (#127408)
The last use was removed in:

  commit 05e6bb40ebfd285cc87f7ce326b7ba76c3c7f870
  Author: Roger Ferrer Ibáñez <rofirrim@gmail.com>
  Date:   Thu May 30 14:55:32 2024 +0200
2025-02-17 00:06:48 -08:00
Kazu Hirata
0323554055
[GlobalISel] Avoid repeated hash lookups (NFC) (#127372) 2025-02-16 08:15:36 -08:00
Csanád Hajdú
a190f15d2b
[AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (1/3) (#125687)
Add support for the new SHF_AARCH64_PURECODE ELF section flag:
https://github.com/ARM-software/abi-aa/pull/304

The general implementation follows the existing one for ARM targets.
Generating object files with the `SHF_AARCH64_PURECODE` flag set is
enabled by the `+execute-only` target feature.

Related PRs:
* Clang: https://github.com/llvm/llvm-project/pull/125688
* LLD: https://github.com/llvm/llvm-project/pull/125689
2025-02-14 08:56:07 +00:00
Michael Buch
41f96f91cd
[llvm][DebugInfo] Emit DW_AT_const_value for float non-type template parameters (#127045)
In C++20, non-type template parameters can be float/double. Clang didn't
emit those constants in DWARF. This patch emits floating point constants
the same way we do other integral template value parameters.
2025-02-13 23:08:44 +00:00
Kazu Hirata
e7bf6a4e04
[CodeGen] Avoid repeated map lookups (NFC) (#127025) 2025-02-13 09:11:17 -08:00
Matt Arsenault
43780f4f92 RegAllocGreedy: Use Register type 2025-02-13 20:49:27 +07:00
Cullen Rhodes
9b2fc66830
[SDAG] Harden assumption in getMemsetStringVal (#126207)
In 5235973ee03aca4148ecabe5eff64da2af1e034e, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.

This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.

For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.

For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.

To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.

For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.

Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973ee03aca4148ecabe5eff64da2af1e034e to defend ICE on f128, but at
some point it stopped hitting this code.

AArch64TargetLowering::getOptimalMemOpType was updated in
29200611055f49a0d37243caa5f8bba1df9d57a6, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.

For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.

Fixes: https://github.com/llvm/llvm-project/issues/20521 [1]
2025-02-13 08:48:06 +00:00
Cullen Rhodes
df62441336
[MISched][NFC] Remove unused heuristic NextDefUse from enum (#125879)
Heuristic was removed in 46533e614b78 due to being ineffective.
2025-02-13 08:46:51 +00:00
Shubham Sandeep Rastogi
92f916faba
Add a pass to collect dropped var statistics for MIR (#126686)
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.

Namely:

1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.

2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h

The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.

The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
2025-02-12 14:08:18 -08:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
David Green
bf7af2d12e
[AArch64][DAG] Allow fptos/ui.sat to scalarized. (#126799)
We we previously running into problems with fp128 types and certain
integer sizes.

Fixes an issue reported on #124984
2025-02-12 11:04:08 +00:00
Craig Topper
7dd82805d5
[SelectionDAGBuilder] Remove NodeMap updates from getValueImpl. NFC (#126849)
Both callers already put the result in NodeMap immediately after the
call.
2025-02-12 00:07:07 -08:00
Haohai Wen
ec28e9b757
[MC] Replace MCContext::GenericSectionID with MCSection::NonUniqueID (#126202)
They have same semantics. NonUniqueID is more friendly for isUnique
implementation in MCSectionELF.

History: 97837b7 added support for unique IDs in sections and added
GenericSectionID. Later, 1dc16c7 added NonUniqueID.
2025-02-12 14:28:37 +08:00
Jim Lin
31bfae35d2
[DAGCombiner] Add hasOneUse checks for folding (not (add X, -1)) to (neg X) (#126667)
To get more better codegen for AArch with bic, x86 with andn and riscv
with andn.
2025-02-12 12:24:29 +08:00
Daniel Hoekwater
3a22cf9bd8
[CFIFixup] Fixup CFI for split functions with synchronous uwtables (#125299)
- **Precommit tests for synchronous uwtable CFI fixup**
- **[CFIFixup] Fixup CFI for split functions with synchronous uwtables**

Commit
6e54fccede
disables CFI fixup for
functions with synchronous tables, breaking CFI for split functions.
Instead, we can disable *block-level* CFI fixup for functions with
synchronous tables.

Unwind tables can be:
- N/A (not present)
- Asynchronous
- Synchronous

Functions without unwind tables don't need CFI fixup (since they don't
care about CFI).

Functions with asynchronous unwind tables must be accurate for each
basic block, so full CFI fixup is necessary.

Functions with synchronous unwind tables only need to be accurate for
each function (specifically, the portion of a function in a given
section). Disabling CFI fixup entirely for functions with synchronous
uwtables may break CFI for a function split between two sections. The
portion in the first section may have valid CFI, while the portion in
the second section is missing a call frame.

Ex:
```
(.text.hot)
Foo (BB1):
  <Call frame information>
  ...
BB2:
  ...

(.text.split)
BB3:
  ...
BB4:
  <epilogue>
```

Even if `Foo` has a synchronous unwind table, we still need to insert
call frame information into `BB3` so that unwinding the call stack from
`BB3` or `BB4` works properly.
2025-02-11 18:25:08 -05:00
Philip Reames
e4016bf5c3 [DAG] Use ArrayRef to simplify ShuffleVectorSDNode::isSplatMask 2025-02-11 12:47:10 -08:00
Benjamin Maxwell
19556eccf6
[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)
This makes the name more consistent with the other helpers.
2025-02-11 11:51:35 +00:00