37511 Commits

Author SHA1 Message Date
Tobias Stadler
1302610f03
[MergeFunc] Fix crash caused by bitcasting ArrayType (#133259)
createCast in MergeFunctions did not consider ArrayTypes, which results
in the creation of a bitcast between ArrayTypes in the thunk function,
leading to an assertion failure in the provided test case.

The version of createCast in GlobalMergeFunctions does handle
ArrayTypes, so this common code has been factored out into the
IRBuilder.
2025-04-04 10:16:40 +01:00
zhijian lin
1a540c3b8b
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-04-03 13:22:49 -04:00
Nikita Popov
efbbdd69c7
[ADT] Make DenseMap::init() private (NFC) (#134229)
I believe this method was not supposed to be public, as it has
additional preconditions (it will misbehave when called on a non-empty
DenseMap).

The public API for this is reserve().
2025-04-03 15:14:45 +02:00
David Green
6c27817294
[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717)
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
2025-04-03 11:14:08 +01:00
Hua Tian
7e65944292
[llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352)
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
2025-04-03 14:25:55 +08:00
LU-JOHN
6a46c6c865
Ensure KnownBits passed when calculating from range md has right size (#132985)
KnownBits passed to computeKnownBitsFromRangeMetadata must have the same
bit width as the range metadata bit width. Otherwise the calculated
results will be incorrect.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-04-03 10:17:14 +07:00
Sami Tolvanen
acc6bcdc50
Support alternative sections for patchable function entries (#131230)
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.

The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.

Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:

https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
2025-04-02 21:53:55 +00:00
Ryan Buchner
fa2a6d68c6
[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922)
Fixes #130510.

In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for
cases where the (X ^ Y) will be re-used.

If a constant is being used for the XOR before a branch, ensure that it
is small enough to fit within a 12-bit immediate field. Otherwise, the
equality check is more efficient than the check against 0, see the
following:
```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        xor     a0, a0, a1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        beq    a0, a1, .LBB0_2
# %bb.1: 
        xor     a0, a0, a1
        ret
.LBB0_2: 
```

Similarly, if the XOR is between 1 and a size one integer, we should
still fold away the XOR since that comparison can be optimized as a
comparison against 0.
```
# %bb.0:
        slt a0, a0, a1
        xor  a0, a0, 1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        slt a0, a0, a1
        bnez    a0, .LBB0_2
# %bb.1: 
        xor  a0, a0, 1
        ret
.LBB0_2: 
```

One question about my code is that I used a hard-coded value for the
width of a RISCV ALU immediate. Do you know of a way that I can gather
this from the `context`, I was unable to devise one.
2025-04-02 09:56:09 -07:00
Nikita Popov
9356091a98
[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801)
The llvm.metadata section is not emitted and has special semantics. We
should not merge globals in it, similarly to how we already skip merging
of `llvm.xyz` globals.

Fixes https://github.com/llvm/llvm-project/issues/131394.
2025-04-02 10:40:53 +02:00
Petr Hosek
4b19db6db9
Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935)
Reverts llvm/llvm-project#132684
2025-04-01 09:39:07 -07:00
Jeremy Morse
1ebc308bba
[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.

We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.

There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.

A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.

Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
2025-04-01 14:27:11 +01:00
Akshat Oke
4a68702455
[CodeGen][NPM] Port XRayInstrumentation to NPM (#129865) 2025-04-01 15:38:49 +05:30
Afanasyev Ivan
337bad3921
[EarlyIfConverter] Fix reg killed twice after early-if-predicator and ifcvt (#133554)
Bug relates to `early-if-predicator` and `early-ifcvt` passes. If
virtual register has "killed" flag in both basic blocks to be merged
into head, both instructions in head basic block will have "killed" flag
for this register. It makes MIR incorrect.

Example:

```
  bb.0: ; if
    ...
    %0:intregs = COPY $r0
    J2_jumpf %2, %bb.2, implicit-def dead $pc
    J2_jump %bb.1, implicit-def dead $pc

  bb.1: ; if.then
    ...
    S4_storeiri_io killed %0, 0, 1
    J2_jump %bb.3, implicit-def dead $pc

  bb.2: ; if.else
    ...
    S4_storeiri_io killed %0, 0, 1
    J2_jump %bb.3, implicit-def dead $pc
```

After early-if-predicator will become:

```
  bb.0:
    %0:intregs = COPY $r0
    S4_storeirif_io %1, killed %0, 0, 1
    S4_storeirit_io %1, killed %0, 0, 1
```

Having `killed` flag set twice in bb.0 for `%0` is an incorrect MIR.
2025-04-01 12:06:30 +02:00
Fangrui Song
dd862356e2
AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function
https://reviews.llvm.org/D17938 introduced lowerRelativeReference to
give ConstantExpr sub (A-B) special semantics in ELF: when `A` is an
`unnamed_addr` function, create a PLT-generating relocation. This was
intended for C++ relative vtables, but C++ relative vtable ended up
using DSOLocalEquivalent (lowerDSOLocalEquivalent).

This special treatment of `unnamed_addr` seems unusual.
Let's remove it. Only COFF needs an overload to generate a @IMGREL32
relocation specifier (llvm/test/MC/COFF/cross-section-relative.ll).

Pull Request: https://github.com/llvm/llvm-project/pull/132684
2025-03-31 20:44:29 -07:00
3405691582
c180e249d0
Fix crash lowering stack guard on OpenBSD/aarch64. (#125416)
TargetLoweringBase::getIRStackGuard refers to a platform-specific guard
variable. Before this change, TargetLoweringBase::getSDagStackGuard only
referred to a different variable.

This means that SelectionDAGBuilder's getLoadStackGuard does not get
memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes
that the passed MachineInstr has nonzero memoperands, causing a
segfault.

We have two possible options here: either disabling the LOAD_STACK_GUARD
node entirely in AArch64TargetLowering::useLoadStackGuardNode or just
making the platform-specific values match across TargetLoweringBase.
Here, we try the latter.
2025-03-31 09:17:55 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Tom Tromey
68947342b7
Add support for fixed-point types (#129596)
This adds DWARF generation for fixed-point types. This feature is needed
by Ada.

Note that a pre-existing GNU extension is used in one case. This has
been emitted by GCC for years, and is needed because standard DWARF is
otherwise incapable of representing these types.
2025-03-31 07:42:21 -07:00
Simon Pilgrim
9b32f3d096
[DAG] visitEXTRACT_SUBVECTOR - don't return early on failure of EXTRACT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (#133695)
Always allow later folds to try to match as well.
2025-03-31 14:32:43 +01:00
Liqiang TAO
1f7f268f30
StackProtector: use isInTailCallPosition to verify tail call position (#68997)
The issue is caused by [D133860](https://reviews.llvm.org/D133860).
The guard would be inserted in wrong place in some cases, like the test
case showed below.
This patch fixed the issue by using `isInTailCallPosition()` to verify
whether the tail call is in right position.
2025-03-30 11:21:19 -07:00
Mingming Liu
9747bb182f
[CodeGen][StaticDataSplitter]Support constant pool partitioning (#129781)
This is a follow-up patch of
https://github.com/llvm/llvm-project/pull/125756

In this PR, static-data-splitter pass produces the aggregated profile
counts of constants for constant pools in a global state
(`StateDataProfileInfo`), and asm printer consumes the profile counts to
produce `.hot` or `.unlikely` prefixes.

This implementation covers both x86 and aarch64 asm printer.
2025-03-29 22:07:56 -07:00
Kazu Hirata
e3a3f78f35
[CodeGen] Use llvm::append_range (NFC) (#133603) 2025-03-29 16:53:02 -07:00
Fangrui Song
fe6fb910df
[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: https://github.com/llvm/llvm-project/pull/132569
2025-03-29 11:08:13 -07:00
Simon Pilgrim
666faa7fd9
[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (REAPPLIED) (#133401)
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.

Second try after #133130 was reverted by #133331 due to it affecting reverted test files
2025-03-29 17:55:38 +00:00
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
Kazu Hirata
f915015a3e
[llvm] Remove extraneous calls to make_range (NFC) (#133551) 2025-03-28 19:56:02 -07:00
Kazu Hirata
d4427f308e
[llvm] Use range constructors of *Set (NFC) (#133549) 2025-03-28 19:55:18 -07:00
Mingming Liu
c8a70f4c6e
[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information (#125756)
In this PR, static-data-splitter pass finds out the local-linkage global
variables in {`.rodata`, `.data.rel.ro`, `bss`, `.data`} sections by
analyzing machine instruction operands, and aggregates their accesses
from code across functions.

A follow-up item is to analyze global variable initializers and count
for access from data.
* This limitation is demonstrated by `bss2` and `data3` in
`llvm/test/CodeGen/X86/global-variable-partition.ll`.

Some stats of static-data-splitter with this patch:

**section**|**bss**|**rodata**|**data**
:-----:|:-----:|:-----:|:-----:
hot-prefixed section coverage|99.75%|97.71%|91.30%
unlikely-prefixed section size percentage|67.94%|39.37%|63.10%

1. The coverage is defined as `#perf-sample-in-hot-prefixed <data>
section / #perf-sample in <data.*> section` for each <data> section.
* The perf command samples
`MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2` events at a high
frequency (`perf -c 2251`) for 30 seconds. The profiled binary is built
as non-PIE so `data.rel.ro` coverage data is not available.
2. The unlikely-prefixed `<data>` section size percentage is defined as
`unlikely <data> section size / the sum size of <data>.* sections` for
each `<data>` section
2025-03-28 16:31:46 -07:00
Kazu Hirata
673f4705a8
[llvm] Use *Set::insert_range (NFC) (#133353)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E.first);

down to:

  Set.insert_range(llvm::make_first_range(Range));

In some cases, we can further fold that into the set declaration.
2025-03-27 20:44:20 -07:00
Walter Lee
5b7fd708fe
Revert "[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses" (#133331)
Reverts llvm/llvm-project#133130

This touches a common file as #133083, which is causing failures
2025-03-27 18:36:38 -04:00
Philip Reames
c90a536bcf [CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc]
These were added in d584cea.  This change runs through existing uses and
simplifies where obvious.
2025-03-27 11:47:51 -07:00
Simon Pilgrim
a8575b3ea8
[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (#133130)
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a
vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded
elts across all users and call SimplifyDemandedVectorElts in
"AssumeSingleUse" use.
2025-03-27 15:31:06 +00:00
LU-JOHN
2df25a4733
Invalidate range metadata when folding bitcast into load (#133095) 2025-03-27 14:10:55 +07:00
Philip Reames
79e82b6f14
[RISCV] Use a precise size for MMO on scalable spill and fill (#133171)
The primary effect of this is that we get proper scalable sizes printed
by the assembler, but this may also enable proper aliasing analysis. I
don't see any test changes resulting from the later.

Getting the size is slightly tricky as we store the scalable size as a
non-scalable quantity in the object size field for the frame index. We
really should remove that hack at some point...

For the synthetic tuple spills and fills, I dropped the size from the
split loads and stores to avoid incorrect (overly large) sizes. We could
also divide by the NF factor if we felt like writing the code to do so.
2025-03-26 18:25:59 -07:00
Ethan Kaji
a629b50575
Port NVPTXTargetLowering::LowerCONCAT_VECTORS to SelectionDAG (#120030)
Ports `NVPTXTargetLowering::LowerCONCAT_VECTORS` to
`llvm/lib/CodeGen/SelectionDAG` as requested in
https://github.com/llvm/llvm-project/issues/116695.
2025-03-27 07:40:35 +07:00
Craig Topper
6075275e68 [AsmPrinter] Don't pass Twine by value. NFC 2025-03-26 15:15:12 -07:00
Philip Reames
236f938ef6 [CodeGen] Provide a target independent default for optimizeLoadInst [NFC]
This just moves the x86 implementation into generic code since it appears
to be suitable for any target.  The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API.  This just removes one piece to implement in the
process of enabling foldMemoryOperand.
2025-03-26 08:52:40 -07:00
dianqk
66f158d918
[TailDuplicator] Determine if computed gotos using blockaddress (#132536)
Using `blockaddress` should be more reliable than determining if an
operand comes from a jump table index.

Alternative: Add the `MachineInstr::MIFlag::ComputedGoto` flag when
lowering `indirectbr`. But I don't think this approach is suitable to
backport.
2025-03-26 21:27:43 +08:00
Tom Tromey
f89129af8a
Add bit stride to DICompositeType (#131680)
In Ada, an array can be packed and the elements can take less space than
their natural object size. For example, for this type:

   type Packed_Array is array (4 .. 8) of Boolean;
   pragma pack (Packed_Array);

... each element of the array occupies a single bit, even though the
"natural" size for a Boolean in memory is a byte.

In DWARF, this is represented by putting a DW_AT_bit_stride onto the
array type itself.

This patch adds a bit stride to DICompositeType so that gnat-llvm can
emit DWARF for these sorts of arrays.
2025-03-25 17:14:07 -07:00
LU-JOHN
70aeb89094
Calculate KnownBits from Metadata correctly for vector loads (#128908)
Calculate KnownBits correctly from metadata for vector loads.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-03-25 22:46:30 +07:00
Jonathan Cohen
6785951410
[Machine-Combiner] Add a pass to reassociate chains of accumulation instructions into a tree (#132728)
This pass is designed to increase ILP by performing accumulation into
multiple registers. It currently supports only the S/UABAL accumulation
instruction, but can be extended to support additional instructions.

Reland of  #126060 which was reverted due to a conflict with #131272.
2025-03-25 15:58:20 +02:00
Simon Pilgrim
0237216f16
[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745)
Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds
2025-03-24 16:03:47 +00:00
Kazu Hirata
1904241a9e
[CodeGen] Avoid repeated hash lookups (NFC) (#132658) 2025-03-24 07:46:35 -07:00
Pierre van Houtryve
c457c88951
[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622)
Split from #131312
2025-03-24 09:32:04 +01:00
Pierre van Houtryve
6e3c24fc0a
[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386) 2025-03-24 09:31:02 +01:00
Antonio Frighetto
ade2276517 [RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills
We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad
that `INLINEASM_BR` output operands get spilled onto the stack, both
in the fallthrough path and in the indirect targets. Since reloads of
live-ins values into physical registers contextually happen after all
MIR instructions (and ops) have been visited, make sure such loads are
placed at the start of the block, but after prologues or `INLINEASM_BR`
spills, as otherwise this may cause stale values to be read from the
stack.

Fixes: #74483, #110251.
2025-03-24 09:19:53 +01:00
Fangrui Song
7e6d008023 AsmPrinter: Remove unneeded lowerRelativeReference overrides
The function is only called by AsmPrinter, where there is a fallback
when lowerRelativeReference returns nullptr.

wasm and XCOFF could use the fallback code.

(lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938)
for C++ relative vtables, but C++ relative vtables ended up using
dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also
uses this.)
2025-03-23 23:58:41 -07:00
Akshat Oke
174110bf3c
[CodeGen][NPM] Port LiveDebugValues to NPM (#131563) 2025-03-24 11:34:45 +05:30
Kazu Hirata
1019457891
[CodeGen] Use *Set::insert_range (NFC) (#132651)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);
2025-03-23 21:20:44 -07:00
Mingming Liu
3b20ac00f9
[NFC]Don't use else after a return (#132644)
A trivial code clean-up per
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
2025-03-23 18:34:52 -07:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00