37497 Commits

Author SHA1 Message Date
3405691582
c180e249d0
Fix crash lowering stack guard on OpenBSD/aarch64. (#125416)
TargetLoweringBase::getIRStackGuard refers to a platform-specific guard
variable. Before this change, TargetLoweringBase::getSDagStackGuard only
referred to a different variable.

This means that SelectionDAGBuilder's getLoadStackGuard does not get
memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes
that the passed MachineInstr has nonzero memoperands, causing a
segfault.

We have two possible options here: either disabling the LOAD_STACK_GUARD
node entirely in AArch64TargetLowering::useLoadStackGuardNode or just
making the platform-specific values match across TargetLoweringBase.
Here, we try the latter.
2025-03-31 09:17:55 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Tom Tromey
68947342b7
Add support for fixed-point types (#129596)
This adds DWARF generation for fixed-point types. This feature is needed
by Ada.

Note that a pre-existing GNU extension is used in one case. This has
been emitted by GCC for years, and is needed because standard DWARF is
otherwise incapable of representing these types.
2025-03-31 07:42:21 -07:00
Simon Pilgrim
9b32f3d096
[DAG] visitEXTRACT_SUBVECTOR - don't return early on failure of EXTRACT_SUBVECTOR(INSERT_SUBVECTOR()) -> BITCAST fold (#133695)
Always allow later folds to try to match as well.
2025-03-31 14:32:43 +01:00
Liqiang TAO
1f7f268f30
StackProtector: use isInTailCallPosition to verify tail call position (#68997)
The issue is caused by [D133860](https://reviews.llvm.org/D133860).
The guard would be inserted in wrong place in some cases, like the test
case showed below.
This patch fixed the issue by using `isInTailCallPosition()` to verify
whether the tail call is in right position.
2025-03-30 11:21:19 -07:00
Mingming Liu
9747bb182f
[CodeGen][StaticDataSplitter]Support constant pool partitioning (#129781)
This is a follow-up patch of
https://github.com/llvm/llvm-project/pull/125756

In this PR, static-data-splitter pass produces the aggregated profile
counts of constants for constant pools in a global state
(`StateDataProfileInfo`), and asm printer consumes the profile counts to
produce `.hot` or `.unlikely` prefixes.

This implementation covers both x86 and aarch64 asm printer.
2025-03-29 22:07:56 -07:00
Kazu Hirata
e3a3f78f35
[CodeGen] Use llvm::append_range (NFC) (#133603) 2025-03-29 16:53:02 -07:00
Fangrui Song
fe6fb910df
[RISCV] Replace @plt/@gotpcrel in data directives with %pltpcrel %gotpcrel
clang -fexperimental-relative-c++-abi-vtables might generate `@plt` and
`@gotpcrel` specifiers in data directives. The syntax is not used in
humand-written assembly code, and is not supported by GNU assembler.
Note: the `@plt` in `.word foo@plt` is different from
the legacy `call func@plt` (where `@plt` is simply ignored).

The `@plt` syntax was selected was simply due to a quirk of AsmParser:
the syntax was supported by all targets until I updated it
to be an opt-in feature in a0671758eb6e52a758bd1b096a9b421eec60204c

RISC-V favors the `%specifier(expr)` syntax following MIPS and Sparc,
and we should follow this convention.

This PR adds support for `.word %pltpcrel(foo+offset)` and
`.word %gotpcrel(foo)`, and drops `@plt` and `@gotpcrel`.

* MCValue::SymA can no longer have a SymbolVariant. Add an assert
  similar to that of AArch64ELFObjectWriter.cpp before
  https://reviews.llvm.org/D81446 (see my analysis at
  https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers
  if intrigued)
* `jump foo@plt, x31` now has a different diagnostic.

Pull Request: https://github.com/llvm/llvm-project/pull/132569
2025-03-29 11:08:13 -07:00
Simon Pilgrim
666faa7fd9
[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (REAPPLIED) (#133401)
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded elts across all users and call SimplifyDemandedVectorElts in "AssumeSingleUse" use.

Second try after #133130 was reverted by #133331 due to it affecting reverted test files
2025-03-29 17:55:38 +00:00
Tim Gymnich
1d0005a69a
[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than
just `KnownBits` in the future
2025-03-29 11:51:29 +01:00
Kazu Hirata
f915015a3e
[llvm] Remove extraneous calls to make_range (NFC) (#133551) 2025-03-28 19:56:02 -07:00
Kazu Hirata
d4427f308e
[llvm] Use range constructors of *Set (NFC) (#133549) 2025-03-28 19:55:18 -07:00
Mingming Liu
c8a70f4c6e
[CodeGen][StaticDataPartitioning]Place local-linkage global variables in hot or unlikely prefixed sections based on profile information (#125756)
In this PR, static-data-splitter pass finds out the local-linkage global
variables in {`.rodata`, `.data.rel.ro`, `bss`, `.data`} sections by
analyzing machine instruction operands, and aggregates their accesses
from code across functions.

A follow-up item is to analyze global variable initializers and count
for access from data.
* This limitation is demonstrated by `bss2` and `data3` in
`llvm/test/CodeGen/X86/global-variable-partition.ll`.

Some stats of static-data-splitter with this patch:

**section**|**bss**|**rodata**|**data**
:-----:|:-----:|:-----:|:-----:
hot-prefixed section coverage|99.75%|97.71%|91.30%
unlikely-prefixed section size percentage|67.94%|39.37%|63.10%

1. The coverage is defined as `#perf-sample-in-hot-prefixed <data>
section / #perf-sample in <data.*> section` for each <data> section.
* The perf command samples
`MEM_INST_RETIRED.ALL_LOADS:u:pinned:precise=2` events at a high
frequency (`perf -c 2251`) for 30 seconds. The profiled binary is built
as non-PIE so `data.rel.ro` coverage data is not available.
2. The unlikely-prefixed `<data>` section size percentage is defined as
`unlikely <data> section size / the sum size of <data>.* sections` for
each `<data>` section
2025-03-28 16:31:46 -07:00
Kazu Hirata
673f4705a8
[llvm] Use *Set::insert_range (NFC) (#133353)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E.first);

down to:

  Set.insert_range(llvm::make_first_range(Range));

In some cases, we can further fold that into the set declaration.
2025-03-27 20:44:20 -07:00
Walter Lee
5b7fd708fe
Revert "[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses" (#133331)
Reverts llvm/llvm-project#133130

This touches a common file as #133083, which is causing failures
2025-03-27 18:36:38 -04:00
Philip Reames
c90a536bcf [CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc]
These were added in d584cea.  This change runs through existing uses and
simplifies where obvious.
2025-03-27 11:47:51 -07:00
Simon Pilgrim
a8575b3ea8
[DAG] visitEXTRACT_SUBVECTOR - accumulate SimplifyDemandedVectorElts demanded elts across all EXTRACT_SUBVECTOR uses (#133130)
Similar to what is done for visitEXTRACT_VECTOR_ELT - if all uses of a
vector are EXTRACT_SUBVECTOR, then determine the accumulated demanded
elts across all users and call SimplifyDemandedVectorElts in
"AssumeSingleUse" use.
2025-03-27 15:31:06 +00:00
LU-JOHN
2df25a4733
Invalidate range metadata when folding bitcast into load (#133095) 2025-03-27 14:10:55 +07:00
Philip Reames
79e82b6f14
[RISCV] Use a precise size for MMO on scalable spill and fill (#133171)
The primary effect of this is that we get proper scalable sizes printed
by the assembler, but this may also enable proper aliasing analysis. I
don't see any test changes resulting from the later.

Getting the size is slightly tricky as we store the scalable size as a
non-scalable quantity in the object size field for the frame index. We
really should remove that hack at some point...

For the synthetic tuple spills and fills, I dropped the size from the
split loads and stores to avoid incorrect (overly large) sizes. We could
also divide by the NF factor if we felt like writing the code to do so.
2025-03-26 18:25:59 -07:00
Ethan Kaji
a629b50575
Port NVPTXTargetLowering::LowerCONCAT_VECTORS to SelectionDAG (#120030)
Ports `NVPTXTargetLowering::LowerCONCAT_VECTORS` to
`llvm/lib/CodeGen/SelectionDAG` as requested in
https://github.com/llvm/llvm-project/issues/116695.
2025-03-27 07:40:35 +07:00
Craig Topper
6075275e68 [AsmPrinter] Don't pass Twine by value. NFC 2025-03-26 15:15:12 -07:00
Philip Reames
236f938ef6 [CodeGen] Provide a target independent default for optimizeLoadInst [NFC]
This just moves the x86 implementation into generic code since it appears
to be suitable for any target.  The heart of this transform is inside
foldMemoryOperand so other targets won't actually kick in until they
implement said API.  This just removes one piece to implement in the
process of enabling foldMemoryOperand.
2025-03-26 08:52:40 -07:00
dianqk
66f158d918
[TailDuplicator] Determine if computed gotos using blockaddress (#132536)
Using `blockaddress` should be more reliable than determining if an
operand comes from a jump table index.

Alternative: Add the `MachineInstr::MIFlag::ComputedGoto` flag when
lowering `indirectbr`. But I don't think this approach is suitable to
backport.
2025-03-26 21:27:43 +08:00
Tom Tromey
f89129af8a
Add bit stride to DICompositeType (#131680)
In Ada, an array can be packed and the elements can take less space than
their natural object size. For example, for this type:

   type Packed_Array is array (4 .. 8) of Boolean;
   pragma pack (Packed_Array);

... each element of the array occupies a single bit, even though the
"natural" size for a Boolean in memory is a byte.

In DWARF, this is represented by putting a DW_AT_bit_stride onto the
array type itself.

This patch adds a bit stride to DICompositeType so that gnat-llvm can
emit DWARF for these sorts of arrays.
2025-03-25 17:14:07 -07:00
LU-JOHN
70aeb89094
Calculate KnownBits from Metadata correctly for vector loads (#128908)
Calculate KnownBits correctly from metadata for vector loads.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-03-25 22:46:30 +07:00
Jonathan Cohen
6785951410
[Machine-Combiner] Add a pass to reassociate chains of accumulation instructions into a tree (#132728)
This pass is designed to increase ILP by performing accumulation into
multiple registers. It currently supports only the S/UABAL accumulation
instruction, but can be extended to support additional instructions.

Reland of  #126060 which was reverted due to a conflict with #131272.
2025-03-25 15:58:20 +02:00
Simon Pilgrim
0237216f16
[DAG] canCreateUndefOrPoison - add EXTRACT_SUBVECTOR handling (#132745)
Similar to INSERT_SUBVECTOR - the index is constant and will be inbounds
2025-03-24 16:03:47 +00:00
Kazu Hirata
1904241a9e
[CodeGen] Avoid repeated hash lookups (NFC) (#132658) 2025-03-24 07:46:35 -07:00
Pierre van Houtryve
c457c88951
[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622)
Split from #131312
2025-03-24 09:32:04 +01:00
Pierre van Houtryve
6e3c24fc0a
[DAG] Combine (sext (sext_in_reg x)) to (sext_in_reg (any_extend x)) (#132386) 2025-03-24 09:31:02 +01:00
Antonio Frighetto
ade2276517 [RegAllocFast] Ensure live-in vregs get reloaded after INLINEASM_BR spills
We have already ensured in 9cec2b246e719533723562950e56c292fe5dd5ad
that `INLINEASM_BR` output operands get spilled onto the stack, both
in the fallthrough path and in the indirect targets. Since reloads of
live-ins values into physical registers contextually happen after all
MIR instructions (and ops) have been visited, make sure such loads are
placed at the start of the block, but after prologues or `INLINEASM_BR`
spills, as otherwise this may cause stale values to be read from the
stack.

Fixes: #74483, #110251.
2025-03-24 09:19:53 +01:00
Fangrui Song
7e6d008023 AsmPrinter: Remove unneeded lowerRelativeReference overrides
The function is only called by AsmPrinter, where there is a fallback
when lowerRelativeReference returns nullptr.

wasm and XCOFF could use the fallback code.

(lowerRelativeReference was introduced in 2016 (https://reviews.llvm.org/D17938)
for C++ relative vtables, but C++ relative vtables ended up using
dso_local_equivalent. llvm/test/MC/COFF/cross-section-relative.ll also
uses this.)
2025-03-23 23:58:41 -07:00
Akshat Oke
174110bf3c
[CodeGen][NPM] Port LiveDebugValues to NPM (#131563) 2025-03-24 11:34:45 +05:30
Kazu Hirata
1019457891
[CodeGen] Use *Set::insert_range (NFC) (#132651)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E);

down to:

  Set.insert_range(Range);
2025-03-23 21:20:44 -07:00
Mingming Liu
3b20ac00f9
[NFC]Don't use else after a return (#132644)
A trivial code clean-up per
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
2025-03-23 18:34:52 -07:00
Kazu Hirata
41b76119ec
[llvm] Use range constructors for *Set (NFC) (#132636) 2025-03-23 15:50:34 -07:00
Fangrui Song
dfae1f968e MCValue: Simplify code with getSubSym 2025-03-23 12:22:44 -07:00
Fangrui Song
b73e144bdf MCValue: Simplify code with getSubSym
MCValue::SymB is a MCSymbolRefExpr *, which might become MCSymbol * in
the future. Simplify some code that uses MCValue::SymB.
2025-03-23 12:13:13 -07:00
Jonathan Cohen
7bda9caa49
Revert "[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060) (#132607)
This reverts commit c4caf949aa934a219e84d4ba0530bd535e698cdb.
2025-03-23 13:58:00 +02:00
Jonathan Cohen
c4caf949aa
[AArch64][MachineCombiner] Recombine long chains of accumulation instructions into a tree to increase ILP (#126060)
This pattern shows up often in media libraries. The optimization should only
kick in for O3. Currently only supports a single family of accumulation
instructions, but can easily be expanded to support additional
instructions in the future.
2025-03-23 13:25:35 +02:00
Kazu Hirata
f3e8e80563
[llvm] Construct SmallVector with ArrayRef (NFC) (#132560) 2025-03-22 13:11:31 -07:00
Kazu Hirata
cb729be11c
[CodeGen] Avoid repeated hash lookups (NFC) (#132513) 2025-03-22 08:08:28 -07:00
Kazu Hirata
1b189cab5e
[llvm] Use *Set::insert_range (NFC) (#132509)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch uses insert_range in
conjunction with llvm::{predecessors,successors} and
MachineBasicBlock::{predecessors,successors}.
2025-03-22 08:07:33 -07:00
Mikhail R. Gadelha
f138e36d52
[SelectionDAG][RISCV] Avoid store merging across function calls (#130430)
This patch improves DAGCombiner's handling of potential store merges by
detecting function calls between loads and stores. When a function call
exists in the chain between a load and its corresponding store, we avoid
merging these stores if the spilling is unprofitable.

We had to implement a hook on TLI, since TTI is unavailable in
DAGCombine. Currently, it's only enabled for riscv.

This is the DAG equivalent of PR #129258
2025-03-22 10:35:25 -03:00
Fangrui Song
4b417992dd [CodeGen] Rename PLTRelativeVariantKind. NFC
Migrate away from the deprecated MCSymbolRefExpr::VariantKind.
The name "Specifier" is utilized in a few *MCExpr.

> "Relocation specifier" is clear, aligns with Arm and IBM AIX's documentation, and fits the assembler's role seamlessly.
2025-03-21 23:02:08 -07:00
pzzp
d6a2cca77e
[llvm:ir] Add support for constant data exceeding 4GiB (#126481)
The test file is over 4GiB, which is too big, so I didn’t submit it.
2025-03-21 11:44:01 -07:00
Tony Varghese
ff9c5c334a
[shrinkwrap] PowerPC's FP register should be honored when processing the save point for prologue. (#129855)
When generating code for functions that have `__builtin_frame_address`
calls and `noinline` attribute, prologue was not emitted correctly
leading to an assertion failure in PowerPC. The issue was due to
improper insertion of prologue for a function that contain llvm
`__builtin_frame_address`.
Shrink-wrap pass computes the save and restore points of a function.
Default points are the entry and exit points of the function. During
shrink-wrapping the frame-pointer was not honored like the stack pointer
and it was considered as a callee-saved register. This change will treat
the FP similar to SP and will insert the prolog on top the instruction
containing FP.

---------

Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
2025-03-21 12:55:39 -04:00
Kazu Hirata
67a631b406
[CodeGen] Avoid repeated hash lookups (NFC) (#132329) 2025-03-21 08:00:45 -07:00
Ryotaro Kasuga
857a04cd76
[MachinePipeliner] Fix incorrect handlings of unpipelineable insts (#126057)
There was a case where `normalizeNonPipelinedInstructions` didn't
schedule unpipelineable instructions correctly, which could generate
illegal code. This patch fixes this issue by rejecting the schedule if
we fail to insert the unpipelineable instructions at stage 0.

Here is a part of the debug output for `sms-unpipeline-insts3.mir`
before applying this patch.

```
SU(0):   %27:gpr32 = PHI %21:gpr32all, %bb.3, %28:gpr32all, %bb.4
  Successors:
    SU(14): Data Latency=0 Reg=%27
    SU(15): Anti Latency=1

...

SU(14):   %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common
  Predecessors:
    SU(0): Data Latency=0 Reg=%27
    SU(16): Ord  Latency=0 Artificial
  Successors:
    SU(15): Data Latency=1 Reg=%41
SU(15):   %28:gpr32all = COPY %41:gpr32
  Predecessors:
    SU(14): Data Latency=1 Reg=%41
    SU(0): Anti Latency=1
SU(16):   %30:ppr = WHILELO_PWW_S %27:gpr32, %15:gpr32, implicit-def $nzcv
  Predecessors:
    SU(0): Data Latency=0 Reg=%27
  Successors:
    SU(14): Ord  Latency=0 Artificial

...

Do not pipeline SU(16)
Do not pipeline SU(1)
Do not pipeline SU(0)
Do not pipeline SU(15)
Do not pipeline SU(14)
SU(0) is not pipelined; moving from cycle 19 to 0 Instr: ...
SU(1) is not pipelined; moving from cycle 10 to 0 Instr: ...
SU(15) is not pipelined; moving from cycle 28 to 19 Instr: ...
SU(16) is not pipelined; moving from cycle 19 to 0 Instr: ...
Schedule Found? 1 (II=10)

...

cycle 9 (1) (14) %41:gpr32 = ADDWrr %27:gpr32, %12:gpr32common

cycle 9 (1) (15) %28:gpr32all = COPY %41:gpr32
```

The SUs are traversed in the order of the original basic block, so in
this case a new cycle of each instruction is determined in the order of
`SU(0)`, `SU(1)`, `SU(14)`, `SU(15)`, `SU(16)`. Since there is an
artificial dependence from `SU(16)` to `SU(14)`, which is contradict to
the original SU order, the new cycle of `SU(14)` must be greater than or
equal to the cycle of `SU(16)` at that time. This results in the failure
of scheduling `SU(14)` at stage 0. For now, we reject the schedule for
such cases.
2025-03-21 23:07:41 +09:00
Kazu Hirata
599005686a
[llvm] Use *Set::insert_range (NFC) (#132325)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:06 -07:00