38594 Commits

Author SHA1 Message Date
Matt Arsenault
ad8f6b44be
DAG: Avoid some libcall string name comparisons (#166321)
Move to the libcall impl based functions.
2025-11-05 07:09:02 -08:00
Santanu Das
63d6e3eb46
[DebugInfo] Assign best possible debugloc to bundle (#164573)
The debug info attached to the BUNDLE is the first instruction in the
BUNDLE, even if a better debug info (line:column) is present in the
later instructions of the bundle. The patch tries to get a better debug
info first. If not, then a worse debug info without line number is
chosen.

---------

Co-authored-by: Vladislav Dzhidzhoev <dzhidzhoev@gmail.com>
Co-authored-by: Orlando Cazalet-Hyams <orlandoch.och@gmail.com>
2025-11-05 20:26:00 +05:30
Jan Patrick Lehr
833983918d
Revert "CodeGen: Record MMOs in finalizeBundle" (#166520)
Reverts llvm/llvm-project#166210

Buildbot failures in the libc on GPU bot:
https://lab.llvm.org/buildbot/#/builders/10/builds/16711
2025-11-05 11:11:08 +01:00
Nicolai Hähnle
304d2ff4d9
CodeGen: Record MMOs in finalizeBundle (#166210)
This allows more accurate alias analysis to apply at the bundle level.
This has a bunch of minor effects in post-RA scheduling that look mostly
beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic).

The pre-existing (and unchanged) test in
CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a
bundle with MMOs can be parsed successfully.

v2:
- use cloneMergedMemRefs
- add another test to explicitly check the MMO bundling behavior

v3:
- use poison instead of undef to initialize the global variable in the
test
2025-11-05 06:56:19 +00:00
Vigneshwar Jayakumar
b5f200129a
[CodeGen] Register-coalescer remat fix subreg liveness (#165662)
This is a bugfix in rematerialization where the liveness of subreg mask
was incorrectly updated causing crash in scheduler.
2025-11-04 22:40:40 -06:00
Abhay Kanhere
d998f92a00
[CodeGen] MachineVerifier to check early-clobber constraint (#151421)
Currently MachineVerifier is missing verifying early-clobber operand
constraint.
The only other machine operand constraint -  TiedTo is already verified.
2025-11-04 18:39:31 -08:00
Nicolai Hähnle
d6fdfe0a27
CodeGen: Record tied virtual register operands in finalizeBundle (#166209)
This is in preparation of a future AMDGPU change where we are going to
create bundles before register allocation and want to rely on the
TwoAddressInstructionPass handling those bundles correctly.

v2:
- simplify the virtual register check and the test
2025-11-05 02:18:39 +00:00
Jin Huang
fa5cd27ef0
[profcheck] Add unknown branch weights to expand LL/SR loop. (#166273)
As a follow-up to PR#165841, this change addresses `prof_md` metadata
loss in AtomicExpandPass when lowering `atomicrmw xchg` to a
Load-Linked/Store-Exclusive (LL/SC) loop.

This path is distinct from the LSE path addressed previously:

PR #165841 (and its tests) used `-mtriple=aarch64-linux-gnu`, which
targets a modern **ARMv8.1+** architecture. This architecture supports
**Large System Extensions (LSE)**, allowing `atomicrmw` to be lowered
directly to a more efficient hardware instruction.

This PR (and its tests) uses `-mtriple=aarch64--` or
`-mtriple=armv8-linux-gnueabihf`. This indicates an `ARMv8.0 or lower
architecture that does not support LSE`. On these targets, the pass must
fall back to synthesizing a manual LL/SC loop using the `ldaxr/stxr`
instruction pair.

Similar to previous issue, the new conditional branch was failin to
inherit the `prof_md` metadata. Theis PR correctly fix the branch
weights to the newly created branch within the LL/SC loop, ensuring
profile information is preserved.

Co-authored-by: Jin Huang <jingold@google.com>
2025-11-04 16:23:34 -08:00
Min-Yih Hsu
6d4e75cc93
[MISched][NFC] Rename isUnbufferedGroup to isReservedGroup (#166439)
In both ScheduleDAGInstrs and MachineScheduler, we call `BufferSize = 0`
as _reserved_ and `BufferSize = 1` as _unbuffered_. This convention is
stem from the fact that we set `SUnit::hasReservedResource` to true when
any of the SUnit's consumed resources has BufferSize equal to zero; set
`SUnit::isUnbuffered` to true when any of its consumed resources has
BufferSize equal to one.

However, `SchedBoundary::isUnbufferedGroup` doesn't really follow this
convention: it returns true when the resource in question is a
`ProcResGroup` and its BufferSize equals to **zero** rather than one.
This could be really confusing for the reader. This patch renames this
function to `isReservedGroup` in aligned with the convention mentioned
above.

NFC.
2025-11-04 16:21:37 -08:00
Grigory Pastukhov
7398591148
[CodeGen] Add skipFunction() check to MachineFunctionSplitter (#166260)
MachineFunctionSplitter was missing a skipFunction() check, causing it
to incorrectly split functions that should be skipped (e.g., functions
with optnone attribute).

This patch adds an early skipFunction() check in runOnMachineFunction()
to ensure these functions are never split, regardless of profile data
availability or other splitting conditions.
2025-11-04 11:01:50 -08:00
Matt Arsenault
831e79adff
DAG: Merge all sincos_stret emission code into legalizer (#166295)
This avoids AArch64 legality rules depending on libcall
availability.

ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.

This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
2025-11-04 10:20:00 -08:00
Alex Voicu
2286118e6f
[SPIRV] Enable bfloat16 arithmetic (#166031)
Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.
2025-11-04 18:10:26 +02:00
Matt Arsenault
3c2c9d5bc1
DAG: Cleanup string bool attribute check for disable-tail-calls (#166237) 2025-11-03 14:18:04 -08:00
Laxman Sole
6fe3eccdf4
[llvm][DebugInfo] Emit 0/1 for constant boolean values (#151225)
Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would
convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats
booleans as unsigned values, so this resulted in the attribute
`DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the
debugger would display the value as `255` instead of `true`.

This change modifies the behavior to use zero-extension for 1-bit values
instead, ensuring that `true` is represented as 1. Consequently, the
DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the
debugger to correctly display the boolean as `true`.
2025-11-03 13:34:44 -08:00
Kazu Hirata
7db6344170
[CodeGen] Remove redundant declarations (NFC) (#166105)
In C++17, static constexpr members are implicitly inline, so they no
longer require an out-of-line definition.

Identified with readability-redundant-declaration.
2025-11-02 22:42:40 -08:00
Kazu Hirata
31b8ba5670
[Analysis, CodeGen] Use ArrayRef instead of const ArrayRef (NFC) (#166026)
This patch improves readability by using "ArrayRef<T>" instead of
"const ArrayRef<T>" and "const ArrayRef<T> &" in function parameter
types.
2025-11-01 23:20:19 -07:00
Kazu Hirata
b82bde695e
[Analysis, CodeGen] Use "= default" (NFC) (#166024)
Identified with modernize-use-equals-default.
2025-11-01 23:20:11 -07:00
wdx727
befae81fa2
Fix the usage issue of getRegMask. (#141215)
In the process of determining whether two MachineOperands are equal and
calculating the hash of a MachineOperand, both MO_RegisterMask and
MO_RegisterLiveOut types were uniformly handled. However, when the type
is MO_RegisterLiveOut, calling getRegMask() triggers an assertion
failure. This PR addresses this issue.
2025-11-01 21:55:08 -07:00
Craig Topper
06575b48ce Revert "[LegalizeTypes] Use UpdateNodeOperands in SoftPromoteHalfOp_STACKMAP/PATCHPOINT. (#165927)"
This reverts commit 4357fcbbd5012369dbbbe50f99941147895d6611.

Causes a crash when combined with #165922.
2025-10-31 23:38:32 -07:00
Craig Topper
02fef973e9
[SelectionDAG][RISCV] Support STACK/PATCHPOINT in SoftenFloatOperand. (#165922)
Test float/double/half/bfloat on RISC-V without F extension.
2025-10-31 23:31:10 -07:00
Craig Topper
4357fcbbd5
[LegalizeTypes] Use UpdateNodeOperands in SoftPromoteHalfOp_STACKMAP/PATCHPOINT. (#165927) 2025-10-31 23:30:23 -07:00
Craig Topper
d310693bde
[SelectionDAG] Use GetPromotedInteger when promoting integer operands of PATCHPOINT/STACKMAP. (#165926)
This is consistent with other promotion, but causes negative constants
to be sign extended instead of zero extended in some cases.

I guess getNode and type legalizer are inconsistent about what
ANY_EXTEND of a constant does.
2025-10-31 22:11:13 +00:00
Fabian Ritter
8ea447b4c4
[SDAG] Set InBounds when when computing offsets into memory objects (#165425)
When a load or store accesses N bytes starting from a pointer P, and we want to
compute an offset pointer within these N bytes after P, we know that the
arithmetic to add the offset must be inbounds. This is for example relevant
when legalizing too-wide memory accesses, when lowering memcpy&Co., or when
optimizing "vector-load -> extractelement" into an offset load.

For SWDEV-516125.
2025-10-31 11:27:55 +01:00
Michael Buch
10fbbb62ce
[llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (#165409)
Depends on:
* https://github.com/llvm/llvm-project/pull/165373

When an Objective-C property has a backing ivar, we would previously not
add a `DW_AT_APPLE_property` to the ivar's `DW_TAG_member`. This is what
was intended based on the [Objective-C DebugInfo
docs](https://github.com/llvm/llvm-project/blob/main/llvm/docs/SourceLevelDebugging.rst#proposal)
but is not what LLVM currently generates.

LLDB currently doesn't ever try linking the `ObjCPropertyDecl`s to their
`ObjCIvarDecl`s, but if we wanted to, this debug-info patch is a
pre-requisite.
2025-10-31 10:25:58 +00:00
Fabian Ritter
a85e84b854
[SDAG] Preserve InBounds in DAGCombines (#165424)
This PR preserves the InBounds flag (#162477) where possible in PTRADD-related
DAGCombines. We can't preserve them in all the cases that we could in the
analogous GISel change (#152495) because SDAG usually represents pointers as
integers, which means that pointer provenance is not preserved between PTRADD
operations (see the discussion at PR #162477 for more details). This PR marks
the places in the DAGCombiner where this is relevant explicitly.

For SWDEV-516125.
2025-10-31 10:25:39 +01:00
David Green
215aca4432
[GlobalISel] SBFX/UBFX does not create poison (#165675)
This adds G_SBFX/G_UBFX to the list of instructions that do not generate
poison, to allowing freeze to be hoisted above one.
2025-10-31 09:18:07 +00:00
Rahman Lavaee
e9368a056d
[SHT_LLVM_BB_ADDR] Implement ELF and YAML support for Propeller CFG data in PGO analysis map. (#164914)
This PR implements the ELF support for PostLink CFG in PGO analysis map
as discussed in
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617/2).

A later PR will implement the Codegen Support.
2025-10-30 13:12:06 -07:00
wdx727
fe52f1d77d
Adding Matching and Inference Functionality to Propeller-PR3: Read basic block hashes from propeller profile. (#164223)
Adding Matching and Inference Functionality to Propeller. For detailed
information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
This is the third PR, which is used to read basic block hashes from the
propeller profile. The associated PRs are:
PR1: https://github.com/llvm/llvm-project/pull/160706
PR2: https://github.com/llvm/llvm-project/pull/162963

co-authors: lifengxiang1025
[lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh
[wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com)

Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com>
Co-authored-by: zcfh <wuminghui03@kuaishou.com>
2025-10-30 13:11:08 -07:00
Princeton Ferro
68e74f8f84
[DAGCombiner] Lower dynamic insertelt chain more efficiently (#162368)
For an insertelt with a dynamic index, the default handling in
DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the
vector, lower the insertelt to a store, then load the modified vector
back into temporaries. The vector store and load may be legalized into a
sequence of smaller operations depending on the target.

Let V = the vector size and L = the length of a chain of insertelts with
dynamic indices. In the worse case, this chain will lower to O(VL)
operations, which can increase code size dramatically.

Instead, identify such chains, reserve one stack slot for the vector,
and lower all of the insertelts to stores at once. This requires only
O(V + L) operations. This change only affects the default lowering
behavior.
2025-10-29 09:46:01 -07:00
Orlando Cazalet-Hyams
aa5fe56db4
[DebugInfo] Add dataSize to DIBasicType to add DW_AT_bit_size to _BitInt types (#164372)
DW_TAG_base_type DIEs are permitted to have both byte_size and bit_size
attributes "If the value of an object of the given type does not fully
occupy the storage described by a byte size attribute"

* Add DataSizeInBits to DIBasicType (`DIBasicType(... dataSize: n ...)` in IR).
* Change Clang to add DataSizeInBits to _BitInt type metadata.
* Change LLVM to add DW_AT_bit_size to base_type DIEs that have non-zero
  DataSizeInBits.

TODO: Do we need to emit DW_AT_data_bit_offset for big endian targets?
See discussion on the PR.

Fixes [#61952](https://github.com/llvm/llvm-project/issues/61952)

---------

Co-authored-by: David Stenberg <david.stenberg@ericsson.com>
2025-10-29 15:23:46 +00:00
David Green
da15b8fc2e
[AArch64][GlobalISel] Add a constant funnel shift post-legalizer combine. (#151912)
We want to be able to produce extr instructions post-legalization. They
are legal for scalars, acting as a funnel shift with a constant shift
amount. Unfortunately I'm not sure if there is a way currently to
represent that in the legalization rules, but it might be useful for
several operations - to be able to treat and test operands with constant
operands as legal or not.

This adds a change to the existing matchOrShiftToFunnelShift so that
AArch64 can generate such instructions post-legalization providing that
the operation is scalar and the shift amount is constant.
2025-10-29 07:47:41 +00:00
Matt Arsenault
28e9a2832f
DAG: Consider __sincos_stret when deciding to form fsincos (#165169) 2025-10-28 08:28:09 -07:00
Shimin Cui
531fd45e92
[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910)
Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`

However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.

---------

Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>
2025-10-28 10:24:32 -04:00
Lauren
e964acf85f
[DAG] Fold mismatched widened avg idioms to narrow form (#147946) (#163366)
[DAG] Fold mismatched widened avg idioms to narrow form (fixes half of
[llvm#147946](https://github.com/llvm/llvm-project/issues/147946))

1. `trunc(avgceilu(sext(x), sext(y))) -> avgceils(x, y)` 
2. `trunc(avgceils(zext(x), zext(y))) -> avgceilu(x, y)`

When inputs are sign-extended, unsigned and signed averaging operations
produce identical results after truncation, allowing us to use the
semantically correct narrow operation.

alive2: https://alive2.llvm.org/ce/z/ZRbfHT
2025-10-27 12:24:41 +00:00
Kazu Hirata
6cb942cec4
[llvm] Remove argument_type in std::hash specializations (NFC) (#165167)
The argument_type and result_type type aliases in std::hash are
deprecated in C++17 and removed in C++20.  This patch aligns two
specializations of ours with the C++ standard.
2025-10-26 15:20:07 -07:00
Kazu Hirata
160b72787c
[CodeGen] Use DenseMap::try_emplace (NFC) (#165165)
With try_emplace, we can pass the key and the arguments for the
value's constructor, which is a lot shorter than:

  Map.insert(std::make_pair(Key, ValueType(Arg1, Arg2)))
2025-10-26 13:34:15 -07:00
Jakub Kuderski
57828a6d5d
[ADT] Prepare for deprecation of StringSwitch cases with 3+ args. NFC. (#165112)
Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.

For more context, see: https://github.com/llvm/llvm-project/pull/163405.
2025-10-25 15:11:18 -04:00
AZero13
5d0f1591f8
[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848)
Source: Hacker's delight.
2025-10-25 10:17:15 -07:00
Yunqing Yu
059d90d08f
[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893)
Cache extracted elements in lowerShuffleVector(). For example, when
lowering
```
%0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1
%2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N )
```
Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in
shufflemask. This is undesirable and bloats the code, especially for
larger vectors.

With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0`
and reuse it for all four result elements.
2025-10-25 10:26:11 -05:00
Kazu Hirata
881b001b07
[ADT] Make internal methods of DenseMap/SmallDenseMap private (NFC) (#165079)
This patch moves the init, copyFrom, and grow methods in DenseMap and
SmallDenseMap from public to private to hide implementation details.

The only problem is that PhysicalRegisterUsageInfo calls
DenseMap::grow instead of DenseMap::reserve, which I don't think is
intended.  This patch updates the call to reserve.
2025-10-25 06:23:20 -07:00
Luo Yuanke
9a0a1fadef
[ISel] Use CallBase instead of CallInst (#164769)
This is to follow the discussion in
https://github.com/llvm/llvm-project/pull/164565
CallBase can cover more call-like instructions which carry caling
convention flag.

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
2025-10-25 20:37:20 +08:00
Yingwei Zheng
59e601a3d5
[CodeGenPrepare] Don't simplify incomplete expression tree in AddrModeCombine (#164628)
Since new select/phi instructions may construct loops, the expression
tree to be simplified may still be incomplete (i.e., it may contain
select with dummy values or phi without incoming values). This patch
removes the call to simplifyInstruction for now, as it doesn't break
existing tests.

Original PR: https://reviews.llvm.org/D36073
Fix the crash reported in
https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732.
2025-10-25 16:47:32 +08:00
Kazu Hirata
8388a5b340
[ADT] Rename identity_cxx20 to identity (#164927)
Now that the old llvm::identity has moved into IndexedMap.h under a
different name, this patch renames identity_cxx20 to identity.  Note
that llvm::identity closely models std::identity from C++20.
2025-10-24 15:30:42 -07:00
Mirko Brkušanin
fe5f49942e
[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122)
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
2025-10-24 14:48:27 +02:00
Matt Arsenault
f5a2e6bb8f
CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044)
All 3 implementations are just checking if this has the
windows check function, so merge that as the only implementation.
2025-10-24 21:17:34 +09:00
David Green
332f786a35
[DAG][AArch64] Ensure that ResNo is correct for uses of Ptr when considering postinc. (#164810)
We might be looking at a different use, for example in the uses of a
i32,i64,ch preindex load.

Fixes #164775
2025-10-24 11:33:08 +01:00
David Green
a1e59bdc17
[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.

Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-24 08:21:35 +01:00
Serge Pavlov
bcee0ee68d
[SDAG] Fix deferring constrained function calls (#153029)
Selection DAG has a more sophisticated execution order representation
than the simple sequence used in IR, so building the DAG can take into
account specific properties of the nodes to better express possible
parallelism. The existing implementation does this for constrained
function calls, some of them are considered as independent, which can
potentially improve the generated code. However this mechanism
incorrectly implies that the calls with exception behavior 'ebIgnore'
cannot raise floating-point exception. The purpose of this change is to
fix the implementation.

In the current implementation, constrained function calls don't
immediately update the DAG root. Instead, the DAG builder collects their
output chains and flushes them when the root is required. Constrained
function calls cannot be moved across calls of external functions and
intrinsics that access floating-point environment, they work as
barriers. Between the barriers, constrained function calls can be
reordered, they may be considered independent from viewpoint of raising
exceptions. For strictfp functions this is possible only if
floating-point trapping is disabled.

This change introduces a new restriction - the calls with default
exception handling cannot not be moved between strictfp function calls.
Otherwise the exceptions raised by such call can disturb the expected
exception sequence. It means that constrained function calls with strict
exception behavior act as barriers for the calls with non-strict
behavior and vice versa. Effectively it means that the entire sequence
of constrained calls in IR is split into "strict" and "non-strict"
regions, in which restrictions on the order of constrained calls are
relaxed, but move from one region to another is not allowed. It agrees
with the representation of strictfp code in high-level languages. For
example, C/C++ strictfp code correspond to blocks where pragma `STDC
FENV_ACCESS ON` is in effect, this restriction should help preserving
the intended semantics.

When floating-point exception trapping is enabled, constrained
intrinsics with 'ebStrict' cannot be reordered, their sequence must be
identical to the original source order. The current implementation does
not distinguish between strictfp modes with trapping and without it.
This change make assumption that the trapping is disabled. It is not
correct in the general case, but is compatible with the existing
implementation.
2025-10-24 09:40:29 +07:00
wdx727
d8d80b659a
Adding Matching and Inference Functionality to Propeller-PR2 (#162963)
Adding Matching and Inference Functionality to Propeller. For detailed
information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
This is the second PR, which includes the calculation of basic block
hashes and their emission to the ELF file. It is associated with the
previous PR at https://github.com/llvm/llvm-project/pull/160706.

co-authors: lifengxiang1025
[lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh
[wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com)

Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com>
Co-authored-by: zcfh <wuminghui03@kuaishou.com>
Co-authored-by: Rahman Lavaee <rahmanl@google.com>
2025-10-23 09:38:12 -07:00
Fabian Ritter
a3ea51e4f1
[SDAG] Introduce inbounds flag for ISD::PTRADD (#162477)
This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode
implements an inbounds getelementptr operation (i.e., the pointer operand is in
bounds wrt. an allocated object it is based on, and the arithmetic does not
change that). The flag is set in the DAG construction when lowering inbounds
GEPs.

Inbounds information is useful in the ISel when selecting memory instructions
that perform address computations whose intermediate steps must be in the same
memory region as the final result. Follow-up patches to propagate the flag in
DAGCombines and to use it when lowering AMDGPU's flat memory instructions,
where the immediate offset must not affect the memory aperture of the address
(similar to this GISel patch: #153001), are planned.

This mirrors #150900, which has introduced a similar flag in GlobalISel.

This patch supersedes #131862, which previously attempted to introduce an
SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that
there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only
defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that
in-bounds-ness refers to a memory object into which the left operand of the
PTRADD node points (in contrast to #131862, where InBounds would have applied
to commutative ISD::ADD nodes, so that the semantics would be more difficult to
reason about).

For SWDEV-516125.
2025-10-23 09:35:33 +02:00