llvm-project

Author	SHA1	Message	Date
Matt Arsenault	ad8f6b44be	DAG: Avoid some libcall string name comparisons (#166321 ) Move to the libcall impl based functions.	2025-11-05 07:09:02 -08:00
Santanu Das	63d6e3eb46	[DebugInfo] Assign best possible debugloc to bundle (#164573 ) The debug info attached to the BUNDLE is the first instruction in the BUNDLE, even if a better debug info (line:column) is present in the later instructions of the bundle. The patch tries to get a better debug info first. If not, then a worse debug info without line number is chosen. --------- Co-authored-by: Vladislav Dzhidzhoev <dzhidzhoev@gmail.com> Co-authored-by: Orlando Cazalet-Hyams <orlandoch.och@gmail.com>	2025-11-05 20:26:00 +05:30
Jan Patrick Lehr	833983918d	Revert "CodeGen: Record MMOs in finalizeBundle" (#166520 ) Reverts llvm/llvm-project#166210 Buildbot failures in the libc on GPU bot: https://lab.llvm.org/buildbot/#/builders/10/builds/16711	2025-11-05 11:11:08 +01:00
Nicolai Hähnle	304d2ff4d9	CodeGen: Record MMOs in finalizeBundle (#166210 ) This allows more accurate alias analysis to apply at the bundle level. This has a bunch of minor effects in post-RA scheduling that look mostly beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic). The pre-existing (and unchanged) test in CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a bundle with MMOs can be parsed successfully. v2: - use cloneMergedMemRefs - add another test to explicitly check the MMO bundling behavior v3: - use poison instead of undef to initialize the global variable in the test	2025-11-05 06:56:19 +00:00
Vigneshwar Jayakumar	b5f200129a	[CodeGen] Register-coalescer remat fix subreg liveness (#165662 ) This is a bugfix in rematerialization where the liveness of subreg mask was incorrectly updated causing crash in scheduler.	2025-11-04 22:40:40 -06:00
Abhay Kanhere	d998f92a00	[CodeGen] MachineVerifier to check early-clobber constraint (#151421 ) Currently MachineVerifier is missing verifying early-clobber operand constraint. The only other machine operand constraint - TiedTo is already verified.	2025-11-04 18:39:31 -08:00
Nicolai Hähnle	d6fdfe0a27	CodeGen: Record tied virtual register operands in finalizeBundle (#166209 ) This is in preparation of a future AMDGPU change where we are going to create bundles before register allocation and want to rely on the TwoAddressInstructionPass handling those bundles correctly. v2: - simplify the virtual register check and the test	2025-11-05 02:18:39 +00:00
Jin Huang	fa5cd27ef0	[profcheck] Add unknown branch weights to expand LL/SR loop. (#166273 ) As a follow-up to PR#165841, this change addresses `prof_md` metadata loss in AtomicExpandPass when lowering `atomicrmw xchg` to a Load-Linked/Store-Exclusive (LL/SC) loop. This path is distinct from the LSE path addressed previously: PR #165841 (and its tests) used `-mtriple=aarch64-linux-gnu`, which targets a modern ARMv8.1+ architecture. This architecture supports Large System Extensions (LSE), allowing `atomicrmw` to be lowered directly to a more efficient hardware instruction. This PR (and its tests) uses `-mtriple=aarch64--` or `-mtriple=armv8-linux-gnueabihf`. This indicates an `ARMv8.0 or lower architecture that does not support LSE`. On these targets, the pass must fall back to synthesizing a manual LL/SC loop using the `ldaxr/stxr` instruction pair. Similar to previous issue, the new conditional branch was failin to inherit the `prof_md` metadata. Theis PR correctly fix the branch weights to the newly created branch within the LL/SC loop, ensuring profile information is preserved. Co-authored-by: Jin Huang <jingold@google.com>	2025-11-04 16:23:34 -08:00
Min-Yih Hsu	6d4e75cc93	[MISched][NFC] Rename isUnbufferedGroup to isReservedGroup (#166439 ) In both ScheduleDAGInstrs and MachineScheduler, we call `BufferSize = 0` as _reserved_ and `BufferSize = 1` as _unbuffered_. This convention is stem from the fact that we set `SUnit::hasReservedResource` to true when any of the SUnit's consumed resources has BufferSize equal to zero; set `SUnit::isUnbuffered` to true when any of its consumed resources has BufferSize equal to one. However, `SchedBoundary::isUnbufferedGroup` doesn't really follow this convention: it returns true when the resource in question is a `ProcResGroup` and its BufferSize equals to zero rather than one. This could be really confusing for the reader. This patch renames this function to `isReservedGroup` in aligned with the convention mentioned above. NFC.	2025-11-04 16:21:37 -08:00
Grigory Pastukhov	7398591148	[CodeGen] Add skipFunction() check to MachineFunctionSplitter (#166260 ) MachineFunctionSplitter was missing a skipFunction() check, causing it to incorrectly split functions that should be skipped (e.g., functions with optnone attribute). This patch adds an early skipFunction() check in runOnMachineFunction() to ensure these functions are never split, regardless of profile data availability or other splitting conditions.	2025-11-04 11:01:50 -08:00
Matt Arsenault	831e79adff	DAG: Merge all sincos_stret emission code into legalizer (#166295 ) This avoids AArch64 legality rules depending on libcall availability. ARM, AArch64, and X86 all had custom lowering of fsincos which all were just to emit calls to sincos_stret / sincosf_stret. This messes with the cost heuristics around legality, because really it's an expand/libcall cost and not a favorable custom. This is a bit ugly, because we're emitting code trying to match the C ABI lowered IR type for the aggregate return type. This now also gives an easy way to lift the unhandled x86_32 darwin case, since ARM already handled the return as sret case.	2025-11-04 10:20:00 -08:00
Alex Voicu	2286118e6f	[SPIRV] Enable `bfloat16` arithmetic (#166031 ) Enable the `SPV_INTEL_bfloat16_arithmetic` extension, which allows arithmetic, relational and `OpExtInst` instructions to take `bfloat16` arguments. This patch only adds support to arithmetic and relational ops. The extension itself is rather fresh, but `bfloat16` is ubiquitous at this point and not supporting these ops is limiting.	2025-11-04 18:10:26 +02:00
Matt Arsenault	3c2c9d5bc1	DAG: Cleanup string bool attribute check for disable-tail-calls (#166237 )	2025-11-03 14:18:04 -08:00
Laxman Sole	6fe3eccdf4	[llvm][DebugInfo] Emit 0/1 for constant boolean values (#151225 ) Previously, sign-extending a 1-bit boolean operand in `#DBG_VALUE` would convert `true` to -1 (i.e., 0xffffffffffffffff). However, DWARF treats booleans as unsigned values, so this resulted in the attribute `DW_AT_const_value(0xffffffffffffffff)` being emitted. As a result, the debugger would display the value as `255` instead of `true`. This change modifies the behavior to use zero-extension for 1-bit values instead, ensuring that `true` is represented as 1. Consequently, the DWARF attribute emitted is now `DW_AT_const_value(1)`, which allows the debugger to correctly display the boolean as `true`.	2025-11-03 13:34:44 -08:00
Kazu Hirata	7db6344170	[CodeGen] Remove redundant declarations (NFC) (#166105 ) In C++17, static constexpr members are implicitly inline, so they no longer require an out-of-line definition. Identified with readability-redundant-declaration.	2025-11-02 22:42:40 -08:00
Kazu Hirata	31b8ba5670	[Analysis, CodeGen] Use ArrayRef instead of const ArrayRef (NFC) (#166026 ) This patch improves readability by using "ArrayRef<T>" instead of "const ArrayRef<T>" and "const ArrayRef<T> &" in function parameter types.	2025-11-01 23:20:19 -07:00
Kazu Hirata	b82bde695e	[Analysis, CodeGen] Use "= default" (NFC) (#166024 ) Identified with modernize-use-equals-default.	2025-11-01 23:20:11 -07:00
wdx727	befae81fa2	Fix the usage issue of getRegMask. (#141215 ) In the process of determining whether two MachineOperands are equal and calculating the hash of a MachineOperand, both MO_RegisterMask and MO_RegisterLiveOut types were uniformly handled. However, when the type is MO_RegisterLiveOut, calling getRegMask() triggers an assertion failure. This PR addresses this issue.	2025-11-01 21:55:08 -07:00
Craig Topper	06575b48ce	Revert "[LegalizeTypes] Use UpdateNodeOperands in SoftPromoteHalfOp_STACKMAP/PATCHPOINT. (#165927 )" This reverts commit 4357fcbbd5012369dbbbe50f99941147895d6611. Causes a crash when combined with #165922.	2025-10-31 23:38:32 -07:00
Craig Topper	02fef973e9	[SelectionDAG][RISCV] Support STACK/PATCHPOINT in SoftenFloatOperand. (#165922 ) Test float/double/half/bfloat on RISC-V without F extension.	2025-10-31 23:31:10 -07:00
Craig Topper	4357fcbbd5	[LegalizeTypes] Use UpdateNodeOperands in SoftPromoteHalfOp_STACKMAP/PATCHPOINT. (#165927 )	2025-10-31 23:30:23 -07:00
Craig Topper	d310693bde	[SelectionDAG] Use GetPromotedInteger when promoting integer operands of PATCHPOINT/STACKMAP. (#165926 ) This is consistent with other promotion, but causes negative constants to be sign extended instead of zero extended in some cases. I guess getNode and type legalizer are inconsistent about what ANY_EXTEND of a constant does.	2025-10-31 22:11:13 +00:00
Fabian Ritter	8ea447b4c4	[SDAG] Set InBounds when when computing offsets into memory objects (#165425 ) When a load or store accesses N bytes starting from a pointer P, and we want to compute an offset pointer within these N bytes after P, we know that the arithmetic to add the offset must be inbounds. This is for example relevant when legalizing too-wide memory accesses, when lowering memcpy&Co., or when optimizing "vector-load -> extractelement" into an offset load. For SWDEV-516125.	2025-10-31 11:27:55 +01:00
Michael Buch	10fbbb62ce	[llvm][DebugInfo][ObjC] Make sure we link backing ivars to their DW_TAG_APPLE_property (#165409 ) Depends on: * https://github.com/llvm/llvm-project/pull/165373 When an Objective-C property has a backing ivar, we would previously not add a `DW_AT_APPLE_property` to the ivar's `DW_TAG_member`. This is what was intended based on the [Objective-C DebugInfo docs](https://github.com/llvm/llvm-project/blob/main/llvm/docs/SourceLevelDebugging.rst#proposal) but is not what LLVM currently generates. LLDB currently doesn't ever try linking the `ObjCPropertyDecl`s to their `ObjCIvarDecl`s, but if we wanted to, this debug-info patch is a pre-requisite.	2025-10-31 10:25:58 +00:00
Fabian Ritter	a85e84b854	[SDAG] Preserve InBounds in DAGCombines (#165424 ) This PR preserves the InBounds flag (#162477) where possible in PTRADD-related DAGCombines. We can't preserve them in all the cases that we could in the analogous GISel change (#152495) because SDAG usually represents pointers as integers, which means that pointer provenance is not preserved between PTRADD operations (see the discussion at PR #162477 for more details). This PR marks the places in the DAGCombiner where this is relevant explicitly. For SWDEV-516125.	2025-10-31 10:25:39 +01:00
David Green	215aca4432	[GlobalISel] SBFX/UBFX does not create poison (#165675 ) This adds G_SBFX/G_UBFX to the list of instructions that do not generate poison, to allowing freeze to be hoisted above one.	2025-10-31 09:18:07 +00:00
Rahman Lavaee	e9368a056d	[SHT_LLVM_BB_ADDR] Implement ELF and YAML support for Propeller CFG data in PGO analysis map. (#164914 ) This PR implements the ELF support for PostLink CFG in PGO analysis map as discussed in [RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617/2). A later PR will implement the Codegen Support.	2025-10-30 13:12:06 -07:00
wdx727	fe52f1d77d	Adding Matching and Inference Functionality to Propeller-PR3: Read basic block hashes from propeller profile. (#164223 ) Adding Matching and Inference Functionality to Propeller. For detailed information, please refer to the following RFC: https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238. This is the third PR, which is used to read basic block hashes from the propeller profile. The associated PRs are: PR1: https://github.com/llvm/llvm-project/pull/160706 PR2: https://github.com/llvm/llvm-project/pull/162963 co-authors: lifengxiang1025 [lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh [wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com) Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com> Co-authored-by: zcfh <wuminghui03@kuaishou.com>	2025-10-30 13:11:08 -07:00
Princeton Ferro	68e74f8f84	[DAGCombiner] Lower dynamic insertelt chain more efficiently (#162368 ) For an insertelt with a dynamic index, the default handling in DAGTypeLegalizer and LegalizeDAG will reserve a stack slot for the vector, lower the insertelt to a store, then load the modified vector back into temporaries. The vector store and load may be legalized into a sequence of smaller operations depending on the target. Let V = the vector size and L = the length of a chain of insertelts with dynamic indices. In the worse case, this chain will lower to O(VL) operations, which can increase code size dramatically. Instead, identify such chains, reserve one stack slot for the vector, and lower all of the insertelts to stores at once. This requires only O(V + L) operations. This change only affects the default lowering behavior.	2025-10-29 09:46:01 -07:00
Orlando Cazalet-Hyams	aa5fe56db4	[DebugInfo] Add dataSize to DIBasicType to add DW_AT_bit_size to _BitInt types (#164372 ) DW_TAG_base_type DIEs are permitted to have both byte_size and bit_size attributes "If the value of an object of the given type does not fully occupy the storage described by a byte size attribute" * Add DataSizeInBits to DIBasicType (`DIBasicType(... dataSize: n ...)` in IR). * Change Clang to add DataSizeInBits to _BitInt type metadata. * Change LLVM to add DW_AT_bit_size to base_type DIEs that have non-zero DataSizeInBits. TODO: Do we need to emit DW_AT_data_bit_offset for big endian targets? See discussion on the PR. Fixes [#61952](https://github.com/llvm/llvm-project/issues/61952) --------- Co-authored-by: David Stenberg <david.stenberg@ericsson.com>	2025-10-29 15:23:46 +00:00
David Green	da15b8fc2e	[AArch64][GlobalISel] Add a constant funnel shift post-legalizer combine. (#151912 ) We want to be able to produce extr instructions post-legalization. They are legal for scalars, acting as a funnel shift with a constant shift amount. Unfortunately I'm not sure if there is a way currently to represent that in the legalization rules, but it might be useful for several operations - to be able to treat and test operands with constant operands as legal or not. This adds a change to the existing matchOrShiftToFunnelShift so that AArch64 can generate such instructions post-legalization providing that the operation is scalar and the shift amount is constant.	2025-10-29 07:47:41 +00:00
Matt Arsenault	28e9a2832f	DAG: Consider __sincos_stret when deciding to form fsincos (#165169 )	2025-10-28 08:28:09 -07:00
Shimin Cui	531fd45e92	[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910 ) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\| (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>	2025-10-28 10:24:32 -04:00
Lauren	e964acf85f	[DAG] Fold mismatched widened avg idioms to narrow form (#147946 ) (#163366 ) [DAG] Fold mismatched widened avg idioms to narrow form (fixes half of [llvm#147946](https://github.com/llvm/llvm-project/issues/147946)) 1. `trunc(avgceilu(sext(x), sext(y))) -> avgceils(x, y)` 2. `trunc(avgceils(zext(x), zext(y))) -> avgceilu(x, y)` When inputs are sign-extended, unsigned and signed averaging operations produce identical results after truncation, allowing us to use the semantically correct narrow operation. alive2: https://alive2.llvm.org/ce/z/ZRbfHT	2025-10-27 12:24:41 +00:00
Kazu Hirata	6cb942cec4	[llvm] Remove argument_type in std::hash specializations (NFC) (#165167 ) The argument_type and result_type type aliases in std::hash are deprecated in C++17 and removed in C++20. This patch aligns two specializations of ours with the C++ standard.	2025-10-26 15:20:07 -07:00
Kazu Hirata	160b72787c	[CodeGen] Use DenseMap::try_emplace (NFC) (#165165 ) With try_emplace, we can pass the key and the arguments for the value's constructor, which is a lot shorter than: Map.insert(std::make_pair(Key, ValueType(Arg1, Arg2)))	2025-10-26 13:34:15 -07:00
Jakub Kuderski	57828a6d5d	[ADT] Prepare for deprecation of StringSwitch cases with 3+ args. NFC. (#165112 ) Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: https://github.com/llvm/llvm-project/pull/163405.	2025-10-25 15:11:18 -04:00
AZero13	5d0f1591f8	[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848 ) Source: Hacker's delight.	2025-10-25 10:17:15 -07:00
Yunqing Yu	059d90d08f	[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893 ) Cache extracted elements in lowerShuffleVector(). For example, when lowering ``` %0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1 %2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N ) ``` Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in shufflemask. This is undesirable and bloats the code, especially for larger vectors. With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0` and reuse it for all four result elements.	2025-10-25 10:26:11 -05:00
Kazu Hirata	881b001b07	[ADT] Make internal methods of DenseMap/SmallDenseMap private (NFC) (#165079 ) This patch moves the init, copyFrom, and grow methods in DenseMap and SmallDenseMap from public to private to hide implementation details. The only problem is that PhysicalRegisterUsageInfo calls DenseMap::grow instead of DenseMap::reserve, which I don't think is intended. This patch updates the call to reserve.	2025-10-25 06:23:20 -07:00
Luo Yuanke	9a0a1fadef	[ISel] Use CallBase instead of CallInst (#164769 ) This is to follow the discussion in https://github.com/llvm/llvm-project/pull/164565 CallBase can cover more call-like instructions which carry caling convention flag. Co-authored-by: Yuanke Luo <ykluo@birentech.com>	2025-10-25 20:37:20 +08:00
Yingwei Zheng	59e601a3d5	[CodeGenPrepare] Don't simplify incomplete expression tree in AddrModeCombine (#164628 ) Since new select/phi instructions may construct loops, the expression tree to be simplified may still be incomplete (i.e., it may contain select with dummy values or phi without incoming values). This patch removes the call to simplifyInstruction for now, as it doesn't break existing tests. Original PR: https://reviews.llvm.org/D36073 Fix the crash reported in https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732.	2025-10-25 16:47:32 +08:00
Kazu Hirata	8388a5b340	[ADT] Rename identity_cxx20 to identity (#164927 ) Now that the old llvm::identity has moved into IndexedMap.h under a different name, this patch renames identity_cxx20 to identity. Note that llvm::identity closely models std::identity from C++20.	2025-10-24 15:30:42 -07:00
Mirko Brkušanin	fe5f49942e	[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122 ) Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.	2025-10-24 14:48:27 +02:00
Matt Arsenault	f5a2e6bb8f	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044 ) All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.	2025-10-24 21:17:34 +09:00
David Green	332f786a35	[DAG][AArch64] Ensure that ResNo is correct for uses of Ptr when considering postinc. (#164810 ) We might be looking at a different use, for example in the uses of a i32,i64,ch preindex load. Fixes #164775	2025-10-24 11:33:08 +01:00
David Green	a1e59bdc17	[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508 ) I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).	2025-10-24 08:21:35 +01:00
Serge Pavlov	bcee0ee68d	[SDAG] Fix deferring constrained function calls (#153029 ) Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation. In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled. This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma `STDC FENV_ACCESS ON` is in effect, this restriction should help preserving the intended semantics. When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrict' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation.	2025-10-24 09:40:29 +07:00
wdx727	d8d80b659a	Adding Matching and Inference Functionality to Propeller-PR2 (#162963 ) Adding Matching and Inference Functionality to Propeller. For detailed information, please refer to the following RFC: https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238. This is the second PR, which includes the calculation of basic block hashes and their emission to the ELF file. It is associated with the previous PR at https://github.com/llvm/llvm-project/pull/160706. co-authors: lifengxiang1025 [lifengxiang@kuaishou.com](mailto:lifengxiang@kuaishou.com); zcfh [wuminghui03@kuaishou.com](mailto:wuminghui03@kuaishou.com) Co-authored-by: lifengxiang1025 <lifengxiang@kuaishou.com> Co-authored-by: zcfh <wuminghui03@kuaishou.com> Co-authored-by: Rahman Lavaee <rahmanl@google.com>	2025-10-23 09:38:12 -07:00
Fabian Ritter	a3ea51e4f1	[SDAG] Introduce inbounds flag for ISD::PTRADD (#162477 ) This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode implements an inbounds getelementptr operation (i.e., the pointer operand is in bounds wrt. an allocated object it is based on, and the arithmetic does not change that). The flag is set in the DAG construction when lowering inbounds GEPs. Inbounds information is useful in the ISel when selecting memory instructions that perform address computations whose intermediate steps must be in the same memory region as the final result. Follow-up patches to propagate the flag in DAGCombines and to use it when lowering AMDGPU's flat memory instructions, where the immediate offset must not affect the memory aperture of the address (similar to this GISel patch: #153001), are planned. This mirrors #150900, which has introduced a similar flag in GlobalISel. This patch supersedes #131862, which previously attempted to introduce an SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that in-bounds-ness refers to a memory object into which the left operand of the PTRADD node points (in contrast to #131862, where InBounds would have applied to commutative ISD::ADD nodes, so that the semantics would be more difficult to reason about). For SWDEV-516125.	2025-10-23 09:35:33 +02:00

1 2 3 4 5 ...

38594 Commits