llvm-project

Author	SHA1	Message	Date
Diana Picus	0461cd3d1d	[AMDGPU] Intrinsic for launching whole wave functions (#145859 ) Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.	2025-08-06 10:25:53 +02:00
Craig Topper	73685583c8	[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593 ) There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.	2025-08-05 16:12:42 -07:00
Nikita Popov	86727fe9a1	[IR] Allow poison argument to lifetime markers (#151148 ) This slightly relaxes the invariant established in #149310, by also allowing the lifetime argument to be poison. This is to support the typical pattern of RAUWing with poison when removing an instruction. It's worth noting that this does not require any conservative assumptions, lifetimes with poison arguments can simply be skipped. Fixes https://github.com/llvm/llvm-project/issues/151119.	2025-08-04 10:02:04 +02:00
Nikita Popov	ab1f6ce482	[IR][SDAG] Remove lifetime size handling from SDAG (#150944 ) Split out from https://github.com/llvm/llvm-project/pull/150248: Specify that the argument of lifetime.start/lifetime.end is ignored and will be removed in the future. Remove lifetime size handling from SDAG. The size was previously discarded during isel, so was always ignored for stack coloring anyway. Where necessary, obtain the size of the full frame index.	2025-07-29 09:53:59 +02:00
paperchalice	21836f4a49	[SelectionDAG] Remove `UnsafeFPMath` in LegalizeDAG (#146316 ) These global flags hinder further improvements like [[RFC] Honor pragmas with -ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast) and pass concurrency support. Remove them incrementally.	2025-07-29 08:41:21 +08:00
Nikita Popov	a7a1df8f72	[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838 ) After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed that the argument is an alloca, so we don't need to look at underlying objects (which was not a correct thing to do anyway). This also drops the offset argument for lifetime nodes in SDAG. The offset is fixed to zero now. (Peculiarly, while SDAG pretended to have an offset, it just gets silently dropped during selection.)	2025-07-22 09:44:59 +02:00
Fraser Cormack	284dd5ba84	[SelectionDAG] Fix misplaced commas in operand bundle errors (#149331 )	2025-07-17 21:18:05 +01:00
Florian Mayer	5458151817	[SelectionDAG] improve error messages for invalid operator bundle (#148945 )	2025-07-15 17:30:48 -07:00
Florian Mayer	be200e2b80	[SelectionDAG] improve error message for invalid op bundles (#148722 )	2025-07-14 20:41:10 -07:00
Florian Mayer	14dc3e3d5f	[SelectionDAG] [KCFI] Allow "kcfi" on invoke (#148742 ) This is handled in CallBase, so it is valid for both call and invoke	2025-07-14 18:55:09 -07:00
Philip Reames	220a002396	[SDAG] Prefer scalar for prefix of vector GEP expansion (#146719 ) When generating SDAG for a getelementptr with a vector result, we were previously generating splats for each scalar operand. This essentially has the effect of aggressively vectorizing the sequence, and leaving it later combines to scalarize if profitable. Instead, we can keep the accumulating address as a scalar for as long as the prefix of operands allows before lazily converting to vector on the first vector operand. This both better fits hardware which frequently has a scalar base on the scatter/gather instructions, and reduces the addressing cost even when not as otherwise we end up with a scalar to vector domain crossing for each scalar operand. Note that constant splat offsets are treated as scalar for the above, and only variable offsets can force a conversion to vector. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-02 18:16:27 -07:00
Matt Arsenault	f0d898f36b	DAG: Move get_dynamic_area_offset type check to IR verifier (#145268 ) Also fix the LangRef to match the implementation. This was checking against the alloca address space size rather than the default address space. The check was also more permissive than the LangRef. The error check permitted any size less than the pointer size; follow the stricter wording of the LangRef.	2025-06-24 11:11:52 +09:00
Orlando Cazalet-Hyams	36038a1048	[RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel (#144702 )	2025-06-18 16:04:18 +01:00
Jeremy Morse	9eb0020555	[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389 ) Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-17 15:55:14 +01:00
Paul Walker	465e3ce9f1	[LLVM][CodeGen] Lower ConstantInt vectors like shufflevector base splats. (#144395 ) ConstantInt vectors utilise DAG.getConstant() when constructing the initial DAG. This can have the effect of legalising the constant before the DAG combiner is run, significant altering the generated code. To mitigate this (hopefully as a temporary measure) we instead try to construct the DAG in the same way as shufflevector based splats.	2025-06-17 11:09:22 +01:00
Omair Javaid	e1e1836bbd	[CodeGen] Inline stack guard check on Windows (#136290 ) This patch optimizes the Windows security cookie check mechanism by moving the comparison inline and only calling __security_check_cookie when the check fails. This reduces the overhead of making a DLL call for every function return. Previously, we implemented this optimization through a machine pass (X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by @mahesh-attarde. We have reverted that pass in favor of this new approach. Also we have abandoned the AArch64 specific implementation of same pass in PR #121938 in favor of this more general solution. The old machine instruction pass approach: - Scanned the generated code to find __security_check_cookie calls - Modified these calls by splitting basic blocks - Added comparison logic and conditional branching - Required complex block management and live register computation The new approach: - Implements the same optimization during instruction selection - Directly emits the comparison and conditional branching - No need for post-processing or basic block manipulation - Disables optimization at -Oz. Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.	2025-06-12 19:38:42 +05:00
Abhishek Kaushik	d6ecd6a658	[SelectionDAG][X86] Handle `llvm.type.test` in DAGBuilder (#142939 ) Closes #142937	2025-06-09 10:44:55 +05:30
Nikita Popov	b3ce9883f3	[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613 ) Replace the asserts with reportFatalUsageError(), as these can be reached with invalid user-provided IR. Fixes https://github.com/llvm/llvm-project/issues/142531.	2025-06-04 09:33:05 +02:00
Simon Tatham	56acb06bc6	[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562 ) In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or more labels / basic blocks in the containing function which the assembly code might jump to. If you're also compiling with branch target enforcement via BTI, then previously listing a basic block as a possible jump destination of an asm goto would cause a BTI instruction to be placed at the start of the block, in case the assembly code used an _indirect_ branch instruction (i.e. to a destination address read from a register) to jump to that location. Now it doesn't do that any more: branches to destination labels from the assembly code are assumed to be direct branches (to a relative offset encoded in the instruction), which don't require a BTI at their destination. This change was proposed in https://discourse.llvm.org/t/85845 and there seemed to be no disagreement. The rationale is: 1. it brings clang's handling of asm goto in Arm and AArch64 in line with gcc's, which didn't generate BTIs at the target labels in the first place. 2. it improves performance in the Linux kernel, which uses a lot of 'asm goto' in which the assembly language just contains a NOP, and the label's address is saved elsewhere to let the kernel self-modify at run time to swap between the original NOP and a direct branch to the label. This allows hot code paths to be instrumented for debugging, at only the cost of a NOP when the instrumentation is turned off, instead of the larger cost of an indirect branch. In this situation a BTI is unnecessary (if the branch happens it's direct), and since the code paths are hot, also a noticeable performance hit. Implementation: `SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each target `MachineBasicBlock`. Previously it also called `setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return true, which caused a BTI to be added in the Arm backends. Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more on asm goto targets, but `hasAddressTaken()` also checks the flag set by `setIsInlineAsmBrIndirectTarget()`. So call sites that were using `hasAddressTaken()` don't need to be modified. But the Arm backends don't call `hasAddressTaken()` any more: instead they test two more specific query functions that cover all the reasons `hasAddressTaken()` might have returned true _except_ being an asm goto target. Testing: The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual change, where it expects not to see a `bti` instruction after `[[LABEL]]`. The rest of the test changes are all churn, due to the flags on basic blocks changing. Actual output code hasn't changed in any of the existing tests, only comments and diagnostics. Further work: `RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp` also call `hasAddressTaken()` in a way that might benefit from using the same more specific check I've put in `ARMBranchTargets.cpp` and `AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit I've only changed the Arm backends, and left those alone.	2025-06-03 08:44:13 +01:00
Nikita Popov	ea096c98ae	[SDAG] Remove noundef workaround for range metadata/attributes (#141745 ) In https://reviews.llvm.org/D157685 I changed SDAG to only transfer range metadata to SDAG if it also has !noundef. At the time, this was necessary because SDAG incorrectly propagated poison when folding logical and/or to bitwise and/or. The root cause of that issue has since been addressed by https://github.com/llvm/llvm-project/pull/84924, so drop the workaround now.	2025-05-30 10:56:49 +02:00
Fabian Ritter	8adcc8a669	[SelectionDAG] Introduce ISD::PTRADD (#140017 ) This opcode represents the addition of a pointer value (first operand) and an integer offset (second operand). PTRADD nodes are only generated if the TargetMachine opts in by overriding TargetMachine::shouldPreservePtrArith(). The PTRADD node and respective visitPTRADD() function were adapted by @rgwott from the CHERI/Morello LLVM tree. Original authors: @davidchisnall, @jrtc27, @arichardson. The changes in this PR were extracted from PR #105669. --------- Co-authored-by: David Chisnall <github@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com> Co-authored-by: Alexander Richardson <alexrichardson@google.com> Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>	2025-05-28 09:09:17 +02:00
Luke Lau	3033f202f6	[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893 ) This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.	2025-05-26 18:45:12 +01:00
Pierre van Houtryve	b5e2a236b9	[CodeGen] Add SSID & Atomic Ordering to IntrinsicInfo (#140896 ) getTgtMemIntrinsic should be able to propagate such information to the MMO	2025-05-22 11:42:01 +02:00
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Kerry McLaughlin	0bc3993716	[SelectionDAG] Add an ISD node for for get.active.lane.mask (#139084 ) For now expansion still happens in SelectionDAGBuilder when GET_ACTIVE_LANE_MASK is not legal on the target. This patch also includes changes in AArch64ISelLowering to replace handling of the get.active.lane.mask intrinsic to use the ISD node. Tablegen patterns are added which match to whilelo for scalable types. A follow up change will add support for more types to be lowered to GET_ACTIVE_LANE_MASK by allowing splitting of the node.	2025-05-15 09:14:46 +01:00
YunQiang Su	780054d3ff	CodeGen: Add ISD::AssertNoFPClass (#138839 ) It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions.	2025-05-15 16:05:15 +08:00
Kazu Hirata	50e949f3cc	[IR] Teach getAsmString to return StringRef (NFC) (#139406 ) This is for consistency with #139401.	2025-05-10 22:59:09 -07:00
Philip Reames	650dca5d89	[IR] Remove the AtomicMemInst helper classes (#138710 ) Migrate their usage to the `AnyMemInst` family, and add a isAtomic() query on the base class for that hierarchy. This matches the idioms we use for e.g. isAtomic on load, store, etc.. instructions, the existing isVolatile idioms on mem* routines, and allows us to more easily share code between atomic and non-atomic variants. As with #138568, the goal here is to simplify the class hierarchy and make it easier to reason about. I'm moving from easiest to hardest, and will stop at some point when I hit "good enough". Longer term, I'd sorta like to merge or reverse the naming on the plain MemInst and the AnyMemInst, but that's a much larger and more risky change. Not sure I'm going to actually do that.	2025-05-06 14:24:40 -07:00
Philip Reames	d1b3eeb244	[SDAG] Merge memcpy and memcpy.inline lowering paths (#138619 ) This is a follow up to c0a264e, but note that there is a functional difference here: the root changes for the memcpy.inline case. This difference appears to have been accidental, but I kept this back to facility separate review in case there's something I'm missing here.	2025-05-06 07:37:44 -07:00
Philip Reames	c0a264e6a9	[IntrinsicInst] Remove MemCpyInlineInst and MemSetInlineInst [nfc] (#138568 ) I'm looking for ways to simplify the Mem*Inst class structure, and these two seem to have fairly minimal justification, so let's remove them.	2025-05-05 14:07:31 -07:00
Kazu Hirata	cdc9a4b5f8	[CodeGen] Use range-based for loops (NFC) (#138488 ) This is a reland of #138434 except that: - the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp have been dropped because they caused a test failure under asan, and - the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have been improved with structured bindings.	2025-05-05 10:08:49 -07:00
Nico Weber	1d955489c3	Revert "[CodeGen] Use range-based for loops (NFC) (#138434 )" This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b. Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs (sanitizer builds; macOS; possibly more), see comments on https://github.com/llvm/llvm-project/pull/138434	2025-05-04 17:36:52 -04:00
Kazu Hirata	a9699a334b	[CodeGen] Use range-based for loops (NFC) (#138434 )	2025-05-04 00:26:19 -07:00
Jonathan Thackray	6e49f73825	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-30 22:06:37 +01:00
YunQiang Su	db859db74d	Revert "CodeGen: Add ISD::AssertNoFPClass (#135946 )" This reverts commit f0c61d2242bbc7576ca5e4137a5ea8f63e4859a9.	2025-04-30 16:16:26 +08:00
Jonathan Thackray	7ee0097b48	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657 ) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4	2025-04-28 16:53:36 +01:00
Jonathan Thackray	ba420d8122	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-28 15:31:44 +01:00
YunQiang Su	f0c61d2242	CodeGen: Add ISD::AssertNoFPClass (#135946 ) It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions. --------- Co-authored-by: Your Name <you@example.com>	2025-04-25 09:12:41 +08:00
Craig Topper	f261f1406d	[SelectionDAG][RISCV] Teach computeKnownBits to use range metadata for atomic_load. (#137119 ) And teach SelectionDAGBuilder to get the range metadata in visitAtomicLoad. This allows us to recognize that sign extending a byte load of a boolean value from memory will produce zeros for the extended bits. This allow us to remove an AND on RISC-V. Tests copied from #136502 with range metadata added to i1 cases. Some of the test effects overlap with #136502, but that patch can't handle the acquire or seq_cst cases with the Zalasr extension. We only have sign extending versions of those loads.	2025-04-24 12:14:05 -07:00
Craig Topper	f6178cdad0	[SelectionDAG] Pass LoadExtType when ATOMIC_LOAD is created. (#136653 ) Rename one signature of getAtomic to getAtomicLoad and pass LoadExtType. Previously we had to set the extension type after the node was created, but we don't usually modify SDNodes once they are created. It's possible the node already existed and has been CSEd. If that happens, modifying the node may affect the other users. It's therefore safer to add the extension type at creation so that it is part of the CSE information. I don't know of any failures related to the current implementation. I only noticed that it doesn't match how we usually do things.	2025-04-22 09:11:46 -07:00
Philip Reames	f2ecd86e34	[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342 ) This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.	2025-04-18 07:46:31 -07:00
zhijian lin	378ac572ac	Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056 ) A new ISD::POISON SDNode is introduced to represent the poison value in the IR, replacing the previous use of ISD::UNDEF	2025-04-10 11:29:14 -04:00
Jakub Kuderski	ef1088f703	Revert "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135060 ) Reverts llvm/llvm-project#125883 This PR causes crashes in RISC-V codegen around f16/f64 poison values: https://github.com/llvm/llvm-project/pull/125883#issuecomment-2787048206	2025-04-09 14:40:56 -04:00
zhijian lin	8fddef8483	[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR. (#125883 ) A new ISD::POISON SDNode is introduced to represent the `poison value` in the IR, replacing the previous use of ISD::UNDEF.	2025-04-07 10:03:05 -04:00
Philip Reames	c90a536bcf	[CodeGen] Simplify code using TypeSize overloads of getMachineMemOperand [nfc] These were added in d584cea. This change runs through existing uses and simplifies where obvious.	2025-03-27 11:47:51 -07:00
pzzp	d6a2cca77e	[llvm:ir] Add support for constant data exceeding 4GiB (#126481 ) The test file is over 4GiB, which is too big, so I didn’t submit it.	2025-03-21 11:44:01 -07:00
yonghong-song	0ffe83feac	[SelectionDAG] Not issue TRAP node if naked function (#132147 ) In [1], Nikita Popov suggested that during lowering 'unreachable' insn should not generate extra code for naked functions, and this applies to all architectures. Note that for naked functions, 'unreachable' insn is necessary in IR since the basic block needs a terminator to end. This patch checked whether a function is naked function or not. If it is a naked function, 'unreachable' insn will not generate ISD::TRAP. [1] https://github.com/llvm/llvm-project/pull/131731 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>	2025-03-20 18:18:03 -07:00
Phoebe Wang	64555e3d48	[X86][NFCI] Add IsStore parameter to hasConditionalLoadStoreForType (#132153 ) Address https://github.com/llvm/llvm-project/pull/132032#issuecomment-2736936769	2025-03-20 18:25:09 +08:00
Diana Picus	e17b3cdfb3	[AMDGPU] Dynamic VGPR support for llvm.amdgcn.cs.chain (#130094 ) The llvm.amdgcn.cs.chain intrinsic has a 'flags' operand which may indicate that we want to reallocate the VGPRs before performing the call. A call with the following arguments: ``` llvm.amdgcn.cs.chain %callee, %exec, %sgpr_args, %vgpr_args, /flags/0x1, %num_vgprs, %fallback_exec, %fallback_callee ``` is supposed to do the following: - copy the SGPR and VGPR args into their respective registers - try to change the VGPR allocation - if the allocation has succeeded, set EXEC to %exec and jump to %callee, otherwise set EXEC to %fallback_exec and jump to %fallback_callee This patch implements the dynamic VGPR behaviour by generating an S_ALLOC_VGPR followed by S_CSELECT_B32/64 instructions for the EXEC and callee. The rest of the call sequence is left undisturbed (i.e. identical to the case where the flags are 0 and we don't use dynamic VGPRs). We achieve this by introducing some new pseudos (SI_CS_CHAIN_TC_Wn_DVGPR) which are expanded in the SILateBranchLowering pass, just like the simpler SI_CS_CHAIN_TC_Wn pseudos. The main reason is so that we don't risk other passes (particularly the PostRA scheduler) introducing instructions between the S_ALLOC_VGPR and the jump. Such instructions might end up using VGPRs that have been deallocated, or the wrong EXEC mask. Once the whole backend treats S_ALLOC_VGPR and changes to EXEC as barriers for instructions that use VGPRs, we could in principle move the expansion earlier (but in the absence of a good reason for that my personal preference is to keep it later in order to make debugging easier). Since the expansion happens after register allocation, we're careful to select constants to immediate operands instead of letting ISel generate S_MOVs which could interfere with register allocation (i.e. make it look like we need more registers than we actually do). For GFX12, S_ALLOC_VGPR only works in wave32 mode, so we bail out during ISel in wave64 mode. However, we can define the pseudos for wave64 too so it's easy to handle if future generations support it. --------- Co-authored-by: Ana Mihajlovic <Ana.Mihajlovic@amd.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-03-20 08:38:04 +01:00
Heejin Ahn	494fe0b414	[WebAssembly] Remove wasm-specific findWasmUnwindDestinations (#130374 ) Unlike in Itanium EH IR, WinEH IR's unwinding instructions (e.g. `invoke`s) can have multiple possible unwind destinations. For example: ```ll entry: invoke void @foo() to label %cont unwind label %catch.dispatch catch.dispatch: ; preds = %entry %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ; preds = %catch.dispatch %1 = catchpad within %0 [ptr null] ... terminate: ; preds = %catch.dispatch %2 = catchpad within none [] ... ... ``` In this case, if an exception is not caught by `catch.dispatch` (and thus `catch.start`), it should next unwind to `terminate`. `findUnwindDestination` in ISel gathers the list of this unwind destinations traversing the unwind edges: `ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2089-L2150)` But we don't use that, and instead use our custom `findWasmUnwindDestinations` that only adds the first unwind destination, `catch.start`, to the successor list of `entry`, and not `terminate`: `ae42f07103/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (L2037-L2087)` The reason behind it was, as described in the comment block in the code, it was assumed that there always would be an `invoke` that connects `catch.start` and `terminate`. In case of `catch (type)`, there will be `call void @llvm.wasm.rethrow()` in `catch.start`'s predecessor that unwinds to the next destination. For example: `0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L429-L430)` In case of `catch (...)`, `__cxa_end_catch` can throw, so it becomes an `invoke` that unwinds to the next destination. For example: `0db702ac8e/llvm/test/CodeGen/WebAssembly/exception.ll (L537-L538)` So the unwind ordering relationship between `catch.start` and `terminate` here would be preserved. But turns out this assumption does not always hold. For example: ```ll entry: invoke void @foo() to label %cont unwind label %catch.dispatch catch.dispatch: ; preds = %entry %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ; preds = %catch.dispatch %1 = catchpad within %0 [ptr null] ... call void @_ZSt9terminatev() unreachable terminate: ; preds = %catch.dispatch %2 = catchpad within none [] call void @_ZSt9terminatev() unreachable ... ``` In this case there is no `invoke` that connects `catch.start` to `terminate`. So after `catch.dispatch` BB is removed in ISel, `terminate` is considered unreachable and incorrectly removed in DCE. This makes Wasm just use the general `findUnwindDestination`. In that case `entry`'s successor is going to be [`catch.start`, `terminate`]. We can get the first unwind destination by just traversing the list from the front. --- This required another change in WinEHPrepare. WinEHPrepare demotes all PHIs in EH pads because they are funclets in Windows and funclets can't have PHIs. When used in Wasm they are not funclets so we don't need to do that wholesale but we still need to demote PHIs in `catchswitch` BBs because they are deleted during ISel. (So we created [`-demote-catchswitch-only`](`a5588b6d20/llvm/lib/CodeGen/WinEHPrepare.cpp (L57-L59)`) option for that) But turns out we need to remove PHIs that have a `catchswitch` BB as an incoming block too: ```ll ... catch.dispatch: %0 = catchswitch within none [label %catch.start] unwind label %terminate catch.start: ... somebb: ... ehcleanup ; preds = %catch.dispatch, %somebb %1 = phi i32 [ 10, %catch.dispatch ], [ 20, %somebb ] ... ``` In this case the `phi` in `ehcleanup` BB should be demoted too because `catch.dispatch` BB will be removed in ISel so one if its incoming block will be gone. This pattern didn't manifest before presumably due to how `findWasmUnwindDestinations` worked. (In this example, in our `findWasmUnwindDestinations`, `catch.dispatch` would have had only one successor, `catch.start`. But now `catch.dispatch` has both `catch.start` and `ehcleanup` as successors, revealing this bug. This case is [represented](`ab87206c4b/llvm/test/CodeGen/WebAssembly/exception.ll (L445)`) by `rethrow_terminator` function in `exception.ll` (or `exception-legacy.ll`) and without the WinEHPrepare fix it will crash. --- Discovered by the reproducer provided in #126916, even though the bug reported there was not this one.	2025-03-10 20:56:38 -07:00

1 2 3 4 5 ...

2162 Commits