llvm-project

Author	SHA1	Message	Date
Björn Pettersson	5e7924a3cb	[SelectionDAG] Handle more opcodes in isGuaranteedNotToBeUndefOrPoison (#147019 ) Add special handling of EXTRACT_SUBVECTOR, INSERT_SUBVECTOR, EXTRACT_VECTOR_ELT, INSERT_VECTOR_ELT and SCALAR_TO_VECTOR in isGuaranteedNotToBeUndefOrPoison. Make use of DemandedElts to improve the analysis and only check relevant elements for each operand. Also start using DemandedElts in the recursive calls that check isGuaranteedNotToBeUndefOrPoison for all operands for operations that do not create undef/poison. We can do that for a number of elementwise operations for which the DemandedElts can be applied to every operand (e.g. ADD, OR, BITREVERSE, TRUNCATE).	2025-08-14 09:05:15 +00:00
Nikita Popov	d1952baa5d	[CodeGen] Remove unnecessary setTypeListBeforeSoften() parameter (NFC) It does not make sense to set the softening type list without setting IsSoften=true.	2025-08-14 10:04:56 +02:00
XChy	f393f2a61e	[BranchFolding] Avoid moving blocks to fall through to an indirect target (#152916 ) Depend on #152591 to fix https://github.com/llvm/llvm-project/issues/149023. Similar to an EH pad, there is no real advantage in "falling through" to an indirect target of an INLINEASM_BR. And multiple indirect targets of inline asm at the end of a function may be rotated infinitely. Therefore, this patch avoids such optimization on indirect target of inline asm as fall through.	2025-08-14 16:18:36 +09:00
Craig Topper	9f96e3f80f	[SelectionDAG] Pass SDValue to InstrEmitter::EmitCopyFromReg. NFC (#153485 ) Instead of passing SDNode and ResNo separately. This allows us to use SDValue::operator== and avoid creating SDValue from the operands inside the function.	2025-08-13 21:46:48 -07:00
Carl Ritson	e4fd6ba682	[PHIElimination] Preserve MachinePostDominatorTree (#153346 ) Minor changes to allow preservation of post dominator tree through PHI elimination pass. Also remove duplicate retrieval of dominator tree analysis. This is a speculative change to support reworking on passes in AMDGPU backend.	2025-08-14 10:50:46 +09:00
David Green	c5105c1e0a	[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364 ) For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into two halfs, the scalar i128 remainder was causing problems, causing a crash with invalid vector types. This makes sure they are handled correctly in fewerElementsBitcast.	2025-08-13 22:27:53 +01:00
Amy Kwan	63cc2e390d	[PowerPC][CodeGen] Expand ISD::AssertNoFPClass for ppc_fp128 (#152357 ) 780054d3ff18075a6bc433029f336931792b1d2d added support for `ISD::AssertNoFPClass`. This ISD node can be used with the `ppc_fp128` type, which is really just two `f64s` and requires expanding when used with `ISD::AssertNoFPClass`. Without the support for expanding the result, we get an assertion because the legalizer does not know how to expand the results of `ppc_fp128` with `ISD::AssertNoFPClass`. ``` ExpandFloatResult #0: t7: ppcf128 = AssertNoFPClass t5, TargetConstant:i32<3> LLVM ERROR: Do not know how to expand the result of this operator! ``` Thus, this patch aims to add support for the expand so we no longer assert. This fixes #151375.	2025-08-13 15:00:32 -04:00
Nikita Popov	498ef361fe	[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414 ) https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.	2025-08-13 18:42:26 +02:00
Nikita Popov	240c454c4d	[CodeGen] Remove default ctors for InputArg and OutputArg (#153205 ) These make it easy to forget to initialize some members, like the newly added OrigTy. Force these to always go through the ctor instead.	2025-08-13 10:51:43 +02:00
Matt Arsenault	db126d8004	CodeGen: Make MachineFunction's subtarget member a reference (#153352 )	2025-08-13 16:22:32 +09:00
Shoreshen	db96363c0a	[AMDGPU] Avoid put implicit_def into bundle that break reg's liveness (#142563 ) Cause: 1. `implicit_def` inside bundle does not count for define of reg in machineinst verifier 2. Including `implicit_def` will cause relative reg not define, result in `Bad machine code: Using an undefined physical register` in the machineinst verifier Fixes https://github.com/llvm/llvm-project/issues/139102 --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-08-13 10:41:44 +08:00
Philip Reames	49b17a0c1c	[MIR] Further cleanup on mutliple save/restore point support [nfc] (#153250 ) Remove the type alias now that the std::variant aspect is gone, directly using std::vector in the few places that need it is more idiomatic. Move a routine from a core header to single user.	2025-08-12 14:16:41 -07:00
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Daniel Paoliello	c430e06fb5	[win][arm64ec] Fix duplicate errors with the dontcall attribute (#152810 ) Since the `dontcall-` attributes are checked both by `FastISel`/`GlobalISel` and `SelectionDAGBuilder`, and both `FastISel` and `GlobalISel` bail for calls on Arm64EC for AFTER doing the check, we ended up emitting duplicate copies of this error. This change moves the checking for `dontcall-` in `FastISel` and `GlobalISel` to after it has been successfully lowered.	2025-08-12 11:05:07 -07:00
Elizaveta Noskova	bbde6be841	[llvm] Support multiple save/restore points in mir (#119357 ) Currently mir supports only one save and one restore point specification: ``` savePoint: '%bb.1' restorePoint: '%bb.2' ``` This patch provide possibility to have multiple save and multiple restore points in mir: ``` savePoints: - point: '%bb.1' restorePoints: - point: '%bb.2' ``` Shrink-Wrap points split Part 3. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 Part 2: https://github.com/llvm/llvm-project/pull/119355 Part 4: https://github.com/llvm/llvm-project/pull/119358 Part 5: https://github.com/llvm/llvm-project/pull/119359	2025-08-12 16:34:29 +03:00
XChy	2a49719525	[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to verify (#152591 ) Partially fix #149023. The original code `MRI.def_begin(Reg)->getParent()` may return the incorrect MI, as the physical register `Reg` may have multiple definitions. This patch selects the correct MI to verify by comparing the MBB of each definition. New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to blame.	2025-08-12 12:37:56 +00:00
David Sherwood	7f763d9b48	[AArch64] Support symmetric complex deinterleaving with higher factors (#151295 ) For loops such as this: ``` struct foo { double a, b; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; } } ``` the complex deinterleaving pass will spot that the deinterleaving associated with the structured loads cancels out the interleaving associated with the structured stores. This happens even though they are not truly "complex" numbers because the pass can handle symmetric operations too. This is great because it means we can then perform normal loads and stores instead. However, we can also do the same for higher interleave factors, e.g. 4: ``` struct foo { double a, b, c, d; }; void foo(struct foo dst, struct foo src, int n) { for (int i = 0; i < n; i++) { dst[i].a += src[i].a * 3.2; dst[i].b += src[i].b * 3.2; dst[i].c += src[i].c * 3.2; dst[i].d += src[i].d * 3.2; } } ``` This PR extends the pass to effectively treat such structures as a set of complex numbers, i.e. ``` struct foo_alt { std::complex<double> x, y; }; ``` with equivalence between members: ``` foo_alt.x.real == foo.a foo_alt.x.imag == foo.b foo_alt.y.real == foo.c foo_alt.y.imag == foo.d ``` I've written the code to handle sets with arbitrary numbers of complex values, but since we only support interleave factors between 2 and 4 I've restricted the sets to 1 or 2 complex numbers. Also, for now I've restricted support for interleave factors of 4 to purely symmetric operations only. However, it could also be extended to handle complex multiplications, reductions, etc. Fixes: https://github.com/llvm/llvm-project/issues/144795	2025-08-12 11:05:15 +01:00
Seraphimt	296e057d0b	[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::FMA/FMAD + tests (#152187 ) In SelectionDAG::canCreateUndefOrPoison add case ISD::FMA/FMAD + tests. Fixing #147693 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-12 17:17:46 +09:00
Benjamin Maxwell	d0c9599c41	[AArch64][SME] Use entry pstate.sm for conditional streaming-mode changes (#152169 ) We only do conditional streaming mode changes in two cases: - Around calls in streaming-compatible functions that don't have a streaming body - At the entry/exit of streaming-compatible functions with a streaming body In both cases, the condition depends on the entry pstate.sm value. Given this, we don't need to emit calls to __arm_sme_state at every mode change. This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node in the entry block and copying the result to a register. The register is then used whenever we need to emit a conditional streaming mode change. The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only if (after SelectionDAG) the function is determined to have streaming-mode changes. This has two main advantages: 1. It allows back-to-back conditional smstart/stop pairs to be folded 2. It has the correct behaviour for EH landing pads - These are entered with pstate.sm = 0, and should switch mode based on the entry pstate.sm - Note: This is not fully implemented yet	2025-08-12 09:15:30 +01:00
Stephen Long	19ada02086	PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744 ) Similar to ab976a1, but for llvm.log.	2025-08-12 01:04:28 +09:00
Wesley Wiser	40a469f79a	Reapply "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#152239 ) The first commit is identical to 69bec0afbb8f2aa0021d18ea38768360b16583a9. The second commit fixes the instruction verification failures by replacing the erroneous instruction with a trap after the error is reported and adds `-verify-machineinstrs` to the tests added in the original PR to catch the issue sooner. After that change, all tests pass with both `LLVM_ENABLE_EXPENSIVE_CHECKS={On,Off}`. cc @RKSimon @e-kud @phoebewang @arsenm as reviewers on the original PR	2025-08-11 21:23:44 +05:30
Fabian Ritter	96775e9229	[GISel] Handle Flags in G_PTR_ADD Combines (#152495 ) So far, GlobalISel's G_PTR_ADD combines have ignored MIFlags like nuw, nusw, and inbounds. That was in many cases unnecessarily conservative and in others unsound, since reassociations re-used the existing G_PTR_ADD instructions without invalidating their flags. This patch aims to improve that. I've checked the transforms in this PR with Alive2 on corresponding middle-end IR constructs. A longer-term goal would be to encapsulate the logic that determines which GEP/ISD::PTRADD/G_PTR_ADD flags can be preserved in which case, since this occurs in similar forms in the middle end, the SelectionDAG combines, and the GlobalISel combines here. For SWDEV-516125.	2025-08-11 10:34:45 +02:00
Nikita Popov	e92b7e9641	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709 ) It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.	2025-08-11 08:57:53 +02:00
Yingwei Zheng	62735d26b1	[DAGCombine] Correctly extend the constant RHS in `TargetLowering::SimplifySetCC` (#152862 ) In https://github.com/llvm/llvm-project/pull/150270, when the predicate is eq/ne and the trunc has only an nsw flag, the RHS is incorrectly zero-extended. Closes https://github.com/llvm/llvm-project/issues/152630.	2025-08-10 01:24:37 +08:00
Kazu Hirata	e98b8cbf55	[MIRParser] Remove an unnecessary cast (NFC) (#152835 ) peekDebugInstrNum() already returns unsigned.	2025-08-09 06:57:51 -07:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Kazu Hirata	9beb18a6f0	[CodeGen] Remove an unnecessary cast (NFC) (#152643 ) getUnitInc() already returns int.	2025-08-08 07:44:51 -07:00
woruyu	95b16d1264	[DAG] Fold trunc(abdu(x,y)) and trunc(abds(x,y)) if they have sufficient leading zero/sign bits (#151471 ) This PR resolves https://github.com/llvm/llvm-project/issues/147683 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-08 10:43:14 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Benjamin Maxwell	94c48a21bb	[AArch64][SVE] Fix hang in VECTOR_HISTOGRAM DAG combine (#152539 ) The histogram DAG combine went into an infinite loop of creating the same histogram node due to an incorrect use of the `refineUniformBase` and `refineIndexType` APIs. These APIs take SDValues by reference (SDValue&) and return `true` if they were "refined" (i.e., set to new values). Previously, this DAG combine would create the `Ops` array (used to create the new histogram node) before calling the `refine*` APIs, which copies the SDValues into the array, meaning the updated values were not used to create the new histogram node. Reproducer: https://godbolt.org/z/hsGWhTaqY (it will timeout)	2025-08-08 09:59:24 +01:00
David Stuttard	c7c0229480	Revert "[AMDGPU] SelectionDAG divergence tracking should take into account Target divergency. (#147560 )" (#152548 ) This reverts commit 9293b65a616b8de432a654d046e802540b146372.	2025-08-08 09:05:59 +01:00
zhijian lin	093439c688	[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093 ) AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __memcmp routine is a millicode implementation; we use millicode for the memcmp function instead of a library call to improve performance.	2025-08-07 13:13:56 -04:00
Kazu Hirata	4be22dabc5	[CodeGen] Remove an unnecessary cast (NFC) (#152441 ) getActiveBits() already returns unsigned.	2025-08-07 07:22:42 -07:00
Chaitanya Koparkar	6ce68d3a12	[DAG] canCreateUndefOrPoison - add FP_EXTEND (#152249 ) Fixes https://github.com/llvm/llvm-project/issues/152141	2025-08-07 09:23:46 +01:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Jann Horn	3f0c180ca0	[DebugInfo][DWARF] Add heapallocsite information (#132073 ) LLVM currently stores heapallocsite information in CodeView debuginfo, but not in DWARF debuginfo. Plumb it into DWARF as an LLVM-specific extension. heapallocsite debug information is useful when it is combined with allocator instrumentation that stores caller addresses; I've used a previous version of this patch for: - analyzing memory usage by object type - analyzing the distributions of values of class members Other possible uses might be: - attributing memory access profiles (for example, on Intel CPUs, from PEBS records with Linear Data Address) to object types or specific object members - adding type information to crash/ASAN reports	2025-08-06 10:34:58 -07:00
Craig Topper	57045a137f	[DAGCombiner] Avoid repeated calls to WideVT.getScalarSizeInBits() in DAGCombiner::mergeTruncStores. NFC (#152231 ) We already have a variable, WideNumBits, that contains the same information. Use it and delay the creation of WideVT until we really need it.	2025-08-06 09:10:02 -07:00
Simon Pilgrim	c4f6d34674	[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945 ) Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.	2025-08-06 14:55:46 +01:00
Diana Picus	14cd133931	Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286 ) Reverts llvm/llvm-project#145859 because it broke a HIP test: ``` [34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane ```	2025-08-06 12:24:52 +02:00
Diana Picus	0461cd3d1d	[AMDGPU] Intrinsic for launching whole wave functions (#145859 ) Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.	2025-08-06 10:25:53 +02:00
Alex MacLean	d27802a217	[DAGCombiner] Fold setcc of trunc, generalizing some NVPTX isel logic (#150270 ) That change adds support for folding a SETCC when one or both of the operands is a TRUNCATE with the appropriate no-wrap flags. This pattern can occur when promoting i8 operations in NVPTX, and we currently have some ISel rules to try to handle it.	2025-08-05 19:20:17 -07:00
Craig Topper	73685583c8	[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593 ) There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.	2025-08-05 16:12:42 -07:00
Jann	da6424c9e3	[DebugInfo][DWARF] Don't emit bogus DW_AT_call_target for complex calls (#151378 ) On X86-64, LLVM currently generates the same DWARF debug info for `call rax` and `call [rax]`; in both cases, the generated DWARF claims that the call goes to address RAX. This bug occurs because the X86 machine instructions CALL64r and CALL64m both receive register operands, but those register operands have different semantics. To fix it, change DwarfDebug::constructCallSiteEntryDIEs() to validate the callee operand's semantics (`OperandType`) and make sure it is not semantically describing a memory location. This fix will result in less DW_TAG_call_site and DW_AT_call_target entries being generated. There is an existing test in dwarf-callsite-related-attrs.ll that asserts the broken behavior; remove the broken check, and instead add a new test dwarf-callsite-related-attrs-indirect.ll that checks behavior for indirect calls. The existing test xray-custom-log.ll is validating something even more broken: It checks the debug info generated by a PATCHABLE_EVENT_CALL. `TII->getCalleeOperand()` assumes that the first argument of a call instruction is always the destination, but the first argument of PATCHABLE_EVENT_CALL is instead the event structure; and so we were emitting debug info claiming the callee was stored in a register that actually contains some kind of xray event descriptor, and the test validates that this happens. I am breaking and deleting this test. I guess the intent there might have been to validate that we emit debuginfo referencing the target of the direct call that LLVM emits (which we don't do)? But I'm not sure.	2025-08-05 13:25:01 -07:00
Kazu Hirata	94dc3c6c49	[GlobalISel] Remove an unnecessary cast (NFC) (#152086 ) getImm() already returns int64_t.	2025-08-05 07:39:06 -07:00
Kazu Hirata	86ab5dc583	[AsmPrinter] Remove an unnecessary cast (NFC) (#152085 ) getValue() already returns uint64_t.	2025-08-05 07:38:58 -07:00
KRM7	ee47427386	[RegisterCoalescer] Fix subrange update when rematerialization widens a def (#151974 ) Currently, when an instruction rematerialized by the register coalescer defines more subregs of the destination register than the original COPY instruction did, we only add dead defs for the newly defined subregs if they were not defined anywhere else. For example, consider something like this before rematerialization: ``` %0:reg64 = CONSTANT 1 %1:reg128.sub_lo64_lo32 = COPY %0.lo32 %1:reg128.sub_lo64_hi32 = ... ... ``` that would look like this after rematerializing `%0`: ``` %0:reg64 = CONSTANT 2 %1:reg128.sub_lo64 = CONSTANT 2 %1:reg128.sub_lo64_hi32 = ... ... ``` A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd instruction because it's subrange wasn't empty beforehand.	2025-08-05 22:32:31 +09:00
Simon Pilgrim	9f50224b25	[DAG] Remove Depth=1 hack from isGuaranteedNotToBeUndefOrPoison checks (#152127 ) Now that #146490 removed the assertion in visitFreeze to assert that the node was still isGuaranteedNotToBeUndefOrPoison we no longer need this reduced depth hack (which had to account for the difference in depth of freeze(op()) vs op(freeze()) Helps with some of the minor regressions in #150017	2025-08-05 13:35:04 +01:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Simon Pilgrim	d561259a08	[DAG] visitFREEZE - replace multiple frozen/unfrozen uses of an SDValue with just the frozen node (#150017 ) Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of hasOneUse() problems when trying to push FREEZE nodes through the DAG. Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we now want to keep the common node, and not bypass for some nodes just because of DemandedElts. Fixes #149799	2025-08-05 09:24:09 +01:00

1 2 3 4 5 ...

38221 Commits