llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	c4f6d34674	[DAG] getNode - fold (sext (trunc x)) -> x iff the upper bits are already signbits (#151945 ) Similar to what we already do for ZERO_EXTEND/ANY_EXTEND patterns.	2025-08-06 14:55:46 +01:00
Diana Picus	14cd133931	Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286 ) Reverts llvm/llvm-project#145859 because it broke a HIP test: ``` [34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane ```	2025-08-06 12:24:52 +02:00
Diana Picus	0461cd3d1d	[AMDGPU] Intrinsic for launching whole wave functions (#145859 ) Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.	2025-08-06 10:25:53 +02:00
Alex MacLean	d27802a217	[DAGCombiner] Fold setcc of trunc, generalizing some NVPTX isel logic (#150270 ) That change adds support for folding a SETCC when one or both of the operands is a TRUNCATE with the appropriate no-wrap flags. This pattern can occur when promoting i8 operations in NVPTX, and we currently have some ISel rules to try to handle it.	2025-08-05 19:20:17 -07:00
Craig Topper	73685583c8	[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593 ) There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.	2025-08-05 16:12:42 -07:00
Jann	da6424c9e3	[DebugInfo][DWARF] Don't emit bogus DW_AT_call_target for complex calls (#151378 ) On X86-64, LLVM currently generates the same DWARF debug info for `call rax` and `call [rax]`; in both cases, the generated DWARF claims that the call goes to address RAX. This bug occurs because the X86 machine instructions CALL64r and CALL64m both receive register operands, but those register operands have different semantics. To fix it, change DwarfDebug::constructCallSiteEntryDIEs() to validate the callee operand's semantics (`OperandType`) and make sure it is not semantically describing a memory location. This fix will result in less DW_TAG_call_site and DW_AT_call_target entries being generated. There is an existing test in dwarf-callsite-related-attrs.ll that asserts the broken behavior; remove the broken check, and instead add a new test dwarf-callsite-related-attrs-indirect.ll that checks behavior for indirect calls. The existing test xray-custom-log.ll is validating something even more broken: It checks the debug info generated by a PATCHABLE_EVENT_CALL. `TII->getCalleeOperand()` assumes that the first argument of a call instruction is always the destination, but the first argument of PATCHABLE_EVENT_CALL is instead the event structure; and so we were emitting debug info claiming the callee was stored in a register that actually contains some kind of xray event descriptor, and the test validates that this happens. I am breaking and deleting this test. I guess the intent there might have been to validate that we emit debuginfo referencing the target of the direct call that LLVM emits (which we don't do)? But I'm not sure.	2025-08-05 13:25:01 -07:00
Kazu Hirata	94dc3c6c49	[GlobalISel] Remove an unnecessary cast (NFC) (#152086 ) getImm() already returns int64_t.	2025-08-05 07:39:06 -07:00
Kazu Hirata	86ab5dc583	[AsmPrinter] Remove an unnecessary cast (NFC) (#152085 ) getValue() already returns uint64_t.	2025-08-05 07:38:58 -07:00
KRM7	ee47427386	[RegisterCoalescer] Fix subrange update when rematerialization widens a def (#151974 ) Currently, when an instruction rematerialized by the register coalescer defines more subregs of the destination register than the original COPY instruction did, we only add dead defs for the newly defined subregs if they were not defined anywhere else. For example, consider something like this before rematerialization: ``` %0:reg64 = CONSTANT 1 %1:reg128.sub_lo64_lo32 = COPY %0.lo32 %1:reg128.sub_lo64_hi32 = ... ... ``` that would look like this after rematerializing `%0`: ``` %0:reg64 = CONSTANT 2 %1:reg128.sub_lo64 = CONSTANT 2 %1:reg128.sub_lo64_hi32 = ... ... ``` A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd instruction because it's subrange wasn't empty beforehand.	2025-08-05 22:32:31 +09:00
Simon Pilgrim	9f50224b25	[DAG] Remove Depth=1 hack from isGuaranteedNotToBeUndefOrPoison checks (#152127 ) Now that #146490 removed the assertion in visitFreeze to assert that the node was still isGuaranteedNotToBeUndefOrPoison we no longer need this reduced depth hack (which had to account for the difference in depth of freeze(op()) vs op(freeze()) Helps with some of the minor regressions in #150017	2025-08-05 13:35:04 +01:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Simon Pilgrim	d561259a08	[DAG] visitFREEZE - replace multiple frozen/unfrozen uses of an SDValue with just the frozen node (#150017 ) Similar to InstCombinerImpl::freezeOtherUses, attempt to ensure that we merge multiple frozen/unfrozen uses of a SDValue. This fixes a number of hasOneUse() problems when trying to push FREEZE nodes through the DAG. Remove SimplifyMultipleUseDemandedBits handling of FREEZE nodes as we now want to keep the common node, and not bypass for some nodes just because of DemandedElts. Fixes #149799	2025-08-05 09:24:09 +01:00
Craig Topper	a3a8e1c064	[TargetLowering][RISCV] Use sra for (X & -256) == 256 -> (X >> 8) == 1 if it yields a better icmp constant. (#151762 ) If using srl does not produce a legal constant for the RHS of the final compare, try to use sra instead. Because the AND constant is negative, the sign bits participate in the compare. Using an arithmetic shift right duplicates that bit.	2025-08-04 09:00:41 -07:00
woruyu	38bfe9ae56	[DAG] combineVSelectWithAllOnesOrZeros - missing freeze (#150388 ) This PR resolves https://github.com/llvm/llvm-project/issues/150069 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-04 15:55:12 +01:00
Kazu Hirata	76abef60b0	[CodeGen] Remove an unnecessary cast (NFC) (#151901 ) getOpcode() already returns unsigned.	2025-08-04 07:40:10 -07:00
Simon Pilgrim	6de745b547	Revert "[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets" (#151975 ) Reverts llvm/llvm-project#123872 - this is breaking on EXPENSIVE_CHECKS builds Co-authored-by: Abhishek Kaushik <abhishek.kaushik@intel.com>	2025-08-04 15:32:28 +01:00
Simon Pilgrim	5c2054a4ea	[DAG] getMinMaxOpcodeForFP - split if-else chain. NFC. (#151938 ) (style) All cases return so split the chain	2025-08-04 15:32:08 +01:00
Abhishek Kaushik	1c0ac80d4a	[DAG] Combine `store + vselect` to `masked_store` (#145176 ) Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.	2025-08-04 19:05:36 +05:30
Sander de Smalen	ed5bd23867	Revert "Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408 )" This reverts commit bae8f1336db6a7f3288a7dcf253f2d484743b257. Some issues were found: * https://github.com/llvm/llvm-project/issues/151768 * https://github.com/llvm/llvm-project/issues/151592 * https://github.com/llvm/llvm-project/pull/134408#issuecomment-3145468321 * https://github.com/llvm/llvm-project/issues/151888#issuecomment-3149286820 I'll revert this for the time being while I investigate.	2025-08-04 12:07:30 +00:00
Fabian Ritter	95191d5460	[GISel] Set more MIFlags when translating GEPs (#151708 ) The IRTranslator sets the flags now more consistently with `SelectionDAGBuilder::visitGetElementPtr()`. This affects `nuw` and `nusw`, as well as the recently introduced `inbounds` MIFlag (see PR #150900). This PR also adds more tests to `AArch64/GlobalISel/irtranslator-gep-flags.ll` to cover all points in `IRTranslator::translateGetElementPtr` that set flags. For SWDEV-516125.	2025-08-04 13:25:33 +02:00
jyli0116	961a4aabf8	[GlobalISel] Add constant matcher for APInt (#151357 ) Changed m_SpecificICst, m_SpecificICstSplat and m_SpecificICstorSplat to match against APInt as well.	2025-08-04 09:47:21 +01:00
Nikita Popov	86727fe9a1	[IR] Allow poison argument to lifetime markers (#151148 ) This slightly relaxes the invariant established in #149310, by also allowing the lifetime argument to be poison. This is to support the typical pattern of RAUWing with poison when removing an instruction. It's worth noting that this does not require any conservative assumptions, lifetimes with poison arguments can simply be skipped. Fixes https://github.com/llvm/llvm-project/issues/151119.	2025-08-04 10:02:04 +02:00
Fangrui Song	d6c2e53151	MCSymbolXCOFF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 18:18:44 -07:00
Fangrui Song	570e09047c	MCSymbolWasm: Remove classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 17:28:33 -07:00
Fangrui Song	b51ff2705f	MCSymbolELF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 16:05:35 -07:00
Fangrui Song	e640ca8b9a	MCSymbolELF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 15:45:36 -07:00
Connector Switch	8b7f81f2de	[NFC] Fix `assignment` typo. (#151864 )	2025-08-03 22:32:00 +08:00
Wesley Wiser	69bec0afbb	[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge frame offsets (#123872 ) The assertion previously did not work correctly because the operand was being truncated to an `int` prior to comparison. Change the assertion into a a reported error as suggested in https://github.com/llvm/llvm-project/pull/101840#issuecomment-2304992425 by @arsenm Finally, fix the lowering on 64-bit targets so that offsets larger than 32-bit are correctly addressed and add tests for various reported issues.	2025-08-03 15:36:23 +05:30
Min-Yih Hsu	7ebbbd885f	[DAG] Always use stack to promote bitcast when the source is vector (#151065 ) The optimization introduced by #125637 tried to avoid using stacks to promote bitcast with vector result type. However, it wouldn't be correct if the input type is vector. This patch limits that optimizations to only scalar to vector bitcasts.	2025-08-02 15:32:10 -07:00
Craig Topper	f952a84f2f	[TargetLowering] Use getShiftAmountConstant in buildSDIVPow2WithCMov.	2025-08-02 10:50:46 -07:00
AZero13	23022a4683	[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736 ) This works on all cases much like the XOR case above it in SelectionDAG.	2025-08-01 14:46:51 -07:00
Paul Walker	ceb2b9c141	[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << C1)). (#150651 )	2025-08-01 11:42:45 +01:00
黃國庭	f04ea2ef1c	Add m_SelectCCLike matcher to match SELECT_CC or SELECT with SETCC (#149646 ) Fix #147282 and Follow-up to #148834 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-01 10:12:05 +01:00
David Sherwood	05b16aff0f	[DAGCombiner] Add combine for vector interleave of splats (#151110 ) This patch adds two DAG combines: 1. vector_interleave(splat, splat, ...) -> {splat,splat,...} 2. concat_vectors(splat, splat, ...) -> wide_splat where all the input splats are identical. Both of these together enable us to fold concat_vectors(vector_interleave(splat, splat, ...)) into a wide splat. Post-legalisation we must only do the concat_vector combine if the wider type and splat operation is legal. For fixed-width vectors the DAG combine only occurs for interleave factors of 3 or more, however it's not currently safe to test this for AArch64 since there isn't any lowering support for fixed-width interleaves. I've only added fixed-width tests for RISCV.	2025-08-01 09:58:05 +01:00
Ruiling, Song	451912a24a	[MachineScheduler] Make cluster check more efficient (#150884 )	2025-08-01 16:00:42 +08:00
Shilei Tian	faa4c4c2dc	[RegAlloc] Fix use-after-free in `RegAllocBase::cleanupFailedVReg` (#151435 ) #128400 introduced a use-after-free bug in `RegAllocBase::cleanupFailedVReg` when removing intervals from regunits. The issue is from the `InterferenceCache` in `RAGreedy`, which holds `LiveRange*`. The current `InterferenceCache` APIs make it difficult to update it, and there isn't a straightforward way to do that. Since #128400 already mentions it's not clear about the necessity of removing intervals from regunits, this PR avoids the issue by simply skipping that step. Fixes SWDEV-527146.	2025-08-01 00:07:57 -04:00
Prabhu Rajasekaran	7c6a1c3d15	[llvm][AsmPrinter] Emit call graph section Collect the necessary information for constructing the call graph section, and emit to .callgraph section of the binary. MD5 hash of the callee_type metadata string is used as the numerical type id emitted. Reviewers: ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87576	2025-07-31 13:38:20 -07:00
Craig Topper	2737d013a0	[SelectionDAG] Improve the doxygen description for SDValue::isOperandOf. NFC (#151244 ) SDValue::isOperandOf checks the result number in addition to the SDNode. SDNode::isOperandOf only checks the SDNode.	2025-07-31 12:58:27 -07:00
Florian Hahn	078d214672	[TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911 ) https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive tail duplication for computed-gotos in both pre- and post-regalloc tail duplication. In some cases, performing tail-duplication too early can lead to worse results, especially if we duplicate blocks with a number of phi nodes. This is causing a ~3% performance regression in some workloads using Python 3.12. This patch updates TailDup to delay aggressive tail-duplication for computed gotos to after register allocation. This means we can keep the non-duplicated version for a bit longer throughout the backend, which should reduce compile-time as well as allowing a number of optimizations and simplifications to trigger before drastically expanding the CFG. For the case in https://github.com/llvm/llvm-project/issues/106846, I get the same performance with and without this patch on Skylake. PR: https://github.com/llvm/llvm-project/pull/150911	2025-07-31 19:20:05 +01:00
Peter Collingbourne	7cdc9781d4	MachineInstrBuilder: Introduce copyMIMetadata() function. This reduces the amount of boilerplate required when adding a new field to MIMetadata and reduces the chance of bugs like the one I fixed in TargetInstrInfo::reassociateOps. Reviewers: arsenm, nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/133535	2025-07-31 09:53:01 -07:00
Florian Hahn	69f3ea0852	[MachineBB] Make sure there are successors in terminatorIsComputedGoto. (#151342 ) Currently terminatorIsComputedGoto will return for blocks with a indirect branch terminator and no successor. If there are no successor, the terminator is likely not a computed goto, return false in that case. Note that this is currently NFC, as the only use checks it only if there are successors, but it will be needed in https://github.com/llvm/llvm-project/pull/150911. PR: https://github.com/llvm/llvm-project/pull/151342	2025-07-31 17:52:45 +01:00
Prabhu Rajasekaran	9e0dc4f737	[MachineFunction] Move CallSiteInfo constructor out of header (#151520 )	2025-07-31 07:34:46 -07:00
Phoebe Wang	ebf96f9316	[X86][APX] Do optimizeMemoryInst for v1X masked load/store (#151331 ) Fix redundant LEA: https://godbolt.org/z/34xEYE818	2025-07-31 11:52:39 +08:00
Prabhu Rajasekaran	17ccb849f3	[llvm] Extract and propagate callee_type metadata Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575	2025-07-30 14:56:39 -07:00
Kazu Hirata	5672a8723f	[CodeGen] Remove an unnecessary cast (NFC) (#151280 ) LoopValStage is already of int.	2025-07-30 07:30:05 -07:00
Sander de Smalen	bae8f1336d	Reland "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG" (#134408 ) This tries to reland #123632 (previously reverted by commit 6b1db79887df19bc8e8c946108966aa6021c8b87) This PR aims to fix coalescing of SUBREG_TO_REG when sub-register liveness tracking is enabled and this is now the so-manieth reincarnation of this effort :) This change is needed in order to enable subreg liveness tracking for AArch64, because without the implicit-def, Machine Copy Propagation would remove a 'redundant' copy because it doesn't realise that the top 32-bits of the register are zeroed, which subsequent instructions rely on. Changes compared to previous PR: * Rather than updating all instructions that define the source register (SrcReg) of the SUBREG_TO_REG, this new approach only updates instructions that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges are updated accordingly.	2025-07-30 14:42:24 +01:00
Fabian Ritter	ef6eaa045a	[GISel] Introduce MIFlags::InBounds (#150900 ) This flag applies to G_PTR_ADD instructions and indicates that the operation implements an inbounds getelementptr operation, i.e., the pointer operand is in bounds wrt. the allocated object it is based on, and the arithmetic does not change that. It is set when the IRTranslator lowers inbounds GEPs (currently only in some cases, to be extended with a future PR), and in the (build\|materialize)ObjectPtrOffset functions. Inbounds information is useful in ISel when we have instructions that perform address computations whose intermediate steps must be in the same memory region as the final result. A follow-up patch will start using it for AMDGPU's flat memory instructions, where the immediate offset must not affect the memory aperture of the address. This is analogous to a concurrent effort in SDAG: #131862 (related: #140017, #141725). For SWDEV-516125.	2025-07-30 13:01:23 +02:00
Paul Walker	13f38c97d5	[LLVM][SelectionDAG] Align poison/undef binop folds with IR. (#149334 ) The "at construction" binop folds in SelectionDAG::getNode() has different behaviour when compared to the equivalent LLVM IR. This PR makes the behaviour consistent while also extending the coverage to include signed/unsigned max/min operations.	2025-07-30 11:20:30 +01:00
Pierre van Houtryve	c4b1557097	[DAG] Fold (setcc ((x \| x >> c0 \| ...) & mask)) sequences (#146054 ) Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. Usually the and would be on the outside (the leaves) of the expression, but the DAG canonicalizes it to a single `and` at the root of the expression. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727	2025-07-30 10:27:19 +02:00
Craig Topper	eddd34227e	[TargetLowering] Use getShiftAmountConstant in CTTZTableLookup. NFC	2025-07-29 22:43:42 -07:00

1 2 3 4 5 ...

38183 Commits