llvm-project

Author	SHA1	Message	Date
JaydeepChauhan14	0d17547879	[X86][NFC] Added POWI function testcases (#134276 ) - Moved existing llvm/test/CodeGen/X86/powi.ll file to llvm/test/CodeGen/X86/powi-const.ll. - Added new testcases for powi into llvm/test/CodeGen/X86/powi.ll.	2025-04-04 13:42:20 +02:00
Paul Walker	b0b97e3b05	[LLVM][AArch64] Refactor lowering of fixed length integer setcc operations. (#132434 ) The original code is essentially performing isel during legalisation with the AArch64 specific nodes offering no additional value compared to ISD::SETCC.	2025-04-04 12:13:45 +01:00
Alex MacLean	ba0a52a04b	[InferAS] Support getAssumedAddrSpace for Arguments for NVPTX (#133991 )	2025-04-03 16:47:36 -07:00
Sumit Agarwal	996cf5dc67	[HLSL] Implement dot2add intrinsic (#131237 ) Resolves #99221 Key points: For SPIRV backend, it decompose into a `dot` followed a `add`. - [x] Implement dot2add clang builtin, - [x] Link dot2add clang builtin with hlsl_intrinsics.h - [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in SemaHLSL.cpp - [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp - [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl - [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl - [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td - [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td - [x] Create the dot2add.ll and dot2add_errors.ll tests in llvm/test/CodeGen/DirectX/	2025-04-03 16:23:09 -06:00
Sterling-Augustine	7514225052	Use a more proper idiom for "the output file doesn't matter". NFC. (#134280 ) As in the description. Follow up to PR #134179.	2025-04-03 10:24:10 -07:00
zhijian lin	1a540c3b8b	[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155 ) ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated, using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO, UADDO_CARRY, USUBO, USUBO_CARRY in the patch.	2025-04-03 13:22:49 -04:00
Simon Pilgrim	2190808f5d	[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (REAPPLIED) (#134263 ) With AVX512VL targets, use 128/256-bit VPERMV/VPERMV3 nodes when we only need the lower elements. Reapplied version of #133923 with fix for typo in the VPERMV3 mask adjustment	2025-04-03 17:39:38 +01:00
Brox Chen	bf388f8a43	[AMDGPU][True16][CodeGen] legalize operands when move16bit SALU to VALU (#133985 ) This is a follow up PR from https://github.com/llvm/llvm-project/pull/132089. When a V2S copy and its useMI are lowered to VALU, this patch check: If the generated new VALU is a true16 inst. Add subreg access on all operands if necessary. an example MIR looks like: ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:sreg_32 = COPY %1:vgpr_32 %3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ... ``` currently lowered to ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ... ``` after this patch ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ... ```	2025-04-03 12:26:41 -04:00
Simon Pilgrim	12f75bba41	Revert "[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded" (#134256 ) Found a typo in the VPERMV3 mask adjustment - I'm going to revert and re-apply the patch with a fix Reverts llvm/llvm-project#133923	2025-04-03 16:28:24 +01:00
Krzysztof Drewniak	f23bb530cf	[AMDGPULowerBufferFatPointers] Use InstSimplifyFolder during rewrites (#134137 ) This PR updates AMDGPULowerBufferFatPointers to use the InstSimplifyFolder when creating IR during buffer fat pointer lowering. This shouldn't cause any large functional changes and might improve the quality of the generated code.	2025-04-03 10:12:18 -05:00
Aaron Puchert	9e0ca5720b	[X86] When expanding LCMPXCHG16B_SAVE_RBX, substitute RBX in base (#134109 ) The pseudo-instruction LCMPXCHG16B_SAVE_RBX is used when RBX serves as frame base pointer. At a very late stage it is then translated into a regular LCMPXCHG16B, preceded by copying the actual argument into RBX, and followed by restoring the register to the base pointer. However, in case the `cmpxchg` operates on a local variable, RBX might also be used as a base for the memory operand in frame finalization, and we've overwritten RBX with the input operand for `cmpxchg16b`. So we have to rewrite the memory operand base to use the saved value of RBX. Fixes #119959.	2025-04-03 15:56:53 +02:00
Steven Perron	a77d807781	[SPIRV] Add spirv.VulkanBuffer types to the backend (#133475 ) Adds code to expand the `llvm.spv.resource.handlefrombinding` and `llvm.spv.resource.getpointer` when the resource type is `spirv.VulkanBuffer`. It gets expanded as a storage buffer or uniform buffer denpending on the storage class used. This is implementing part of https://github.com/llvm/wg-hlsl/blob/main/proposals/0018-spirv-resource-representation.md.	2025-04-03 09:44:07 -04:00
Paul Walker	41a6bb4c05	[LLVM][CodeGen][SVE] Prefer NEON instructions when zeroing Z registers. (#133929 ) Several implementations have zero-latency instructions to zero registers. To-date no implementation has a dedicated SVE instruction but we can use the NEON equivalent because it is defined to zero bits 128..VL regardless of the immediate used. NOTE: The relevant instruction is not available in streaming mode, where the original SVE DUP instruction remains in use.	2025-04-03 13:15:05 +01:00
Paul Walker	ee4e8197fa	[LLVM][AArch64][SVE] Mark DUP immediate instructions with isAsCheapAsAMove. (#133945 ) Doing this means we'll regenerate an immediate rather than copy the result of an existing one, reducing instruction dependency chains.	2025-04-03 11:42:07 +01:00
David Green	6c27817294	[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717 ) This adds a call to SimplifyDemandedBits from bitcasts with scalar input types in SimplifyDemandedVectorElts, which can help simplify the input scalar.	2025-04-03 11:14:08 +01:00
Simon Pilgrim	bf516098fb	[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (#133923 ) With AVX512VL targets, use 128/256-bit VPERMV/VPERMV3 nodes when we only need the lower elements.	2025-04-03 11:01:08 +01:00
Hua Tian	7e65944292	[llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352 ) Some new registers are reused when replacing some old ones in certain use case of ModuloScheduleExpander. It is necessary to avoid repeated interval calculations for these registers.	2025-04-03 14:25:55 +08:00
tangaac	ff0c2fbd8e	[LoongArch] Pre-commit tests for vector absolute difference (#132898 )	2025-04-03 09:19:59 +08:00
Sterling-Augustine	f68a5185d0	Allow this test to pass when the source is on a read-only filesystem (#134179 ) llc attempts to create an empty file in the current directory, but it can't do that on a read-only file system. Send that empty-output to stdout, which prevents this failure.	2025-04-02 16:49:57 -07:00
Sami Tolvanen	acc6bcdc50	Support alternative sections for patchable function entries (#131230 ) With -fpatchable-function-entry (or the patchable_function_entry function attribute), we emit records of patchable entry locations to the __patchable_function_entries section. Add an additional parameter to the command line option that allows one to specify a different default section name for the records, and an identical parameter to the function attribute that allows one to override the section used. The main use case for this change is the Linux kernel using prefix NOPs for ftrace, and thus depending on__patchable_function_entries to locate traceable functions. Functions that are not traceable currently disable entry NOPs using the function attribute, but this creates a compatibility issue with -fsanitize=kcfi, which expects all indirectly callable functions to have a type hash prefix at the same offset from the function entry. Adding a section parameter would allow the kernel to distinguish between traceable and non-traceable functions by adding entry records to separate sections while maintaining a stable function prefix layout for all functions. LKML discussion: https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/	2025-04-02 21:53:55 +00:00
Brox Chen	066787b9bd	[AMDGPU][True16][CodeGen] fold clamp update for true16 (#128919 ) Check through COPY for possible clamp folding for v_mad_mixhi_f16 isel	2025-04-02 17:10:53 -04:00
Brox Chen	fb0e7b5f16	[AMDGPU][True16][CodeGen] Implement sgpr folding in true16 (#128929 ) We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be folded into True16 bit instructions taking 16 bit values. Also use sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded. This improves generated code quality.	2025-04-02 16:08:26 -04:00
Florian Hahn	3bdf9a0880	[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075 ) Currently iterators over EquivalenceClasses will iterate over std::set, which guarantees the order specified by the comperator. Unfortunately in many cases, EquivalenceClasses are used with pointers, so iterating over std::set of pointers will not be deterministic across runs. There are multiple places that explicitly try to sort the equivalence classes before using them to try to get a deterministic order (LowerTypeTests, SplitModule), but there are others that do not at the moment and this can result at least in non-determinstic value naming in Float2Int. This patch updates EquivalenceClasses to keep track of all members via a extra SmallVector and removes code from LowerTypeTests and SplitModule to sort the classes before processing. Overall it looks like compile-time slightly decreases in most cases, but close to noise: https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions PR: https://github.com/llvm/llvm-project/pull/134075	2025-04-02 20:27:43 +01:00
Juan Manuel Martinez Caamaño	beae0e9f1a	[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055 ) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.	2025-04-02 20:00:09 +02:00
Juan Manuel Martinez Caamaño	0375ef07c3	[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741 ) This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as a 4-bit signed integer and returns `0.0625f * src`. SWDEV-518861	2025-04-02 19:51:40 +02:00
Luke Lau	711b15d179	[RISCV] Mark subvector extracts from index 0 as cheap (#134101 ) Previously we only marked fixed length vector extracts as cheap, so this extends it to any extract at index 0 which should just be a subreg extract. This allows extracts of i1 vectors to be considered for DAG combines, but also scalable vectors too. This causes some slight improvements with large legalized fixed-length vectors, but the underlying motiviation for this is to actually prevent an unprofitable DAG combine on a scalable vector in an upcoming patch.	2025-04-02 17:57:13 +01:00
Ryan Buchner	fa2a6d68c6	[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922 ) Fixes #130510. In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for cases where the (X ^ Y) will be re-used. If a constant is being used for the XOR before a branch, ensure that it is small enough to fit within a 12-bit immediate field. Otherwise, the equality check is more efficient than the check against 0, see the following: ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 xor a0, a0, a1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: lui a1, 5 addiw a1, a1, 1365 beq a0, a1, .LBB0_2 # %bb.1: xor a0, a0, a1 ret .LBB0_2: ``` Similarly, if the XOR is between 1 and a size one integer, we should still fold away the XOR since that comparison can be optimized as a comparison against 0. ``` # %bb.0: slt a0, a0, a1 xor a0, a0, 1 beqz a0, .LBB0_2 # %bb.1: ret .LBB0_2: ``` ``` # %bb.0: slt a0, a0, a1 bnez a0, .LBB0_2 # %bb.1: xor a0, a0, 1 ret .LBB0_2: ``` One question about my code is that I used a hard-coded value for the width of a RISCV ALU immediate. Do you know of a way that I can gather this from the `context`, I was unable to devise one.	2025-04-02 09:56:09 -07:00
Luke Lau	c132bd6885	[RISCV] Add test for vmv.s.x of an immediate into a zeroinitializer vector. NFC The immediate version of ffaaaceaa1cfaa7103196cc7f307ffcb61d73558	2025-04-02 16:48:57 +01:00
Luke Lau	ffaaaceaa1	[RISCV] Add test for vmv.s.x into a zeroinitializer vector. NFC This is generated by the loop vectorizer for out-of-loop add reductions with some starting value	2025-04-02 16:09:44 +01:00
Simon Pilgrim	3843dfeaf7	[X86] Add demanded elts test coverage for vXi16 VPERMW nodes Requested for #133923	2025-04-02 15:32:01 +01:00
Akshat Oke	a13a51b91f	[AMDGPU][NPM] Port AMDGPUSetWavePriority to NPM (#130064 )	2025-04-02 16:28:05 +05:30
Simon Pilgrim	2426ac647f	[X86] Add demanded elts for v8f32 VPERMV node Based off #133923 - test to ensure the VPERMV node as only the lower 128-bit source elements are demanded.	2025-04-02 11:18:47 +01:00
Nikita Popov	9356091a98	[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801 ) The llvm.metadata section is not emitted and has special semantics. We should not merge globals in it, similarly to how we already skip merging of `llvm.xyz` globals. Fixes https://github.com/llvm/llvm-project/issues/131394.	2025-04-02 10:40:53 +02:00
Sudharsan Veeravalli	536fe74aaa	[RISCV] Modify register type of extd* Xqcibm instructions (#134027 ) The v0.8 spec specifies that rs1 cannot be x31 (t6) since these instructions operate on a pair of registers (rs1 and rs1 + 1) with no wrap around. The latest spec can be found here: https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.8.0	2025-04-02 12:14:50 +05:30
ZhaoQi	46968310cb	[LoongArch] Move fix-tle-le-sym-type test to test/MC. NFC (#133839 )	2025-04-02 09:11:20 +08:00
Alex MacLean	3c7a0e6c82	[NVPTX] Cleanup and refactor atomic lowering (#133781 ) Cleanup lowering of atomic instructions and intrninsics. The TableGen changes are primarily a refactor, though sub variants are now lowered via operation legalization, potentially allowing for more DAG optimization.	2025-04-01 13:08:57 -07:00
Virginia Cangelosi	79487757b7	[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230 ) Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 19:20:27 +01:00
Sam Clegg	a30caa6a73	[WebAssembly] Add missing tests from #133289 (#133938 )	2025-04-01 10:47:35 -07:00
Brox Chen	dd1d41f833	[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 (#132089 ) There are V2S copies between vpgr16 and spgr32 in true16 mode. This is caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel. When a V2S copy and its useMI are lowered to VALU, this patch check 1. If the generated new VALU is used by a true16 inst. Add subreg access if necessary. 2. Legalize the V2S copy by replacing it to subreg_to_reg an example MIR looks like: ``` %2:sgpr_32 = COPY %1:vgpr_16 %3:sgpr_32 = S_OR_B32 %2:sgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ... ``` currently lowered to ``` %2:vgpr_32 = COPY %1:vgpr_16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ... ``` after this patch ``` %2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ... ```	2025-04-01 12:40:18 -04:00
Petr Hosek	4b19db6db9	Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935 ) Reverts llvm/llvm-project#132684	2025-04-01 09:39:07 -07:00
Jonathan Thackray	558ce50ebc	[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226 ) Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 17:04:59 +01:00
David Green	4cb41d136c	[AArch64] Prefer zip over ushll for anyext. (#133433 ) Many CPUs have a higher throughput of ZIP instructions vs USHLL. This adds some tablegen patterns for preferring zip in anyext patterns.	2025-04-01 16:24:54 +01:00
Simon Pilgrim	664745cf38	[X86] avx512-vselect.ll - regenerate VPTERNLOG comments	2025-04-01 15:50:07 +01:00
Virginia Cangelosi	e92ff64bad	[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854 ) Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files. This PR depends on https://github.com/llvm/llvm-project/pull/127797 This patch updates the semantics of template arguments in intrinsic names for clarity and ease of use. Previously, template argument numbers indicated which character in the prototype string determined the final type suffix, which was confusing—especially for intrinsics using multiple prototype modifiers per operand (e.g., intrinsics operating on arrays of vectors). The number had to reference the correct character in the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and error-prone. With this patch, template argument numbers now refer to the operand number that determines the final type suffix, providing a more intuitive and consistent approach.	2025-04-01 15:05:30 +01:00
Jeremy Morse	1ebc308bba	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855 ) During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.	2025-04-01 14:27:11 +01:00
Simon Pilgrim	2c0b888359	[X86] combineX86ShuffleChain - prefer combining to X86ISD::SHUF128 if PERMQ operands are splittable (#133900 ) If the 512-bit unary shuffle is a concatenation of 128/256-bit subvectors then we're better off using a X86ISD::SHUF128 node so we can fold the concatenation into the shuffle as well.	2025-04-01 13:47:52 +01:00
Virginia Cangelosi	6892d54286	[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797 ) Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and llvm following the acle in https://github.com/ARM-software/acle/pull/381/files	2025-04-01 13:35:09 +01:00
Akshat Oke	4a68702455	[CodeGen][NPM] Port XRayInstrumentation to NPM (#129865 )	2025-04-01 15:38:49 +05:30
Afanasyev Ivan	337bad3921	[EarlyIfConverter] Fix reg killed twice after early-if-predicator and ifcvt (#133554 ) Bug relates to `early-if-predicator` and `early-ifcvt` passes. If virtual register has "killed" flag in both basic blocks to be merged into head, both instructions in head basic block will have "killed" flag for this register. It makes MIR incorrect. Example: ``` bb.0: ; if ... %0:intregs = COPY $r0 J2_jumpf %2, %bb.2, implicit-def dead $pc J2_jump %bb.1, implicit-def dead $pc bb.1: ; if.then ... S4_storeiri_io killed %0, 0, 1 J2_jump %bb.3, implicit-def dead $pc bb.2: ; if.else ... S4_storeiri_io killed %0, 0, 1 J2_jump %bb.3, implicit-def dead $pc ``` After early-if-predicator will become: ``` bb.0: %0:intregs = COPY $r0 S4_storeirif_io %1, killed %0, 0, 1 S4_storeirit_io %1, killed %0, 0, 1 ``` Having `killed` flag set twice in bb.0 for `%0` is an incorrect MIR.	2025-04-01 12:06:30 +02:00
Shoreshen	7f14b2a9eb	Revert "[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable" (#133880 ) Reverts llvm/llvm-project#130577	2025-04-01 17:37:02 +08:00

1 2 3 4 5 ...

58205 Commits