llvm-project

Author	SHA1	Message	Date
Piotr Fusik	6e7312bda6	[RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR (#120221 ) (and X, (C<<12\|0xfff)) -> (ANDN X, ~C<<12) (or X, (C<<12\|0xfff)) -> (ORN X, ~C<<12) (xor X, (C<<12\|0xfff)) -> (XNOR X, ~C<<12) Emits better code, typically by avoiding an `ADDI HI, -1` instruction. Co-authored-by: Craig Topper <craig.topper@sifive.com>	2024-12-19 21:38:20 +01:00
Konstantina Mitropoulou	d3508ccd15	[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588 ) - [AMDGPU] Add new test. - [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2024-12-19 11:20:43 -08:00
Justin Bogner	aa07f92210	[DirectX][SPIRV] Consistent names for HLSL resource intrinsics (#120466 ) Rename HLSL resource-related intrinsics to be consistent with the naming conventions discussed in [wg-hlsl:0014]. This is an entirely mechanical change, consisting of the following commands and automated formatting. ```sh git grep -l handle.fromBinding \| xargs perl -pi -e \ 's/(dx\|spv)(.)handle.fromBinding/$1$2resource$2handlefrombinding/g' git grep -l typedBufferLoad_checkbit \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferLoad_checkbit/$1$2resource$2loadchecked$2typedbuffer/g' git grep -l typedBufferLoad \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferLoad/$1$2resource$2load$2typedbuffer/g' git grep -l typedBufferStore \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferStore/$1$2resource$2store$2typedbuffer/g' git grep -l bufferUpdateCounter \| xargs perl -pi -e \ 's/(dx\|spv)(.)bufferUpdateCounter/$1$2resource$2updatecounter/g' git grep -l cast_handle \| xargs perl -pi -e \ 's/(dx\|spv)(.)cast.handle/$1$2resource$2casthandle/g' ``` [wg-hlsl:0014]: https://github.com/llvm/wg-hlsl/blob/main/proposals/0014-consistent-naming-for-dx-intrinsics.md	2024-12-19 12:17:21 -07:00
Michael Maitland	3710050566	[RISCV][VLOPT] Set CommonVL as the largest of the users (#120349 ) Prior to this patch, we required that all users had the same VL in order to optimize. But as the FIXME said, we can use the largest VL to optimize, as long as we can determine what the largest is. This patch implements the FIXME.	2024-12-19 13:22:31 -05:00
Piotr Fusik	01b96385fd	[RISCV][test] Add zbb-logic-neg-imm.ll	2024-12-19 18:44:21 +01:00
Alex MacLean	310e798757	[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel (#120486 ) Avoid introducing `ProxyReg` and `MOV` nodes during ISel when lowering `bitconvert` or similar operations. These nodes are all erased by a later pass but not introducing them in the first place is simpler and likely saves compile time. Also remove redundant `MOV` instruction definitions.	2024-12-19 07:55:03 -08:00
Benjamin Maxwell	ca98a3d9bb	[AArch64][SVE] Use SVE for scalar FP converts in streaming[-compatible] functions (1/n) (#118505 ) In streaming[-compatible] functions, use SVE for scalar FP conversions to/from integer types. This can help avoid moves between FPRs and GRPs, which could be costly. This patch also updates definitions of SCVTF_ZPmZ_StoD and UCVTF_ZPmZ_StoD to disallow lowering to them from ISD nodes, as doing so requires creating a [U\|S]INT_TO_FP_MERGE_PASSTHRU node with inconsistent types. Follow up to #112213. Note: This PR does not include support for f64 <-> i32 conversions (like #112564), which needs a bit more work to support.	2024-12-19 13:16:31 +00:00
Feng Zou	eb812d28f5	[X86] Put R20/R21/R28/R29 later in GR64 list (#120510 ) Because these registers require an extra byte to encode in certain memory form. Putting them later in the list will reduce code size when EGPR is enabled. And align the same order in GR8, GR16 and GR32 lists. Example: movq (%r20), %r11 # encoding: [0xd5,0x1c,0x8b,0x1c,0x24] movq (%r22), %r11 # encoding: [0xd5,0x1c,0x8b,0x1e]	2024-12-19 20:16:34 +08:00
David Green	e020f46027	[ARM] Fix BF16 lowering with FullFP16 This adds test coverage for bf16 instructions, making sure that lowering bf16 works with and without +fullfp16.	2024-12-19 10:20:35 +00:00
Kerry McLaughlin	9829598933	[AArch64][SME2] Extend getRegAllocationHints for ZPRStridedOrContiguousReg (#119865 ) ZPR2StridedOrContiguous loads used by a FORM_TRANSPOSED_REG_TUPLE pseudo should attempt to assign a strided register to avoid unnecessary copies, even though this may overlap with the list of SVE callee-saved registers.	2024-12-19 09:40:13 +00:00
Pengcheng Wang	2c782ab271	[RISCV] Add software pipeliner support (#117546 ) This patch adds basic support of `MachinePipeliner` and disable it by default. The functionality should be OK and all llvm-test-suite tests have passed.	2024-12-19 13:00:08 +08:00
Craig Topper	dc72ec808d	[RISCV] Custom legalize vp.merge for mask vectors. (#120479 ) The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.	2024-12-18 19:19:14 -08:00
Zhaoxin Yang	f334db92be	[llvm][CodeGen] Intrinsic `llvm.powi.*` code gen for vector arguments (#118242 ) Scalarize vector FPOWI instead of promoting the type. This allows the scalar FPOWIs to be visited and converted to libcalls before promoting the type. FIXME: This should be done in LegalizeVectorOps/LegalizeDAG, but call lowering needs the unpromoted EVT. Without this patch, in some backends, such as RISCV64 and LoongArch64, the i32 type is illegal and will be promoted. This causes exponent type check to fail when ISD::FPOWI node generates a libcall. Fix https://github.com/llvm/llvm-project/issues/118079	2024-12-19 08:57:31 +08:00
Farzon Lotfi	6457aee5b7	[DirectX] Bug fix for Data Scalarization crash (#118426 ) Two bugs here. First calling `Inst->getFunction()` has undefined behavior if the instruction is not tracked to a function. I suspect the `replaceAllUsesWith` was leaving the GEPs in a weird ghost parent situation. I switched up the visitor to be able to `eraseFromParent` as part of visiting and then everything started working. The second bug was in `DXILFlattenArrays.cpp`. I was unaware that you can have multidimensional arrays of `zeroinitializer`, and `undef` so fixed up the initializer to handle these two cases. fixes #117273	2024-12-18 16:33:49 -05:00
Justin Bogner	9b3d85f0f4	[DirectX] TypedUAVLoadAdditionalFormats shader flag (#120477 ) Set the TypedUAVLoadAddtionalFormats flag if the shader contains a load from a multicomponent UAV. Fixes #114557	2024-12-18 13:42:12 -07:00
Justin Bogner	bfd05102d8	[DirectX] Lower ops after translating metadata (#120157 ) Move the DXILOpLoweringPass after DXILTranslateMetadata, and add asserts in DXILShaderFlags to ensure it isn't scheduled after op lowering. This will allow us to rely on DirectX intrinsics in the shader flags analysis rather than having to recover information from lowered operations. Fixes #120119.	2024-12-18 12:03:05 -07:00
Jun Wang	d57230c72e	[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485 ) In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g., v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be allowed.	2024-12-18 10:50:47 -08:00
Brox Chen	c6f753b9a0	[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630 ) Support true16 format for v_pack_b32_f16 in MC. Since we are replacing v_alignbit_b32 to `v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test passing. There is no pattern modified/created, but just replacing the `v_pack_b32_f16` with fake16 format. Some of the true16 CodeGen test are impacted since `v_pack_b32_f16` selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done is the following patch.	2024-12-18 13:28:42 -05:00
Justin Bogner	0e2466f624	[DirectX] Create symbols for resource handles (#119775 ) We need to create symbols with "the original shape of resource and element type" to put in the resource metadata in order to generate valid DXIL. Note that DXC generally doesn't emit an actual symbol outside of library shaders (it emits an undef of a pointer to the type), but since we have to deal with opaque pointers we would need a way to smuggle the type through to match that. Instead, we simply emit symbols for now. Fixed #116849	2024-12-18 10:47:12 -07:00
Justin Bogner	0fca76d576	[DirectX] Introduce the DXILResourceAccess pass (#116726 ) This pass transforms resource access via `llvm.dx.resource.getpointer` into buffer loads and stores. Fixes #114848.	2024-12-18 10:13:45 -07:00
Simon Pilgrim	49fd2dde21	[X86] LowerShift - don't prematurely lower to x86 vector shift imm instructions (#120282 ) When splitting 2 unique amount shifts to shuffle(shift(x,c1),shift(x,c2)), don't use getTargetVShiftByConstNode directly to lower, use generic shifts to ensure we make use of any further canonicalization: shl(X,1) to add(X,X) etc. - this can have notably better throughput on some x86 targets. Noticed on #120270	2024-12-18 16:08:45 +00:00
Justin Bogner	3eca15cbb9	[DirectX] Split resource info into type and binding info. NFC (#119773 ) This splits the DXILResourceAnalysis pass into TypeAnalysis and BindingAnalysis passes. The type analysis pass is made immutable and populated lazily so that it can be used earlier in the pipeline without needing to carefully maintain the invariants of the binding analysis. Fixes #118400	2024-12-18 09:02:28 -07:00
Sergei Barannikov	d3750412aa	[TableGen][GISel] Improve dead register handling (#120426 ) A dead implicit def wasn't marked as dead if it is also an implicit use. The new approach should also be more straightforward and simplifies future changes for supporting optional defs and physical register defs. Pull Request: https://github.com/llvm/llvm-project/pull/120426	2024-12-18 18:58:37 +03:00
Florian Hahn	76714be5fd	Revert "Add support for single reductions in ComplexDeinterleavingPass (#112875 )" This reverts commit b3eede5e1fa7ab742b86e9be22db7bccd2505b8a. This has been breaking most AArch64 stage2 builds for 4+ hours, reverting to get the bots back to green. https://lab.llvm.org/buildbot/#/builders/41/builds/4172 https://lab.llvm.org/buildbot/#/builders/4/builds/4281 https://lab.llvm.org/buildbot/#/builders/199/builds/263 https://lab.llvm.org/buildbot/#/builders/198/builds/334 https://lab.llvm.org/buildbot/#/builders/143/builds/4276 https://lab.llvm.org/buildbot/#/builders/17/builds/4725	2024-12-18 15:06:52 +00:00
Aaditya	0446990cc7	Reapply "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120410 ) This reapplies commit https://github.com/llvm/llvm-project/pull/120063. A machine-verifier bug was causing a crash in the previous commit. This has been addressed in https://github.com/llvm/llvm-project/pull/120393.	2024-12-18 18:20:45 +05:30
Simon Pilgrim	f270c9a7d0	[X86] urem-seteq-illegal-types.ll - regenerate VPTERNLOG comment	2024-12-18 11:58:49 +00:00
Paul Walker	3146911eb0	[LLVM][AsmPrinter] Add vector ConstantInt/FP support to emitGlobalConstantImpl. (#120077 ) The fixes a failure path for fixed length vector globals when ConstantInt/FP is used to represent splats instead of ConstantDataVector.	2024-12-18 11:51:01 +00:00
Sergei Barannikov	1941f34172	[TableGen][GISel] Import more "multi-level" patterns (#120332 ) Previously, if the destination DAG has an untyped leaf, we would import the pattern only if that leaf is defined by the top-level source DAG. This is an unnecessary restriction. Here is an example of such pattern: ``` def : Pat<(add (mul v8i16:$vA, v8i16:$vB), v8i16:$vC), (VMLADDUHM $vA, $vB, $vC)>; ``` Previously, it failed to import because `add` doesn't define neither `$vA` nor `$vB`. This change reduces the number of skipped patterns as follows: ``` AArch64: 8695 -> 8548 (-147) AMDGPU: 11333 -> 11240 (-93) ARM: 4297 -> 4278 (-1) PowerPC: 3955 -> 3010 (-945) ``` Other GISel-enabled targets are unaffected.	2024-12-18 14:44:55 +03:00
Simon Pilgrim	dd8e1adbf2	[X86] LowerShift - track the number and location of constant shift elements. (#120270 ) We have several vector shift lowering strategies that have to analyse the distribution of non-uniform constant vector shift amounts, at the moment there is very little sharing of data between these analysis. This patch creates a SmallDenseMap of the different LEGAL constant shift amounts used, with a mask of which elements they are used in. So far I've only updated the shuffle(immshift(x,c1),immshift(x,c2)) lowering pattern to use it for clarity, there's several more that can be done in followups. Its hoped that the proposed patch #117980 can be simplified after this patch as well. vec_shift6.ll - the existing shuffle(immshift(x,c1),immshift(x,c2)) lowering bails on out of range shift amounts, while this patch now skips them and treats them as UNDEF - this means we manage to fold more cases that before would have to lower to a SHL->MUL pattern, including some legalized cases.	2024-12-18 11:36:54 +00:00
Mikhail Goncharov	41c1992a16	[NVPTX] fix nvcl-param-align.ll fix for f9c8c01d38f8fbea81db99ab90b7d0f2bdcc8b4d	2024-12-18 11:41:44 +01:00
Aaditya	414c462a83	[AMDGPU] Modify Dyn Alloca test to account for Machine-Verifier bug (#120393 ) Machine-Verifier crashes in kernel functions, but fails gracefully in device functions. This is due to the buffer resource descriptor selected during G-ISEL, before the fallback path. Device functions use `$sgpr0_sgpr1_sgpr2_sgpr3`. while Kernel functions select `$private_rsrc_reg` where machine-verifier complains: `$private_rsrc_reg is not a SReg_128 register.` Modifying test case to capture both behaviors, this is related to https://github.com/llvm/llvm-project/pull/120063	2024-12-18 16:08:17 +05:30
Nicholas Guy	b3eede5e1f	Add support for single reductions in ComplexDeinterleavingPass (#112875 ) The Complex Deinterleaving pass assumes that all values emitted will result in complex numbers, this patch aims to remove that assumption and adds support for emitting just the real or imaginary components, not both.	2024-12-18 10:34:26 +00:00
Simon Pilgrim	0b4ee8d4ee	[X86] combineKSHIFT - fold kshiftr(kshiftr/extract_subvector(X,C1),C2) --> kshiftr(X,C1+C2) (#115528 ) Merge serial KSHIFTR nodes, possibly separated by EXTRACT_SUBVECTOR, to allow mask instructions to be computed in parallel.	2024-12-18 09:48:38 +00:00
Csanád Hajdú	96bb281b63	[AArch64] Prevent unnecessary truncation in bool vector reduce code generation (#120096 ) Prevent unnecessarily truncating results of 128 bit wide vector comparisons to 64 bit wide vector values in boolean vector reduce operations.	2024-12-18 09:14:12 +00:00
Aaditya	d6e8ab1fa6	Revert "[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas" (#120369 ) Reverts llvm/llvm-project#120063 due to build-bot failures	2024-12-18 14:06:49 +07:00
Aaditya	99c2e3b782	[NFC][AMDGPU] Pre-commit clang and llvm tests for dynamic allocas (#120063 ) For #119822	2024-12-18 12:14:37 +05:30
Ruiling, Song	67c55b1ffc	[AMDGPU] Make max dwords of memory cluster configurable (#119342 ) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.	2024-12-18 14:17:27 +08:00
Michael Maitland	a61eeaa748	[RISCV][VLOPT] Add vector indexed loads and stores to getOperandInfo (#119748 ) Use `MO.getOperandNo() == 0` instead of `IsMODef` so naming is clear for the store, since the store should treat its operand 0 like that even though it is not a def.The load should treat its operand 0 def in the same way.	2024-12-17 23:51:45 -05:00
Michael Maitland	fb33268d2f	[RISCV][VLOPT] Add support for VID and VIOTA (#120331 ) We already cover vid in `llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir` so no need to add tests for that instruction.	2024-12-17 21:15:23 -05:00
Drew Kersnar	932d9c13fa	[NVPTX] Generalize and extend upsizing when lowering 8/16-bit-element vector loads/stores (#119622 ) This addresses the following issue I opened: https://github.com/llvm/llvm-project/issues/118851. This change generalizes the Type Legalization mechanism that currently handles `v8[i/f/bf]16` upsizing to include loads _and_ stores of `v8i8` + `v16i8`, allowing all of the mentioned vectors to be lowered to ptx as vectors of `b32`. This extension also allows us to remove the DagCombine that only handled exactly `load v16i8`, thus centralizing all the upsizing logic into one place. Test changes include adding v8i8, v16i8, and v8i16 cases to load-store.ll, and updating the CHECKs for other tests to match the improved codegen.	2024-12-17 15:23:22 -08:00
Farzon Lotfi	c03fc929ff	[DirectX] Add support for vector_reduce_add (#117646 ) Use of `vector_reduce_add` will make it easier to write more intrinsics in `hlsl_intrinsics.h`.	2024-12-17 17:32:50 -05:00
Michael Maitland	169c32eb49	[RISCV][VLOPT] Enable the RISCVVLOptimizer by default (#119461 ) Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.	2024-12-17 16:19:35 -05:00
Philip Reames	984cb791db	[RISCV] Use vmv.v.x to materialize masks in deinterleave2 lowering (#118500 ) This is a follow up to 2af2634 to use vmv.v.x of i8 constants instead of the prior vid/vand/vmsne sequence. The advantage of the vmv.v.x sequence is that it's always m1 (so cheaper at high LMUL), and can be rematerialized by the register allocator if needed to locally reduce register pressure.	2024-12-17 12:50:09 -08:00
Simon Pilgrim	2a922903bf	[X86] vector-shift tests - regenerate VPTERNLOG comments	2024-12-17 18:19:15 +00:00
Michael Maitland	904849f297	[RISCV][VLOPT] Add support for more instructions in vl-opt-op-info.mir (#119416 ) Specifically, some more where EMUL=LMUL and EEW=SEW.	2024-12-17 12:57:29 -05:00
Michael Maitland	345a35259c	[RISCV][VLOPT] Avoid crash when user produces scalar def (#120255 ) I found this crash when trying to enable the VLOptimizer pass. We need this patch before we can enable by default. The old assert was not checking that USE and DEF were vector registers. The correct condition is guarded at the callsite of tryReduceVL.	2024-12-17 12:07:29 -05:00
Mikhail Goncharov	17b3dd03a0	[NVPTX][test] fix CodeGen/NVPTX/surf-write.ll ptxas needs a proper triplet for 133352feb30605ec51b15f77826ed3a2fbf8db56	2024-12-17 15:45:06 +01:00
Florian Hahn	c1f5937eb4	[SelectOpt] Support BinOps with SExt operands. (#115879 ) Building on top of https://github.com/llvm/llvm-project/pull/115489 extend support for binops with SExt operand. PR: https://github.com/llvm/llvm-project/pull/115879	2024-12-17 11:52:15 +00:00
SpencerAbson	908e30658d	[AArch64] Implement intrinsics for FP8 SME FMLAL/FMLALL (multi) (#119546 ) This patch implements the following intrinsics: Multi-vector 8-bit floating-point multiply-add long (multiple vectors). ``` c // Only if __ARM_FEATURE_SME_F8F16 != 0 void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); // Only if __ARM_FEATURE_SME_F8F32 != 0 void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm, fpm_t fpm) __arm_streaming __arm_inout("za"); ``` In accordance with https://github.com/ARM-software/acle/pull/323	2024-12-17 11:47:20 +00:00
Benjamin Maxwell	a7dafea384	[SDAG] Allow folding stack slots into sincos/frexp in more cases (#118117 ) This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to check that it is safe to fold a store into a node that will expand to a library call that takes output pointers. This requires checking for two (independent) properties: 1. The store is not within a CALLSEQ_START..CALLSEQ_END pair * If it is, the expansion would lead to nested call sequences (which is invalid) 2. The node does not appear as a predecessor to the store * If it does, attempting to merge the store into the call would result in a cycle in the DAG These two properties are checked as part of the same traversal in `canFoldStoreIntoLibCallOutputPointers()`	2024-12-17 10:54:17 +00:00

1 2 3 4 5 ...

56693 Commits