llvm-project

Author	SHA1	Message	Date
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Piotr Fusik	6e7312bda6	[RISCV] Select and/or/xor with certain constants to Zbb ANDN/ORN/XNOR (#120221 ) (and X, (C<<12\|0xfff)) -> (ANDN X, ~C<<12) (or X, (C<<12\|0xfff)) -> (ORN X, ~C<<12) (xor X, (C<<12\|0xfff)) -> (XNOR X, ~C<<12) Emits better code, typically by avoiding an `ADDI HI, -1` instruction. Co-authored-by: Craig Topper <craig.topper@sifive.com>	2024-12-19 21:38:20 +01:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
Konstantina Mitropoulou	d3508ccd15	[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588 ) - [AMDGPU] Add new test. - [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2024-12-19 11:20:43 -08:00
Justin Bogner	aa07f92210	[DirectX][SPIRV] Consistent names for HLSL resource intrinsics (#120466 ) Rename HLSL resource-related intrinsics to be consistent with the naming conventions discussed in [wg-hlsl:0014]. This is an entirely mechanical change, consisting of the following commands and automated formatting. ```sh git grep -l handle.fromBinding \| xargs perl -pi -e \ 's/(dx\|spv)(.)handle.fromBinding/$1$2resource$2handlefrombinding/g' git grep -l typedBufferLoad_checkbit \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferLoad_checkbit/$1$2resource$2loadchecked$2typedbuffer/g' git grep -l typedBufferLoad \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferLoad/$1$2resource$2load$2typedbuffer/g' git grep -l typedBufferStore \| xargs perl -pi -e \ 's/(dx\|spv)(.)typedBufferStore/$1$2resource$2store$2typedbuffer/g' git grep -l bufferUpdateCounter \| xargs perl -pi -e \ 's/(dx\|spv)(.)bufferUpdateCounter/$1$2resource$2updatecounter/g' git grep -l cast_handle \| xargs perl -pi -e \ 's/(dx\|spv)(.)cast.handle/$1$2resource$2casthandle/g' ``` [wg-hlsl:0014]: https://github.com/llvm/wg-hlsl/blob/main/proposals/0014-consistent-naming-for-dx-intrinsics.md	2024-12-19 12:17:21 -07:00
Sylvestre Ledru	395a369056	[Xtensa] Fix build after splitting SDNode::use_iterator Same as: 145ddf7ede28d9131a65b7f86ad07736a824ee21	2024-12-19 19:52:53 +01:00
Michael Maitland	3710050566	[RISCV][VLOPT] Set CommonVL as the largest of the users (#120349 ) Prior to this patch, we required that all users had the same VL in order to optimize. But as the FIXME said, we can use the largest VL to optimize, as long as we can determine what the largest is. This patch implements the FIXME.	2024-12-19 13:22:31 -05:00
Brox Chen	4044886c7c	Revert "[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586 )" (#120594 ) This reverts commit e0526b0780f56eede09b05a859a93626ecdc6e4d. The `v_minmax/maxmin_f16`(GFX11) needs to be updated to t16 with `v_minmax/maxmin_num_f16`(GFX12) together since they share the same codegen pattern. Revert the old patch and resubmit	2024-12-19 12:10:23 -05:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	145ddf7ede	[M68k] Fix build after splitting SDNode::use_iterator.	2024-12-19 08:55:58 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
Alex MacLean	310e798757	[NVPTX] Avoid introducing unnecessary ProxyRegs and Movs in ISel (#120486 ) Avoid introducing `ProxyReg` and `MOV` nodes during ISel when lowering `bitconvert` or similar operations. These nodes are all erased by a later pass but not introducing them in the first place is simpler and likely saves compile time. Also remove redundant `MOV` instruction definitions.	2024-12-19 07:55:03 -08:00
Jay Foad	a161e73fcc	[AMDGPU] Remove unnecessary casts to GCNSubtarget	2024-12-19 15:50:53 +00:00
Djordje Todorovic	3222060124	Reland "[RISCV] Add scheduling model for mips p8700 CPU" (#120550 ) This patch introduces a scheduling model for the MIPS p8700, an out-of-order RISC-V processor. The model includes pipelines for the following units: - 2 Integer Arithmetic/Logical Units (ALU and AL2) - Multiply/Divide Unit (MDU) - Branch Unit (CTI) - Load/Store Unit (LSU) - Short Floating-Point Pipe (FPUS) - Long Floating-Point Pipe (FPUL) For additional details, refer to the official product page: https://mips.com/products/hardware/p8700/. Also adds `UnsupportedSchedZfhmin` to handle cases like `WriteFCvtF16ToF32` that previously caused build failures.	2024-12-19 14:26:43 +01:00
Benjamin Maxwell	ca98a3d9bb	[AArch64][SVE] Use SVE for scalar FP converts in streaming[-compatible] functions (1/n) (#118505 ) In streaming[-compatible] functions, use SVE for scalar FP conversions to/from integer types. This can help avoid moves between FPRs and GRPs, which could be costly. This patch also updates definitions of SCVTF_ZPmZ_StoD and UCVTF_ZPmZ_StoD to disallow lowering to them from ISD nodes, as doing so requires creating a [U\|S]INT_TO_FP_MERGE_PASSTHRU node with inconsistent types. Follow up to #112213. Note: This PR does not include support for f64 <-> i32 conversions (like #112564), which needs a bit more work to support.	2024-12-19 13:16:31 +00:00
Simon Pilgrim	9bb1d0369c	[X86] getShuffleCost - when splitting shuffles, if a whole vector source is just copied we should treat this as free. (#120561 ) If the shuffle split results in referencing a single legalised whole vector (i.e. no permutation), then this can be treated as free. We already do something similar for broadcasts / whole subvector insertion + extraction - its purely an issue for register allocation.	2024-12-19 12:55:44 +00:00
Feng Zou	eb812d28f5	[X86] Put R20/R21/R28/R29 later in GR64 list (#120510 ) Because these registers require an extra byte to encode in certain memory form. Putting them later in the list will reduce code size when EGPR is enabled. And align the same order in GR8, GR16 and GR32 lists. Example: movq (%r20), %r11 # encoding: [0xd5,0x1c,0x8b,0x1c,0x24] movq (%r22), %r11 # encoding: [0xd5,0x1c,0x8b,0x1e]	2024-12-19 20:16:34 +08:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
Jay Foad	056e5eccaf	[AMDGPU] Remove unneeded use of !dag. NFC. (#120546 )	2024-12-19 11:01:59 +00:00
Matt Arsenault	5fb8d70e5f	ARM: Handle vldrh and vstrh in stack access hooks (#120527 )	2024-12-19 17:55:19 +07:00
SpencerAbson	30f386cb4d	[AArch64] Fixup destructive floating-point precision conversions (#118788 ) This patch changes the zeroing forms of `FCVTXNT`, `FCVTNT`, and `BFCVTNT` such that their destination operand is also listed as a dag input. These narrowing down-conversions leave the even elements of the destination vector unchanged, regardless of the predicate type. This patch also makes the merging form of `BFCVTNT` non-movprfx'able. - `FCVTXNT` - [Arm Developer](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FCVTXNT--Floating-point-down-convert--rounding-to-odd--top--predicated--?lang=en) - `FCVTNT` - [Arm Developer](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FCVTNT--predicated---Floating-point-down-convert-and-narrow--top--predicated--?lang=en) - `BFCVTNT` - [Arm Developer](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/BFCVTNT--Floating-point-down-convert-and-narrow-to-BFloat16--top--predicated--?lang=en)	2024-12-19 10:45:07 +00:00
David Green	e020f46027	[ARM] Fix BF16 lowering with FullFP16 This adds test coverage for bf16 instructions, making sure that lowering bf16 works with and without +fullfp16.	2024-12-19 10:20:35 +00:00
David Sherwood	eaf482f012	[AArch64] Tweak truncate costs for some scalable vector types (#119542 ) == We were previously returning an invalid cost when truncating anything to <vscale x 2 x i1>, which is incorrect since we can generate perfectly good code for this. == The costs for truncating legal or unpacked types to predicates seemed overly optimistic. For example, when truncating <vscale x 8 x i16> to <vscale x 8 x i1> we typically do something like and z0.h, z0.h, #0x1 cmpne p0.h, p0/z, z0.h, #0 I guess it might depend upon whether the input value is generated in the same block or not and if we can avoid the inreg zero-extend. However, it feels safe to take the more conservative cost here. == The costs for some truncates such as trunc <vscale x 2 x i32> %a to <vscale x 2 x i16> were 1, whereas in actual fact they are free and no instructions are required. == Also, for this trunc <vscale x 8 x i32> %a to <vscale x 8 x i16> it's just a single uzp1 instruction so I reduced the cost to 1. In general, I've added costs for all cases where the destination type is legal or unpacked. One unfortunate side effect of this is the costs for some fixed-width truncates when using SVE now look too optimistic.	2024-12-19 10:07:41 +00:00
Simon Pilgrim	976f877388	[X86] ExtendToType - directly initialize SmallVector with build vector operands. NFC. Don't push_back the operands separately.	2024-12-19 09:41:01 +00:00
Simon Pilgrim	431975b630	[X86] LowerShift - directly initialize SmallVector with build vector operands. NFC. Don't push_back the operands separately.	2024-12-19 09:41:00 +00:00
Daniil Kovalev	3c661cf03a	[PAC][MC][ELF][AArch64] Support signed TLSDESC (#120010 ) Support the following relocations and assembly operators: - `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21` (`:tlsdesc_auth:` for `adrp`) - `R_AARCH64_AUTH_TLSDESC_LD64_LO12` (`:tlsdesc_auth_lo12:` for `ldr`) - `R_AARCH64_AUTH_TLSDESC_ADD_LO12` (`:tlsdesc_auth_lo12:` for `add`)	2024-12-19 12:40:33 +03:00
Kerry McLaughlin	9829598933	[AArch64][SME2] Extend getRegAllocationHints for ZPRStridedOrContiguousReg (#119865 ) ZPR2StridedOrContiguous loads used by a FORM_TRANSPOSED_REG_TUPLE pseudo should attempt to assign a strided register to avoid unnecessary copies, even though this may overlap with the list of SVE callee-saved registers.	2024-12-19 09:40:13 +00:00
Djordje Todorovic	9fa109a508	Revert "[RISCV] Add scheduling model for mips p8700 CPU" (#120537 ) Reverts llvm/llvm-project#119885 llvm-project/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td:20:5: error: Processor does not define resources for WriteFCvtF32ToF16 def MIPSP8700Model : SchedMachineModel {	2024-12-19 10:01:46 +01:00
Djordje Todorovic	0f9257b9ab	[RISCV] Add scheduling model for mips p8700 CPU (#119885 ) Depends on #119882.	2024-12-19 09:52:16 +01:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Pengcheng Wang	2c782ab271	[RISCV] Add software pipeliner support (#117546 ) This patch adds basic support of `MachinePipeliner` and disable it by default. The functionality should be OK and all llvm-test-suite tests have passed.	2024-12-19 13:00:08 +08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
Craig Topper	dc72ec808d	[RISCV] Custom legalize vp.merge for mask vectors. (#120479 ) The default legalization uses vmslt with a vector of XLen to compute a mask. This doesn't work if the type isn't legal. For fixed vectors it will scalarize. For scalable vectors it crashes the compiler. This patch uses an alternate strategy that promotes the i1 vector to an i8 vector and does the merge. I don't claim this to be the best lowering. I wrote it quickly almost 3 years ago when a crash was reported in our downstream. Fixes #120405.	2024-12-18 19:19:14 -08:00
Brox Chen	e0526b0780	[AMDGPU][True16][MC] true16 for v_minmax/maxmin_f16 (#119586 ) Support true16 format for v_minmax/maxmin_f16 in MC. Since we are replacing `v_minmax/maxmin_f16` to `v_minmax/maxmin_f16_t16 / v_minmax/maxmin_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_minmax/maxmin_f16` to get CodeGen test passing.	2024-12-18 18:04:50 -05:00
Brox Chen	e10b12e656	[AMDGPU][True16][MC] true16 for v_div_fixup_f16 (#119613 ) Support true16 format for v_div_fixup_f16 in MC.	2024-12-18 18:01:13 -05:00
Brox Chen	dc0ea0f945	[AMDGPU][True16][MC] true16 for v_cvt_pknorm_i16/u16_f16 (#119605 ) Support true16 format for v_cvt_pknorm_i16/u16_f16 in MC.	2024-12-18 17:56:34 -05:00
Farzon Lotfi	6457aee5b7	[DirectX] Bug fix for Data Scalarization crash (#118426 ) Two bugs here. First calling `Inst->getFunction()` has undefined behavior if the instruction is not tracked to a function. I suspect the `replaceAllUsesWith` was leaving the GEPs in a weird ghost parent situation. I switched up the visitor to be able to `eraseFromParent` as part of visiting and then everything started working. The second bug was in `DXILFlattenArrays.cpp`. I was unaware that you can have multidimensional arrays of `zeroinitializer`, and `undef` so fixed up the initializer to handle these two cases. fixes #117273	2024-12-18 16:33:49 -05:00
Justin Bogner	9b3d85f0f4	[DirectX] TypedUAVLoadAdditionalFormats shader flag (#120477 ) Set the TypedUAVLoadAddtionalFormats flag if the shader contains a load from a multicomponent UAV. Fixes #114557	2024-12-18 13:42:12 -07:00
Justin Bogner	bfd05102d8	[DirectX] Lower ops after translating metadata (#120157 ) Move the DXILOpLoweringPass after DXILTranslateMetadata, and add asserts in DXILShaderFlags to ensure it isn't scheduled after op lowering. This will allow us to rely on DirectX intrinsics in the shader flags analysis rather than having to recover information from lowered operations. Fixes #120119.	2024-12-18 12:03:05 -07:00
Jun Wang	d57230c72e	[AMDGPU][MC] Disallow op_sel in some VOP3P dot instructions (#100485 ) In v_dot4 and v_dot8 instructions with 4- or 8-bit packed data (e.g., v_dot4_u32_u8, v_dot8_u32_u4), the op_sel modifier should not be allowed.	2024-12-18 10:50:47 -08:00
Brox Chen	c6f753b9a0	[AMDGPU][True16][MC] true16 for v_pack_b32_f16 (#119630 ) Support true16 format for v_pack_b32_f16 in MC. Since we are replacing v_alignbit_b32 to `v_pack_b32_f16_t16/v_pack_b32_f16_fake16` in Post-GFX11, have to update the CodeGen pattern for `v_pack_b32_f16_fake16 `to get CodeGen test passing. There is no pattern modified/created, but just replacing the `v_pack_b32_f16` with fake16 format. Some of the true16 CodeGen test are impacted since `v_pack_b32_f16` selection are removed in Post-GFX11 while `v_pack_b32_f16_t16` are not yet supported. The CodeGen patch for `v_pack_b32_f16_t16` will be done is the following patch.	2024-12-18 13:28:42 -05:00
Brox Chen	c3241a9a4d	[AMDGPU][True16][MC] test update for v_subrev_f16 in true16 (#119315 ) This is a NFC change. Update mc test for v_subrev_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:01:08 -05:00
Brox Chen	5270e63cdc	[AMDGPU][True16][MC] test update for v_ldexp_f16 in true16 (#119313 ) This is a NFC change. Update mc test for v_ldexp_f16 in true16 format. MC source change was done by previous patch and automatically enabled by t16 pesudo	2024-12-18 13:00:07 -05:00
Justin Bogner	0e2466f624	[DirectX] Create symbols for resource handles (#119775 ) We need to create symbols with "the original shape of resource and element type" to put in the resource metadata in order to generate valid DXIL. Note that DXC generally doesn't emit an actual symbol outside of library shaders (it emits an undef of a pointer to the type), but since we have to deal with opaque pointers we would need a way to smuggle the type through to match that. Instead, we simply emit symbols for now. Fixed #116849	2024-12-18 10:47:12 -07:00
Justin Bogner	0fca76d576	[DirectX] Introduce the DXILResourceAccess pass (#116726 ) This pass transforms resource access via `llvm.dx.resource.getpointer` into buffer loads and stores. Fixes #114848.	2024-12-18 10:13:45 -07:00
Simon Pilgrim	49fd2dde21	[X86] LowerShift - don't prematurely lower to x86 vector shift imm instructions (#120282 ) When splitting 2 unique amount shifts to shuffle(shift(x,c1),shift(x,c2)), don't use getTargetVShiftByConstNode directly to lower, use generic shifts to ensure we make use of any further canonicalization: shl(X,1) to add(X,X) etc. - this can have notably better throughput on some x86 targets. Noticed on #120270	2024-12-18 16:08:45 +00:00
Justin Bogner	3eca15cbb9	[DirectX] Split resource info into type and binding info. NFC (#119773 ) This splits the DXILResourceAnalysis pass into TypeAnalysis and BindingAnalysis passes. The type analysis pass is made immutable and populated lazily so that it can be used earlier in the pipeline without needing to carefully maintain the invariants of the binding analysis. Fixes #118400	2024-12-18 09:02:28 -07:00
Florian Hahn	76714be5fd	Revert "Add support for single reductions in ComplexDeinterleavingPass (#112875 )" This reverts commit b3eede5e1fa7ab742b86e9be22db7bccd2505b8a. This has been breaking most AArch64 stage2 builds for 4+ hours, reverting to get the bots back to green. https://lab.llvm.org/buildbot/#/builders/41/builds/4172 https://lab.llvm.org/buildbot/#/builders/4/builds/4281 https://lab.llvm.org/buildbot/#/builders/199/builds/263 https://lab.llvm.org/buildbot/#/builders/198/builds/334 https://lab.llvm.org/buildbot/#/builders/143/builds/4276 https://lab.llvm.org/buildbot/#/builders/17/builds/4725	2024-12-18 15:06:52 +00:00
Andrei Safronov	c6967efe78	[Xtensa] Implement Code Density Option. (#119639 ) The Code Density option adds 16-bit encoding for frequently used instructions.	2024-12-18 15:37:08 +03:00
Simon Pilgrim	dd8e1adbf2	[X86] LowerShift - track the number and location of constant shift elements. (#120270 ) We have several vector shift lowering strategies that have to analyse the distribution of non-uniform constant vector shift amounts, at the moment there is very little sharing of data between these analysis. This patch creates a SmallDenseMap of the different LEGAL constant shift amounts used, with a mask of which elements they are used in. So far I've only updated the shuffle(immshift(x,c1),immshift(x,c2)) lowering pattern to use it for clarity, there's several more that can be done in followups. Its hoped that the proposed patch #117980 can be simplified after this patch as well. vec_shift6.ll - the existing shuffle(immshift(x,c1),immshift(x,c2)) lowering bails on out of range shift amounts, while this patch now skips them and treats them as UNDEF - this means we manage to fold more cases that before would have to lower to a SHL->MUL pattern, including some legalized cases.	2024-12-18 11:36:54 +00:00

1 2 3 4 5 ...

81823 Commits