llvm-project

Author	SHA1	Message	Date
Sameer Sahasrabuddhe	9c1b82599d	[AAPointerInfo] handle multiple offsets in PHI Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95 Reapplying because this commit is NOT DEPENDENT on the reverted commit fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information. The arguments to a PHI may represent a recurrence by eventually using the output of the PHI itself. This is now handled by checking for cycles in the control flow. If a PHI is not in a recurrence, it is now able to report multiple offsets instead of conservatively reporting unknown. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138991	2022-12-18 10:51:20 +05:30
Dinar Temirbulatov	7bce66edc6	[AArch64][SVE] Allow to lower WHILEop with constant operands to PTRUE This allows it to fold WHILEop with constant operand to PTRUE instruction in the case given range is fitted to predicate format. Also, this change fixes the unsigned overflow error introduced in D137547 for WHILELO lowering. Differential Revision: https://reviews.llvm.org/D139068	2022-12-18 01:27:03 +00:00
Ganesh Gopalasubramanian	1f057e365f	[X86] AMD Zen 4 Initial enablement Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D139073	2022-12-17 16:15:22 +05:30
Craig Topper	da7415acda	[RISCV] Add support for predicating AND/OR/XOR/ADD/SUB with short-forward-branch-opt. sifive-7-series can predicate ALU instructions in the shadow of a branch not just move instructions. This patch implements analyzeSelect/optimizeSelect to predicate these operations. This is based on ARM's implementation which can predicate using flags and condition codes. I've restricted it to just the instructions we have test cases for, but it can be extended in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D140053	2022-12-16 22:58:43 -08:00
Christudasan Devadasan	40ba0942e2	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which isn an unproblematic case. This patch also implements the whole wave spills which might occur if RA spills any live range of virtual registers involved in the whole wave operations. Earlier, we had been hand-picking registers for such machine operands. But now with SGPR spills into virtual VGPR lanes, we are exposing them to the allocator. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124196	2022-12-17 11:56:32 +05:30
Christudasan Devadasan	29247824f5	[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills Unlike the callee-saved VGPR spill instructions emitted by `PEI::spillCalleeSavedRegs`, the CS VGPR spills inserted during emitPrologue/emitEpilogue require the exec bits flipping to avoid clobbering the inactive lanes of VGPRs used for SGPR spilling. Currently, these spill instructions are referenced from the SP at function entry and when the callee performs a stack realignment, they ended up getting incorrect stack offsets. Even if we try to adjust the offsets, the FP-SP becomes a runtime entity with dynamic stack realignment and the offsets would still be inaccurate. To fix it, use FP as the frame base in the spill instructions whenever the function has FP. The offsets obtained for the CS objects would always be the right values from FP. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134949	2022-12-17 11:52:36 +05:30
Christudasan Devadasan	7a72a93580	[AMDGPU] Preserve only the inactive lanes of scratch vgprs In general, a callee is free to use a scratch register without preserving its previous state. However, the VGPR used for SGPR spilling can potentially have its inactive lanes overwritten by the writelane instructions. When the function returns, it can cause unexpected behavior if the VGPR value is not preserved appropriately. The current scheme to preserve the inactive lanes of such scratch VGPRs is not done rightly. It preserves all lanes and causes the outgoing values (if any) getting overwritten by the epilog restores. It then corrupts the return value. To avoid such situation with scratch VGPRs, this patch ensures we preserve only their inactive lanes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134526	2022-12-17 11:51:43 +05:30
Christudasan Devadasan	20a940f1e2	[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores There is a lot of customization and eventually code duplication in the frame lowering that handles special SGPR spills like the one needed for the Frame Pointer. Incorporating any additional SGPR spill currently makes it difficult during PEI. This patch introduces a new spill builder to efficiently handle such spill requirements. Various spill methods are special handled using a separate class. Reviewed By: sebastian-ne, scott.linder Differential Revision: https://reviews.llvm.org/D132436	2022-12-17 11:50:25 +05:30
Christudasan Devadasan	b25b4c0ab4	[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI SILowerSGPRSpills pass handles the lowering of SGPR spills into VGPR lanes. Some SGPR spills are handled later during PEI. There is a common function used in both places to find the free VGPR lane. This patch eliminates that dependency to find the free VGPR by handling it separately for PEI. It is a prerequisite patch for a future work to allow SGPR spills to virtual VGPR lanes during SILowerSGPRSpills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124195	2022-12-17 11:49:41 +05:30
Christudasan Devadasan	5ebe91fcb2	[AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog We always assume the vector register is dead or killed while inserting the VGPR spills in the prolog. It is not always true. Used the entry block liveIn data while setting the flag. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D124194	2022-12-17 11:48:44 +05:30
Christudasan Devadasan	5692a7e84e	[AMDGPU] Callee must always spill writelane VGPRs Since the writelane instruction used for SGPR spills can modify inactive lanes, the callee must preserve the VGPR this instruction modifies even if it was marked Caller-saved. Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D124192	2022-12-17 11:11:42 +05:30
Yingchi Long	0359c19e8f	[RISCV][VP] support vp.reduce.mul by ExpandVectorPredication Most of VP intrinsics are implemented in RISC-V backends, but vp.reduce.mul (element length > 1) does not yet. Legalizes vp.reduce.mul using ExpandVectorPredication Pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139721	2022-12-17 10:49:47 +08:00
Roman Lebedev	428f36401b	Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17, and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341. The miscompile was in InstCombine, and it has been addressed. This tries to approach the problem noted by @arsenm: terrible codegen for `__builtin_fpclassify()`: https://godbolt.org/z/388zqdE37 Just because the PHI in the common successor happens to have different incoming values for these two blocks, doesn't mean we have to give up. It's quite easy to deal with this, we just need to produce a select: https://alive2.llvm.org/ce/z/000srb Now, the cost model for this transform is rather overly strict, so this will basically never fire. We tally all (over all preds) the selects needed to the NumBonusInsts Differential Revision: https://reviews.llvm.org/D139275	2022-12-17 05:18:54 +03:00
Mitch Phillips	525d6c54b5	Revert "[AAPointerInfo] handle multiple offsets in PHI" This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:55:48 -08:00
Mitch Phillips	7928a6387f	Revert "Revert "[AAPointerInfo] handle multiple offsets in PHI"" This reverts commit 12696d302d146ffe616eecab3feceba9d29be2db. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:55:38 -08:00
Mitch Phillips	8b446ea2ba	Revert "[AAPointerInfo] handle multiple offsets in PHI" This reverts commit 179ed8871101cd197e0a719a3629cd5077b1a999. Reason: This change is dependent on a commit that needs to be rolled back because it broke the ASan buildbot. See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for more information.	2022-12-16 17:54:44 -08:00
Jeffrey Byrnes	4d2faf043b	[AMDGPU][SIFrameLowering] Mark VGPR used for AGPR spills as reserved Presently, there is an issue on MI100 (and probably other architecture) where the VGPR used for AGPR copies clobbers VGPR used for AGPR spill. AFAICT this is because in processFunctionBeforeFrameIndicesReplaced we think the VGPR register for AGPR spill is unused. This patch aims to correct that. This is a WIP while I work out issues with producing a good test. For now, I'm curious if this is generally a good / bad idea. Differential Revision: https://reviews.llvm.org/D139673	2022-12-16 12:00:51 -08:00
Roman Lebedev	96d3c82645	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)" While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PPV big-endian bots are still unhappy. https://lab.llvm.org/buildbot/#/builders/93/builds/12560 This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.	2022-12-16 22:58:41 +03:00
Roman Lebedev	cfd594f8bb	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3) * This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f, * which was reverted in 25f01d593ce296078f57e872778b77d074ae5888, because it exposed a miscompile in PPC backend, which was resolved in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5. * which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, * which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, 5and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-12-16 19:27:38 +03:00
Alexander Kornienko	37b8f09a4b	Revert "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit 1bd0b82e508d049efdb07f4f8a342f35818df341, since it leads to miscompiles. See https://reviews.llvm.org/D139275#3993229 and https://reviews.llvm.org/D139275#4001580.	2022-12-16 17:23:35 +01:00
Nemanja Ivanovic	cb3f415cd2	[PowerPC] Fix up memory ordering after combining BV to a load The combiner for BUILD_VECTOR that merges consecutive loads into a wide load had two issues: - It didn't check that the input loads all have the same input chain - It didn't update nodes that are chained to the original loads to be chained to the new load This caused issues with bootstrap when 3c4d2a03968ccf5889bacffe02d6fa2443b0260f was committed. This patch fixes the issue so it can unblock this commit. Differential revision: https://reviews.llvm.org/D140046	2022-12-16 08:57:36 -06:00
Archibald Elliott	82b51a1428	[AArch64] Support SLC in ACLE prefetch intrinsics This change: - Modifies the ACLE code to allow the new SLC value (3) for the prefetch target. - Introduces a new intrinsic, @llvm.aarch64.prefetch which matches the PRFM family instructions much more closely, and can represent all values for the PRFM immediate. The target-independent @llvm.prefetch intrinsic does not have enough information for us to be able to lower to it from the ACLE intrinsics correctly. - Lowers the acle calls to the new intrinsic on aarch64 (the ARM lowering is unchanged). - Implements code generation for the new intrinsic in both SelectionDAG and GlobalISel. We specifically choose to continue to support lowering the target-independent @llvm.prefetch intrinsic so that other frontends can continue to use it. Differential Revision: https://reviews.llvm.org/D139443	2022-12-16 14:42:27 +00:00
Jay Foad	e2bcdca527	[AMDGPU] Generate permlane test checks	2022-12-16 11:30:01 +00:00
Weining Lu	ba9ed24b03	[LoongArch] Add tests showing the optimization pipeline Other targets like ARM, AArch64, RISCV and X86 have similar tests. `O1`, `O2` and `O3` appear to be the same for now. But in future, some passes may be disabled at lower levels (e.g. `O1`). Hoping we can use FileCheck prefixes for differences to avoid repeating the contents 3 times. Reviewed By: xen0n, MaskRay Differential Revision: https://reviews.llvm.org/D139499	2022-12-16 18:09:50 +08:00
Yeting Kuo	982a586ab4	[RISCV] Emit .variant_cc directives for vector function calls. The patch is splitted from D103435. The patch emits .variant_cc [0] for those function calls that have vector arguments or vector return values. [0]: https://github.com/riscv/riscv-elf-psabi-doc/pull/190 Initial authored by: HsiangKai Reviewed By: reames Differential Revision: https://reviews.llvm.org/D139414	2022-12-16 13:51:39 +08:00
Matt Arsenault	b5edd522d1	AMDGPU/GlobalISel: Do not create readfirstlane with non-s32 type We should probably handle any 32-bit type here, but the intrinsic definition and selection pattern currently do not. Avoids a few lit tests failures when switched on by default.	2022-12-15 21:44:07 -05:00
Alexander Timofeev	2877b87666	[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant Sometimes we have a constant value loaded to VGPR. In case we further need to rematrerialize it in the physical scalar register we may avoid VGPR to SGPR copy replacing it with S_MOV_B32. Reviewed By: JonChesterfield, arsenm Differential Revision: https://reviews.llvm.org/D139874	2022-12-16 00:38:10 +01:00
Kazu Hirata	3442309138	[mlgo] Use have_tflite instead of have_tf_api We are in the process of retiring LLVM_HAVE_TF_API in favor of LLVM_HAVE_TFLITE. This patch takes care of the transition in llvm/test. Differential Revision: https://reviews.llvm.org/D140133	2022-12-15 13:54:25 -08:00
Kai Nacke	110340c687	[PowerPC][GIsel] Materialize i64 constants. Adds support for i64 constant. It uses the same pattern-based approach as in SDAG (see PPCISelDAGToDAG::selectI64ImmDirect(), PPCISelDAGToDAG::selectI64Imm()). It does not support the prefixed instructions. Reviewed By: arsenm, tschuett Differential Revision: https://reviews.llvm.org/D140119	2022-12-15 21:22:58 +00:00
Kevin Athey	ec7cffc579	Revert "Revert "[AArch64][GlobalISel][Legalizer] Legalize G_SHUFFLE_VECTOR with different lengths"" This reverts commit 192cc76e0be688106492989cd845ba786a7ae36d. Reverted Revert, as build was fixed while I was examining.	2022-12-15 11:19:24 -08:00
Kevin Athey	192cc76e0b	Revert "[AArch64][GlobalISel][Legalizer] Legalize G_SHUFFLE_VECTOR with different lengths" This reverts commit 4c52fb1a5ee20846627d16e38f5dec08c08f8884. Breaks sanitizer ubsan buildbot: https://lab.llvm.org/buildbot/#/builders/85/builds/12983	2022-12-15 11:15:55 -08:00
Craig Topper	992bee045b	[RISCV] Teach RISCVSExtWRemoval to remove sext.w whose upper bits aren't demanded. SelectionDAG aggressively creates sext_inreg operations after promoting an i32 add. If the add is later matched to a sh1add, sh2add or sh3add, a sext.w from the sext_inreg will get left behind. In many cases we can prove this sext.w is unnecessary by checking if its upper bits are ever used.	2022-12-15 11:01:20 -08:00
Christudasan Devadasan	229c466bc8	[AMDGPU] Test fixup Changing cast_lds_gv into a kernel function to lower the LDS usage appropriately. The LDS lowering is currently won't happen for orphan device functions.	2022-12-15 23:36:55 +05:30
Ron Lieberman	38f1abef86	Revert "[SelectionDAG] Do not second-guess alignment for alloca" Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193 23491 This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.	2022-12-15 10:55:18 -06:00
Simon Pilgrim	37c3b83bd8	[X86] combineBitcastvxi1 - handle boolmask sign-extension through vselect See if we can freely sign-extend both sources of a vselect operand, also handle allones constant build vectors (easily rematerializable and uses in the test case). Fixes #59526	2022-12-15 16:40:44 +00:00
Simon Pilgrim	4da6a983ad	[X86] Add test case for Issue #59526	2022-12-15 16:19:41 +00:00
Philip Reames	90f9168307	[RISCV][InsertVSETVLI] Mutate prior vsetvli AVL if doing so allows us to remove a toggle This extends the backwards walk to allow mutating the previous vsetvl's AVL value if it was not used by any instructions in between. In practice, this mostly benefits vmv.x.s and fvmv.f.s patterns since vector instructions which ignore VL are rare. Differential Revision: https://reviews.llvm.org/D140048	2022-12-15 07:32:28 -08:00
Nilanjana Basu	02d09ffc1b	[AArch64] Extending lowering of 'trunc <(8\|16) x i64> %x to <(8\|16) x i8>' to use tbl instructions [AArch64] Patch for lowering trunc instructions to 'tbl' for (8\|16)xi32 -> (8\|16)xi8 conversions in https://reviews.llvm.org/D133495 is extended to support trunc to tbl lowering for (8\|16) x i64 to (8\|16) x i8. A microbenchmark for runtime for these transformations is added in https://reviews.llvm.org/D136274 Reviewed by: fhahn, t.p.northover Differential Revision: https://reviews.llvm.org/D135229	2022-12-15 20:50:40 +05:30
Nilanjana Basu	a645ec0d3d	[AArch64] Extra unit tests for trunc lowering of vectors These tests show code generation for vectorized trunc lowering from i16 to i8 in AArch64. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D137293	2022-12-15 20:50:40 +05:30
Andrew Savonichev	ffedf47d8b	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2022-12-15 18:18:12 +03:00
Benjamin Maxwell	3010f60381	Reland "[TargetLowering] Teach DemandedBits about VSCALE" Reland with a fixup to avoid converting APInts to int64_t which allowed for overflows (UB) with sufficiently high/low multiplier values. This allows DemandedBits to see the result of VSCALE will be at most VScaleMax * some compile-time constant. This relies on the vscale_range() attribute being present on the function, with a max set. (This is done by default when clang is targeting AArch64+SVE). Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated. Differential Revision: https://reviews.llvm.org/D138508	2022-12-15 13:50:02 +00:00
Anton Sidorenko	1cdffa359a	[MachineCombiner][RISCV] Support inverse instructions reassociation This patch adds reassociation of FADD/FSUB instruction pairs. Differential Revision: https://reviews.llvm.org/D138660	2022-12-15 16:48:30 +03:00
Juan Manuel MARTINEZ CAAMAÑO	4d852374b1	[DAGCombine] Fix always true condition in combineShiftToMULH Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D139550	2022-12-15 13:04:42 +01:00
Vladislav Dzhidzhoev	4c52fb1a5e	[AArch64][GlobalISel][Legalizer] Legalize G_SHUFFLE_VECTOR with different lengths Legalize G_SHUFFLE_VECTOR having destination vector length greater than source vector length by reshaping source vectors. Partial implementation of SelectionDAGBuilder::visitShuffleVector. Differential Revision: https://reviews.llvm.org/D132190	2022-12-15 15:03:34 +03:00
Benjamin Maxwell	20b29a59c5	Revert "[TargetLowering] Teach DemandedBits about VSCALE" This reverts commit c165b0553a96394b9bbf3984782703cdae99821d.	2022-12-15 11:29:34 +00:00
Luke Lau	0cd9c51766	[WebAssembly] Use ComplexPattern on remaining memory instructions This continues the refactoring work of selecting offset + address operands with the AddrOpsN pattern, previously called LoadOpsN. This is not an NFC, since constant addresses are now folded into the offset in more places for v128.storeN_lane. Differential Revision: https://reviews.llvm.org/D139950	2022-12-15 10:20:06 +00:00
Anton Sidorenko	37f9eec142	[RISCV] Allow conversion of fp divisions to fp multiplications by the reciprocal If the divisor is repeated at least twice, we will convert the FDIVs to the calculation of the reciprocal and FMULs. We perform the transformation only under fast-math mode. FDIVs must have 'arcp' flag. Differential Revision: https://reviews.llvm.org/D140024	2022-12-15 13:00:36 +03:00
Anton Sidorenko	619f455dee	[RISCV] Precommit test for D140024 Simple test to check converson of repeated fp divisors.	2022-12-15 12:58:21 +03:00
YunQiang Su	9739bb81ae	MIPS: fix build from IR files, nan2008 and FpAbi When we use llc or lld to compiler IR files, the features +nan2008 and +fpxx/+fp64 are not used. Thus wrong format files are produced. In IR files, the attributes are only set for function while not the whole compile units. So we output `.nan 2008` and `.module fp=xx/64` before every function. `isFPXXDefault`: for o32, the FPXX should always be the default, no matter about the vendors. Of course some distributions with FP64 default enabled should be listed explicit. Let's add them in future if we know about one. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D138179	2022-12-15 09:04:36 +00:00
esmeyi	2e8c7f6527	[XCOFF] adjust the Fixedvalue for R_RBR relocations. Summary: Currently we get a wrong fixed value for R_RBR relocations when -ffunction-sections enabled. This patch fixes this. Reviewed By: DiggerLin, shchenz Differential Revision: https://reviews.llvm.org/D138982	2022-12-15 01:56:53 -05:00

1 2 3 4 5 ...

46156 Commits