llvm-project

Author	SHA1	Message	Date
hezuoqiang	f4ba1db5bf	[GlobalISel] Fix the error transformation of BRCOND to BCC Fix https://github.com/llvm/llvm-project/issues/62309 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D150527	2023-07-13 21:20:01 +08:00
Simon Pilgrim	8d598531b3	Revert rGf269877dc30777354be8a512e871aba1b1f9fd7a "[X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements" This appears to be causing some infinite loops (and is particularly bad when D152928 is applied).	2023-07-13 14:11:54 +01:00
Nikita Popov	7025ac81f0	[X86] Don't elide argument copies for scalarized vectors (PR63475) When eliding argument copies, the memory layout between a plain store of the type and the layout of the argument lowering on the stack must match. For multi-part argument lowerings, this is not necessarily the case. The code already tried to prevent this optimization for "scalarized and extended" vectors, but the check for "extends" was incomplete. While a scalarized vector of i32s stores i32 values on the stack, these are stored in 8 byte stack slots (on x86_64), so effectively have padding. Rather than trying to add more special cases to handle this (which is not straightforward), I'm going in the other direction and exclude scalarized vectors from this optimization entirely. This seems like a rare case that is not worth the hassle -- the complete lack of test coverage is not reassuring either. Fixes https://github.com/llvm/llvm-project/issues/63475. Differential Revision: https://reviews.llvm.org/D154078	2023-07-13 14:49:48 +02:00
Nikita Popov	f78a06ef11	[X86] Remove out of range extract in test (NFC) As pointed out in https://reviews.llvm.org/D154078#inline-1500915.	2023-07-13 14:46:29 +02:00
Maurice Heumann	d1fc8f7211	[X86] Prevent infinite loop in SelectionDAG when lowering negations In certain cases, lowering negations can cause an infinite loop in SelectionDAG on X86. The following snippet shows that behaviour: https://godbolt.org/z/5hP45T4hY What happens is that ADD(XOR(..., -1), 1) is detected as the two's complement and transformed into SUB(0, ...) However, immediates can not be encoded as the LHS of a SUB on X86. Therefore it is transformed back into an ADD/XOR pair, which is then again transformed into a SUB and so on. In that specific case, I still think it is valid to display this as a SUB(0,...) , because it should eventually be lowered as a NEG. Which seems better than an ADD/XOR pair. Adding an exception to the X86 specific handling for SUBs with 0 LHS operand fixes this infinite loop. Differential Revision: https://reviews.llvm.org/D154575	2023-07-13 12:20:44 +01:00
David Green	50378a16d4	[AArch64] Extra tablegen patterns for smaller extracted addl/addw/subl/subw During lowering, especially of smaller vector types, we can end up with `add (extract_subvector(zext(x), extract_subvector(zext(y))`, which can be turned into `extract_subvector(add(zext(y), zext(x)))`, which can use the addl AArch64 instruction. This adds some tablegen patterns for it, along with addw where only one operand is an extract/extend and subl/subw. Differential Revision: https://reviews.llvm.org/D153632	2023-07-13 11:44:17 +01:00
Luke Lau	ed15e9119b	[RISCV] Don't fold vmerge into ops if fp exception can be raised We are already checking for fp exceptions if VL changes, but I believe we should also be checking for them if the mask changes as well, since that also affects the set of active elements. From the spec: > A vector floating-point exception at any active floating-point element sets > the standard FP exception flags in the fflags register. Inactive elements do > not set FP exception flags. Note that we don't change the mask if IsMasked is true, i.e. True is masked already, since in that case we keep the existing mask. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D154980	2023-07-13 11:42:23 +01:00
Luke Lau	becfb4612a	[RISCV] Add test for vmerge combine that should be prevented The fadd in these test cases is constrained and may set fflags differently depending on the active elements (the nofpexcept flag isn't set on the node). Therefore to preserve semantics we shouldn't change its mask. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D154979	2023-07-13 11:42:20 +01:00
Jon Chesterfield	9418c40af7	[amdgpu][lds] Raise an explicit unimplemented error on absolute address LDS variables These aren't implemented. They could be at moderate implementation complexity. Raising an error is better than silently miscompiling. Patching now because the patch at D155125 is a step towards using this metadata more extensively as part of the lowering path and that will interact badly with input variables with this annotation. Lowering user defined variables at specific addresses would drop this error, put them at the requested position in the frame during this pass, and then use the same codegen that will be used for the kernel specific struct shortly. Reviewed By: jmmartinez Differential Revision: https://reviews.llvm.org/D155132	2023-07-13 11:32:03 +01:00
Simon Pilgrim	451af63551	[X86] Remove combineVectorTruncation and delay general vector trunc to lowering Stop folding vector truncations to PACKSS/PACKUS patterns prematurely - another step towards Issue #63710. We still prematurely fold to PACKSS/PACKUS if there are sufficient signbits, that will be addressed in a later patch when we remove combineVectorSignBitsTruncation. This required ReplaceNodeResults to extend handling of sub-128-bit results to SSSE3 (or later) cases, which has allowed us to improve vXi32->vXi16 truncations to use PSHUFB. I also tweaked LowerTruncateVecPack to recognise widened truncation source operands so the upper elements remain UNDEF (otherwise truncateVectorWithPACK* will constant fold them to allzeros/allones values).	2023-07-13 11:29:21 +01:00
Simon Pilgrim	228442a14c	[X86] canonicalizeShuffleMaskWithHorizOp - fold 256-bit permute(hop(x,y)) -> hop(extract(x),extract(x)) iff we only demand the lower elements Attempt to recognise when we can narrow a 256-bit hop to a lower 128-bit hop by extracting the requested subvectors (and then widening back)	2023-07-13 10:37:08 +01:00
Simon Pilgrim	f269877dc3	[X86] canonicalizeShuffleMaskWithHorizOp - fold permute(pack(x,y)) -> pack(shuffle(x,y),undef) iff we only demand the lower elements Help expose undef elements for further shuffle combines Noticed while trying to improve truncation packss/packus patterns for sub-128-bit results.	2023-07-13 10:37:08 +01:00
eopXD	2c38d63323	[8/8][RISCV] Add rounding mode control variant for vfredosum, vfredusum, vfwredosum, vfwredusum Depends on D154635 For the cover letter of the patch-set, please checkout D154628. This is the 8th patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154636	2023-07-13 00:55:10 -07:00
eopXD	5d18d43f26	[7/8][RISCV] Add rounding mode control variant for conversion intrinsics between floating-point and integer Depends on D154634 For the cover letter of the patch-set, please checkout D154628. This is the 7th patch of the patch-set. This patch includes change to vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154635	2023-07-13 00:54:07 -07:00
eopXD	51b9e33661	[6/8][RISCV] Add rounding mode control variant for vfsqrt, vfrec7 Depends on D154633 For the cover letter of the patch-set, please checkout D154628. This is the 6th patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154634	2023-07-13 00:51:51 -07:00
eopXD	4085b23609	[5/8][RISCV] Add rounding mode control variant for vfwmacc, vfwnmacc, vfwmsac, vfwnmsac Depends on D154632 For the cover letter of the patch-set, please checkout D154628. This is the 5th patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154633	2023-07-13 00:49:59 -07:00
eopXD	e1f224a647	[4/8][RISCV] Add rounding mode control variant for vfmacc, vfnmacc, vfmsac, vfnmsac, vfmadd, vfnmadd, vfmsub, vfnmsub Depends on D154631 For the cover letter of the patch-set, please checkout D154628. This is the 4th patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154632	2023-07-13 00:47:27 -07:00
eopXD	1a905e8238	[3/8][RISCV] Add rounding mode control variant for vfmul, vfdiv, vfrdiv, vfwmul Depends on D154629 For the cover letter of the patch-set, please checkout D154628. This is the 3rd patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154631	2023-07-13 00:43:54 -07:00
eopXD	00093667b1	[2/8][RISCV] Add rounding mode control variant for vfwadd, vfwsub Depends on D154628 For the cover letter of the patch-set, please checkout D154628. This is the 2nd patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154629	2023-07-13 00:42:00 -07:00
eopXD	474e37c113	[1/8][RISCV] Add rounding mode control variant for vfsub, vfrsub Depends on D152996. This patch-set aims to add a variant for the RVV floating-point intrinsics that controls the rounding mode (`frm`). The rounding mode variant appends `_rm` before the policy suffix to distinguish from those without them. Specification PR: riscv-non-isa/rvv-intrinsic-doc#226 This is the 1st patch of the patch-set. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154628	2023-07-13 00:35:36 -07:00
eopXD	76482078cd	[RISCV][POC] Model frm control for vfadd Depends on D152879. Specification PR: riscv-non-isa/rvv-intrinsic-doc#226 This patch adds variant of `vfadd` that models the rounding mode control. The added variant has suffix `_rm` appended to differentiate from the existing ones that does not alternate `frm` and uses whatever is inside. The value `7` is used to indicate no rounding mode change. Reusing the semantic from the rounding mode encoding for scalar floating-point instructions. Additional data member `HasFRMRoundModeOp` is added so we can append `_rm` suffix for the fadd variants that models rounding mode control. Additional data member `IsRVVFixedPoint` is added so we can define pseudo instructions with rounding mode operand and distinguish the instructions between fixed-point and floating-point. Reviewed By: craig.topper, kito-cheng Differential Revision: https://reviews.llvm.org/D152996	2023-07-13 00:34:00 -07:00
pvanhout	361e9eec51	[AMDGPU] Corrrectly emit AGPR copies in tryFoldPhiAGPR - Don't create COPY instructions between PHI nodes. - Don't create V_ACCVGPR_WRITE with operands that aren't AGPR_32 Solves SWDEV-410408 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155080	2023-07-13 08:55:22 +02:00
Caslyn Tonelli	b11559122e	Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions" This reverts commit 647aff28558b6b1379f0892138059b403192512a. Differential Revision: https://reviews.llvm.org/D155122	2023-07-12 23:29:15 +00:00
Jon Roelofs	56e60bc5bb	TargetLowering: fix an infinite DAG combine in SimplifySETCC TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to canonicalize the constant to the RHS. The bug here was that it did so whether or not the RHS was already a constant, leading to an infinite loop. rdar://111847838 Divverential revision: https://reviews.llvm.org/D155095 This reverts commit cdc633e4bc93d4bf241ecd4c29691ae065749313.	2023-07-12 16:13:27 -07:00
Philip Reames	b5cbd9628e	[RISCV] Remove legacy TA/TU pseudo distinction of vmerge and carry-in arithmetic operations [NFC[ his change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. This is analogous to other patches in the series, but with one key difference - the resulting pseudo does not have a policy operand. We could add one for vmerge, but the some of the multiclasses are sufficiently entwined with the mask producing arithmetic instructions that the change delta becomes unmanageable. Note that these instructions are not in the RISCVMaskedPseudo table, and thus the difference doesn't complicate other code. The main value of working incrementally here is that we get to eagerly cleanup the IsTA logic flowing through the post-ISEL combines. Differential Revision: https://reviews.llvm.org/D154645	2023-07-12 15:31:02 -07:00
Noah Goldstein	a4c461c063	[SelectionDAG] Fill in some more cases in `isKnownNeverZero` This mostly copies cases that already exist in ValueTracking, although it skips the more complex ones. Those can be filled in as needed. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D149199	2023-07-12 17:17:53 -05:00
Noah Goldstein	24f752ed2e	[X86] Add tests for checking `isKnownNeverZero`; NFC Differential Revision: https://reviews.llvm.org/D149299	2023-07-12 17:17:53 -05:00
Noah Goldstein	74f0ec5e24	[DAGCombiner] Make it so that `udiv` can be folded with `(select c, NonZero, 1)` This is done by allowing speculation of `udiv` if we can prove the denominator is non-zero. https://alive2.llvm.org/ce/z/VNCt_q Differential Revision: https://reviews.llvm.org/D149198	2023-07-12 17:17:53 -05:00
Noah Goldstein	eccb454177	[X86] Add tests for `div/rem %x, (select c, <const>, 1)`; NFC Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D149197	2023-07-12 17:17:52 -05:00
Mikhail Gudim	17e2df6695	[RISCV] Removed the requirement of XLenVT for performSELECTCombine. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D153044	2023-07-12 16:29:09 -04:00
Hiroshi Yamauchi	6e6d2b7859	[AArch64][Windows] Fix the callee-saved registers for swiftcc on Windows/AArch64 Fix a miscompilation crash where swiftself on x20 gets corrupted due to incorrect save/restore at prologue/epilogue. Differential Revision: https://reviews.llvm.org/D155001	2023-07-12 13:13:21 -07:00
Jon Roelofs	cdc633e4bc	Revert "TargetLowering: fix an infinite DAG combine in SimplifySETCC" This reverts commit b76c85b355578d9076c22a86faf4ea8de1745bdf. It broke the RISCV-enabled bots. Oops.	2023-07-12 12:22:03 -07:00
Jon Roelofs	b76c85b355	TargetLowering: fix an infinite DAG combine in SimplifySETCC TargetLowering::SimplifySetCC wants to swap the operands of a SETCC to canonicalize the constant to the RHS. The bug here was that it did so whether or not the RHS was already a constant, leading to an infinite loop. rdar://111847838 Differential revision: https://reviews.llvm.org/D155095	2023-07-12 11:44:15 -07:00
Jingu Kang	33e60484d7	[MachineLICM] Handle Subloops MachineLICM pass handles inner loops only when outmost loop does not have unique predecessor. If the loop has preheader and there is loop invariant code, the invariant code can be hoisted to the preheader in general. This patch makes the pass handle inner loops in general. Differential Revision: https://reviews.llvm.org/D154205	2023-07-12 16:32:14 +01:00
Craig Topper	1aecb0e000	[RISCV] Clear kill flags when forming FMA instructions in MachineCombiner. If the operands to the mul have other uses we may be extending their live range past a kill flag. Reviewed By: asb, asi-sc Differential Revision: https://reviews.llvm.org/D155046	2023-07-12 08:03:45 -07:00
Craig Topper	45b172c838	[LegalizeDAG] Prevent LegalizeLoadOps from creating extloads that mix int and fp types. For RISC-V, getRegisterType for fp16 returns i16. i16->fp64 extload is considered legal because the LoadExtActions defaults to Legal for all entries. Only fp/fp and int/int entries are changed to Expand fore RISC-V. This patch detects the FP-ness has changed and won't try to call isLoadExtLegal. Alternatively, we could add Expand for int/fp and fp/int, but that seemed a little silly. Fixes #63816 Reviewed By: asb, wangpc Differential Revision: https://reviews.llvm.org/D155040	2023-07-12 08:03:35 -07:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
Marco Elver	de79233b2e	[X86] Complete preservation of !pcsections in X86ISelLowering https://reviews.llvm.org/D130883 introduced MIMetadata to simplify metadata propagation (DebugLoc and PCSections). However, we're currently still permitting implicit conversion of DebugLoc to MIMetadata, to allow for a gradual transition and let the old code work as-is. This manifests in lost !pcsections metadata for X86-specific lowerings. For example, 128-bit atomics. Fix the situation for X86ISelLowering by converting all BuildMI() calls to use an explicitly constructed MIMetadata. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D154986	2023-07-12 15:09:31 +02:00
Matt Devereau	fa58aa8e91	[SVE2p1][SME2] Add scalar addressing mode for LD1 Add the scalar addressing mode for multi vector LD1 instructions. Differential Revision: https://reviews.llvm.org/D154829	2023-07-12 13:08:38 +00:00
Maciej Gabka	5b0e19a7ab	[TLI][AArch64] Add mappings to vectorized functions from ArmPL Arm Performance Libraries contain math library which provides vectorized versions of common math functions. This patch allows to use it with clang and llvm via -fveclib=ArmPL or -vector-library=ArmPL, so loops with such calls can be vectorized. The executable needs to be linked with the amath library. Arm Performance Libraries are available at: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D154508	2023-07-12 12:53:18 +00:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Ivan Kosarev	15e7749e19	[Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154528	2023-07-12 11:55:19 +01:00
John Brawn	210f61cbdd	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-12 11:48:01 +01:00
John Brawn	647aff2855	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-12 11:48:01 +01:00
David Stenberg	6aa94c64a5	[DWARF] Add printout for op-index This is a preparatory patch for extending DWARFDebugLine to properly parse line number programs with maximum_operations_per_instruction > 1 for VLIW targets. Add some scaffolding for handling op-index in line number programs, and add printouts for that in the table. As this affects a lot of tests, this is done in a separate commit to get a cleaner review for the actual op-index implementation. Verbose printouts are not present in many tests, and adding op-index to those will require a bit more code changes, so that is done in the actual implementation patch. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D152535	2023-07-12 12:03:44 +02:00
Nikita Popov	d69033d245	[SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers Instead of checking the pointer type, check the element type of the GEP. Previously we ended up reusing GEP increments that were not in expanded form, thus not respecting LSRs choice of representation. The change in 2011-10-06-ReusePhi.ll recovers a regression that appeared when converting that test to opaque pointers. Changes in various Thumb tests now compute the step outside the loop instead of using add.w inside the loop, which is LSR's preferred representation for this target.	2023-07-12 11:32:13 +02:00
David Green	86780f49ef	[AArch64] Fix order of isReg and isDef checks in INSvi64 peephole. The isDef asserts that the operand isReg, so the checks need to happen in the other order.	2023-07-12 09:51:28 +01:00
Jay Foad	f7684d8510	[DAG] Use legal shift amount type in DAGTypeLegalizer::JoinIntegers Documentation for TargetLowering::getShiftAmountTy says that LegalTypes should generally be true during type legalization, so this patch does that. On AMDGPU the effect is that we use i32 (a sane type) instead of i64 (pointer sized type) for more shift amounts, which in turn allows more formation of rotates and funnel shifts pre-legalization. Differential Revision: https://reviews.llvm.org/D154960	2023-07-12 08:12:09 +01:00
Han Shen	65ef4d4357	[CodeGen] Part II of "Fine tune MachineFunctionSplitPass (MFS) for FSAFDO". This CL adds a new discriminator pass. Also adds a new sample profile loading pass when MFS is enabled. Differential Revision: https://reviews.llvm.org/D152577	2023-07-11 22:40:25 -07:00
Jon Chesterfield	e75ce77cd7	[amdgpu][lds] Fix missing markUsedByKernel calls and undef lookup table elements More robust association between the kernels and lds struct. Use poison instead of value() for lookup table elements introduced by dynamic lds lowering. Extracted from D154946, new test from there verbatim. Segv fixed. Fixes issues/63338 Fixes SWDEV-404491 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154972	2023-07-12 00:37:21 +01:00

... 76 77 78 79 80 ...

52796 Commits