llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	623d4b5787	[X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.	2022-04-04 10:51:26 +01:00
Simon Pilgrim	fbfd78f7aa	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.) Fixes #54658	2022-04-03 10:05:10 +01:00
Simon Pilgrim	b8652fbcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:59:06 +01:00
Simon Pilgrim	5a457bd2fa	Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))" Investigating a sanitizer-windows buildbot breakage	2022-04-01 16:48:24 +01:00
Simon Pilgrim	9afa6811ad	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles We were only performing this on 256-bit vectors on AVX2 targets Noticed while triaging Issue #54658	2022-04-01 16:40:10 +01:00
Simon Pilgrim	a5f637bcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:07:56 +01:00
Simon Pilgrim	3245cfb8d3	[X86] Add getBT helper node for attempting to create a X86ISD::BT node Avoids repeating all the extension/legalization wrappers in every use	2022-04-01 11:48:25 +01:00
Simon Pilgrim	919b657080	Revert rGff2d1bb2b749bd8a5697c25d2380b7c97a59ae06 "[X86] Add getBT helper node for attempting to create a X86ISD::BT node" Typo means that this doesn't return a value in all cases.	2022-04-01 11:21:00 +01:00
Simon Pilgrim	ff2d1bb2b7	[X86] Add getBT helper node for attempting to create a X86ISD::BT node Avoids repeating all the extension/legalization wrapper in every use	2022-04-01 11:12:23 +01:00
Simon Pilgrim	cb5c4a5917	[X86] lowerV8I16Shuffle - use explicit SmallVector<SDValue, 4> width to avoid MSVC AVX alignment bug As discussed on Issue #54645 - building llc with /AVX can result in incorrectly aligned structs	2022-04-01 10:54:24 +01:00
Simon Pilgrim	535211c3eb	[X86] Remove redundant FIXME lowerV64I8Shuffle has been extended a lot since this was added.	2022-03-31 18:05:52 +01:00
Simon Pilgrim	fac1729924	[X86] lowerV64I8Shuffle - don't use lowerShuffleWithPERMV until we've tried simpler options Shuffle combining will still lower to this with better fast cross lane checks. Noticed while triaging Issue #54658	2022-03-31 18:05:51 +01:00
Sanjay Patel	4a54e3eed3	[x86] try to replace 0.0 in fcmp with negated operand This inverts a fold recently added to IR with: 3491f2f4b033 We can put -bidirectional on the Alive2 examples to show that the reverse transforms work: https://alive2.llvm.org/ce/z/8iVQwB The motivation for the IR change was to improve matching to 'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ), but it regressed x86 codegen for 'not-quite-fabs' patterns like (X > -X) ? X : -X. Ie, when there is no fast-math (nsz), the cmp+select is not a proper fabs operation, but it does map nicely to the unusual NAN semantics of MINSS/MAXSS. I drafted this as a target-independent fold, but it doesn't appear to help any other targets and seems to cause regressions for SystemZ at least. Differential Revision: https://reviews.llvm.org/D122726	2022-03-31 09:17:49 -04:00
Simon Pilgrim	481b185620	[X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB Differential Revision: https://reviews.llvm.org/D122572	2022-03-31 09:52:55 +01:00
Simon Pilgrim	6697e3354f	[X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry) If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds. We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet. As suggested by @davezarzycki on Issue #35256 Differential Revision: https://reviews.llvm.org/D122482	2022-03-30 09:11:55 +01:00
Simon Pilgrim	1ec109ec58	[X86] combineCarryThroughADD - remove unused peek through of SEXT/AEXT nodes.	2022-03-29 17:22:50 +01:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
Simon Pilgrim	8a1956dfa5	[X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute Fixes #54562	2022-03-28 17:21:27 +01:00
Phoebe Wang	674d52e8ce	[X86] Refactor X86ScalarSSEf16/32/64 with hasFP16/SSE1/SSE2. NFCI This is used for f16 emulation. We emulate f16 for SSE2 targets and above. Refactoring makes the future code to be more clean. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D122475	2022-03-27 12:24:02 +08:00
Simon Pilgrim	43a969debd	[X86] combineADC - pull out repeated dyn_cast<ConstantSDNode> calls. NFC.	2022-03-25 12:53:08 +00:00
Simon Pilgrim	3db858c58c	[X86] combineAdd - fold ADD(ADC(Y,0,W),X) -> ADC(X,Y,W) This also exposed a missed ADC canonicalization of constant ops to the RHS	2022-03-25 10:52:10 +00:00
Simon Pilgrim	33b214b711	[X86] combineSub - fold SUB(X,ADC(Y,0,W)) -> SBB(X,Y,W)	2022-03-24 18:00:00 +00:00
Simon Pilgrim	438ac282db	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) (REAPPLIED) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Reapply with extra type legality checks - LowerAndToBT was originally only used during lowering, now that it can occur earlier we might encounter illegal types that we can either promote to i32 or just bail. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 21:37:42 +00:00
Nikita Popov	1533682839	Revert "[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y)" This reverts commit 81569f5b6ef531a48023f28133481262ee1509a3. This causes a segfault when building consumer-typeset in ReleaseLTO-g configuration: https://llvm-compile-time-tracker.com/show_error.php?commit=81569f5b6ef531a48023f28133481262ee1509a3	2022-03-21 21:52:36 +01:00
Simon Pilgrim	5fd9451668	[X86][AVX512] lower1BitShuffle - fold broadcast(setcc(x,y)) -> setcc(broadcast(x),broadcast(y)) (PR52500) AVX512 has excellent broadcast ops for everything but vXi1 bool vectors - so if we're broadcasting a comparison result, see if we can broadcast the comparison operands instead.	2022-03-21 17:42:49 +00:00
Simon Pilgrim	b6e2832fc2	[X86] Don't fold SUB(X,SBB(0,0,W)) -> SUB(ADC(0,0,W),Y) This will further fold to a AND(SETCC_CARRY(),1) pattern which tends to prevent further folds.	2022-03-21 15:54:48 +00:00
Simon Pilgrim	315896d3ac	[X86] Fold SUB(X,SBB(Y,Z,W)) -> SUB(ADC(X,Z,W),Y) Prefer the commutable ADC over SBB to improve load folding opportunities	2022-03-21 14:20:46 +00:00
Simon Pilgrim	ed51e26ab4	[X86] combineAddOrSubToADCOrSBB - commute + neg subtraction patterns Handle SUB(AND(SRL(Y,Z),1),X) -> NEG(SBB(X,0,BT(Y,Z))) folds I'll address the X86 lost folded-load regressions in a follow-up patch	2022-03-21 13:55:35 +00:00
Simon Pilgrim	5e9365c5eb	[X86] combineAddOrSubToADCOrSBB - bail for illegal types Ensure we don't attempt to fold to illegal types to ADC/SBB nodes. After D122084 its possible for ADD(X,AND(SRL(Y,Z),1) patterns to be matched before type legalization.	2022-03-21 13:31:21 +00:00
Simon Pilgrim	81569f5b6e	[X86] combineAddOrSubToADCOrSBB - Fold ADD/SUB + (AND(SRL(X,Y),1) -> ADC/SBB+BT(X,Y) As suggested on PR35908, if we are adding/subtracting an extracted bit, attempt to use BT instead to fold the op and use a ADC/SBB op. Differential Revision: https://reviews.llvm.org/D122084	2022-03-21 10:57:12 +00:00
Simon Pilgrim	1ae3c4e948	[X86] combineAddOrSubToADCOrSBB - split to more cleanly handle commuted variants. Split combineAddOrSubToADCOrSBB into wrapper (which handles ADDs with commuted args) and the real combine, which no longer has to account for commutation. I'm intending to extend combineAddOrSubToADCOrSBB to detect patterns other than just X86ISD::SETCC, so we need to detect all patterns without detecting them as part of a commutation swap.	2022-03-20 09:14:21 +00:00
Shengchen Kan	076a9dc99a	[X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF() To make them less like other feature functions. This is a follow-up patch for D121978.	2022-03-20 12:00:25 +08:00
Craig Topper	57b41af838	[X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID. Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other feature functions. Add a similar canUseCMPXCHG8B() that aliases hasCX8() to keep similar naming. Differential Revision: https://reviews.llvm.org/D121978	2022-03-19 12:34:06 -07:00
Simon Pilgrim	34110a7320	[X86] combineAddOrSubToADCOrSBB - pull out repeated Y.getOperand(1) calls. NFC.	2022-03-19 17:56:11 +00:00
Simon Pilgrim	b90478d422	[X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)	2022-03-19 16:51:07 +00:00
Simon Pilgrim	a6c18bfbe3	[X86] combineSelect - don't constant fold BLENDV nodes like VSELECT If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0) This just prevents constant folding entirely for non-VSELECT ops.	2022-03-19 16:31:19 +00:00
Simon Pilgrim	56ad791f46	[X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode	2022-03-19 14:03:03 +00:00
Simon Pilgrim	c7ba5a9aff	[X86][SSE] Add initial support for extracting non-constant bool vector elements We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.	2022-03-19 13:31:05 +00:00
Shengchen Kan	920c2e5763	[X86][NFC] Rename target feature hasCMov->hasCMOV This is a follow-up patch for D121975.	2022-03-18 14:05:52 +08:00
Craig Topper	6cfe41dcc8	[X86] Rename more target feature related things consistency. NFC -Rename ModeBit to IsBit to match X86Subtarget. -Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget. -Use consistent capitalization Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121975	2022-03-17 22:27:17 -07:00
Simon Pilgrim	e3deb7d88b	[X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling Fixes #54171	2022-03-16 11:05:36 +00:00
Shengchen Kan	052d37dc7c	[NFC][X86] Rename some variables and functions about target features This is preparation for D121768. The member's name should align w/ the interface for trival target feature.	2022-03-16 13:08:52 +08:00
Simon Pilgrim	f591231cad	[X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization This replaces the attempt in 20af71f8ec47319d375a871db6fd3889c2487cbd to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart. Fixes #53760	2022-03-15 12:16:11 +00:00
Simon Pilgrim	ad3a7654dc	[X86] combineCMP - peek through zero-extensions for X86cmp(zext(x0),0) zero tests (PR38960) If we're comparing a value against zero, strip away any zero-extension and perform the comparison on the pre-extended value Fixes #38308 Differential Revision: https://reviews.llvm.org/D121472	2022-03-13 11:38:40 +00:00
Simon Pilgrim	e4ab2024a6	[X86] convertIntLogicToFPLogic - enable fp-logic on pre-AVX targets for supported fp predicates (PR34563) If the SETCC fp-condcode is supported on SSE as a single CMPPS/PD op then we can use convertIntLogicToFPLogic to reduce EFLAGS and XMM->GPR traffic like we do for AVX targets. Differential Revision: https://reviews.llvm.org/D121210	2022-03-08 18:06:27 +00:00
Simon Pilgrim	9119eefe5f	[X86] Add cheapX86FSETCC_SSE helper. NFC. Identify FP CondCode that can be performed by a non-AVX SSE CMP op Pulled out of D121210	2022-03-08 18:06:27 +00:00
Simon Pilgrim	d0aa77440c	[X86] convertIntLogicToFPLogic - pull out condcodes. NFCI.	2022-03-08 13:31:17 +00:00
Simon Pilgrim	588d97e246	[X86] getTargetVShiftNode - peek through any zext node If the shift amount has been zero-extended, peek through as this might help us further canonicalize the shift amount. Fixes regression mentioned in rG147cfcbef1255ba2b4875b76708dab1a685085f5	2022-03-04 17:41:45 +00:00
Simon Pilgrim	147cfcbef1	[X86] LowerShiftByScalarVariable - find splat patterns with getSplatSourceVector instead of getSplatValue This completes the removal of uses of SelectionDAG::getSplatValue started in D119090 - by avoiding extracting the splatted element we make it a lot easier to zero-extend the bottom 64-bits of the shift amount and fixes issues we had on 32-bit targets where i64 isn't legal. I've removed the old version of getTargetVShiftNode that took the scalar shift amount argument and LowerRotate can finally efficiently handle vXi16 rotates-by-scalar (using the same code as general funnel-shifts). The only regression we see is in the X86-AVX2 PR52719 test case in vector-shift-ashr-256.ll - this is now hitting the same problem as the X86-AVX1 case (failure to simplify a multi-use X86ISD::VBROADCAST_LOAD) which I intend to address in a follow up patch.	2022-03-04 16:47:35 +00:00
Simon Pilgrim	940d7cd59f	[X86] SimplifyDemandedVectorElts - adjust X86ISD::ANDNP demanded elts based off constant masks Similar to what we already do in combineAndnp, if either operand is a constant then we can improve the demanded elts/bits.	2022-03-04 13:40:56 +00:00

1 2 3 4 5 ...

8026 Commits