17838 Commits

Author SHA1 Message Date
Simon Pilgrim
ad59bd0be9 [X86] Regenerate peep tests checks 2022-04-04 12:02:33 +01:00
Simon Pilgrim
623d4b5787 [X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold
Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.
2022-04-04 10:51:26 +01:00
Simon Pilgrim
842175676c [X86] Add additional test cases for NOT(AND(SRL(X,Y),1))/AND(SRL(NOT(X(,Y),1) -> SETCC(BT(X,Y))
As suggested in post review on D122891
2022-04-04 10:29:33 +01:00
Dávid Bolvanský
fb65aaf0be [NFCI] Fixed missing colon in CHECK directives - part 2 2022-04-03 14:42:59 +02:00
Dávid Bolvanský
f02a0a69af [NFCI] Fixed missing colon in CHECK directives 2022-04-03 11:52:38 +02:00
Simon Pilgrim
fbfd78f7aa [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles
Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)

Fixes #54658
2022-04-03 10:05:10 +01:00
Simon Pilgrim
c64f37f818 [X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling
Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns

As mentioned on PR52267.

Differential Revision: https://reviews.llvm.org/D122815
2022-04-01 17:26:29 +01:00
Simon Pilgrim
b8652fbcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED)
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:59:06 +01:00
Simon Pilgrim
5a457bd2fa Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))"
Investigating a sanitizer-windows buildbot breakage
2022-04-01 16:48:24 +01:00
Simon Pilgrim
9afa6811ad [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles
We were only performing this on 256-bit vectors on AVX2 targets

Noticed while triaging Issue #54658
2022-04-01 16:40:10 +01:00
Simon Pilgrim
b465752f92 [X86] Add PR54658 test case 2022-04-01 16:21:54 +01:00
Simon Pilgrim
a5f637bcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:07:56 +01:00
Sanjay Patel
1074bdfb52 [x86] add tests for funnel+or == 0; NFC
This is another family of patterns based on issue #49541
2022-04-01 09:28:45 -04:00
Xiang1 Zhang
a56f264958 Refine tls-load-hoista llvm option
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D122890
2022-04-01 19:03:58 +08:00
Fangrui Song
ac6878b330 [X86] Set frame-setup/frame-destroy on prologue/epilogue CFI instructions
This approach is used by AArch64/RISCV to make frame-setup/frame-destroy
instructions contiguous instead of being interleaved by CFI instructions. Code
checking `MBBI->getFlag(MachineInstr::FrameSetup) || MBBI->isCFIInstruction()`
can be simplified to just check FrameSetup.

This helps locate all CFI instructions in the prologue, which can be handy to use
.cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122541
2022-03-31 23:04:50 -07:00
Matt Arsenault
f635be3014 X86/GlobalISel: Use LLT form of getMachineMemOperand 2022-03-31 18:49:23 -04:00
Matt Arsenault
4d72acf991 X86/GlobalISel: Regenerate test checks 2022-03-31 18:49:23 -04:00
Matt Arsenault
ae8d35b8ee X86: Use -NEXT checks in a test 2022-03-31 16:30:01 -04:00
Simon Pilgrim
596af141b2 [X86] setcc.ll - add PR39174 test case and i686 coverage 2022-03-31 21:29:12 +01:00
Nikita Popov
0721d7c4d8 [X86] Add test for PR54369 (NFC) 2022-03-31 16:45:05 +02:00
Sanjay Patel
4a54e3eed3 [x86] try to replace 0.0 in fcmp with negated operand
This inverts a fold recently added to IR with:
3491f2f4b033

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
2022-03-31 09:17:49 -04:00
Luo, Yuanke
6753eb0c90 [X86][AMX] Materialize undef or zero value to tilezero
The AMX combiner would store undef or zero to stack and invoke tileload
to load the data to tile register. To avoid the store/load, we can
materialzie undef or zero value to tilezero.

Differential Revision: https://reviews.llvm.org/D122714
2022-03-31 19:10:28 +08:00
Simon Pilgrim
a1d09f3a98 [X86] Extend xor-lea test coverage
Add ADD/SUB(XOR(X,MIN_SIGNED_VALUE),Y) tests
2022-03-31 10:54:27 +01:00
Simon Pilgrim
481b185620 [X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT
As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB

Differential Revision: https://reviews.llvm.org/D122572
2022-03-31 09:52:55 +01:00
Wei Xiao
3728eebd7b [X86] Add test with abs intrinsic for x86-partial-reduction optimization 2022-03-31 09:58:19 +08:00
Fangrui Song
e78cea0a91 [X86][test] Precommit D122541 tests for prologue/epilogue CFI
Currently there is no CFI_INSTRUCTION MIR test with .ll input. This patch
adds some -stop-after=prologepilog tests.
2022-03-30 09:02:23 -07:00
Sanjay Patel
e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Sanjay Patel
849d577e56 [x86] add tests for fcmp with 0.0 operand; NFC 2022-03-30 08:37:15 -04:00
Simon Pilgrim
14a89d00c7 [X86] Extend xor-lea test coverage
Add XOR(ADD/SUB(X,Y),MIN_SIGNED_VALUE) tests and adjust some XOR(SHL(X,C),MIN_SIGNED_VALUE) shifts to better match LEA scales
2022-03-30 13:34:32 +01:00
Simon Pilgrim
e000dbc39f [X86] Add test coverage based off Issue #51609 2022-03-30 12:57:22 +01:00
Luo, Yuanke
7471d8b13c [X86][AMX] Pre-checkin the test case for AMX undef and zero 2022-03-30 17:53:01 +08:00
Luo, Yuanke
1141c8b6fc [X86][AMX] Fix bug for amx cast tranform
After combining amx cast operation, some amx cast intrinsic may be dead
code. This patch is to delete such dead code and avoid crash.
2022-03-30 17:22:30 +08:00
Simon Pilgrim
bce954321a [X86] Add PR47857 test case 2022-03-30 09:51:36 +01:00
Simon Pilgrim
6697e3354f [X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry)
If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds.

We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet.

As suggested by @davezarzycki on Issue #35256

Differential Revision: https://reviews.llvm.org/D122482
2022-03-30 09:11:55 +01:00
Nikita Popov
8a72391f60 [IR] Require intrinsic struct return type to be anonymous
This is an alternative to D122376. Rather than working around the
problem, this patch requires that struct return types in intrinsics
are anonymous/literal and adds auto-upgrade code to convert
existing uses of intrinsics with named struct types.

This ensures that the mapping between intrinsic name and
intrinsic function type is actually bijective, as it is supposed
to be.

This also fixes https://github.com/llvm/llvm-project/issues/37891.

Differential Revision: https://reviews.llvm.org/D122471
2022-03-30 09:51:24 +02:00
Chenbing Zheng
780eb9f586 [DAGCombine] add tests for bitreverse-shift optimization
This patch add some tests to show some optimization opportunities
for bitreverse-shift.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121507
2022-03-30 09:50:28 +08:00
Sanjay Patel
7f5c2f6a76 [x86] consolidate tests and auto-gen complete check lines; NFC
The same test was duplicated in 2 files.
2022-03-29 14:55:04 -04:00
Simon Pilgrim
d32d65b903 [X86] Regenerate x86-interleaved-access.ll with AVX1OR2 common check-prefix to reduce duplication 2022-03-29 12:24:46 +01:00
Chenbing Zheng
6a01b676cf [DAGCombine] add tests for bswap-shift optimization
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121504
2022-03-29 16:34:52 +08:00
Simon Pilgrim
8a1956dfa5 [X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute
Fixes #54562
2022-03-28 17:21:27 +01:00
Simon Pilgrim
614363ecc0 [X86] Add shuffle tests from Issue #54562 2022-03-28 13:54:17 +01:00
Simon Pilgrim
f84b5c11dd [X86] Add test showing failure to fold multiple constant args in ADC
As noticed on Issue #35256
2022-03-25 13:42:08 +00:00
Simon Pilgrim
e209190c2d [SDAG] enable binop identity constant folds for multiplies
Add mul to the list of ops that we canonicalize with a select to expose an identity merge

Differential Revision: https://reviews.llvm.org/D122071
2022-03-25 11:07:04 +00:00
Simon Pilgrim
3db858c58c [X86] combineAdd - fold ADD(ADC(Y,0,W),X) -> ADC(X,Y,W)
This also exposed a missed ADC canonicalization of constant ops to the RHS
2022-03-25 10:52:10 +00:00
Simon Pilgrim
19de2a5768 [X86] Add test showing failure to fold add(adc(x,0,carry),c) -> adc(x,c,carry) 2022-03-25 10:14:11 +00:00
Simon Pilgrim
33b214b711 [X86] combineSub - fold SUB(X,ADC(Y,0,W)) -> SBB(X,Y,W) 2022-03-24 18:00:00 +00:00
Simon Pilgrim
dc58c3ba93 [X86] Add additional 'add/sub single bit' patterns
InstCombine can convert some cases to 'splat bit' sra(shl(x,y),bw-1) patterns which we need to handle as well
2022-03-24 17:20:08 +00:00
Xiang1 Zhang
9566405020 [Inline asm] Fix mangle problem when variable used in inline asm.
(Correct 'Mem symbol + IntelExpr' output in PIC model)

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D121785
2022-03-24 09:41:23 +08:00
Xiang1 Zhang
8a6b644c79 [Inline asm] Fix mangle problem when variable used in inline asm.
(Connect InlineAsm Memory Operand with its real value not just name)
Revert 2 history bugfix patch:

Revert "[X86][MS-InlineAsm] Make the constraint *m to be simple place holder"
This patch revert https://reviews.llvm.org/D115225 which mainly
fix problems intrduced by https://reviews.llvm.org/D113096

This reverts commit d7c07f60b35f901f5bd9153b11807124a9bdde60.

Revert "Reland "[X86][MS-InlineAsm] Use exact conditions to recognize MS global variables""
This patch revert https://reviews.llvm.org/D116090 which fix problem
intrduced by https://reviews.llvm.org/D115225

This reverts commit 24c68ea1eb4fc0d0e782424ddb02da9e8c53ddf5.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D120886
2022-03-24 09:41:22 +08:00
Julian Lettner
64902d335c Reland "Lower @llvm.global_dtors using __cxa_atexit on MachO"
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-23 18:36:55 -07:00