42810 Commits

Author SHA1 Message Date
Simon Pilgrim
9db1eb13b6 [Thumb2] Regenerate thumb2-teq2 tests 2022-04-04 12:48:20 +01:00
Simon Pilgrim
ec93435ba0 [Thumb2] Regenerate thumb2-teq tests 2022-04-04 12:24:35 +01:00
Simon Pilgrim
d4cdaa24fd [MIPS] Regenerate countleading tests with common check prefixes 2022-04-04 12:19:57 +01:00
Simon Pilgrim
ad59bd0be9 [X86] Regenerate peep tests checks 2022-04-04 12:02:33 +01:00
Simon Pilgrim
623d4b5787 [X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold
Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.
2022-04-04 10:51:26 +01:00
Simon Pilgrim
842175676c [X86] Add additional test cases for NOT(AND(SRL(X,Y),1))/AND(SRL(NOT(X(,Y),1) -> SETCC(BT(X,Y))
As suggested in post review on D122891
2022-04-04 10:29:33 +01:00
Min-Yih Hsu
22201f499d [M68k][test] Remove redundant CHECK-LABEL directive
The associated test had a redundant CHECK-LABEL directive that might fail
the test since the inception, but this issue was "burried" by a missing
colon, which was addressed in fb65aaf0be09936e657d339f3dc8e62666a41956.
Thus, the test finally failed after the said commit.

This patch remove that CHECK-LABEL directive.
2022-04-03 22:51:03 -07:00
Dávid Bolvanský
fb65aaf0be [NFCI] Fixed missing colon in CHECK directives - part 2 2022-04-03 14:42:59 +02:00
Dávid Bolvanský
f02a0a69af [NFCI] Fixed missing colon in CHECK directives 2022-04-03 11:52:38 +02:00
Simon Pilgrim
fbfd78f7aa [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles
Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)

Fixes #54658
2022-04-03 10:05:10 +01:00
wanglei
cd85ea9431 [LoongArch] Fix instruction definition
This patch fixes issue with the LU32I_D instruction, which did not have
an input register operand.

Differential Revision: https://reviews.llvm.org/D122970
2022-04-02 18:08:29 +08:00
Craig Topper
d970e96c53 [RISCV] Add lowering for vp.fptoui and vp.uitofp.
This is a straightforward extension of D122512 to unsigned integers.
2022-04-01 18:28:46 -07:00
Craig Topper
fa630e7594 [RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1).
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.

Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.

This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122933
2022-04-01 13:14:10 -07:00
Craig Topper
31b8a1dc46 [RISCV] Add tests for uaddo with a constant 1. NFC
The overflow calculation can be optimized to check if the add
result is 0.
2022-04-01 12:29:08 -07:00
Sanjay Patel
ec0b332cd8 [AArch64] add tests for funnel+or == 0; NFC
These are copied from x86 ( 1074bdfb52b2e1753e51472 ) to
provide more coverage for a potential generic combine.
2022-04-01 13:39:25 -04:00
Simon Pilgrim
c64f37f818 [X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling
Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns

As mentioned on PR52267.

Differential Revision: https://reviews.llvm.org/D122815
2022-04-01 17:26:29 +01:00
Simon Pilgrim
b8652fbcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED)
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:59:06 +01:00
Simon Pilgrim
5a457bd2fa Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))"
Investigating a sanitizer-windows buildbot breakage
2022-04-01 16:48:24 +01:00
Simon Pilgrim
9afa6811ad [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles
We were only performing this on 256-bit vectors on AVX2 targets

Noticed while triaging Issue #54658
2022-04-01 16:40:10 +01:00
Simon Pilgrim
b465752f92 [X86] Add PR54658 test case 2022-04-01 16:21:54 +01:00
Simon Pilgrim
a5f637bcbb [X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))
As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Differential Revision: https://reviews.llvm.org/D122891
2022-04-01 16:07:56 +01:00
Sanjay Patel
1074bdfb52 [x86] add tests for funnel+or == 0; NFC
This is another family of patterns based on issue #49541
2022-04-01 09:28:45 -04:00
Jay Foad
c246b7bd4a [AMDGPU] Only count global-to-global as indirect accesses
Previously any load (global, local or constant) feeding into a
global load or store would be counted as an indirect access. This
patch only counts global loads feeding into a global load or store.
The rationale is that the latency for global loads is generally
much larger than the other kinds.

As a side effect this makes it easier to write small kernels test
cases that are not counted as having indirect accesses, despite
the fact that arguments to the kernel are accessed with an SMEM
load.

Differential Revision: https://reviews.llvm.org/D122804
2022-04-01 13:48:13 +01:00
Xiang1 Zhang
a56f264958 Refine tls-load-hoista llvm option
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D122890
2022-04-01 19:03:58 +08:00
Fangrui Song
ac6878b330 [X86] Set frame-setup/frame-destroy on prologue/epilogue CFI instructions
This approach is used by AArch64/RISCV to make frame-setup/frame-destroy
instructions contiguous instead of being interleaved by CFI instructions. Code
checking `MBBI->getFlag(MachineInstr::FrameSetup) || MBBI->isCFIInstruction()`
can be simplified to just check FrameSetup.

This helps locate all CFI instructions in the prologue, which can be handy to use
.cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122541
2022-03-31 23:04:50 -07:00
Lian Wang
62dd3674bc [RISCV] Supplement SDNode patterns for vfwmul/vfwadd/vfwsub
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D122720
2022-04-01 03:09:50 +00:00
Matt Arsenault
f635be3014 X86/GlobalISel: Use LLT form of getMachineMemOperand 2022-03-31 18:49:23 -04:00
Matt Arsenault
4d72acf991 X86/GlobalISel: Regenerate test checks 2022-03-31 18:49:23 -04:00
Matt Arsenault
0fb6856aff ARM/GlobalISel: Get pointer type from value instead of getPointerSize
Avoid using getPointerSize and pass through the original value type.
2022-03-31 16:46:23 -04:00
Matt Arsenault
cc2e2b80a1 AMDGPU: Update test checks to include -NEXT 2022-03-31 16:30:01 -04:00
Matt Arsenault
ae8d35b8ee X86: Use -NEXT checks in a test 2022-03-31 16:30:01 -04:00
Simon Pilgrim
596af141b2 [X86] setcc.ll - add PR39174 test case and i686 coverage 2022-03-31 21:29:12 +01:00
Stefan Pintilie
585c85abe5 [PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes.
To store a byval parameter the existing code would store as many 8 byte elements
as was required to store the full size of the byval parameter.
For example, a paramter of size 16 would store two element of 8 bytes.
A paramter of size 12 would also store two elements of 8 bytes.
This would sometimes store too many bytes as the size of the paramter is not
always a factor of 8.

This patch fixes that issue and now byval paramters are stored with the correct
number of bytes.

Reviewed By: nemanjai, #powerpc, quinnp, amyk

Differential Revision: https://reviews.llvm.org/D121430
2022-03-31 15:12:46 -05:00
Stefan Pintilie
2e55bc9f3c [PowerPC] Set the special DSCR with a compiler option.
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.

Original patch by: Muhammad Usman

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117013
2022-03-31 14:06:30 -05:00
Thomas Symalla
1a6aa8b195 [AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be
transformed to an equivalent s_mov, v_cmpx sequence, it needs
to be detected if the v_cmp target register is used between
the two instructions as the v_cmp result gets omitted by
using the v_cmpx instruction, resulting in invalid code.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122797
2022-03-31 19:25:35 +02:00
Abinav Puthan Purayil
898d5776ec [AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer
Differential Revision: https://reviews.llvm.org/D122803
2022-03-31 21:46:34 +05:30
Abinav Puthan Purayil
db17ebd593 [AMDGPU][GlobalISel] Add end to end IR tests for add/sub with overflow
Differential Revision: https://reviews.llvm.org/D122818
2022-03-31 21:46:34 +05:30
Jay Foad
e8e32e5714 [AMDGPU] Fix typo in RUN line 2022-03-31 16:23:40 +01:00
Changpeng Fang
1711020c37 AMDGPU: Use isLiteralConstantLike to check whether the operand could ever be literal
Summary:
  To compute the size of a VALU/SALU instruction, we need to check whether an operand
could ever be literal. Previously isLiteralConstant was used, which missed cases
like global variables or external symbols. These misses lead to under-estimation of
the instruction size and branch offset, and thus incorrectly skip the necessary branch
relaxation when the branch offset is actually greater than what the branch bits can hold.
In this work, we use isLiteralConstantLike to check the operands. It maybe conservative,
but it is safe.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D122778
2022-03-31 08:06:31 -07:00
Nikita Popov
0721d7c4d8 [X86] Add test for PR54369 (NFC) 2022-03-31 16:45:05 +02:00
Sanjay Patel
4a54e3eed3 [x86] try to replace 0.0 in fcmp with negated operand
This inverts a fold recently added to IR with:
3491f2f4b033

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
2022-03-31 09:17:49 -04:00
Jay Foad
fdaf606c8e [AMDGPU] Fix last remaining checks in perfhint.ll
Unfortunately this just shows that the test case for D47740 never
really tested what it was supposed to test.

Differential Revision: https://reviews.llvm.org/D122664
2022-03-31 13:39:15 +01:00
Abinav Puthan Purayil
2f284b0ff9 [AMDGPU] Regenerate checks in some mir tests 2022-03-31 17:49:00 +05:30
Luo, Yuanke
6753eb0c90 [X86][AMX] Materialize undef or zero value to tilezero
The AMX combiner would store undef or zero to stack and invoke tileload
to load the data to tile register. To avoid the store/load, we can
materialzie undef or zero value to tilezero.

Differential Revision: https://reviews.llvm.org/D122714
2022-03-31 19:10:28 +08:00
Nicholas Guy
7d676714fb [AArch64] Set MaxBytesForLoopAlignment for more targets
Differential Revision: https://reviews.llvm.org/D122566
2022-03-31 11:37:11 +01:00
Simon Pilgrim
a1d09f3a98 [X86] Extend xor-lea test coverage
Add ADD/SUB(XOR(X,MIN_SIGNED_VALUE),Y) tests
2022-03-31 10:54:27 +01:00
Simon Pilgrim
481b185620 [X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT
As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB

Differential Revision: https://reviews.llvm.org/D122572
2022-03-31 09:52:55 +01:00
Lian Wang
b3851e9931 [RISCV] Add VL patterns for vfwmul/vfwadd/vfwsub
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122369
2022-03-31 07:08:58 +00:00
wanglei
a1c6743922 [LoongArch] Construct codegen infra and generate first add instruction.
This patch constructs codegen infra and successfully generate the first
'add' instruction. Add integer calling convention for fixed arguments which
are passed with general-purpose registers.

New test added here:

  CodeGen/LoongArch/ir-instruction/add.ll

The test file is placed in a subdirectory because we will use
subdirctories to distinguish different categories of tests (e.g.
 intrinsic, inline-asm ...)

Reviewed By: MaskRay, SixWeining

Differential Revision: https://reviews.llvm.org/D122366
2022-03-31 11:57:07 +08:00
Wei Xiao
3728eebd7b [X86] Add test with abs intrinsic for x86-partial-reduction optimization 2022-03-31 09:58:19 +08:00