llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	9db1eb13b6	[Thumb2] Regenerate thumb2-teq2 tests	2022-04-04 12:48:20 +01:00
Simon Pilgrim	ec93435ba0	[Thumb2] Regenerate thumb2-teq tests	2022-04-04 12:24:35 +01:00
Simon Pilgrim	d4cdaa24fd	[MIPS] Regenerate countleading tests with common check prefixes	2022-04-04 12:19:57 +01:00
Simon Pilgrim	ad59bd0be9	[X86] Regenerate peep tests checks	2022-04-04 12:02:33 +01:00
Simon Pilgrim	623d4b5787	[X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.	2022-04-04 10:51:26 +01:00
Simon Pilgrim	842175676c	[X86] Add additional test cases for NOT(AND(SRL(X,Y),1))/AND(SRL(NOT(X(,Y),1) -> SETCC(BT(X,Y)) As suggested in post review on D122891	2022-04-04 10:29:33 +01:00
Min-Yih Hsu	22201f499d	[M68k][test] Remove redundant CHECK-LABEL directive The associated test had a redundant CHECK-LABEL directive that might fail the test since the inception, but this issue was "burried" by a missing colon, which was addressed in fb65aaf0be09936e657d339f3dc8e62666a41956. Thus, the test finally failed after the said commit. This patch remove that CHECK-LABEL directive.	2022-04-03 22:51:03 -07:00
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Dávid Bolvanský	f02a0a69af	[NFCI] Fixed missing colon in CHECK directives	2022-04-03 11:52:38 +02:00
Simon Pilgrim	fbfd78f7aa	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.) Fixes #54658	2022-04-03 10:05:10 +01:00
wanglei	cd85ea9431	[LoongArch] Fix instruction definition This patch fixes issue with the LU32I_D instruction, which did not have an input register operand. Differential Revision: https://reviews.llvm.org/D122970	2022-04-02 18:08:29 +08:00
Craig Topper	d970e96c53	[RISCV] Add lowering for vp.fptoui and vp.uitofp. This is a straightforward extension of D122512 to unsigned integers.	2022-04-01 18:28:46 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Craig Topper	31b8a1dc46	[RISCV] Add tests for uaddo with a constant 1. NFC The overflow calculation can be optimized to check if the add result is 0.	2022-04-01 12:29:08 -07:00
Sanjay Patel	ec0b332cd8	[AArch64] add tests for funnel+or == 0; NFC These are copied from x86 ( 1074bdfb52b2e1753e51472 ) to provide more coverage for a potential generic combine.	2022-04-01 13:39:25 -04:00
Simon Pilgrim	c64f37f818	[X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns As mentioned on PR52267. Differential Revision: https://reviews.llvm.org/D122815	2022-04-01 17:26:29 +01:00
Simon Pilgrim	b8652fbcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:59:06 +01:00
Simon Pilgrim	5a457bd2fa	Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))" Investigating a sanitizer-windows buildbot breakage	2022-04-01 16:48:24 +01:00
Simon Pilgrim	9afa6811ad	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles We were only performing this on 256-bit vectors on AVX2 targets Noticed while triaging Issue #54658	2022-04-01 16:40:10 +01:00
Simon Pilgrim	b465752f92	[X86] Add PR54658 test case	2022-04-01 16:21:54 +01:00
Simon Pilgrim	a5f637bcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:07:56 +01:00
Sanjay Patel	1074bdfb52	[x86] add tests for funnel+or == 0; NFC This is another family of patterns based on issue #49541	2022-04-01 09:28:45 -04:00
Jay Foad	c246b7bd4a	[AMDGPU] Only count global-to-global as indirect accesses Previously any load (global, local or constant) feeding into a global load or store would be counted as an indirect access. This patch only counts global loads feeding into a global load or store. The rationale is that the latency for global loads is generally much larger than the other kinds. As a side effect this makes it easier to write small kernels test cases that are not counted as having indirect accesses, despite the fact that arguments to the kernel are accessed with an SMEM load. Differential Revision: https://reviews.llvm.org/D122804	2022-04-01 13:48:13 +01:00
Xiang1 Zhang	a56f264958	Refine tls-load-hoista llvm option Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D122890	2022-04-01 19:03:58 +08:00
Fangrui Song	ac6878b330	[X86] Set frame-setup/frame-destroy on prologue/epilogue CFI instructions This approach is used by AArch64/RISCV to make frame-setup/frame-destroy instructions contiguous instead of being interleaved by CFI instructions. Code checking `MBBI->getFlag(MachineInstr::FrameSetup) \|\| MBBI->isCFIInstruction()` can be simplified to just check FrameSetup. This helps locate all CFI instructions in the prologue, which can be handy to use .cfi_remember_state/.cfi_restore_state to decrease unwind table size (D114545). Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122541	2022-03-31 23:04:50 -07:00
Lian Wang	62dd3674bc	[RISCV] Supplement SDNode patterns for vfwmul/vfwadd/vfwsub Reviewed By: jacquesguan Differential Revision: https://reviews.llvm.org/D122720	2022-04-01 03:09:50 +00:00
Matt Arsenault	f635be3014	X86/GlobalISel: Use LLT form of getMachineMemOperand	2022-03-31 18:49:23 -04:00
Matt Arsenault	4d72acf991	X86/GlobalISel: Regenerate test checks	2022-03-31 18:49:23 -04:00
Matt Arsenault	0fb6856aff	ARM/GlobalISel: Get pointer type from value instead of getPointerSize Avoid using getPointerSize and pass through the original value type.	2022-03-31 16:46:23 -04:00
Matt Arsenault	cc2e2b80a1	AMDGPU: Update test checks to include -NEXT	2022-03-31 16:30:01 -04:00
Matt Arsenault	ae8d35b8ee	X86: Use -NEXT checks in a test	2022-03-31 16:30:01 -04:00
Simon Pilgrim	596af141b2	[X86] setcc.ll - add PR39174 test case and i686 coverage	2022-03-31 21:29:12 +01:00
Stefan Pintilie	585c85abe5	[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes. To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8. This patch fixes that issue and now byval paramters are stored with the correct number of bytes. Reviewed By: nemanjai, #powerpc, quinnp, amyk Differential Revision: https://reviews.llvm.org/D121430	2022-03-31 15:12:46 -05:00
Stefan Pintilie	2e55bc9f3c	[PowerPC] Set the special DSCR with a compiler option. Add a compiler option and the instructions required to set the special Data Stream Control Register (DSCR). The special register will not be set by default. Original patch by: Muhammad Usman Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117013	2022-03-31 14:06:30 -05:00
Thomas Symalla	1a6aa8b195	[AMDGPU] Add missing use check in SIOptimizeExecMasking pass. Whenever a v_cmp, s_and_saveexec instruction sequence shall be transformed to an equivalent s_mov, v_cmpx sequence, it needs to be detected if the v_cmp target register is used between the two instructions as the v_cmp result gets omitted by using the v_cmpx instruction, resulting in invalid code. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122797	2022-03-31 19:25:35 +02:00
Abinav Puthan Purayil	898d5776ec	[AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer Differential Revision: https://reviews.llvm.org/D122803	2022-03-31 21:46:34 +05:30
Abinav Puthan Purayil	db17ebd593	[AMDGPU][GlobalISel] Add end to end IR tests for add/sub with overflow Differential Revision: https://reviews.llvm.org/D122818	2022-03-31 21:46:34 +05:30
Jay Foad	e8e32e5714	[AMDGPU] Fix typo in RUN line	2022-03-31 16:23:40 +01:00
Changpeng Fang	1711020c37	AMDGPU: Use isLiteralConstantLike to check whether the operand could ever be literal Summary: To compute the size of a VALU/SALU instruction, we need to check whether an operand could ever be literal. Previously isLiteralConstant was used, which missed cases like global variables or external symbols. These misses lead to under-estimation of the instruction size and branch offset, and thus incorrectly skip the necessary branch relaxation when the branch offset is actually greater than what the branch bits can hold. In this work, we use isLiteralConstantLike to check the operands. It maybe conservative, but it is safe. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122778	2022-03-31 08:06:31 -07:00
Nikita Popov	0721d7c4d8	[X86] Add test for PR54369 (NFC)	2022-03-31 16:45:05 +02:00
Sanjay Patel	4a54e3eed3	[x86] try to replace 0.0 in fcmp with negated operand This inverts a fold recently added to IR with: 3491f2f4b033 We can put -bidirectional on the Alive2 examples to show that the reverse transforms work: https://alive2.llvm.org/ce/z/8iVQwB The motivation for the IR change was to improve matching to 'fabs' in IR (see https://github.com/llvm/llvm-project/issues/38828 ), but it regressed x86 codegen for 'not-quite-fabs' patterns like (X > -X) ? X : -X. Ie, when there is no fast-math (nsz), the cmp+select is not a proper fabs operation, but it does map nicely to the unusual NAN semantics of MINSS/MAXSS. I drafted this as a target-independent fold, but it doesn't appear to help any other targets and seems to cause regressions for SystemZ at least. Differential Revision: https://reviews.llvm.org/D122726	2022-03-31 09:17:49 -04:00
Jay Foad	fdaf606c8e	[AMDGPU] Fix last remaining checks in perfhint.ll Unfortunately this just shows that the test case for D47740 never really tested what it was supposed to test. Differential Revision: https://reviews.llvm.org/D122664	2022-03-31 13:39:15 +01:00
Abinav Puthan Purayil	2f284b0ff9	[AMDGPU] Regenerate checks in some mir tests	2022-03-31 17:49:00 +05:30
Luo, Yuanke	6753eb0c90	[X86][AMX] Materialize undef or zero value to tilezero The AMX combiner would store undef or zero to stack and invoke tileload to load the data to tile register. To avoid the store/load, we can materialzie undef or zero value to tilezero. Differential Revision: https://reviews.llvm.org/D122714	2022-03-31 19:10:28 +08:00
Nicholas Guy	7d676714fb	[AArch64] Set MaxBytesForLoopAlignment for more targets Differential Revision: https://reviews.llvm.org/D122566	2022-03-31 11:37:11 +01:00
Simon Pilgrim	a1d09f3a98	[X86] Extend xor-lea test coverage Add ADD/SUB(XOR(X,MIN_SIGNED_VALUE),Y) tests	2022-03-31 10:54:27 +01:00
Simon Pilgrim	481b185620	[X86] combineCarryThroughADD - recognise X86ISD::ADD(AND(X,1),-1) pattern can be folded to X86ISD::BT As mentioned on D122482, if we've generated a masked overflow test see if we can fold it to X86ISD::BT to feed a X86ISD::ADC/SBB Differential Revision: https://reviews.llvm.org/D122572	2022-03-31 09:52:55 +01:00
Lian Wang	b3851e9931	[RISCV] Add VL patterns for vfwmul/vfwadd/vfwsub Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122369	2022-03-31 07:08:58 +00:00
wanglei	a1c6743922	[LoongArch] Construct codegen infra and generate first add instruction. This patch constructs codegen infra and successfully generate the first 'add' instruction. Add integer calling convention for fixed arguments which are passed with general-purpose registers. New test added here: CodeGen/LoongArch/ir-instruction/add.ll The test file is placed in a subdirectory because we will use subdirctories to distinguish different categories of tests (e.g. intrinsic, inline-asm ...) Reviewed By: MaskRay, SixWeining Differential Revision: https://reviews.llvm.org/D122366	2022-03-31 11:57:07 +08:00
Wei Xiao	3728eebd7b	[X86] Add test with abs intrinsic for x86-partial-reduction optimization	2022-03-31 09:58:19 +08:00

1 2 3 4 5 ...

42810 Commits