llvm-project

Author	SHA1	Message	Date
Craig Topper	e94dc58dff	[RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven. This avoids the call overhead as well as the the save/restore of fflags and the snan handling in the libm function. The save/restore of fflags and snan handling are needed to be correct for -ftrapping-math. I think we can ignore them in the default environment. The inline sequence will generate an invalid exception for nan and an inexact exception if fractional bits are discarded. I've used a custom inserter to explicitly create the control flow around the float->int->float conversion. We can probably avoid the final fsgnj after the conversion for no signed zeros FMF, but I'll leave that for future work. Note the comparison constant is slightly different than glibc uses. They use 1<<53 for double, I'm using 1<<52. I believe either are valid. Numbers >= 1<<52 can't have any fractional bits. It's ok to do the float->int->float conversion on numbers between 1<<53 and 1<<52 since they will all fit in 64. We only have a problem if the double can't fit in i64 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136508	2022-10-26 14:36:49 -07:00
Craig Topper	0a03240fb4	[RISCV] Add tests for fixed vector sshl_sat/ushl_sat. NFC	2022-10-26 14:15:47 -07:00
Sanjay Patel	54eeadcf44	[SDAG] avoid vector extract/insert around binop scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...} We generally try to avoid ad-hoc vectorization in SDAG, but the motivating case from issue #39482 escapes our normal vectorization folds in IR. It seems like it should always be a win to transform this pattern in cases where we have the same vector type for input and output and the target supports the vector operation. That avoids transfers from vector to scalar and back. In the x86 shift examples, we create the scalar-to-vector node during legalization. I'm not sure if there's a more general way to create the pattern for testing. (If so, I could add tests for other targets.) Differential Revision: https://reviews.llvm.org/D136713	2022-10-26 14:04:46 -04:00
Piyou Chen	7d7940fd77	[RISCV] add svinval extension 1. Add the svinval extension support 2. Add the svinval Predicates for its instruction Note: the svinval instructions defined in https://reviews.llvm.org/D117654 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136571	2022-10-26 09:45:30 -07:00
Craig Topper	a61b74889f	[RISCV] Use vslide1down for i64 insertelt on RV32. Instead of using vslide1up, use vslide1down and build the other direction. This avoids the overlap constraint early clobber of vslide1up. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136735	2022-10-26 09:43:12 -07:00
Yashwant Singh	14fb4040e2	[AMDGPU][test] precommiting tests for D136663 More tests for si-peephole-sdwa pass	2022-10-26 22:08:28 +05:30
Sanjay Patel	ef9dfcd6cd	[x86] add tests for extract + insert of vector shift amount; NFC	2022-10-26 11:20:14 -04:00
Haohai Wen	21f23a37c6	[SelectionDAG] Clamp stack alignment for memset, memmove memcpy has clamped dst stack alignment to NaturalStackAlignment if hasStackRealignment is false. We should also clamp stack alignment for memset and memmove. If we don't clamp, SelectionDAG may first do tail call optimization which requires no stack realignment. Then memmove, memset in same function may be lowered to load/store with larger alignment leading to PEI emit stack realignment code which is absolutely not correct. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D136456	2022-10-26 16:45:31 +08:00
Pierre van Houtryve	c1b2920c6e	[AMDGPU] Autogenerate llvm.amdgcn.fcmp.ll Prep commit for adding GISel run lines to that test. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136591	2022-10-26 07:00:34 +00:00
chenglin.bi	9403a8bc37	[GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel The miscompile case's G_ZEXT has a G_FREEZE source. Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel. Fix #58431 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D136433	2022-10-26 09:54:13 +08:00
Guozhi Wei	d24c93cc41	[X86] Enable reassociation for ADD instructions ADD is an associative and commutative operation, so we can do reassociation for it. Differential Revision: https://reviews.llvm.org/D136396	2022-10-26 00:46:13 +00:00
Douglas Yung	fc40c73921	Revert "Update supported features in the generic CPU configuration" This reverts commit 11afbf396e10e1b1e91a5991e2aec1916e29a910. There are 10 tests still failing after follow-up fix b5d0bf9b9853, this should get the following bots back to green: - https://lab.llvm.org/buildbot/#/builders/183/builds/8194 - https://lab.llvm.org/buildbot/#/builders/186/builds/9491 - https://lab.llvm.org/buildbot/#/builders/214/builds/3908 - https://lab.llvm.org/buildbot/#/builders/93/builds/11740 - https://lab.llvm.org/buildbot/#/builders/231/builds/4200 - https://lab.llvm.org/buildbot/#/builders/121/builds/24519 - https://lab.llvm.org/buildbot/#/builders/230/builds/4466 - https://lab.llvm.org/buildbot/#/builders/94/builds/11639 - https://lab.llvm.org/buildbot/#/builders/45/builds/9325 - https://lab.llvm.org/buildbot/#/builders/124/builds/5219 - https://lab.llvm.org/buildbot/#/builders/67/builds/8623 - https://lab.llvm.org/buildbot/#/builders/123/builds/13836 - https://lab.llvm.org/buildbot/#/builders/109/builds/49355 - https://lab.llvm.org/buildbot/#/builders/58/builds/27751 - https://lab.llvm.org/buildbot/#/builders/117/builds/9922 - https://lab.llvm.org/buildbot/#/builders/16/builds/37012 - https://lab.llvm.org/buildbot/#/builders/104/builds/9490 - https://lab.llvm.org/buildbot/#/builders/42/builds/7725 - https://lab.llvm.org/buildbot/#/builders/196/builds/20077 - https://lab.llvm.org/buildbot/#/builders/3/builds/15217 - https://lab.llvm.org/buildbot/#/builders/6/builds/15251 - https://lab.llvm.org/buildbot/#/builders/9/builds/15247 - https://lab.llvm.org/buildbot/#/builders/36/builds/26487 - https://lab.llvm.org/buildbot/#/builders/54/builds/2474 - https://lab.llvm.org/buildbot/#/builders/74/builds/14536 - https://lab.llvm.org/buildbot/#/builders/5/builds/28555	2022-10-25 16:34:08 -07:00
Dan Gohman	11afbf396e	Update supported features in the generic CPU configuration Accompanying https://reviews.llvm.org/D125728, this updates LLVM Codegen's "generic" CPU to enable the same new features. Differential Revision: https://reviews.llvm.org/D125729	2022-10-25 11:42:32 -07:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Joe Nash	01b8140d3a	[AMDGPU] Fix delay alu for VOPD with src2acc V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of these instructions lacked the dummy operand, which was a problem for inserting s_delay_alu. Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand tracking logic to account for it. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D136629	2022-10-25 13:11:17 -04:00
Mircea Trofin	87ec22de70	[mlgo] More wildcarding in extra features logging for regalloc May need a different testing approach for opcodes.	2022-10-25 08:20:55 -07:00
Simon Pilgrim	ed1b0da557	[X86] combineConcatVectorOps - fold v4i64/v8x32 concat(broadcast(),broadcast()) -> permilps(concat()) Extend the existing v4f64 fold to handle v4i64/v8f32/v8i32 as well Fixes #58585	2022-10-25 15:37:42 +01:00
Simon Pilgrim	fcbaf6f4e8	[X86] Add v4i64 test coverage for #58585 Turns out we fail to do this for concat_v4i64(broadcast_v2i64,broadcast_v2i64) as well	2022-10-25 15:03:08 +01:00
Simon Pilgrim	b92725ecbc	[X86] Add test coverage for #58585	2022-10-25 14:33:55 +01:00
Simon Pilgrim	c4051b2606	[X86] Fold vbroadcast(bitcast(vbroadcast(src))) -> bitcast(vbroadcast(vbroadcast(src))) If the inner broadcast scalar type is smaller/same width as the outer broadcast scalar type then we can broadcast using the same inner type directly. Works for vbroadcast_load as well.	2022-10-25 14:03:43 +01:00
chenglin.bi	e95c74b423	[AArch64] Add precommit test for bcmp; NFC	2022-10-25 17:23:03 +08:00
Cullen Rhodes	1e02a29e47	[AArch64][SVE] Use more flag-setting instructions If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the opcode so the PTEST becomes redundant. This patch extends this existing optimization in AArch64::optimizePTestInstr to cover all flag-setting opcodes. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136083	2022-10-25 09:02:21 +00:00
Cullen Rhodes	5621caeb82	[AArch64][SVE] NFC: extend tests for flag-setting predicate instructions A follow on patch will extend existing PTEST(PG, OP(PG, ...)) -> OP_FLAG_SETTING(PG, ...) optimization in AArch64InstrInfo::optimizePTestInstr to cover more of the flag-setting instructions Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136161	2022-10-25 09:02:20 +00:00
Thomas Symalla	1f23cf4e50	[NFC][AMDGPU] Pre-commit test for D136432 Nested BFI instruction with multiple uses.	2022-10-25 10:52:32 +02:00
Jay Foad	325927ffb9	[X86] Update LiveVariables in more cases in convertToThreeAddress Following on from D129634, this patch fixes more X86 CodeGen test failures with D129213 applied, which adds verification of LiveIntervals after the TwoAddressInstruction pass runs. These failures only showed up with LLVM_ENABLE_EXPENSIVE_CHECKS=ON which adds the equivalent of an implicit -verify-machineinstrs on all tests. Differential Revision: https://reviews.llvm.org/D136596	2022-10-25 09:21:51 +01:00
Sander de Smalen	19b9e6204a	[AArch64][SME] Fix chain for arm_locally_streaming functions. The Chain wasn't set correctly in the DAG for functions marked with aarch64_pstate_sm_body, which meant that SelectionDAG would dead-code some of the CopyToReg's. This didn't show up in the existing tests because all uses were in the same block, but when adding some control-flow, suddenly things would break. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D136579	2022-10-25 08:14:51 +00:00
Freddy Ye	fdac4c4e92	[X86] Add CMPCCXADD instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D135933	2022-10-25 14:33:39 +08:00
Craig Topper	a54f3347e8	[RISCV] Add shift amount operands of shift, rotate, and Zbs instructions to hasAllNBitUsers.	2022-10-24 22:07:22 -07:00
Craig Topper	223f466f4f	[RISCV] Add ORI to hasAllNBitUsers. If the immediate is negative with sufficient leading ones, then the upper bits of the other operand aren't demanded.	2022-10-24 21:33:17 -07:00
Xiang Li	996267d20e	[DirectX backend] set target triple to "dxil-ms-dx" Set target triple to "dxil-ms-dx" for DXIL at the end of DXILTranslateMetadata. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D131545	2022-10-24 14:49:31 -07:00
Zhiyao Ma	7e8af2fc0c	[ARM] Support -mexecute-only with -mlong-calls. Instead of using constant pools, use movw movt pair. Differential Revision: https://reviews.llvm.org/D136203	2022-10-24 11:41:24 -07:00
Guozhi Wei	f298bfb09b	[X86] New test case for reassociation of ADD instructions. This is a pre-commit test case for D136396. Differential Revision: https://reviews.llvm.org/D136501	2022-10-24 17:46:46 +00:00
Roman Lebedev	377f27be87	[X86] `DAGTypeLegalizer::ModifyToType()`: when widening w/ zeros, insert into undef and `and`-mask the padding away We can expect that the sequence of inserting-of-extracts-into-undef will be successfully lowered back into widening of the source vector, but it seems that at least for X86 mask vectors, we have a really hard time recovering from inserting-into-zero. I've looked into alternative fix injection points, and they are much more involved, by the time of `LowerBUILD_VECTORvXi1()`/`LowerINSERT_VECTOR_ELT()` the constants might be obscured, so it does not seem like we can easily deal with this by lowering into bit math later on, some other pieces are missing. Instead, it seems like just clearing the padding away via an `AND`-mask is at least not a worse choice. Why create a problem where there wasn't one. Though yes, it is possible that there are cases where constants originate from the source IR, so some other fix may still be needed. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D136046	2022-10-24 20:27:02 +03:00
Craig Topper	1fa8fd4c33	Recommit "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant." This reverts commit 65aaecca8842dec30d03734a7fe8ce33c5afec81. There was an ordering problem in the calculation of the partial remainder. Original commit message: If the divisor is even, we can first shift the dividend and divisor right by the number of trailing zeros. Now the divisor is odd and we can do the original algorithm to calculate a remainder. Then we shift that remainder left by the number of trailing zeros and add the bits that were shifted out of the dividend. Differential Revision: https://reviews.llvm.org/D135541	2022-10-24 10:08:50 -07:00
Simon Pilgrim	d81919dd7e	[X86] 2012-01-12-extract-sv.ll - add AVX2 test coverage	2022-10-24 17:55:00 +01:00
Ahmed Bougacha	718bb22c28	[AArch64][PAC] Select XPAC for ptrauth.strip intrinsic. Differential Revision: https://reviews.llvm.org/D132385	2022-10-24 08:15:56 -07:00
Amy Kwan	715301056e	[PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction. When lowering vector shuffles into the xxsplti32dx instruction on Power10, we canonicalize the right operand to be a BUILD_VECTOR and as a result, get the commuted vector shuffle node. However, a vector shuffle will not always be returned as the result for a commuted vector shuffle. In such a scenario, this patch updates the original cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector shuffle node prior to obtaining the commuted shuffle mask. This patch also adds a new test case that demonstrates this scenario (primarily seen on 32-bit), and was originally a crash prior to this fix. Differential Revision: https://reviews.llvm.org/D135024	2022-10-24 09:56:54 -05:00
Craig Topper	65aaecca88	Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant." This reverts commit f6a7b47820904c5e69cc4f133d382c74a87c44e8. I received a report that this fails on 32-bit X86.	2022-10-24 07:12:54 -07:00
Petar Avramovic	cbc378ecb8	GlobalISel: Artifact combine merge-like and unmerges into merge-like Recognize when sub-vectors have been split to elements which are used to build large vector. This happens when instructions have different vector sizes available. For example a few arithmetic instruction are required to process all elements of larger vector that can be stored using one instruction. Differential Revision: https://reviews.llvm.org/D109242	2022-10-24 13:33:06 +02:00
Petar Avramovic	e6c778f861	GlobalISel: Artifact combine merge-like and unmerge into unmerge Recognize when source could have been unmerged to pieces with DstTy without having to split source to smaller elements and then merge small elements into DstTy pieces. This happens when vector was meant to be split to sub-vectors but there was leftover. At this point artifact combiner have already dealt with leftover and we can continue to use sub-vectors. Differential Revision: https://reviews.llvm.org/D109241	2022-10-24 13:33:05 +02:00
Petar Avramovic	f1aa598046	GlobalISel: Artifact combine merge-like and unmerge into copy Recognize copy that is represented as split of a source register to elements that were reassembled to another register with the same type. Differential Revision: https://reviews.llvm.org/D109240	2022-10-24 13:33:05 +02:00
Petar Avramovic	51b98db487	GlobalISel: Precommit for artifact combine patches Differential Revision: https://reviews.llvm.org/D117655	2022-10-24 13:33:05 +02:00
Simon Pilgrim	fd5f3abb07	[DAG] Fold (abs (sign_extend_inreg x)) -> (zero_extend (abs (truncate x))) (PR43370) If the upper half of an abs() is all sign bits, then we can perform the abs() using just the lower half and then zero extend. I've limited the DAG combine to only sign_extend_inreg (and free truncate/zero_extend) to minimise any later promotion issues, but for legalization a similar fold can use ComputeNumSignBits to be more aggressive. Alive2: https://alive2.llvm.org/ce/z/y32fS4 Fixes #43370 Differential Revision: https://reviews.llvm.org/D136559	2022-10-24 10:27:08 +01:00
Piyou Chen	f8b8426861	[RISCV] Add Svnapot extension Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D136570	2022-10-24 01:27:04 -07:00
gonglingqin	3be059377e	[LoongArch] Add support for ISD::FRAMEADDR and ISD::RETURNADDR For now, only support lowering frame/return address for current frame. Differential Revision: https://reviews.llvm.org/D136215	2022-10-24 15:12:28 +08:00
Pierre van Houtryve	eccdedd6f7	[AMDGPU] Autogenerate icmp codegen test Switch to autogenerated tests so we can use the same test for GISel and DAGIsel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136446	2022-10-24 06:37:50 +00:00
Min-Yih Hsu	718a9793b4	[M68k][NFC] Use OS and ABI agnostic triple in codegen tests Use 'm68k' (i.e. m68k-unknown-unknown) in all codegen tests rather than m68k-linux-gnu. NFC.	2022-10-23 15:26:13 -07:00
Craig Topper	a41c1f3168	[RISCV] Make selectShiftMask look for negate opportunities after looking through AND. Previously we would only look for an AND or a negate. But its possible there is a negate after looking through the AND.	2022-10-23 14:23:13 -07:00
Simon Pilgrim	fe7bc7153a	[X86] Add abs(sext_inreg(x)) test coverage for Issue #43370	2022-10-23 18:17:19 +01:00
Simon Pilgrim	4e8f847676	[X86][AVX512] Fold extract_element(bitcast(<X x i1>) -> bitcast(extract_subvector()) On AVX512, extract legal bool vectors as bool subvectors before bitcasting to scalars to avoid spilling to stack. This helps rust which internally represents bool vectors as bool arrays It also exposes more missed opportunities to use the KADD instruction to add masks together before moving to gpr Fixes #58546	2022-10-23 14:47:24 +01:00

1 2 3 4 5 ...

45410 Commits