llvm-project

Author	SHA1	Message	Date
Craig Topper	e94dc58dff	[RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven. This avoids the call overhead as well as the the save/restore of fflags and the snan handling in the libm function. The save/restore of fflags and snan handling are needed to be correct for -ftrapping-math. I think we can ignore them in the default environment. The inline sequence will generate an invalid exception for nan and an inexact exception if fractional bits are discarded. I've used a custom inserter to explicitly create the control flow around the float->int->float conversion. We can probably avoid the final fsgnj after the conversion for no signed zeros FMF, but I'll leave that for future work. Note the comparison constant is slightly different than glibc uses. They use 1<<53 for double, I'm using 1<<52. I believe either are valid. Numbers >= 1<<52 can't have any fractional bits. It's ok to do the float->int->float conversion on numbers between 1<<53 and 1<<52 since they will all fit in 64. We only have a problem if the double can't fit in i64 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136508	2022-10-26 14:36:49 -07:00
James Y Knight	26fdad031c	[MIPS] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. The number of MIPS16 changes here is a bit surprising. Many of the fields with mismatched names were NOT previously choosing the correct argument positionally, but instead doing something completely wrong (e.g. it would encode a register where an immediate was expected). But, machine-code generation for MIPS16 has never actually functioned. It's also fully untested, thus, the MIPS16 changes, despite changing behavior, breaks (and fixes) zero tests. This change does not fix MIPS16 output, but it ought to be at least incrementally less broken. Outside MIPS16, I believe the only functional change is to the 'ginvi' instruction: it was previously encoding garbage into a field which was specified to be '00'. Fortunately, it was covered by tests -- and the tests were testing the incorrect behavior. So, fixed. Differential Revision: https://reviews.llvm.org/D134220	2022-10-26 14:06:08 -04:00
James Y Knight	23394cd810	[Sparc] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. It renames a few fields to have consistent names, as well as renaming operands to match the field names. Behavior is unchanged by this cleanup. (The only generated code change is for the disassembler for LDSTUB/LDSTUBA, but in both old and new versions, it fails to add enough operands, and thus triggers a runtime abort. I will address that bug in a future commit.) Differential Revision: https://reviews.llvm.org/D134201	2022-10-26 14:06:07 -04:00
Piyou Chen	7d7940fd77	[RISCV] add svinval extension 1. Add the svinval extension support 2. Add the svinval Predicates for its instruction Note: the svinval instructions defined in https://reviews.llvm.org/D117654 Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136571	2022-10-26 09:45:30 -07:00
Craig Topper	a61b74889f	[RISCV] Use vslide1down for i64 insertelt on RV32. Instead of using vslide1up, use vslide1down and build the other direction. This avoids the overlap constraint early clobber of vslide1up. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D136735	2022-10-26 09:43:12 -07:00
Pierre van Houtryve	63390dccd8	[GlobalISel] Add Predicates to GICombineRule Small QoL change to allow Predicates to be used in GICombineRule. Currently only one combine in the AMDGPU backend makes use of it. The implementation is pretty simple to get started but of course we can expand this later on and optimize predicate checking better if needed. Reviewed By: dsanders Differential Revision: https://reviews.llvm.org/D136681	2022-10-26 07:13:40 +00:00
chenglin.bi	9403a8bc37	[GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel The miscompile case's G_ZEXT has a G_FREEZE source. Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel. Fix #58431 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D136433	2022-10-26 09:54:13 +08:00
Guozhi Wei	d24c93cc41	[X86] Enable reassociation for ADD instructions ADD is an associative and commutative operation, so we can do reassociation for it. Differential Revision: https://reviews.llvm.org/D136396	2022-10-26 00:46:13 +00:00
Douglas Yung	fc40c73921	Revert "Update supported features in the generic CPU configuration" This reverts commit 11afbf396e10e1b1e91a5991e2aec1916e29a910. There are 10 tests still failing after follow-up fix b5d0bf9b9853, this should get the following bots back to green: - https://lab.llvm.org/buildbot/#/builders/183/builds/8194 - https://lab.llvm.org/buildbot/#/builders/186/builds/9491 - https://lab.llvm.org/buildbot/#/builders/214/builds/3908 - https://lab.llvm.org/buildbot/#/builders/93/builds/11740 - https://lab.llvm.org/buildbot/#/builders/231/builds/4200 - https://lab.llvm.org/buildbot/#/builders/121/builds/24519 - https://lab.llvm.org/buildbot/#/builders/230/builds/4466 - https://lab.llvm.org/buildbot/#/builders/94/builds/11639 - https://lab.llvm.org/buildbot/#/builders/45/builds/9325 - https://lab.llvm.org/buildbot/#/builders/124/builds/5219 - https://lab.llvm.org/buildbot/#/builders/67/builds/8623 - https://lab.llvm.org/buildbot/#/builders/123/builds/13836 - https://lab.llvm.org/buildbot/#/builders/109/builds/49355 - https://lab.llvm.org/buildbot/#/builders/58/builds/27751 - https://lab.llvm.org/buildbot/#/builders/117/builds/9922 - https://lab.llvm.org/buildbot/#/builders/16/builds/37012 - https://lab.llvm.org/buildbot/#/builders/104/builds/9490 - https://lab.llvm.org/buildbot/#/builders/42/builds/7725 - https://lab.llvm.org/buildbot/#/builders/196/builds/20077 - https://lab.llvm.org/buildbot/#/builders/3/builds/15217 - https://lab.llvm.org/buildbot/#/builders/6/builds/15251 - https://lab.llvm.org/buildbot/#/builders/9/builds/15247 - https://lab.llvm.org/buildbot/#/builders/36/builds/26487 - https://lab.llvm.org/buildbot/#/builders/54/builds/2474 - https://lab.llvm.org/buildbot/#/builders/74/builds/14536 - https://lab.llvm.org/buildbot/#/builders/5/builds/28555	2022-10-25 16:34:08 -07:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
Dan Gohman	11afbf396e	Update supported features in the generic CPU configuration Accompanying https://reviews.llvm.org/D125728, this updates LLVM Codegen's "generic" CPU to enable the same new features. Differential Revision: https://reviews.llvm.org/D125729	2022-10-25 11:42:32 -07:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Caroline Concatto	9fbd57fbe2	[AArch64]SME2 single-multi and multi-multi INT dot product instructions[part2] This patch adds the assembly/disassembly for the following instructions: SDOT: (4-way, multiple and single vector): Multi-vector signed integer dot-product by vector. SDOT (4-way, multiple vectors): Multi-vector signed integer dot-product. UDOT: (4-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector. (4-way, multiple vectors): Multi-vector unsigned integer dot-product. for groups of 2 and 4 ZA registers The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Depends on: D135563 Differential Revision: https://reviews.llvm.org/D135760	2022-10-25 18:32:20 +01:00
Caroline Concatto	070f414604	[AArch64]SME2 single-multi and multi-multi INT/FP dot product instructions This patch adds the assembly/disassembly for the following instruction: INT: SDOT (2-way, multiple and single vector): Multi-vector signed integer dot-product by vector. (2-way, multiple vectors): Multi-vector signed integer dot-product. UDOT (2-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector. (2-way, multiple vectors): Multi-vector unsigned integer dot-product. SUDOT (multiple and indexed vector): Multi-vector signed by unsigned integer dot-product by indexed elements. (multiple and single vector): Multi-vector signed by unsigned integer dot-product by vector. USDOT (multiple and single vector): Multi-vector unsigned by signed integer dot-product by vector. (multiple vectors): Multi-vector unsigned by signed integer dot-product. FP: BFDOT(multiple and single vector): Multi-vector BFloat16 floating-point dot-product by vector. (multiple vectors): Multi-vector BFloat16 floating-point dot-product. FDOT (multiple and single vector): Multi-vector half-precision floating-point dot-product by vector. (multiple vectors): Multi-vector half-precision floating-point dot-product. For set of 2 and 4 ZA registers The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Depends on:D135455 Differential Revision: https://reviews.llvm.org/D135683	2022-10-25 18:28:11 +01:00
Joe Nash	01b8140d3a	[AMDGPU] Fix delay alu for VOPD with src2acc V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of these instructions lacked the dummy operand, which was a problem for inserting s_delay_alu. Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand tracking logic to account for it. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D136629	2022-10-25 13:11:17 -04:00
Ulrich Weigand	96482ee434	[SystemZInstPrinter] Introduce markup tags emission SystemZ assembly syntax emission now leverages markup tags, if enabled. Author: Antonio Frighetto Differential Revision: https://reviews.llvm.org/D129868	2022-10-25 18:59:50 +02:00
Simon Pilgrim	ed1b0da557	[X86] combineConcatVectorOps - fold v4i64/v8x32 concat(broadcast(),broadcast()) -> permilps(concat()) Extend the existing v4f64 fold to handle v4i64/v8f32/v8i32 as well Fixes #58585	2022-10-25 15:37:42 +01:00
Jay Foad	191d70f2f5	[AMDGPU] Use Register in more places in SIInstrInfo. NFC. Also avoid using AMDGPU::NoRegister when it's not neeeded.	2022-10-25 15:04:58 +01:00
Simon Pilgrim	c4051b2606	[X86] Fold vbroadcast(bitcast(vbroadcast(src))) -> bitcast(vbroadcast(vbroadcast(src))) If the inner broadcast scalar type is smaller/same width as the outer broadcast scalar type then we can broadcast using the same inner type directly. Works for vbroadcast_load as well.	2022-10-25 14:03:43 +01:00
David Sherwood	1fe096ef59	[AArch64][SVE2] Add the SVE2.1 signed and unsigned 2-way dot instructions This patch adds the assembly/disassembly for the following instructions: SDOT : Signed integer 2-way dot product indexed and non-indexed UDOT : Unsigned integer 2-way dot product, indexed and non-indexed The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Differential Revision: https://reviews.llvm.org/D136464	2022-10-25 10:26:17 +00:00
Cullen Rhodes	1e02a29e47	[AArch64][SVE] Use more flag-setting instructions If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the opcode so the PTEST becomes redundant. This patch extends this existing optimization in AArch64::optimizePTestInstr to cover all flag-setting opcodes. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136083	2022-10-25 09:02:21 +00:00
David Sherwood	d6a8a0798f	[AArch64][SVE2] Add the SVE2.1 bfmlslb and bfmlslt instructions This patch adds the assembly/disassembly for the following instructions: BFMLSLB : BFloat16 floating-point multiply-subtract long from single-precision (bottom) BFMLSLT : BFloat16 floating-point multiply-subtract long from single-precision (top) Both the vector and indexed forms are added for each. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Differential Revision: https://reviews.llvm.org/D136439	2022-10-25 08:51:41 +00:00
Caroline Concatto	a4e8492ba2	[AArch64]SME2 Multi-vector ternary indexed DOT and FMLA instructions This patch adds the assembly/disassembly for the following instruction: FP: FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element. FMLS(multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element. BFDOT (multiple and indexed vector): Multi-vector BFloat16 floating-point dot-product by indexed element. FDOT (multiple and indexed vector): Multi-vector half-precision floating-point dot-product by indexed element. BFVDOT: Multi-vector BFloat16 floating-point vertical dot-product by indexed element. FVDOT: Multi-vector half-precision floating-point vertical dot-product by indexed element. INT: SDOT (2-way, multiple and indexed vector): Multi-vector signed integer dot-product by indexed element. (4-way, multiple and indexed vector): Multi-vector signed integer dot-product by indexed element. SUDOT (multiple and indexed vector): Multi-vector signed by unsigned integer dot-product by indexed elements. SUVDOT: Multi-vector signed by unsigned integer vertical dot-product by indexed element. UDOT (2-way, multiple and indexed vector): Multi-vector unsigned integer dot-product by indexed element. (4-way, multiple and indexed vector): Multi-vector unsigned integer dot-product by indexed element. USDOT (multiple and indexed vector): Multi-vector unsigned by signed integer dot-product by indexed element. USVDOT: Multi-vector unsigned by signed integer vertical dot-product by indexed element. For the multi-vec ternary indexed with 2 and 4 ZA single-vectors for 32 and 64 bits according to the instruction The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Depends on:D135563 Differential Revision: https://reviews.llvm.org/D135676	2022-10-25 09:22:16 +01:00
Jay Foad	325927ffb9	[X86] Update LiveVariables in more cases in convertToThreeAddress Following on from D129634, this patch fixes more X86 CodeGen test failures with D129213 applied, which adds verification of LiveIntervals after the TwoAddressInstruction pass runs. These failures only showed up with LLVM_ENABLE_EXPENSIVE_CHECKS=ON which adds the equivalent of an implicit -verify-machineinstrs on all tests. Differential Revision: https://reviews.llvm.org/D136596	2022-10-25 09:21:51 +01:00
Sander de Smalen	19b9e6204a	[AArch64][SME] Fix chain for arm_locally_streaming functions. The Chain wasn't set correctly in the DAG for functions marked with aarch64_pstate_sm_body, which meant that SelectionDAG would dead-code some of the CopyToReg's. This didn't show up in the existing tests because all uses were in the same block, but when adding some control-flow, suddenly things would break. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D136579	2022-10-25 08:14:51 +00:00
Caroline Concatto	ecd78ec5b9	[AArch64]SME2 Multiple vectors Int/FP clamp instructions for two/four registers This patch adds the assembly/disassembly for the following instruction: Int: SCLAMP:Multi-vector signed clamp to minimum/maximum vector. UCLAMP:Multi-vector unsigned clamp to minimum/maximum vector. FP: FCLAMP: Multi-vector floating-point clamp to minimum/maximum number. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Depends on: D135563 Differential Revision: https://reviews.llvm.org/D135601	2022-10-25 09:12:27 +01:00
Caroline Concatto	021ed4ccf6	[AArch64]SME2 Single and multiple vectors SVE Destructive two/four registers[part2] This patch adds the assembly/disassembly for the following instruction: INT: SMAX (multiple and single vector): Multi-vector signed maximum by vector. (multiple vectors): Multi-vector signed maximum. SMIN (multiple and single vector): Multi-vector signed minimum by vector. (multiple vectors): Multi-vector signed minimum. UMAX (multiple and single vector): Multi-vector unsigned maximum by vector. (multiple vectors): Multi-vector unsigned maximum. UMIN (multiple and single vector): Multi-vector unsigned minimum by vector. (multiple vectors): Multi-vector unsigned minimum. SRSHL (multiple and single vector): Multi-vector signed rounding shift left by vector. (multiple vectors): Multi-vector signed rounding shift left. URSHL (multiple and single vector): Multi-vector unsigned rounding shift left by vector. (multiple vectors): Multi-vector unsigned rounding shift left. FP: FMAX (multiple and single vector): Multi-vector floating-point maximum by vector. (multiple vectors): Multi-vector floating-point maximum. FMAXNM (multiple and single vector): Multi-vector floating-point maximum number by vector. (multiple vectors): Multi-vector floating-point maximum number. FMIN (multiple and single vector): Multi-vector floating-point minimum by vector. (multiple vectors): Multi-vector floating-point minimum. FMINNM (multiple and single vector): Multi-vector floating-point minimum number by vector. (multiple vectors): Multi-vector floating-point minimum number. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 It also updates ADD and SQDMULH Depends on: D135563 Differential Revision: https://reviews.llvm.org/D135599	2022-10-25 09:03:56 +01:00
Freddy Ye	fdac4c4e92	[X86] Add CMPCCXADD instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D135933	2022-10-25 14:33:39 +08:00
Craig Topper	a54f3347e8	[RISCV] Add shift amount operands of shift, rotate, and Zbs instructions to hasAllNBitUsers.	2022-10-24 22:07:22 -07:00
Craig Topper	223f466f4f	[RISCV] Add ORI to hasAllNBitUsers. If the immediate is negative with sufficient leading ones, then the upper bits of the other operand aren't demanded.	2022-10-24 21:33:17 -07:00
Craig Topper	d4dc036e70	[RISCV] Move vector cost table lookup out of the switch in getIntrinsicInstrCost. NFC This allows vectors to be looked up if the switch is used for the scalar version of an intrinsic. Extracted from D136508.	2022-10-24 20:32:22 -07:00
Craig Topper	63ed3d0eeb	[RISCV] Rename lowerFTRUNC_FCEIL_FFLOOR_FROUND to lowerVectorFTRUNC_FCEIL_FFLOOR_FROUND. NFC Extracted from D136508.	2022-10-24 20:32:22 -07:00
Xiang Li	996267d20e	[DirectX backend] set target triple to "dxil-ms-dx" Set target triple to "dxil-ms-dx" for DXIL at the end of DXILTranslateMetadata. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D131545	2022-10-24 14:49:31 -07:00
Caroline Concatto	440005b3c3	[AArch64]]SME2 multi-vec to multi-vec FP/INT down convert 2/4 registers This patch implements: FCVTZS: Multi-vector floating-point convert to signed integer, rounding toward zero. FCVTZU: Multi-vector floating-point convert to unsigned integer, rounding toward zero. SCVTF: Multi-vector signed integer convert to floating-point. UCVTF: Multi-vector unsigned integer convert to floating-point. for 2 and 4 registers The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Depends on: D135563 Differential Revision: https://reviews.llvm.org/D135593	2022-10-24 20:21:14 +01:00
Zhiyao Ma	7e8af2fc0c	[ARM] Support -mexecute-only with -mlong-calls. Instead of using constant pools, use movw movt pair. Differential Revision: https://reviews.llvm.org/D136203	2022-10-24 11:41:24 -07:00
Simon Pilgrim	da4baa9ebe	Fix MSVC "not all control paths return a value" warning. NFC.	2022-10-24 17:55:01 +01:00
Ahmed Bougacha	718bb22c28	[AArch64][PAC] Select XPAC for ptrauth.strip intrinsic. Differential Revision: https://reviews.llvm.org/D132385	2022-10-24 08:15:56 -07:00
Ahmed Bougacha	5b70001974	[AArch64][PAC] Add helper enum/functions to handle PAC keys/ops.	2022-10-24 08:15:17 -07:00
Amy Kwan	715301056e	[PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction. When lowering vector shuffles into the xxsplti32dx instruction on Power10, we canonicalize the right operand to be a BUILD_VECTOR and as a result, get the commuted vector shuffle node. However, a vector shuffle will not always be returned as the result for a commuted vector shuffle. In such a scenario, this patch updates the original cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector shuffle node prior to obtaining the commuted shuffle mask. This patch also adds a new test case that demonstrates this scenario (primarily seen on 32-bit), and was originally a crash prior to this fix. Differential Revision: https://reviews.llvm.org/D135024	2022-10-24 09:56:54 -05:00
Dmitry Preobrazhensky	72711d4e5f	[AMDGPU][MC] Correct definition of aliases Differential Revision: https://reviews.llvm.org/D136370	2022-10-24 17:06:52 +03:00
Simon Pilgrim	e28e214e4f	[X86] Treat PSLLDQ/PSRLDQ as a shuffle not a shift This appears to be a copy+paste typo in the znver1/2 AMD SoG tables, treating the byte shift instructions like bit shifts Older AMD SoG referred to PSLLDQ/PSRLDQ as shuffles, and Agner/instlatx64 both report they are integer shuffles	2022-10-24 14:42:45 +01:00
Piyou Chen	f8b8426861	[RISCV] Add Svnapot extension Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D136570	2022-10-24 01:27:04 -07:00
gonglingqin	3be059377e	[LoongArch] Add support for ISD::FRAMEADDR and ISD::RETURNADDR For now, only support lowering frame/return address for current frame. Differential Revision: https://reviews.llvm.org/D136215	2022-10-24 15:12:28 +08:00
Sheng	fb937c4913	[NFC][X86] Fix typo: stric => strict	2022-10-24 04:56:26 +00:00
Sheng	9cedab654d	[NFC][M68k] Update the status of ISA implementation LINK/UNLNK have been implemented in 64d326c33c6d3f008.	2022-10-24 09:37:48 +08:00
Craig Topper	a41c1f3168	[RISCV] Make selectShiftMask look for negate opportunities after looking through AND. Previously we would only look for an AND or a negate. But its possible there is a negate after looking through the AND.	2022-10-23 14:23:13 -07:00
Simon Pilgrim	4e8f847676	[X86][AVX512] Fold extract_element(bitcast(<X x i1>) -> bitcast(extract_subvector()) On AVX512, extract legal bool vectors as bool subvectors before bitcasting to scalars to avoid spilling to stack. This helps rust which internally represents bool vectors as bool arrays It also exposes more missed opportunities to use the KADD instruction to add masks together before moving to gpr Fixes #58546	2022-10-23 14:47:24 +01:00
Simon Pilgrim	c175d880a4	[X86] Add freeze(pshufd/permilps(x,imm)) -> pshufd/permilps(freeze(x),imm) folding Add X86 isGuaranteedNotToBeUndefOrPoisonForTargetNode / canCreateUndefOrPoisonForTargetNode overrides and add X86ISD::PSHUFD/VPERMILPI handling.	2022-10-23 10:39:12 +01:00
Craig Topper	ef72ff7b15	[RISCV] Fix unused variable warning. NFC	2022-10-22 22:29:03 -07:00
Krzysztof Parzyszek	2ec380b23f	[Hexagon] Improve handling of 32-bit mulh/mul_lohi on HVX Handle MULH[US] by normalizing them into newly invented nodes HexagonISD::(S\|U\|US)MUL_LOHI. On HVX v60, if only the high part of SMUL_LOHI is used, use the original MULHS expansion. In all other cases, expand the full product. On HVX v62, always expand the full product. Introduce Hexagon-specific LLVM IR intrinsics for 32x32 multiplication returning low/high parts.	2022-10-22 15:08:27 -07:00

1 2 3 4 5 ...

69334 Commits