69334 Commits

Author SHA1 Message Date
Craig Topper
e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00
James Y Knight
26fdad031c [MIPS] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

The number of MIPS16 changes here is a bit surprising. Many of the
fields with mismatched names were NOT previously choosing the correct
argument positionally, but instead doing something completely wrong
(e.g. it would encode a register where an immediate was expected).

But, machine-code generation for MIPS16 has never actually functioned.
It's also fully untested, thus, the MIPS16 changes, despite changing
behavior, breaks (and fixes) zero tests. This change does not fix
MIPS16 output, but it ought to be at least incrementally less broken.

Outside MIPS16, I believe the only functional change is to the 'ginvi'
instruction: it was previously encoding garbage into a field which was
specified to be '00'. Fortunately, it was covered by tests -- and the
tests were testing the incorrect behavior. So, fixed.

Differential Revision: https://reviews.llvm.org/D134220
2022-10-26 14:06:08 -04:00
James Y Knight
23394cd810 [Sparc] Fix useDeprecatedPositionallyEncodedOperands errors.
This is a follow-on to https://reviews.llvm.org/D134073.

It renames a few fields to have consistent names, as well as renaming
operands to match the field names.

Behavior is unchanged by this cleanup. (The only generated code change
is for the disassembler for LDSTUB/LDSTUBA, but in both old and new
versions, it fails to add enough operands, and thus triggers a runtime
abort. I will address that bug in a future commit.)

Differential Revision: https://reviews.llvm.org/D134201
2022-10-26 14:06:07 -04:00
Piyou Chen
7d7940fd77 [RISCV] add svinval extension
1. Add the svinval extension support
2. Add the svinval Predicates for its instruction

Note: the svinval instructions defined in https://reviews.llvm.org/D117654

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136571
2022-10-26 09:45:30 -07:00
Craig Topper
a61b74889f [RISCV] Use vslide1down for i64 insertelt on RV32.
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136735
2022-10-26 09:43:12 -07:00
Pierre van Houtryve
63390dccd8 [GlobalISel] Add Predicates to GICombineRule
Small QoL change to allow Predicates to be used in GICombineRule.
Currently only one combine in the AMDGPU backend makes use of it.

The implementation is pretty simple to get started but of course we can expand this later on and optimize predicate checking better if needed.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D136681
2022-10-26 07:13:40 +00:00
chenglin.bi
9403a8bc37 [GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel
The miscompile case's G_ZEXT has a G_FREEZE source.  Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel.

Fix #58431

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D136433
2022-10-26 09:54:13 +08:00
Guozhi Wei
d24c93cc41 [X86] Enable reassociation for ADD instructions
ADD is an associative and commutative operation, so we can do reassociation for it.

Differential Revision: https://reviews.llvm.org/D136396
2022-10-26 00:46:13 +00:00
Douglas Yung
fc40c73921 Revert "Update supported features in the generic CPU configuration"
This reverts commit 11afbf396e10e1b1e91a5991e2aec1916e29a910.

There are 10 tests still failing after follow-up fix b5d0bf9b9853, this should get the following bots back to green:
 - https://lab.llvm.org/buildbot/#/builders/183/builds/8194
 - https://lab.llvm.org/buildbot/#/builders/186/builds/9491
 - https://lab.llvm.org/buildbot/#/builders/214/builds/3908
 - https://lab.llvm.org/buildbot/#/builders/93/builds/11740
 - https://lab.llvm.org/buildbot/#/builders/231/builds/4200
 - https://lab.llvm.org/buildbot/#/builders/121/builds/24519
 - https://lab.llvm.org/buildbot/#/builders/230/builds/4466
 - https://lab.llvm.org/buildbot/#/builders/94/builds/11639
 - https://lab.llvm.org/buildbot/#/builders/45/builds/9325
 - https://lab.llvm.org/buildbot/#/builders/124/builds/5219
 - https://lab.llvm.org/buildbot/#/builders/67/builds/8623
 - https://lab.llvm.org/buildbot/#/builders/123/builds/13836
 - https://lab.llvm.org/buildbot/#/builders/109/builds/49355
 - https://lab.llvm.org/buildbot/#/builders/58/builds/27751
 - https://lab.llvm.org/buildbot/#/builders/117/builds/9922
 - https://lab.llvm.org/buildbot/#/builders/16/builds/37012
 - https://lab.llvm.org/buildbot/#/builders/104/builds/9490
 - https://lab.llvm.org/buildbot/#/builders/42/builds/7725
 - https://lab.llvm.org/buildbot/#/builders/196/builds/20077
 - https://lab.llvm.org/buildbot/#/builders/3/builds/15217
 - https://lab.llvm.org/buildbot/#/builders/6/builds/15251
 - https://lab.llvm.org/buildbot/#/builders/9/builds/15247
 - https://lab.llvm.org/buildbot/#/builders/36/builds/26487
 - https://lab.llvm.org/buildbot/#/builders/54/builds/2474
 - https://lab.llvm.org/buildbot/#/builders/74/builds/14536
 - https://lab.llvm.org/buildbot/#/builders/5/builds/28555
2022-10-25 16:34:08 -07:00
Philip Reames
269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Dan Gohman
11afbf396e Update supported features in the generic CPU configuration
Accompanying https://reviews.llvm.org/D125728, this updates LLVM
Codegen's "generic" CPU to enable the same new features.

Differential Revision: https://reviews.llvm.org/D125729
2022-10-25 11:42:32 -07:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Caroline Concatto
9fbd57fbe2 [AArch64]SME2 single-multi and multi-multi INT dot product instructions[part2]
This patch adds the assembly/disassembly for the following instructions:
        SDOT: (4-way, multiple and single vector): Multi-vector signed integer dot-product by vector.
              SDOT (4-way, multiple vectors): Multi-vector signed integer dot-product.
        UDOT: (4-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector.
              (4-way, multiple vectors): Multi-vector unsigned integer dot-product.
    for groups of 2 and 4 ZA registers

The reference can be found here:
                https://developer.arm.com/documentation/ddi0602/2022-09

Depends on: D135563

Differential Revision: https://reviews.llvm.org/D135760
2022-10-25 18:32:20 +01:00
Caroline Concatto
070f414604 [AArch64]SME2 single-multi and multi-multi INT/FP dot product instructions
This patch adds the assembly/disassembly for the following instruction:
INT:
  SDOT (2-way, multiple and single vector): Multi-vector signed integer dot-product by vector.
       (2-way, multiple vectors): Multi-vector signed integer dot-product.
  UDOT (2-way, multiple and single vector): Multi-vector unsigned integer dot-product by vector.
       (2-way, multiple vectors): Multi-vector unsigned integer dot-product.
  SUDOT (multiple and indexed vector): Multi-vector signed by unsigned integer dot-product by indexed elements.
        (multiple and single vector): Multi-vector signed by unsigned integer dot-product by vector.
  USDOT (multiple and single vector): Multi-vector unsigned by signed integer dot-product by vector.
        (multiple vectors): Multi-vector unsigned by signed integer dot-product.
FP:
  BFDOT(multiple and single vector): Multi-vector BFloat16 floating-point dot-product by vector.
        (multiple vectors): Multi-vector BFloat16 floating-point dot-product.

  FDOT (multiple and single vector): Multi-vector half-precision floating-point dot-product by vector.
       (multiple vectors): Multi-vector half-precision floating-point dot-product.
For set of 2 and 4 ZA registers

The reference can be found here:
        https://developer.arm.com/documentation/ddi0602/2022-09

Depends on:D135455

Differential Revision: https://reviews.llvm.org/D135683
2022-10-25 18:28:11 +01:00
Joe Nash
01b8140d3a [AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to
inform passes that the instructions read the dst operand. The VOPD
versions of these instructions lacked the dummy operand, which was a
problem for inserting s_delay_alu.
Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand
tracking logic to account for it.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D136629
2022-10-25 13:11:17 -04:00
Ulrich Weigand
96482ee434 [SystemZInstPrinter] Introduce markup tags emission
SystemZ assembly syntax emission now leverages markup tags, if enabled.

Author: Antonio Frighetto

Differential Revision: https://reviews.llvm.org/D129868
2022-10-25 18:59:50 +02:00
Simon Pilgrim
ed1b0da557 [X86] combineConcatVectorOps - fold v4i64/v8x32 concat(broadcast(),broadcast()) -> permilps(concat())
Extend the existing v4f64 fold to handle v4i64/v8f32/v8i32 as well

Fixes #58585
2022-10-25 15:37:42 +01:00
Jay Foad
191d70f2f5 [AMDGPU] Use Register in more places in SIInstrInfo. NFC.
Also avoid using AMDGPU::NoRegister when it's not neeeded.
2022-10-25 15:04:58 +01:00
Simon Pilgrim
c4051b2606 [X86] Fold vbroadcast(bitcast(vbroadcast(src))) -> bitcast(vbroadcast(vbroadcast(src)))
If the inner broadcast scalar type is smaller/same width as the outer broadcast scalar type then we can broadcast using the same inner type directly. Works for vbroadcast_load as well.
2022-10-25 14:03:43 +01:00
David Sherwood
1fe096ef59 [AArch64][SVE2] Add the SVE2.1 signed and unsigned 2-way dot instructions
This patch adds the assembly/disassembly for the following instructions:

SDOT : Signed integer 2-way dot product indexed and non-indexed
UDOT : Unsigned integer 2-way dot product, indexed and non-indexed

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136464
2022-10-25 10:26:17 +00:00
Cullen Rhodes
1e02a29e47 [AArch64][SVE] Use more flag-setting instructions
If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the
opcode so the PTEST becomes redundant. This patch extends this existing
optimization in AArch64::optimizePTestInstr to cover all flag-setting
opcodes.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136083
2022-10-25 09:02:21 +00:00
David Sherwood
d6a8a0798f [AArch64][SVE2] Add the SVE2.1 bfmlslb and bfmlslt instructions
This patch adds the assembly/disassembly for the following instructions:

BFMLSLB : BFloat16 floating-point multiply-subtract long
          from single-precision (bottom)
BFMLSLT : BFloat16 floating-point multiply-subtract long
          from single-precision (top)

Both the vector and indexed forms are added for each.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136439
2022-10-25 08:51:41 +00:00
Caroline Concatto
a4e8492ba2 [AArch64]SME2 Multi-vector ternary indexed DOT and FMLA instructions
This patch adds the assembly/disassembly for the following instruction:
FP:
  FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
  FMLS(multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.

  BFDOT (multiple and indexed vector): Multi-vector BFloat16 floating-point dot-product by indexed element.
  FDOT (multiple and indexed vector): Multi-vector half-precision floating-point dot-product by indexed element.
  BFVDOT: Multi-vector BFloat16 floating-point vertical dot-product by indexed element.
  FVDOT: Multi-vector half-precision floating-point vertical dot-product by indexed element.

INT:
  SDOT (2-way, multiple and indexed vector): Multi-vector signed integer dot-product by indexed element.
       (4-way, multiple and indexed vector): Multi-vector signed integer dot-product by indexed element.
  SUDOT (multiple and indexed vector): Multi-vector signed by unsigned integer dot-product by indexed elements.
  SUVDOT: Multi-vector signed by unsigned integer vertical dot-product by indexed element.
  UDOT (2-way, multiple and indexed vector): Multi-vector unsigned integer dot-product by indexed element.
       (4-way, multiple and indexed vector): Multi-vector unsigned integer dot-product by indexed element.
  USDOT (multiple and indexed vector): Multi-vector unsigned by signed integer dot-product by indexed element.
  USVDOT: Multi-vector unsigned by signed integer vertical dot-product by indexed element.

For the multi-vec ternary indexed with 2 and 4 ZA single-vectors for
32 and 64 bits according to the instruction

The reference can be found here:
    https://developer.arm.com/documentation/ddi0602/2022-09

Depends on:D135563

Differential Revision: https://reviews.llvm.org/D135676
2022-10-25 09:22:16 +01:00
Jay Foad
325927ffb9 [X86] Update LiveVariables in more cases in convertToThreeAddress
Following on from D129634, this patch fixes more X86 CodeGen test
failures with D129213 applied, which adds verification of LiveIntervals
after the TwoAddressInstruction pass runs. These failures only showed up
with LLVM_ENABLE_EXPENSIVE_CHECKS=ON which adds the equivalent of an
implicit -verify-machineinstrs on all tests.

Differential Revision: https://reviews.llvm.org/D136596
2022-10-25 09:21:51 +01:00
Sander de Smalen
19b9e6204a [AArch64][SME] Fix chain for arm_locally_streaming functions.
The Chain wasn't set correctly in the DAG for functions marked
with aarch64_pstate_sm_body, which meant that SelectionDAG would
dead-code some of the CopyToReg's. This didn't show up in the
existing tests because all uses were in the same block, but when
adding some control-flow, suddenly things would break.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D136579
2022-10-25 08:14:51 +00:00
Caroline Concatto
ecd78ec5b9 [AArch64]SME2 Multiple vectors Int/FP clamp instructions for two/four registers
This patch adds the assembly/disassembly for the following instruction:
Int:
  SCLAMP:Multi-vector signed clamp to minimum/maximum vector.
  UCLAMP:Multi-vector unsigned clamp to minimum/maximum vector.
FP:
  FCLAMP: Multi-vector floating-point clamp to minimum/maximum number.

  The reference can be found here:

    https://developer.arm.com/documentation/ddi0602/2022-09

    Depends on: D135563

Differential Revision: https://reviews.llvm.org/D135601
2022-10-25 09:12:27 +01:00
Caroline Concatto
021ed4ccf6 [AArch64]SME2 Single and multiple vectors SVE Destructive two/four registers[part2]
This patch adds the assembly/disassembly for the following instruction:
INT:
  SMAX (multiple and single vector): Multi-vector signed maximum by vector.
       (multiple vectors): Multi-vector signed maximum.
  SMIN (multiple and single vector): Multi-vector signed minimum by vector.
       (multiple vectors): Multi-vector signed minimum.
  UMAX (multiple and single vector): Multi-vector unsigned maximum by vector.
       (multiple vectors): Multi-vector unsigned maximum.
  UMIN (multiple and single vector): Multi-vector unsigned minimum by vector.
       (multiple vectors): Multi-vector unsigned minimum.
  SRSHL (multiple and single vector): Multi-vector signed rounding shift left by vector.
        (multiple vectors): Multi-vector signed rounding shift left.
  URSHL (multiple and single vector): Multi-vector unsigned rounding shift left by vector.
        (multiple vectors): Multi-vector unsigned rounding shift left.
FP:
  FMAX (multiple and single vector): Multi-vector floating-point maximum by vector.
       (multiple vectors): Multi-vector floating-point maximum.
  FMAXNM (multiple and single vector): Multi-vector floating-point maximum number by vector.
         (multiple vectors): Multi-vector floating-point maximum number.
  FMIN (multiple and single vector): Multi-vector floating-point minimum by vector.
       (multiple vectors): Multi-vector floating-point minimum.
  FMINNM (multiple and single vector): Multi-vector floating-point minimum number by vector.
         (multiple vectors): Multi-vector floating-point minimum number.

The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

It also updates ADD and SQDMULH

Depends on: D135563

Differential Revision: https://reviews.llvm.org/D135599
2022-10-25 09:03:56 +01:00
Freddy Ye
fdac4c4e92 [X86] Add CMPCCXADD instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D135933
2022-10-25 14:33:39 +08:00
Craig Topper
a54f3347e8 [RISCV] Add shift amount operands of shift, rotate, and Zbs instructions to hasAllNBitUsers. 2022-10-24 22:07:22 -07:00
Craig Topper
223f466f4f [RISCV] Add ORI to hasAllNBitUsers.
If the immediate is negative with sufficient leading ones, then
the upper bits of the other operand aren't demanded.
2022-10-24 21:33:17 -07:00
Craig Topper
d4dc036e70 [RISCV] Move vector cost table lookup out of the switch in getIntrinsicInstrCost. NFC
This allows vectors to be looked up if the switch is used for the
scalar version of an intrinsic.

Extracted from D136508.
2022-10-24 20:32:22 -07:00
Craig Topper
63ed3d0eeb [RISCV] Rename lowerFTRUNC_FCEIL_FFLOOR_FROUND to lowerVectorFTRUNC_FCEIL_FFLOOR_FROUND. NFC
Extracted from D136508.
2022-10-24 20:32:22 -07:00
Xiang Li
996267d20e [DirectX backend] set target triple to "dxil-ms-dx"
Set target triple to "dxil-ms-dx" for DXIL at the end of DXILTranslateMetadata.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D131545
2022-10-24 14:49:31 -07:00
Caroline Concatto
440005b3c3 [AArch64]]SME2 multi-vec to multi-vec FP/INT down convert 2/4 registers
This patch implements:
 FCVTZS: Multi-vector floating-point convert to signed integer, rounding
         toward zero.
 FCVTZU: Multi-vector floating-point convert to unsigned integer,
        rounding toward zero.
 SCVTF: Multi-vector signed integer convert to floating-point.
 UCVTF: Multi-vector unsigned integer convert to floating-point.
for 2 and 4 registers

The reference can be found here:
    https://developer.arm.com/documentation/ddi0602/2022-09

    Depends on: D135563

Differential Revision: https://reviews.llvm.org/D135593
2022-10-24 20:21:14 +01:00
Zhiyao Ma
7e8af2fc0c [ARM] Support -mexecute-only with -mlong-calls.
Instead of using constant pools, use movw movt pair.

Differential Revision: https://reviews.llvm.org/D136203
2022-10-24 11:41:24 -07:00
Simon Pilgrim
da4baa9ebe Fix MSVC "not all control paths return a value" warning. NFC. 2022-10-24 17:55:01 +01:00
Ahmed Bougacha
718bb22c28 [AArch64][PAC] Select XPAC for ptrauth.strip intrinsic.
Differential Revision: https://reviews.llvm.org/D132385
2022-10-24 08:15:56 -07:00
Ahmed Bougacha
5b70001974 [AArch64][PAC] Add helper enum/functions to handle PAC keys/ops. 2022-10-24 08:15:17 -07:00
Amy Kwan
715301056e [PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction.
When lowering vector shuffles into the xxsplti32dx instruction on Power10, we
canonicalize the right operand to be a BUILD_VECTOR and as a result, get the
commuted vector shuffle node.

However, a vector shuffle will not always be returned as the result for a
commuted vector shuffle. In such a scenario, this patch updates the original
cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector
shuffle node prior to obtaining the commuted shuffle mask.

This patch also adds a new test case that demonstrates this scenario (primarily
seen on 32-bit), and was originally a crash prior to this fix.

Differential Revision: https://reviews.llvm.org/D135024
2022-10-24 09:56:54 -05:00
Dmitry Preobrazhensky
72711d4e5f [AMDGPU][MC] Correct definition of aliases
Differential Revision: https://reviews.llvm.org/D136370
2022-10-24 17:06:52 +03:00
Simon Pilgrim
e28e214e4f [X86] Treat PSLLDQ/PSRLDQ as a shuffle not a shift
This appears to be a copy+paste typo in the znver1/2 AMD SoG tables, treating the byte shift instructions like bit shifts

Older AMD SoG referred to PSLLDQ/PSRLDQ as shuffles, and Agner/instlatx64 both report they are integer shuffles
2022-10-24 14:42:45 +01:00
Piyou Chen
f8b8426861 [RISCV] Add Svnapot extension
Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D136570
2022-10-24 01:27:04 -07:00
gonglingqin
3be059377e [LoongArch] Add support for ISD::FRAMEADDR and ISD::RETURNADDR
For now, only support lowering frame/return address for current frame.

Differential Revision: https://reviews.llvm.org/D136215
2022-10-24 15:12:28 +08:00
Sheng
fb937c4913 [NFC][X86] Fix typo: stric => strict 2022-10-24 04:56:26 +00:00
Sheng
9cedab654d [NFC][M68k] Update the status of ISA implementation
LINK/UNLNK have been implemented in 64d326c33c6d3f008.
2022-10-24 09:37:48 +08:00
Craig Topper
a41c1f3168 [RISCV] Make selectShiftMask look for negate opportunities after looking through AND.
Previously we would only look for an AND or a negate. But its
possible there is a negate after looking through the AND.
2022-10-23 14:23:13 -07:00
Simon Pilgrim
4e8f847676 [X86][AVX512] Fold extract_element(bitcast(<X x i1>) -> bitcast(extract_subvector())
On AVX512, extract legal bool vectors as bool subvectors before bitcasting to scalars to avoid spilling to stack.

This helps rust which internally represents bool vectors as bool arrays

It also exposes more missed opportunities to use the KADD instruction to add masks together before moving to gpr

Fixes #58546
2022-10-23 14:47:24 +01:00
Simon Pilgrim
c175d880a4 [X86] Add freeze(pshufd/permilps(x,imm)) -> pshufd/permilps(freeze(x),imm) folding
Add X86 isGuaranteedNotToBeUndefOrPoisonForTargetNode / canCreateUndefOrPoisonForTargetNode overrides and add X86ISD::PSHUFD/VPERMILPI handling.
2022-10-23 10:39:12 +01:00
Craig Topper
ef72ff7b15 [RISCV] Fix unused variable warning. NFC 2022-10-22 22:29:03 -07:00
Krzysztof Parzyszek
2ec380b23f [Hexagon] Improve handling of 32-bit mulh/mul_lohi on HVX
Handle MULH[US] by normalizing them into newly invented nodes
HexagonISD::(S|U|US)MUL_LOHI. On HVX v60, if only the high part of
SMUL_LOHI is used, use the original MULHS expansion. In all other
cases, expand the full product.
On HVX v62, always expand the full product.

Introduce Hexagon-specific LLVM IR intrinsics for 32x32 multiplication
returning low/high parts.
2022-10-22 15:08:27 -07:00