52796 Commits

Author SHA1 Message Date
Simon Pilgrim
3a758076f5 [X86] Allow i8 CTPOP expansion to work with a 'shifted' active bits value of 8 bits or less
Shift down the value so the active bits are at the lsb
2024-02-02 18:03:20 +00:00
Manish Kausik H
a768bc6ef6
[SelectionDAG] Use unaligned store to move AVX registers onto stack for extractelement (#78422)
Prior to this patch, SelectionDAG generated aligned move onto stacks for
AVX registers when the function was marked as a no-realign-stack
function. This lead to misalignment between the stack and the
instruction generated. This patch fixes the issue.

Fixes #77730
2024-02-02 22:49:31 +05:30
Simon Pilgrim
fbf9356be0 [X86] Add ctpop-mask.ll - test coverage based off #79823 2024-02-02 15:40:22 +00:00
Simon Pilgrim
275729ae06 [X86] Generalize i8 CTPOP expansion to work with any input with 8 or less active bits
Extend #79989 slightly to use KnownBits on the CTPOP input - this should make it easier to add additional cases identified in #79823
2024-02-02 14:29:57 +00:00
Simon Pilgrim
b5d35feacb
[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)
Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.

I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.

I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
2024-02-02 11:28:58 +00:00
chuongg3
10943695f7
[AArch64][GlobalISel] Legalize BSWAP for Vector Types (#80036)
Add support of i16 vector operation for BSWAP and change to TableGen to
select instructions

Handle vector types that are smaller/larger than legal for BSWAP
2024-02-02 10:45:46 +00:00
Simon Pilgrim
9410019ac9
[X86] Add i8 CTPOP lowering using i32 MUL (#79989)
This is the first basic proposal in #79823 - we can investigate improving support for other widths if we can find further use cases.
2024-02-02 10:40:28 +00:00
Tuan Chuong Goh
4b8e514334 [AArch64][GlobalISel] Pre-Commit tests for Legalize BSWAP Vectors 2024-02-02 10:18:31 +00:00
Matthew Devereau
d9c20e437f
[AArch64][SME] Implement inline-asm clobbers for za/zt0 (#79276)
This enables specifing "za" or "zt0" to the clobber list for inline asm.
This complies with the acle SME addition to the asm extension here:
https://github.com/ARM-software/acle/pull/276
2024-02-02 08:12:05 +00:00
Yuta Mukai
374a600df7
[MachinePipeliner] Fix missing requirements for tests (#80386)
Add asserts requirements for tests that verify debug output.
2024-02-02 15:04:51 +09:00
Craig Topper
58c494f47c
[RISCV] Add -march support for many of the S extensions mentioned in the profile specification. (#79399)
This is a good portion of the extensions mentioned in the RVA23 profile
here
https://github.com/riscv/riscv-profiles/blob/main/rva23-profile.adoc

I don't believe these add any new CSRs. Sstc does add new CSRs, but we
already added them without the extension name a while back.

I tried to keep the descriptions in RISCVFeatures.td fairly short since
the strings show up in `-print-supported-extensions`.
2024-02-01 18:50:30 -08:00
Rahman Lavaee
acec6419e8
[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128)
Today `-split-machine-functions` and `-fbasic-block-sections={all,list}`
cannot be combined with `-basic-block-sections=labels` (the labels
option will be ignored).
The inconsistency comes from the way basic block address map -- the
underlying mechanism for basic block labels -- encodes basic block
addresses
(https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html).
Specifically, basic block offsets are computed relative to the function
begin symbol. This relies on functions being contiguous which is not the
case for MFS and basic block section binaries. This means Propeller
cannot use binary profiles collected from these binaries, which limits
the applicability of Propeller for iterative optimization.
    
To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section
binaries, we propose modifying the encoding of this section as follows.

First let us review the current encoding which emits the address of each
function and its number of basic blocks, followed by basic block entries
for each basic block.

| | |
|--|--|
| Address of the function | Function Address |
|  Number of basic blocks in this function | NumBlocks |
|  BB entry 1
|  BB entry 2
|   ...
|  BB entry #NumBlocks
    
To make this work for basic block sections, we treat each basic block
section similar to a function, except that basic block sections of the
same function must be encapsulated in the same structure so we can map
all of them to their single function.
    
We modify the encoding to first emit the number of basic block sections
(BB ranges) in the function. Then we emit the address map of each basic
block section section as before: the base address of the section, its
number of blocks, and BB entries for its basic block. The first section
in the BB address map is always the function entry section.
| | |
|--|--|
|  Number of sections for this function   | NumBBRanges |
| Section 1 begin address                     | BaseAddress[1]  |
| Number of basic blocks in section 1 | NumBlocks[1]    |
| BB entries for Section 1
|..................|
| Section #NumBBRanges begin address | BaseAddress[NumBBRanges] |
| Number of basic blocks in section #NumBBRanges |
NumBlocks[NumBBRanges] |
| BB entries for Section #NumBBRanges
    
The encoding of basic block entries remains as before with the minor
change that each basic block offset is now computed relative to the
begin symbol of its containing BB section.
    
This patch adds a new boolean codegen option `-basic-block-address-map`.
Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD
flag `--lto-basic-block-address-map` are introduced.
Analogously, we add a new TargetOption field `BBAddrMap`. This means BB
address maps are either generated for all functions in the compiling
unit, or for none (depending on `TargetOptions::BBAddrMap`).
    
This patch keeps the functionality of the old
`-fbasic-block-sections=labels` option but does not remove it. A
subsequent patch will remove the obsolete option.

We refactor the `BasicBlockSections` pass by separating the BB address
map and BB sections handing to their own functions (named
`handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers
basic blocks and places them in their assigned sections.
`handleBBAddrMap` is invoked after `handleBBSections` (if requested) and
only renumbers the blocks.
  - New tests added:
- Two tests basic-block-address-map-with-basic-block-sections.ll and
basic-block-address-map-with-mfs.ll to exercise the combination of
`-basic-block-address-map` with `-basic-block-sections=list` and
'-split-machine-functions`.
- A driver sanity test for the `-fbasic-block-address-map` option
(basic-block-address-map.c).
- An LLD test for testing the `--lto-basic-block-address-map` option.
This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`.
- Renamed and modified the two existing codegen tests for basic block
address map (`basic-block-sections-labels-functions-sections.ll` and
`basic-block-sections-labels.ll`)
- Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of
`SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2
will happen in a separate PR in a few months.
2024-02-01 17:50:46 -08:00
Yuta Mukai
70eab122bc
[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589)
Add AArch64 implementations for the interfaces of MachinePipeliner pass.
The pass is disabled by default for AArch64. It is enabled by specifying
--aarch64-enable-pipeliner.

5 tests in llvm-test-suites show performance improvement by more than 5%
on a Neoverse V1 processor.

| test | improvement |
| ---------------------------------------------------------------- |
-----------:|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test | 16%
|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test | 16%
|
| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test | 14% |
| SingleSource/Benchmarks/Misc/flops-5.test | 13% |
| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test | 6% |

(base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm
-aarch64-enable-pipeliner -mllvm
-pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm
-pipeliner-enable-copytophi=0)

On the other hand, there are cases of significant performance
degradation. Algorithm improvements and adding the option/pragma will be
needed in the future.
2024-02-02 10:33:44 +09:00
Craig Topper
5cf0fb4317
[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. (#80242)
The stack slot coloring pass is concerned with optimizing spill
slots. If any change is a pass is made over the function to remove
stack stores that use the same register and stack slot as an
immediately preceding load.
    
The register check is too simple for constant registers like AArch64
and RISC-V's zero register. This register can be used as the result
of a load if we want to discard the result, but still have the memory
access performed. Like for a volatile or atomic load.
    
If the code sees a load from the zero register followed by a store
of the zero register at the same stack slot, the pass mistakenly
believes the store isn't needed.
    
Since the main stack coloring optimization is only concerned with
spill slots, it seems reasonable that RemoveDeadStores should
only be concerned with spills. Since we never generate a reload of
x0, this avoids the issue seen by RISC-V.
    
Test case concept is adapted from pr30821.mir from X86. That test
had to be updated to mark the stack slot as a spill slot.
    
Fixes #80052.
2024-02-01 13:25:15 -08:00
Anatoly Trosinenko
08fccf8094
[AArch64][PAC] Expand blend(reg, imm) operation in aarch64-pauth pass (#74729)
In preparation for implementing code generation for more @llvm.ptrauth.* intrinsics, move the expansion of blend(register, small integer) variant of @llvm.ptrauth.blend to the AArch64PointerAuth pass, where most other PAuth-related code generation takes place.
2024-02-01 13:02:39 -08:00
Jiahan Xie
10c2d5ff7c
[RISCV][GISel] RegBank select and instruction select for vector G_ADD, G_SUB (#74114)
RegisterBank Selection for scalable vector G_ADD and G_SUB by creating
new mappings for different types of vector register banks.
Then implement Instruction Selection for the same operations by choosing
the correct RISC-V vector register class.
2024-02-01 15:06:43 -05:00
Brendan Sweeney
e296cedcd6
[RISCV][MC] MC layer support for the experimental zalasr extension (#79911)
This PR implements experimental support for the RISC-V Atomic
Load-Acquire and Store-Release Extension (Zalasr). It has been approved
to be pursued as a fast track extension
(https://lists.riscv.org/g/tech-unprivileged/topic/arc_architecture_review/101951698),
but has not yet been approved by ARC or ratified. See
https://github.com/mehnadnerd/riscv-zalasr for draft spec.

---------

Co-authored-by: brs <turtwig@utexas.edu>
Co-authored-by: Philip Reames <preames@rivosinc.com>
2024-02-01 10:58:21 -08:00
Fangrui Song
10a55caccf
[RISCV] Support constraint "s" (#80201)
GCC has supported a generic constraint "s" for a long time (since at
least 1992), which references a symbol or label with an optional
constant offset. "i" is a superset that also supports a constant
integer.

GCC's RISC-V port also supports a machine-specific constraint "S",
which cannot be used with a preemptible symbol. (We don't bother to
check preemptibility.) In PIC code, an external symbol is preemptible by
default, making "S" less useful if you want to create an artificial
reference for linker garbage collection, or define sections to hold
symbol addresses:

```
void fun();
// error: impossible constraint in ‘asm’ for riscv64-linux-gnu-gcc -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "S"(fun)); }
// good even if -fpie/-fpic
void foo() { asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(fun)); }
```

This patch adds support for "s". Modify https://reviews.llvm.org/D105254
("S") to handle multi-depth GEPs (https://reviews.llvm.org/D61560).
2024-02-01 10:18:42 -08:00
Yaxun (Sam) Liu
1f3c30911c
[AMDGPU] Mark PC_ADD_REL_OFFSET rematerializable (#79674)
Currently machine LICM hoist PC_ADD_REL_OFFSET out of loops, causes
register pressure when function calls are deep in loops. This is a main
cause of sgpr spill for programs containing large number of function
calls in loops.

This patch marks PC_ADD_REL_OFFSET as rematerializable, which eliminates
sgpr spills due to function calls in loops.
2024-02-01 12:21:19 -05:00
Amy Kwan
2a50921553
[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485)
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.

In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
  access where R13 is the thread pointer, respectively.

In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.

Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
  TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.

This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.

This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
2024-02-01 09:29:21 -05:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
David Green
5d41788f37
[AArch64] Alter latency of FCSEL under Cortex-A510 (#80178)
As per the Cortex-A510 software optimization guide, the latency of a
fcsel should be 3 not 4. It would previously get the latency from
WriteF.
2024-02-01 13:42:14 +00:00
Sander de Smalen
d313614b60
[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166)
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.

Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.

For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0

We have now done the same for ZA, such that we add:
* aarch64_new_za       (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)

This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
2024-02-01 13:37:37 +00:00
Joseph Huber
f956e7fbf1
[AMDGPU] Prefer s_memtime for readcyclecounter on GFX10 (#80211)
Summary:
The old `s_memtime` instruction was supported until the GFX10
architecture. Although this instruction has a higher latency than the
new shader counter, it's much more usable as a processor clock as it is
a full 64-bit counter. The new shader counter is only a 20-bit counter,
which makes it difficult to use as a standard cycle counter as it will
overflow in a few milliseconds. This patch suggests preferring
`s_memtime` for this instrinsic if it is still available.
2024-02-01 07:19:57 -06:00
Wang Pengcheng
178719e860
[RISCV][NFC] Simplify calls.ll and autogenerate checks for tail-calls.ll
Split out from #78417.

Reviewers: topperc, asb, kito-cheng

Reviewed By: asb

Pull Request: https://github.com/llvm/llvm-project/pull/79248
2024-02-01 20:50:20 +08:00
Simon Pilgrim
ea2984287d [ARM] Add ctpop codegen tests 2024-02-01 11:42:18 +00:00
Craig Topper
cf401f72e1
[RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. (#80119)
We don't have an AMO instruction for Nand, so with the A extension we
use an LR/SC loop. If we have Zacas we can use a CAS loop instead.

According to the Zacas spec, a CAS loop scales to highly parallel
systems better than LR/SC.
2024-01-31 15:37:41 -08:00
Congcong Cai
5561beae29
[WebAssembly] avoid to enable explicit disabled feature (#80094) 2024-02-01 07:26:58 +08:00
Philip Reames
ff53d50742
[RISCV] Improve legalization of e8 m8 VL>256 shuffles (#79330)
If we can't produce a large enough index vector in i8, we may need to legalize
the shuffle (via scalarization - which in turn gets lowered into stack usage).
This change makes two related changes:
* Deferring legalization until we actually need to generate the vrgather
  instruction.  With the new recursive structure, this only happens when
  doing the fallback for one of the arms.
* Check the actual mask values for something outside of the representable
  range.

Both are covered by recently added tests.
2024-01-31 14:41:15 -08:00
Alex MacLean
5e3ae4c4af
[NVPTX] improve Boolean ISel (#80166)
Add TableGen patterns to convert more instructions to boolean
expressions:

- **mul -> and/or**: i1 multiply instructions currently cannot be
selected causing the compiler to crash. See
https://github.com/llvm/llvm-project/issues/57404
- **select -> and/or**: Converting selects to and/or can enable more
optimizations. `InstCombine` cannot do this as aggressively due to
poison semantics.
2024-01-31 14:37:27 -08:00
Usman Nadeem
1d1432356e
[AArch64][SVE2] Generate urshr rounding shift rights (#78374)
Add a new node `AArch64ISD::URSHR_I_PRED`.

`srl(add(X, 1 << (ShiftValue - 1)), ShiftValue)` is transformed to
`urshr`, or to `rshrnb` (as before) if the result it truncated.

`uzp1(rshrnb(uunpklo(X),C), rshrnb(uunpkhi(X), C))` is converted to
`urshr(X, C)` (tested by the wide_trunc tests).

Pattern matching code in `canLowerSRLToRoundingShiftForVT` is taken
from prior code in rshrnb. It returns true if the add has NUW or if the
number of bits used in the return value allow us to not care about the
overflow (tested by rshrnb test cases).
2024-01-31 14:03:58 -08:00
Zaara Syeda
a03a6e9964
[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530)
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.

---------

Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
2024-01-31 16:34:21 -05:00
David Green
d04ae1b15f
[AArch64] Use DAG->isAddLike in add_and_or_is_add (#79563)
This allows it to work with disjoint or's as well as computing the known
bits.
2024-01-31 16:49:23 +00:00
Rin Dobrescu
2907c63311
Revert "[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d))" (#80157)
Reverts llvm/llvm-project#79464 while figuring out why the tests are
failing.
2024-01-31 16:45:25 +00:00
David Green
5d7d89de31
[AArch64] Use add_and_or_is_add for CSINC (#79552)
Adds or add-like-or's of 1 can both be turned into csinc, which can help
fold more instructions into a csinc.
2024-01-31 15:48:31 +00:00
Sjoerd Meijer
8841846050
[AArch64] MI Scheduler LDP combine follow up (#79003)
This is a follow up of 75d820dcdd86, adding more opcodes to the combine
target hook enabling more LDP creation.

Patch co-authored by Cameron McInally.
2024-01-31 15:41:32 +00:00
Shimin Cui
1bab570e9b
Move the PowerPC/PPCMergeStringPool work to initializer (#77352)
Currently, the `PPCMergeStringPool` merges the global variable after the
`AsmPrinter` initializer adds the global variables to its symbol list.
This is to move the merging work of `PPCMergeStringPool` to its
initializer, just like what GlobalMerge does, to avoid adding merged
global variables to the `AsmPrinter` symbol lis.
2024-01-31 10:27:07 -05:00
Quentin Dian
b7738e275d
[MIRPrinter] Don't print space when there is no successor (#80143)
Extra space causes the checks generated by update_mir_test_checks to be
unavailable.

```
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
# RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s
---
name: foo
body: |
  ; CHECK-LABEL: name: foo
  ; CHECK: bb.0:
  ; CHECK-NEXT:   successors:
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: bb.1:
  ; CHECK-NEXT:   RET 0, $eax
  bb.0:
    successors:

  bb.1:
    RET 0, $eax
...
```

The failure log is as follows:

```
llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match
 ; CHECK-NEXT: {{ $}}
               ^
<stdin>:21:13: note: 'next' match was here
 successors:
            ^
<stdin>:21:13: note: previous match ended here
 successors:
```
2024-01-31 22:35:41 +08:00
Alfie Richards
de75e5079a
[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287)
This ensures the odd/even pseudo instructions are allocated to the same
register range.

This fixes #71763
2024-01-31 14:08:02 +00:00
Shengchen Kan
e3c9327bc4 [X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load
Broadcast of a single float should not be any slower than
loading 32B using vmovaps. So remat it can help reduce
register spill when there is big register pressure.
2024-01-31 20:55:56 +08:00
Rin Dobrescu
cf828aee24
[AArch64] Convert concat(uhadd(a,b), uhadd(c,d)) to uhadd(concat(a,c), concat(b,d)) (#79464)
We can convert concat(v4i16 uhadd(a,b), v4i16 uhadd(c,d)) to v8i16
uhadd(concat(a,c), concat(b,d)), which can lead to further
simplifications.
2024-01-31 12:52:12 +00:00
Simon Pilgrim
648eb7c141 [X86] divrem8_ext.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 12:06:48 +00:00
Simon Pilgrim
1d8c8f1169 [X86] cfguard - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 12:05:32 +00:00
Simon Pilgrim
824d073fb6 [X86] fold-vector-sext - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 12:01:02 +00:00
Simon Pilgrim
ed11f255a8 [X86] divide-by-constant.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 12:01:02 +00:00
Simon Pilgrim
e4af212f96 [X86] divrem.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 12:01:01 +00:00
Simon Pilgrim
a82ca1cd1b [X86] insertps-from-constantpool.ll - replace X32 check prefixes with X86 and expose address math
We try to only use X32 for gnux32 triple tests.

Use no_x86_scrub_mem_shuffle so the test shows updated shuffle intermediate and the +4 offset into the constant pool vector entry
2024-01-31 12:01:01 +00:00
Simon Pilgrim
929503ead3 [X86] v2f32.ll - replace X32 check prefixes with X86 (and add common CHECK prefix)
We try to only use X32 for gnux32 triple tests.
2024-01-31 11:04:20 +00:00
Simon Pilgrim
00a6817108 [X86] v4f32-immediate.ll - replace X32 check prefixes with X86
We try to only use X32 for gnux32 triple tests.
2024-01-31 11:04:19 +00:00
Simon Pilgrim
8d450b47ba [X86] mmx-arith.ll - replace X32 check prefixes with X86 + strip cfi noise
We try to only use X32 for gnux32 triple tests.
2024-01-31 11:04:19 +00:00