52796 Commits

Author SHA1 Message Date
Shengchen Kan
60dbb2cec1 [X86][test] Update CHECK prefixes in CodeGen/X86/vector-interleaved-store-*.ll to suppress warnings
Suppress warnings like

WARNING: Prefix AVX had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX1 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX2 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX2-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX512 had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX512F had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX512F-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX512-FAST had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
WARNING: Prefix AVX512DQ-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-store-i16-stride-7.ll
2024-01-29 00:11:17 +08:00
David Green
f297d0bc6d
[AArch64][GlobalISel] More FCmp legalization. (#78734)
This fills out the fcmp handling to be more like the other instructions,
adding better support for fp16 and some larger vectors.

Select of f16 values is still not handled optimally in places as the
select is only legal for s32 values, not s16. This would be correct for
integer but not necessarily for fp. It is as if we need to do
legalization -> regbankselect -> extra legaliation -> selection.
2024-01-28 15:42:36 +00:00
Shengchen Kan
5abbb7b5d0 [X86][test] Update CHECK prefixes in CodeGen/X86/vector-interleaved-load-*.ll to suppress warnings
Suppress warnings like

WARNING: Prefix AVX had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX1 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX2 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX2-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX512 had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX512F had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX512F-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX512-FAST had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
WARNING: Prefix AVX512DQ-ONLY had conflicting output from different RUN lines for all functions in test vector-interleaved-load-i16-stride-7.ll
2024-01-28 14:41:59 +08:00
Chia
3855757f98
[RISCV][ISel] Remove redundant vmerge for the vwadd. (#78403)
This patch is aiming at resolving the below missed-optimization case. 

### Code
```
define <8 x i64> @vwadd_mask_v8i32(<8 x i32> %x, <8 x i64> %y) {
    %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer
    %sa = sext <8 x i32> %a to <8 x i64>
    %ret = add <8 x i64> %sa, %y
    ret <8 x i64> %ret
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/cd1bKTrx6)
```
vwadd_mask_v8i32:
        li      a0, 42
        vsetivli        zero, 8, e32, m2, ta, ma
        vmslt.vx        v0, v8, a0
        vmv.v.i v10, 0
        vmerge.vvm      v16, v10, v8, v0
        vwadd.wv        v8, v12, v16
        ret
```

### After this patch
```
vwadd_mask_v8i32:
        li a0, 42
        vsetivli zero, 8, e32, m2, ta, ma
        vmslt.vx v0, v8, a0
        vsetvli zero, zero, e32, m2, tu, mu
        vwadd.wv v12, v12, v8, v0.t
        vmv4r.v v8, v12
        ret
```
This pattern could be found in a reduction with a widening destination

Specifically, we first do a fold like `(vwadd.wv y, (vmerge cond, x, 0))
-> (vwadd.wv y, x, y, cond)`, then do pattern matching on it.
2024-01-27 20:03:32 +09:00
Evgenii Kudriashov
cfd91199ca
[X86] Skip unused VRegs traverse (#78229)
Almost all loops with getNumVirtRegs skip unused registers by means
of reg_nodbg_empty or empty live interval. Except for these two cases
that are revealed by GlobalISel since it can skip RegClass assignment
for unused registers.

Closes #64452, closes #71926
2024-01-26 23:57:14 +01:00
Alex MacLean
1d5820aafe
[NVPTX] improve identifier renaming for PTX (#79459)
Update `NVPTXAssignValidGlobalNames` to convert all characters which are
illegal in PTX identifiers to `_$_`. ([PTX ISA: 4.4
Identifiers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#identifiers)).
2024-01-26 13:49:00 -08:00
Nikita Popov
07a1925b8b Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498)"
This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b.

Introduces a major compile-time regression.
2024-01-26 22:33:17 +01:00
Oskar Wirga
59bf60519f
Refactor recomputeLiveIns to operate on whole CFG (#79498)
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.

This PR fixes that by simply recomputing the liveins for the entire CFG
until convergence is achieved. This makes it harder to introduce subtle
bugs which alter liveness.
2024-01-26 11:25:36 -08:00
Adhemerval Zanella
a58c62fa82
[X86] Do not end 'note.gnu.property' section with -fcf-protection (#79360)
The glibc now adds the required minimum ISA level for libc-nonshared.a
(linked on all programs) and this is done with an inline asm along with
.note.gnu.property and .pushsection/.popsection. However, the x86
backend always ends the 'note.gnu.property' section when building with
-fcf-protection, leading to assert failure:

llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void
llvm::MCStreamer::switchSection(llvm::MCSection*, const llvm::MCExpr*):
Assertion `!Section->hasEnded() && "Section already ended"' failed.

[1]
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD
2024-01-26 10:33:47 -08:00
David Green
7f518ee9ea
[DAG] Add a one-use check to concat -> scalar_to_vector fold. (#79510)
Without this we can end up with multiple copies from gpr->fpr.
2024-01-26 18:17:17 +00:00
Amy Kwan
d5fe1bd081
[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252)
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.

This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.

Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
2024-01-26 12:39:25 -05:00
dyung
45f883ed06
Change check for embedded llvm version number to a regex to make test more flexible. (#79528)
This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.
2024-01-26 09:36:20 -08:00
Nemanja Ivanovic
67c1c1dbb6
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.

I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.

---------

Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2024-01-26 11:24:50 -05:00
Krzysztof Drewniak
63fe80fb18
[SeperateConstOffsetFromGEP] Handle or disjoint flags (#76997)
This commit extends separate-const-offset-from-gep to look at the
newly-added `disjoint` flag on `or` instructions so as to preserve
additional opportunities for optimization.

The tests were pre-committed in #76972.
2024-01-26 09:56:06 -06:00
Evgenii Kudriashov
a437347562
[X86][GlobalISel] Remove G_OR/G_AND/G_XOR test duplication (NFC) (#79088) 2024-01-26 16:48:51 +01:00
Simon Pilgrim
1f930cf894 [X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2) (REAPPLIED)
Reapply b9483d30a7d7a0650a0e83c75fcb9ab4932f475a with fix (typo - wasn't ensuring icmp vs zero)

Fixes #78888
2024-01-26 15:13:59 +00:00
Shengchen Kan
d9245e8b47 [X86][ISEL] Add NDD entries in X86ISelDAGToDAG.cpp 2024-01-26 23:02:53 +08:00
Shimin Cui
e278c67096
Add support to meger strings used by metadata (#77364)
Currently if the merged string is used by metadata, its metadata uses
are not replaced if the string is merged. This is to add code support
for the metadata use replacement.
2024-01-26 09:22:37 -05:00
Shengchen Kan
035f33bf41 [X86][CodeGen] Add NDD entries for X86InstrInfo::foldImmediate 2024-01-26 22:11:57 +08:00
Luke Lau
5cf9f2cd98 [RISCV] Fix M1 shuffle on wrong SrcVec in lowerShuffleViaVRegSplitting
This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.
2024-01-26 20:25:05 +07:00
Luke Lau
d407e6ca61 [RISCV] Add test to showcase miscompile from #79072 2024-01-26 20:25:05 +07:00
Diana Picus
46dd8acf36 [AMDGPU] Fix typos. NFC 2024-01-26 12:04:58 +01:00
Shengchen Kan
821dee9852 [X86][CodeGen] Add NDD entries for isAssociativeAndCommutative 2024-01-26 18:39:52 +08:00
Shengchen Kan
14a027b2b7
[X86][CodeGen] Support flags copy lowering for NDD ADC/SBB/RCL/RCR (#79280) 2024-01-26 16:49:44 +08:00
David Green
f0012dcce4 [AArch64] Add a couple more csinc tests with disjoint ors. NFC 2024-01-26 08:30:35 +00:00
XinWang10
02d56801ee
[X86] Support APX promoted RAO-INT and MOVBE instructions (#77431)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted RAO-INT and MOVBE instructions in EVEX
space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:33:45 +08:00
XinWang10
6d0080b5de
[X86] Support promoted ENQCMD, KEYLOCKER and USERMSR (#77293)
R16-R31 was added into GPRs in
https://github.com/llvm/llvm-project/pull/70958,
This patch supports the promoted ENQCMD, KEYLOCKER and USER-MSR
instructions in EVEX space.

RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-01-26 14:24:43 +08:00
Brandon Wu
fb94c6491a
[RISCV][SiFive] Reduce intrinsics of SiFive VCIX extension (#79407)
This patch models LMUL and SEW as inputs in sf_vc_x_se and sf_vc_i_se,
it reduces 42 intrinsics in the lookup table.
2024-01-26 11:15:53 +08:00
Michael Maitland
594b92a7b9
[RISCV] Add Tune to DontSinkSplatOperands (#79199)
A CPU may prefer to not sink splat operands, one reason being that it
could require a S2V transfer buffer to move scalars into buffers.
2024-01-25 14:44:36 -05:00
Florian Hahn
eb678d8993
[AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b. (#78637)
Improve codegen for (trunc X to <3 x i8>) by converting it to a sequence
of 3 ST1.b, but first converting the truncate operand to either v8i8 or
v16i8, extracting the lanes for the truncate results and storing them.

At the moment, there are almost no cases in which such vector operations
will be generated automatically. The motivating case is non-power-of-2
SLP vectorization: https://github.com/llvm/llvm-project/pull/77790

PR: https://github.com/llvm/llvm-project/pull/78637
2024-01-25 18:28:44 +00:00
David Green
30279dcf51 [AArch64] Add a test from #79100, showing extra unnecessary movs. NFC 2024-01-25 18:15:36 +00:00
Philip Reames
5aa5a2f1b7
[RISCV] Disable exact VLEN splitting for bitrotate shuffles (#79468)
If we have a bitrotate shuffle, this is also by definition a vreg
splitable shuffle when exact VLEN is known. However, there's no profit
to be had from splitting the wider bitrotate lowering into individual m1
pieces. We'd rather leave it the higher lmul to reduce code size.

This is a general problem for any linear-in-LMUL shuffle expansions when
the vreg splitting still has to do linear work per piece. On first
reflection it seems like element rotation might have the same
interaction, but in that case, splitting can be done via a set of whole
register moves (which may get folded into the consumer depending) which
at least as good as a pair of slideup/slidedown. I think that bitrotate
is the only shuffle expansion we have that actually needs handled here.
2024-01-25 10:06:14 -08:00
Douglas Yung
b9483d30a7 Revert "[X86] Fold not(pcmpeq(and(X,CstPow2),0)) -> pcmpeq(and(X,CstPow2),CstPow2)"
This reverts commit 72f10f7eb536da58cb79e13974895cd97d4e1a5f.

This change was causing a miscompile on an internal test and is being reverted at the author's request until it can be fixed.
2024-01-25 09:40:16 -08:00
Jay Foad
c5d59fe1b2
[AMDGPU] Disable V_MAD_U64_U32/V_MAD_I64_I32 workaround for GFX11.5 (#79460)
The hardware bug only affects GFX11.0.x.
2024-01-25 16:28:49 +00:00
Wang Pengcheng
1a14c446dd
[RISCV][MC] Add experimental support of Zaamo and Zalrsc
`A` extension has been split into two parts: Zaamo (Atomic Memory
Operations) and Zalrsc (Load-Reserved/Store-Conditional). See also
https://github.com/riscv/riscv-zaamo-zalrsc.

This patch adds the MC support.

Reviewers: dtcxzyw, topperc, kito-cheng

Reviewed By: topperc

Pull Request: https://github.com/llvm/llvm-project/pull/78970
2024-01-25 17:03:25 +08:00
David Green
2c49586e1b
[ARM] Fix MVEFloatOps check on creating VCVTN (#79291)
In the past PerformSplittingToNarrowingStores handled both int and float
ops, but since the introduction of MVETRUNC now only operates on float
operations, creating VCVTN nodes. It should be guarded by hasMVEFloatOps
to prevent a failure to select.
2024-01-25 08:12:51 +00:00
paperchalice
e390c229a4
[Pass] Add hyphen to some pass names (#74287)
Here is the list of the renamed passes:
- `callbrprepare` -> `callbr-prepare`
- `dwarfehprepare` -> `dwarf-eh-prepare`
- `flattencfg` -> `flatten-cfg`
- `loweratomic` -> `lower-atomic`
- `lowerinvoke` -> `lower-invoke`
- `lowerswitch` -> `lower-switch`
- `winehprepare` -> `win-eh-prepare`
- `targetir` -> `target-ir`
- `targetlibinfo` -> `target-lib-info`

Legacy passes are not affected.
2024-01-25 16:05:54 +08:00
Jay Foad
45d2d7757f
[AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)
This is only valid on targets with architected SGPRs.
2024-01-25 07:48:06 +00:00
Yeting Kuo
df08350dcf
[RISCV] Implement foward inserting save/restore FRM instructions. (#77744)
Previously, RISCVInsertReadWriteCSR inserted an FRM swap for any value
other than 7 and restored the original value right after the vector
instruction. This is inefficient if multiple vector instructions use the
same rounding mode if the next vector instruction uses a different
explicit rounding mode.

This patch implements a local optimization to solve the above problem.
We assume the starting rounding mode of the basic block is "dynamic."
When iterating through a basic block and encountering an instruction
whose rounding mode is not the same as the current rounding mode, we
change the current rounding mode and save the current rounding mode if
needed. And we may need to restore FRM when encountering function call,
inline asm and some uses of FRM.

The advanced version of this is to perform cross basic block analysis
for the starting rounding mode of each basic block.
2024-01-25 14:41:52 +08:00
Craig Topper
5446902cf2 [RISCV] Add IsSignExtendingOpW to amocas.w. (#79351) 2024-01-24 20:15:41 -08:00
Craig Topper
65e0dc68f5 [RISCV] Add test cases showing missed opportunity to remove sext.w after amocas.w. NFC 2024-01-24 20:15:33 -08:00
Philip Reames
28db4017b0 [RISCV] Add test coverage for bad interaction of exact vlen and rotate shuffles 2024-01-24 18:00:41 -08:00
Philip Reames
795090739c [RISCV] Fix a bug accidentally introduced in e9311f9
If we're lowering an e8 m8 shuffle and we have an index value greater than
255, we have no available space to generate an e16 index vector.  The
code had originally handled this correctly, but in a recent refactoring
I had moved the single source code above the check, and thus broke the
single source by accident.

I have a change on review to rework this (https://github.com/llvm/llvm-project/pull/79330), but for now, go with the most obvious fix.
2024-01-24 17:10:59 -08:00
Philip Reames
7386aa02ef [RISCV] Add test coverage for shuffle index > i8 cornercase
Triggered by discussion on https://github.com/llvm/llvm-project/pull/79330.  In the process of writing this, realized one of my recent refactorings appears to have broken the legalization for the single source case here.  Fix to follow in separate patch.
2024-01-24 17:03:29 -08:00
Michael Maitland
3967510032
[RISCV][GISel] First mask argument placed in v0 according to RISCV Ve… (#79343)
…ctor CC.
2024-01-24 16:03:38 -05:00
Jonas Paulsson
84dcf3d35b
[SystemZ] Require D12 for i128 accesses in isLegalAddressingMode() (#79221)
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
2024-01-24 20:16:05 +01:00
Alex MacLean
3b8539c9dc
[NVPTX] use incomplete aggregate initializers (#79062)
The PTX ISA specifies that initializers may be incomplete ([5.4.4.
Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers))
> As in C, array initializers may be incomplete, i.e., the number of
initializer elements may be less than the extent of the corresponding
array dimension, with remaining array locations initialized to the
default value for the specified array type.

Emitting initializers in this form is preferable because it reduces the
size of the PTX, in some cases significantly, and can improve compile
time of ptxas as a result.
2024-01-24 09:24:28 -08:00
Philip Reames
396b6bbc5e
[RISCV] Recurse on second operand of two operand shuffles (#79197)
This builds on bdc41106ee48dce59c500c9a3957af947f30c8c3.

This change completes the migration to a recursive shuffle lowering
strategy where when we encounter an unknown two argument shuffle, we
lower each operand as a single source permute, and then use a vselect
(i.e. a vmerge) to combine the results. This relies for code quality on
the post-isel combine which will aggressively fold that vmerge back into
the materialization of the second operand if possible.

Note: The change includes only the most immediately obvious of the
stylistic cleanup. There's a bunch of code movement that this enables
that I'll do as a separate patch as rolling it into this creates an
unreadable diff.
2024-01-24 08:29:28 -08:00
quic-asaravan
dc5b4daae7
[HEXAGON] Inlining Division (#79021)
This patch inlines float division function calls for hexagon.

Co-authored-by: Awanish Pandey <awanpand@codeaurora.org>
2024-01-24 09:30:33 -06:00
Jay Foad
70fc970378
[AMDGPU] Move architected SGPR implementation into isel (#79120) 2024-01-24 15:06:20 +00:00