45278 Commits

Author SHA1 Message Date
Nicola Lancellotti
43fe14c056 [AArch64] Canonicalize ZERO_EXTEND to VSELECT
Differential Revision: https://reviews.llvm.org/D135596
2022-10-17 15:42:46 +01:00
Simon Pilgrim
efd0d66269 [AMDGPU] Add regression test cases reported on D136042 2022-10-17 14:54:27 +01:00
Simon Pilgrim
0aa9a7f8d9 [AMDGPU] Regenerate bfe-combine.ll and bfe-patterns.ll 2022-10-17 14:41:14 +01:00
Jay Foad
0c22f4f5fe [AMDGPU] Common up some generated checks in fnearbyint.ll
Also remove -mattr=-flat-for-global which is not needed for generated
checks.
2022-10-17 11:02:19 +01:00
Roman Lebedev
3c5a164994
[NFC][X86] Test commit, add test with bad mask vector legalization
Inspired by codegen of `@test`
from `llvm/test/Analysis/CostModel/X86/masked-interleaved-*-i16.ll`.
2022-10-16 22:22:10 +03:00
Simon Pilgrim
986ca95e06 [BPF] Add (failing) testcase for Issue #57872 2022-10-16 18:16:18 +01:00
Amara Emerson
13792ba417 [AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first.
Fixes https://github.com/llvm/llvm-project/issues/57181
2022-10-15 20:25:43 -07:00
Peter Rong
c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Simon Pilgrim
0b36d1ef1f [Mips] Regenerate unalignedload.ll 2022-10-15 18:29:54 +01:00
Simon Pilgrim
1901bd0404 [Mips] Regenerate return-struct.ll 2022-10-15 18:21:55 +01:00
Simon Pilgrim
f2c4204d8a [Mips] Regenerate load-store-left-right.ll 2022-10-15 18:21:54 +01:00
wanglei
506e936871 [LoongArch] Fix wrong VariantKind for MO_GOT_PC_{HI/LO} flags
Differential Revision: https://reviews.llvm.org/D135946
2022-10-15 17:45:08 +08:00
Kazushi (Jam) Marukawa
0278c9ceb6 [VE] Change the way to lower select
Change to use VEISD::CMOV in combineSelect for better optimization.
Support VEISD::CMOV in combineTRUNCATE also to optimize trancate.
Merge functions to handle condition codes to VE.h.  And add basic
CMOV patterns to VEInstrInfo.td.  Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D135878
2022-10-15 08:49:36 +09:00
Krzysztof Parzyszek
361a27c155 [Hexagon] Recognize idioms for fixed-point vector multiplication
Recognize Q.15*Q.15 and Q.31*Q.31, with and without rounding.
2022-10-14 15:22:25 -07:00
Philip Reames
d91b0d6816 [RISCV] Merge rv32 and rv64 fixed vector stepvector tests 2022-10-14 14:54:37 -07:00
Martin Storsjö
6eb205b257 Reapply [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-15 00:40:13 +03:00
Krzysztof Parzyszek
705e77abed [Hexagon] Lower funnel shifts for HVX
HVX v62+ has bidirectional shifts, which do not mask the shift amount to
the bit width. Instead, the shift amount is sign-extended from the log(BW)
bit value, and a negative value causes a shift in the other direction.
For the shift amount being -log(BW), this reversed shift will shift all
bits out, inserting 0s or sign bits depending on the type and direction.
2022-10-14 14:13:18 -07:00
Filipp Zhinkin
ef774bec63 [AArch64] Support SETCCCARRY lowering
Support SETCCCARRY lowering to SBCS instruction.

Related issue: https://github.com/llvm/llvm-project/issues/44629

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135302
2022-10-14 22:29:31 +03:00
Krzysztof Parzyszek
7f4ce3f1eb [Hexagon] Introduce PS_vsplat[ir][bhw] pseudo instructions
HVX v60 only has splats that take a 32-bit word as input, while v62+
has splats that take 8- or 16-bit value. This makes writing output
patterns that need to use a splat annoying, because the entire output
pattern needs to be replicated for various versions of HVX.
To avoid this, the patterns will always use the pseudos, and then the
pseudos will be handled using a post-ISel hook.
2022-10-14 12:03:13 -07:00
Chris Bieneman
e530a1188e [DX] Add pass to pretty-print DXIL metadata in asm
When DXC prints IR output it adds a bunch of IR comments in a header
that describe the DXIL metadata in a more human-readable format. This
pass will serve that purpose for LLVM by printing out ahead of the IR
printer.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D135802
2022-10-14 13:32:59 -05:00
Anshil Gandhi
94ac8f3a8c [BranchRelaxation] Fix test for duplicate branch instruction
This patch is a follow up for D134557, inserting a check
for a duplicate unconditional branch to fall through.

Differential Revision: https://reviews.llvm.org/D135975
2022-10-14 12:21:26 -06:00
Caroline Concatto
60e2aad109 [AArch64]Change printVectorList to print SVE vector range
This patch has the prefered disassembly changed for SVE vector list.
For instance, instead of printing this assembly:
  ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0]
it will print this:
  ld4d { z1.d-z4.d }, p0/z, [x0]

Differential Revision: https://reviews.llvm.org/D135952
2022-10-14 18:59:56 +01:00
Hassnaa Hamdi
2c72d90ecc [AArch64-SVE]: Force generating code compatible to streaming mode.
Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Differential Revision: https://reviews.llvm.org/D133433
2022-10-14 17:46:56 +00:00
chenglin.bi
c1909d7337 [DAGCombiner] Fix crash for the merge stores with different value type
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135954
2022-10-15 01:16:35 +08:00
Amy Kwan
22e4203df8 [PowerPC][NFC] Pre-commit case for lowering vector shuffles to xxsplti32dx (64 bit)
This patch adds a test case for lowering vector shuffles to xxsplti32dx in
preparation for D135024. The test case added in this patch only adds the
64-bit CHECKs, as the 32-bit CHECKs cannot be generated (in which D135024
aims to fix).
2022-10-14 10:15:34 -05:00
Sander de Smalen
02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
chenglin.bi
85e41fcaac [AArch64] Select to CCMN when the CCMP's second operator is negative constant
CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov.

Fix: #57034

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135939
2022-10-14 21:41:25 +08:00
Martin Storsjö
f309f095e7 Revert "[AArch64] Fix aligning the stack after calling __chkstk"
This reverts commit 50e0aced4521260af842dba73f1d8c50d36314ea.

This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
2022-10-14 11:55:59 +03:00
gonglingqin
e632bb6543 [LoongArch] Add codegen support for atomicrmw umin/umax operation on LA64
Furthermore, use `beqz $rd, .BB` instead of `beq $rd, $zero, .BB`.

Differential Revision: https://reviews.llvm.org/D135525
2022-10-14 15:24:43 +08:00
Leon Clark
6370bc2435 Add f16 nearbyint support.
Enable lowering of FNEARBYINT for f16 and extend existing tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135124
2022-10-14 08:05:24 +01:00
Matt Arsenault
99dff82118 AMDGPU: Fix failing test with expensive checks
Fixes failure after d383adec4d3914492e67267462e6f00fdd4934af
2022-10-13 23:34:20 -07:00
Anshil Gandhi
d383adec4d [BranchRelaxation] Fall through only if block has no unconditional branches
Prior to inserting an unconditional branch from X to its
fall through basic block, check if X has any terminators to
avoid inserting additional branches.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134557
2022-10-13 22:48:41 -06:00
chenglin.bi
07c5270043 [AArch64] add tests for ccmp with negative constant op1; NFC 2022-10-14 12:07:43 +08:00
Xiang1 Zhang
aad013de41 [InlineAsm][bugfix] Correct function addressing in inline asm
In Linux PIC model, there are 4 cases about value/label addressing:
Case 1: Function call or Label jmp inside the module.
Case 2: Data access (such as global variable, static variable) inside the module.
Case 3: Function call or Label jmp outside the module.
Case 4: Data access (such as global variable) outside the module.

Due to current llvm inline asm architecture designed to not "recognize" the asm
code, there are quite troubles for us to treat mem addressing differently for
same value/adress used in different instuctions.
For example, in pic model, call a func may in plt way or direclty pc-related,
but lea/mov a function adress may use got.

This patch fix/refine the case 1 and case 2 in inline asm.
Due to currently inline asm didn't support jmp the outsider lable, this patch
mainly focus on fix the function call addressing bugs in inline asm.

Reviewed By: Pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D133914
2022-10-14 09:47:26 +08:00
Nemanja Ivanovic
0d253bbd33 [PowerPC] Change CRNOT to a code gen single operand instruction
Inputs to crnor can come from operands with chains so
if it is being used simply to negate such an operand,
the repeated input cannot be CSE'd. This patch just
adds a code-gen only instruction for this that takes
a single input and duplicates it in the encoding of
the underlying crnor.

Differential revision: https://reviews.llvm.org/D133577
2022-10-13 20:09:44 -05:00
Michal Paszkowski
14ea4f5bf2 [SPIRV] Fix formatting of function tests
Differential Revision: https://reviews.llvm.org/D135624
2022-10-14 01:55:27 +02:00
Jakub Chlanda
8407fdbd69 [NVPTX] Support neg{.ftz} for f16 and f16x2
Differential Revision: https://reviews.llvm.org/D135428
2022-10-13 10:48:33 -07:00
Craig Topper
e68b0d5875 [RISCV] Match (select C, -1, X)->(or -C, X) during lowerSelect
Same with (select C, X, -1), (select C, 0, X), and (select C, X, 0).

There's a DAGCombine after we turn the select into select_cc, but
that may introduce a setcc that didn't previously exist. We could
add more DAGCombines to remove the extra setcc, but this seemed lower
effort.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135833
2022-10-13 09:06:12 -07:00
David Green
16e4e4ab87 [CodeGenPrep] Handle constants in ConvertPhiType
This is a simple addition to the convertPhiTypes in CodeGenPrepare to
consider and convert constants as it converts the phi type. Someone
fixed the bug in the motivating example, so the undef is now a constant
0. This does mean converting between integer and floating point
constants, which may have different materialization.

Differential Revision: https://reviews.llvm.org/D135561
2022-10-13 16:41:44 +01:00
David Green
1e80201f7f [AArch64] Add ConvertPhiType constant tests. NFC 2022-10-13 16:23:34 +01:00
Nemanja Ivanovic
a77a70fa3c [PowerPC] Stash GPR to VSR if emergency spill slot is not reachable
When removing frame indices on PowerPC, we need to scavenge
a GPR to materialize a large constant if the stack offset
for the spill/reload cannot be reached by a D-Form
instruction. However, in a perfect storm of conditions,
we may not have GPR's available to scavenge, thereby
requiring an emergency spill. If such an emergency
spill also needs to be spilled to a location with a
large offset, it would itself require register scavenging
thereby creating an infinite loop.

This patch detects when the scavenger cannot scavenge
a register and the spill/reload is to a location with
a large offset. It then stashes a GPR into a VSR so
that it can use the GPR to materialize the constant
(rather than scavenging a GPR).

Fixes: https://github.com/llvm/llvm-project/issues/52894

Differential revision: https://reviews.llvm.org/D124841
2022-10-13 09:06:37 -05:00
Simon Pilgrim
fa9c12ed96 [X86] Attempt to combine binary shuffles where both operands come from the same larger vector
Allows us to use combineX86ShuffleChainWithExtract to combine targetshuffle(low_subvector(x),high_subvector(x)) -> low_subvector(targetshuffle(x)) style patterns

This is currently very limited (it must have a v2i64/v2f64 result), but while triaging I noticed we might be able to extend this to allow more types for targets with suitable variable cross lane shuffle support.

Fixes #58339
2022-10-13 14:34:11 +01:00
WANG Xuerui
f017e92c1c [LoongArch] Add support for llvm.trap and llvm.debugtrap
Similar to D69390 for RISCV, use a guaranteed non-existing insn for
llvm.trap and the break insn for llvm.debugtrap.

Differential Revision: https://reviews.llvm.org/D134365
2022-10-13 19:27:47 +08:00
WANG Xuerui
4e2dfd3589 [LoongArch] Updates for the LoongArch ELF psABI v2.01 revision
The e_flags of existing object files are all 0x3 which happens to be
compatible. From this commit on, all LoongArch objects produced with
upstream LLVM will be of object file ABI v1, which is already supported
by binutils' master branch (to be released as 2.40), and is allowed by
the same binutils version to interlink with v0 objects so the existing
distributions have time to migrate.

Differential Revision: https://reviews.llvm.org/D134601
2022-10-13 19:12:26 +08:00
Sheng
62fc58a61d [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases
To achieve this, we need this observation:

`uzp1` is just a `xtn` that operates on two registers

For example, given the following register with type v2i64:

LSB_______MSB

x0 x1	x2 x3

Applying xtn on it we get:

x0	x2

This is equivalent to bitcast it to v4i32, and then applying uzp1 on it:

x0	x1	x2	x3
   |
  uzp1
   v
x0	x2	<value from other register>

We can transform xtn to uzp1 by this observation, and vice versa.

This observation only works on little endian target. Big endian target has
a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy
in the behavior of uzp1 between the little endian and big endian.

To illustrate, take the following for example:

LSB____________________MSB

x0	x1	x2	x3

On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it
grabs x3 and x1, which doesn't match what I saw on the document. But, since
I'm new to AArch64, take my word with a pinch of salt. This bevavior is
observed on gdb, maybe there's issue in the order of the value printed by it ?

Whatever the reason is, the execution result given by qemu just doesn't match.
So I disable this on big endian target temporarily until we find the crux.

Fixes #57502

Reviewed By: dmgreen, mingmingl

Co-authored-by: Mingming Liu <mingmingl@google.com>

Differential Revision: https://reviews.llvm.org/D133850
2022-10-13 19:08:33 +08:00
Simon Pilgrim
7055751115 [X86][AVX2] Add shuffle test case where we fail to merge vpunpcklqdq(vextracti128(x,0),vextracti128(x,1)) -> vpermq
These are likely to appear during truncation
2022-10-13 11:47:37 +01:00
Archibald Elliott
7d15212b8c [ARM] Support fp16/bf16 using w constraint
fp16 and bf16 values can be used in GCC's inline assembly using the "w"
constraint, which means "VFP floating-point registers d0-d31" - fp16 and
bf16 values are stored in S registers (which alias the D registers).

This change ensures that LLVM is compatible with GCC for programs that
use fp16 and the 'w' constraint.

Differential Revision: https://reviews.llvm.org/D135662
2022-10-13 10:32:06 +01:00
Simon Tatham
526ce9c929 Propagate tied operands when copying a MachineInstr.
MachineInstr's copy constructor works by calling the addOperand method
to add each operand of the old MachineInstr to the new one, one by
one. But addOperand deliberately avoids trying to replicate ties
between operands, on the grounds that the tie refers to operands by
index, and the indices aren't necessarily finalized yet.

This led to a code generation fault when the machine pipeliner cloned
an Arm conditional instruction, and lost the tie between the output
register and the input value to be used when the condition failed to
execute.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D135434
2022-10-13 09:40:35 +01:00
Leon Clark
98852a0f3d Precommit for SWDEV-353076: Add check directives to existing tests.
Add FileCheck directives to existing tests in preparation for new tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135788
2022-10-13 08:02:37 +01:00
Martin Storsjö
cbd8464595 [MC] [Win64EH] Check that ARM64 prologs and epilogs have the right matching number of instructions
This matches what was done for the ARM implementation (where getting
the instruction sizes right is even more tricky, and hence needed
tighter testing).

This will allow catching any future cases where prologs and epilogs
don't match the instructions within them.

Differential Revision: https://reviews.llvm.org/D131394
2022-10-13 09:47:39 +03:00