6033 Commits

Author SHA1 Message Date
chenglin.bi
9403a8bc37 [GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel
The miscompile case's G_ZEXT has a G_FREEZE source.  Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel.

Fix #58431

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D136433
2022-10-26 09:54:13 +08:00
chenglin.bi
e95c74b423 [AArch64] Add precommit test for bcmp; NFC 2022-10-25 17:23:03 +08:00
Cullen Rhodes
1e02a29e47 [AArch64][SVE] Use more flag-setting instructions
If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the
opcode so the PTEST becomes redundant. This patch extends this existing
optimization in AArch64::optimizePTestInstr to cover all flag-setting
opcodes.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136083
2022-10-25 09:02:21 +00:00
Cullen Rhodes
5621caeb82 [AArch64][SVE] NFC: extend tests for flag-setting predicate instructions
A follow on patch will extend existing

  PTEST(PG, OP(PG, ...)) -> OP_FLAG_SETTING(PG, ...)

optimization in AArch64InstrInfo::optimizePTestInstr to cover more of
the flag-setting instructions

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136161
2022-10-25 09:02:20 +00:00
Sander de Smalen
19b9e6204a [AArch64][SME] Fix chain for arm_locally_streaming functions.
The Chain wasn't set correctly in the DAG for functions marked
with aarch64_pstate_sm_body, which meant that SelectionDAG would
dead-code some of the CopyToReg's. This didn't show up in the
existing tests because all uses were in the same block, but when
adding some control-flow, suddenly things would break.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D136579
2022-10-25 08:14:51 +00:00
Ahmed Bougacha
718bb22c28 [AArch64][PAC] Select XPAC for ptrauth.strip intrinsic.
Differential Revision: https://reviews.llvm.org/D132385
2022-10-24 08:15:56 -07:00
Petar Avramovic
e6c778f861 GlobalISel: Artifact combine merge-like and unmerge into unmerge
Recognize when source could have been unmerged to pieces with DstTy
without having to split source to smaller elements
and then merge small elements into DstTy pieces.
This happens when vector was meant to be split to sub-vectors but there
was leftover. At this point artifact combiner have already dealt with
leftover and we can continue to use sub-vectors.

Differential Revision: https://reviews.llvm.org/D109241
2022-10-24 13:33:05 +02:00
Petar Avramovic
f1aa598046 GlobalISel: Artifact combine merge-like and unmerge into copy
Recognize copy that is represented as split of a source register to
elements that were reassembled to another register with the same type.

Differential Revision: https://reviews.llvm.org/D109240
2022-10-24 13:33:05 +02:00
Petar Avramovic
51b98db487 GlobalISel: Precommit for artifact combine patches
Differential Revision: https://reviews.llvm.org/D117655
2022-10-24 13:33:05 +02:00
Craig Topper
db25f51e37 Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.

The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but
passes locally. I'm really confused.
2022-10-22 22:50:43 -07:00
Craig Topper
e8b3ffa532 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-22 21:51:45 -07:00
chenglin.bi
ec4db1d0dc [AAArch64][Windows] Fix the crash when running ninja check-asan
The crash comes from mismatch between load count in epilogue and seh instruction count.
Still because of the pass AArch64LoadStoreOpt. It remove some load in the epilogue but haven't remove the corresponding seh instruction.
This patch don't optimize the load in the epilogue to fix the issue.

Fix: #58516

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D136430
2022-10-21 22:11:54 +08:00
Eli Friedman
a6ac968360 [Arm64EC] Refer to dllimport'ed functions correctly.
Arm64EC has two different ways to refer to dllimport'ed functions in an
object file. One is using the usual __imp_ prefix, the other is using an
Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function
is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while
__imp_ points to some linker-generated code that calls the exit thunk.
So __imp_aux_ is used to refer to the address in non-call contexts,
while __imp_ is used for calls to avoid the indirect call checker.

There's one twist to this, though: if an object refers to a symbol using
the __imp_aux_ prefix, the object file's symbol table must also contain
the symbol with the usual __imp_ prefix. The symbol doesn't actually
have to be used anywhere, it just has to exist; otherwise, the linker's
symbol lookup in x64 import libraries doesn't work correctly. Currently,
this is handled by emitting a .globl __imp_foo directive; we could try
to design some better way to handle this.

One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC
prefers to use a linker-synthesized stub to call dllimport'ed functions,
instead of branching directly. The linker stub appears to do the same
thing that inline code would do, so not sure if it's just a code-size
optimization, or if the synthesized stub can actually do something other
than just load from the import table in some circumstances.

Differential Revision: https://reviews.llvm.org/D136202
2022-10-20 15:08:56 -07:00
Eli Friedman
decb743e80 [AArch64] Fix scheduler crash in fusion code.
Make sure we don't call getReg() on the first operand of instruction
without knowing that operand is actually a register.

(This codepath isn't enabled for most CPUs; only triggers on certain
CPUs, like Cortex-X1.)

Differential Revision: https://reviews.llvm.org/D136296
2022-10-20 10:47:44 -07:00
zhongyunde
74c2d4f602 [AArch64][SelectionDAG] Lower multiplication by a constant to shl+add+shl+add
Change the costmodel to lower a = b * C where C = (1 + 2^m) * (1 + 2^n) to
      add   w8, w0, w0, lsl #m
      add   w0, w8, w8, lsl #n
Note: The latency can vary depending on the shirt amount

Reviewed By: efriedma, dmgreen
Differential Revision: https://reviews.llvm.org/D135441
2022-10-21 00:33:49 +08:00
Sander de Smalen
5e5baf917b [AArch64][SME] Remove get.pstatesm intrinsic.
This intrinsic can be removed in favour of using a call to
__arm_sme_state() directly and testing the LSB of X0.

In IR that would look like:

   %pstate = call aarch64_sme_preservemost_from_x2 {i64, i64} @__arm_sme_state()
   %pstate.x0 = extractvalue {i64, i64} %pstate, 0
   %pstate.sm = and i64 %pstate.x0, 1
2022-10-20 12:25:32 +00:00
Caroline Concatto
2ecbe8c38c [AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers
This patch adds the assembly/disassembly for the following instructions:

For INT:
    ADD(array results, multiple and single vector): Add replicated single
        vector to multi-vector with ZA array vector results.
    SUB(array results, multiple and single vector): Subtract replicated single
        vector from multi-vector with ZA array vector results.
For FP:
    FMLA (multiple and single vector): Multi-vector floating-point fused
          multiply-add by vector.
    FMLS (multiple and single vector): Multi-vector floating-point
          multiply-subtract long by vector.
The reference can be found here:

https://developer.arm.com/documentation/ddi0602/2022-09

The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits
(MatrixOp32 and MatrixOp64)

Depends on: D135448

Depends on:  D135952

Differential Revision: https://reviews.llvm.org/D135455
2022-10-19 17:49:48 +01:00
Caroline Concatto
579ca5e7e1 [AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64
The names in developer.arm for these SME features are:
  HaveSMEI16I64 and HaveSMEF64F64
so the new flag names are consistent with the documentation page

Reviewed By: sdesmalen, c-rhodes

Differential Revision: https://reviews.llvm.org/D135974
2022-10-19 10:56:46 +01:00
Mingming Liu
34d18fd241 [AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)'
Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL)

After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes.

https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves.

Differential Revision: https://reviews.llvm.org/D135852
2022-10-18 09:07:14 -07:00
chenglin.bi
327c45da26 [AArch64] add test case for pattern ((X >> C) - Y) + Z; NFC 2022-10-18 19:18:23 +08:00
Mingming Liu
db0286a096 [AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).
Before this patch (and D135844)

- Given DAG node shl(op, N), isBitfieldPositioningOp uses (optionally shifted [1] ) op as the Src (least significant bits of Src are inserted into DstLSB of Dst node).

After this patch

- If op is and(val, mask), isBitfieldPositioningOp tries to see through and and find if val is a simpler source than op.

It helps in a similar (probably symmetric) way how isSeveralBitsExtractOpFromShr [2] optimizes isBitfieldExtractOpFromShr

Existing test cases are improved without regressions.

[1] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2546)
[2] cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2057)

Differential Revision: https://reviews.llvm.org/D135850
2022-10-17 09:01:29 -07:00
Nicola Lancellotti
43fe14c056 [AArch64] Canonicalize ZERO_EXTEND to VSELECT
Differential Revision: https://reviews.llvm.org/D135596
2022-10-17 15:42:46 +01:00
Amara Emerson
13792ba417 [AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first.
Fixes https://github.com/llvm/llvm-project/issues/57181
2022-10-15 20:25:43 -07:00
Peter Rong
c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Martin Storsjö
6eb205b257 Reapply [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

The initial version of this produced invalid code for small
functions with no local stack allocations, if those functions
were marked with the "stackrealign" attribute. If building
with -mstack-alignment=16 (which otherwise mostly would be a
no-op), this attribute is added on the main function.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-15 00:40:13 +03:00
Filipp Zhinkin
ef774bec63 [AArch64] Support SETCCCARRY lowering
Support SETCCCARRY lowering to SBCS instruction.

Related issue: https://github.com/llvm/llvm-project/issues/44629

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135302
2022-10-14 22:29:31 +03:00
Caroline Concatto
60e2aad109 [AArch64]Change printVectorList to print SVE vector range
This patch has the prefered disassembly changed for SVE vector list.
For instance, instead of printing this assembly:
  ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0]
it will print this:
  ld4d { z1.d-z4.d }, p0/z, [x0]

Differential Revision: https://reviews.llvm.org/D135952
2022-10-14 18:59:56 +01:00
Hassnaa Hamdi
2c72d90ecc [AArch64-SVE]: Force generating code compatible to streaming mode.
Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Differential Revision: https://reviews.llvm.org/D133433
2022-10-14 17:46:56 +00:00
chenglin.bi
c1909d7337 [DAGCombiner] Fix crash for the merge stores with different value type
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D135954
2022-10-15 01:16:35 +08:00
Sander de Smalen
02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
chenglin.bi
85e41fcaac [AArch64] Select to CCMN when the CCMP's second operator is negative constant
CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov.

Fix: #57034

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135939
2022-10-14 21:41:25 +08:00
Martin Storsjö
f309f095e7 Revert "[AArch64] Fix aligning the stack after calling __chkstk"
This reverts commit 50e0aced4521260af842dba73f1d8c50d36314ea.

This could accidentally start producing invalid code in some
cases (in particular, if compiling with -mstack-alignment=16, which
one could expect to be a no-op for a target where the stack always
is aligned to 16 bytes anyway).
2022-10-14 11:55:59 +03:00
chenglin.bi
07c5270043 [AArch64] add tests for ccmp with negative constant op1; NFC 2022-10-14 12:07:43 +08:00
David Green
16e4e4ab87 [CodeGenPrep] Handle constants in ConvertPhiType
This is a simple addition to the convertPhiTypes in CodeGenPrepare to
consider and convert constants as it converts the phi type. Someone
fixed the bug in the motivating example, so the undef is now a constant
0. This does mean converting between integer and floating point
constants, which may have different materialization.

Differential Revision: https://reviews.llvm.org/D135561
2022-10-13 16:41:44 +01:00
David Green
1e80201f7f [AArch64] Add ConvertPhiType constant tests. NFC 2022-10-13 16:23:34 +01:00
Sheng
62fc58a61d [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases
To achieve this, we need this observation:

`uzp1` is just a `xtn` that operates on two registers

For example, given the following register with type v2i64:

LSB_______MSB

x0 x1	x2 x3

Applying xtn on it we get:

x0	x2

This is equivalent to bitcast it to v4i32, and then applying uzp1 on it:

x0	x1	x2	x3
   |
  uzp1
   v
x0	x2	<value from other register>

We can transform xtn to uzp1 by this observation, and vice versa.

This observation only works on little endian target. Big endian target has
a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy
in the behavior of uzp1 between the little endian and big endian.

To illustrate, take the following for example:

LSB____________________MSB

x0	x1	x2	x3

On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it
grabs x3 and x1, which doesn't match what I saw on the document. But, since
I'm new to AArch64, take my word with a pinch of salt. This bevavior is
observed on gdb, maybe there's issue in the order of the value printed by it ?

Whatever the reason is, the execution result given by qemu just doesn't match.
So I disable this on big endian target temporarily until we find the crux.

Fixes #57502

Reviewed By: dmgreen, mingmingl

Co-authored-by: Mingming Liu <mingmingl@google.com>

Differential Revision: https://reviews.llvm.org/D133850
2022-10-13 19:08:33 +08:00
Martin Storsjö
cbd8464595 [MC] [Win64EH] Check that ARM64 prologs and epilogs have the right matching number of instructions
This matches what was done for the ARM implementation (where getting
the instruction sizes right is even more tricky, and hence needed
tighter testing).

This will allow catching any future cases where prologs and epilogs
don't match the instructions within them.

Differential Revision: https://reviews.llvm.org/D131394
2022-10-13 09:47:39 +03:00
Martin Storsjö
50e0aced45 [AArch64] Fix aligning the stack after calling __chkstk
Whenever a call to __chkstk was made, the frame lowering previously
omitted the aligning (as NumBytes was reset to zero before doing
alignment).

This fixes https://github.com/llvm/llvm-project/issues/56182.

Differential Revision: https://reviews.llvm.org/D135687
2022-10-13 09:47:38 +03:00
Martin Storsjö
bd3fa31887 [AArch64] Generate SEH info for PAC instructions
Without this, unwinding through functions that does use PAC
would fail, if PAC actually was active.

Differential Revision: https://reviews.llvm.org/D135103
2022-10-12 22:21:03 +03:00
Mingming Liu
0849f0a5a3 [AArch64] Pre-commit test case to show sub-optimal codegen for Github issue #57502
Pre-commit test cases to show cases when UZP1 (TRUNC, TRUNC) could be
combined into TRUNC (UZP1) (with some proper bit conversions in the middle) to generate more efficient code.

Differential Revision: https://reviews.llvm.org/D133280
2022-10-12 10:29:09 -07:00
David Green
1e723b7ab3 Revert "[AArch64] Add support for 128-bit non temporal loads."
This reverts commit 661403b85c219a83baa37335a870d4d93dc4b1c3 as the
custom lowering of loads prevents expanding unaligned loads with
strict-align.
2022-10-12 11:11:32 +01:00
Martin Storsjö
a07787c9a5 [AArch64] Exclude instructions after setting the FP from SEH prologues
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.

This allows reverting the functional parts of
2f7fbf837625267193351cc334e506a3a9161958 (but fixing inconsistent
duplicate setting of HasWinCFI).

Differential Revision: https://reviews.llvm.org/D135686
2022-10-12 12:36:21 +03:00
Cullen Rhodes
a17fcb2230 [AArch64][SVE] Fix BRKNS bug in optimizePTestInstr
The BRKNS instruction is unlike the other instructions that set flags
since it has an all active implicit predicate, so the existing

  PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B)

in AArch64InstrInfo::optimizePTestInstr is incorrect, however

  PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B)

is correct.

Spotted by @paulwalker-arm in D134946.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135655
2022-10-12 08:34:41 +00:00
Cullen Rhodes
495d9e1f3f [AArch64] NFC: Auto-generate llvm/test/CodeGen/AArch64/sve-ptest-removal-brk.ll 2022-10-12 08:34:41 +00:00
chenglin.bi
41f5bbe18b [AArch64][Windows] Check sret attribute also for inreg attribute
Fix the issue: #57684

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135512
2022-10-12 09:58:50 +08:00
Craig Topper
ac9209751a Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))"
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.

Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
2022-10-11 16:30:40 -07:00
Craig Topper
0148df8157 [DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.

This pattern shows up when type legalizing wide multiplies involving
a sign extended value.

Fixes PR57549.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D133399
2022-10-11 16:20:55 -07:00
Jessica Paquette
036a13065b [GlobalISel] Combine (X op Y) == X --> Y == 0
This matches patterns of the form

```
(X op Y) == X
```

And transforms them to

```
Y == 0
```

where appropriate.

Example: https://godbolt.org/z/hfW811c7W

Differential Revision: https://reviews.llvm.org/D135380
2022-10-11 09:52:48 -07:00
Weining Lu
42b70793a1 Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC"
Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html

k: A memory operand whose address is formed by a base register and
(optionally scaled) index register.

m: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as st.w and ld.w.

ZB: An address that is held in a general-purpose register. The offset
is zero.

ZC: A memory operand whose address is formed by a base register and
offset that is suitable for use in instructions with the same
addressing mode as ll.w and sc.w.

Note:
The INLINEASM SDNode flags in below tests are updated because the new
introduced enum `Constraint_k` is added before `Constraint_m`.
  llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll
  llvm/test/CodeGen/X86/callbr-asm-kill.mir

This patch passes `ninja check-all` on a X86 machine with all official
targets and the LoongArch target enabled.

Differential Revision: https://reviews.llvm.org/D134638
2022-10-11 19:51:48 +08:00
Martin Storsjö
018ac7847b [AArch64] Add SEH_Nop opcodes for BTI hints
These are harmless for the unwinder - the unwinder doesn't need to
handle them for being able to unwind correctly.

Only add the opcodes when the branch target is in a SEH prologue;
for jumptables e.g. within a function, we shouldn't add any SEH
opcodes.

Differential Revision: https://reviews.llvm.org/D135277
2022-10-11 14:32:01 +03:00