47896 Commits

Author SHA1 Message Date
Craig Topper
13fe673301 [RISCV] Move NTLH hint emission into RISCVAsmPrinter.cpp.
Rather than having a separate pass to add the hint instructions,
emit them directly into the streamer during asm printing.

Reviewed By: BeMg, kito-cheng

Differential Revision: https://reviews.llvm.org/D149511
2023-05-01 12:05:18 -07:00
Craig Topper
e56c6f3a8c [RISCV] Prevent lowerVectorStrictFSetcc from creatin an ISD::AND with identical operands.
This AND immediately gets legalized to RISCVISD::VMAND_VL and we don't
yet have DAG combine to optimize that away. So this is a quick fix to
improve generated code.
2023-04-29 21:42:45 -07:00
Ian Douglas Scott
34b37c00ab [M68k] Add instruction selection support for zext with PCD addressing
Instruction selection was failing when trying to zero extend a value
loaded from a PC-relative address. This adds support for zero extension
using the "program counter indirect with displacement" addressing mode.
It also adds a test with code that was previously failing to compile.

This fixes a compile error in Rust's libcore.

Differential Revision: https://reviews.llvm.org/D149034
2023-04-29 16:27:16 -07:00
David Green
f1961153c2 [ARM] Add predicated shift patterns
This uses the patterns defined in MVE_TwoOpPattern to add predicated patterns
for vshls/u instructions.

Differnetial Revision: https://reviews.llvm.org/D149366
2023-04-29 20:32:54 +01:00
Craig Topper
df017ba9d3 [TargetLowering] Don't use ISD::SELECT_CC in expandFP_TO_INT_SAT.
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.

Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D149481
2023-04-29 10:23:08 -07:00
Joseph Huber
a1da746157 [AMDGPU] Place global constructors in .init_array and .fini_array
For the GPU, we emit external kernels that call the initializers and
constructors, however if we had a persistent kernel like in the `_start`
kernel for the `libc` project, we could initialize the standard way of
calling constructors. This patch adds new global variables containing
pointers to the constructors to be called. If these are placed in the
`.init_array` and `.fini_array` sections, then the backend will handle
them specially. The linker will then provide the `__init_array_` and
`__fini_array_` sections to traverse them. An implementation would look
like this.

```
extern uintptr_t __init_array_start[];
extern uintptr_t __init_array_end[];
extern uintptr_t __fini_array_start[];
extern uintptr_t __fini_array_end[];

using InitCallback = void(int, char **, char **);
using FiniCallback = void(void);

extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
_start(int argc, char **argv, char **envp) {
  uint64_t init_array_size = __init_array_end - __init_array_start;
  for (uint64_t i = 0; i < init_array_size; ++i)
    reinterpret_cast<InitCallback *>(__init_array_start[i])(argc, argv, env);
  uint64_t fini_array_size = __fini_array_end - __fini_array_start;
  for (uint64_t i = 0; i < fini_array_size; ++i)
    reinterpret_cast<FiniCallback *>(__fini_array_start[i])();
}
```

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D149340
2023-04-29 08:40:19 -05:00
Matt Arsenault
bc37be1855 LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
2023-04-29 08:44:59 -04:00
Luo, Yuanke
40222ddcf8 [X86] Fix the vnni machine combine issue.
The previous patch (D148980) didn't set the InstrIdxForVirtReg correctly
in genAlternativeDpCodeSequence(). It causes vnni lit test failure when
LLVM_ENABLE_EXPENSIVE_CHECKS is on.
2023-04-29 13:51:08 +08:00
Craig Topper
578413751c [RISCV] Add a DAG combine to fold (add (xor (setcc X, Y), 1) -1)->(neg (setcc X, Y)). 2023-04-28 16:52:55 -07:00
Philip Reames
d636bcb6ae [RISCV] Introduce unaligned-vector-mem feature
This allows us to model and thus test transforms which are legal only when a vector load with less than element alignment are supported. This was originally part of D126085, but was split out as we didn't have a good example of such a transform. As can be seen in the test diffs, we have the recently added concat_vector(loads) -> strided_load transform (from D147713) which now benefits from the unaligned support.

While making this change, I realized that we actually *do* support unaligned vector loads and stores of all types via conversion to i8 element type. For contiguous loads and stores without masking, we actually already implement this in the backend - though we don't tell the optimizer that. For indexed, lowering to i8 requires complicated addressing. For indexed and segmented, we'd have to use indexed. All around, doesn't seem worthwhile pursuing, but makes for an interesting observation.

Differential Revision: https://reviews.llvm.org/D149375
2023-04-28 08:28:08 -07:00
David Green
d321f3aa64 [ARM] Enable shouldFoldSelectWithIdentityConstant for MVE
We already have tablegen patterns for a lot of these, but performing the
combine earlier in DAG can help in a few extra cases.

Differential Revision: https://reviews.llvm.org/D149269
2023-04-28 14:57:51 +01:00
Jay Foad
56af0e913c [EarlyCSE] Do not CSE convergent calls in different basic blocks
"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.

In code like this, if we allow CSE to combine the two calls:

  x = convergent_call();
  if (cond) {
    y = convergent_call();
    use y;
  }

then we get this:

  x = convergent_call();
  if (cond) {
    use x;
  }

This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.

Differential Revision: https://reviews.llvm.org/D149348
2023-04-28 14:50:48 +01:00
Jay Foad
5534d1d834 [CSE] Precommit an AMDGPU test case for D149348
Differential Revision: https://reviews.llvm.org/D149349
2023-04-28 14:50:48 +01:00
Daniel Kiss
d75e70d7ae [AArch64] Add preserve_all calling convention.
Clang accepts preserve_all for AArch64 while it is missing form the backed.

Fixes #58145

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D135652
2023-04-28 14:55:38 +02:00
David Green
5ff493df29 [ARM] Update and regenerate pred-selectop test. NFC
Shift and fdiv tests have been added to show the reverse transform.
2023-04-28 13:47:14 +01:00
Nikita Popov
0659000ff7 [LICM] Don't duplicate instructions just because they're free
D37076 makes LICM duplicate instructions into exit blocks if the
instruction is free. For GEPs, the motivation appears to be that
this allows the GEP to be folded into addressing modes, while
non-foldable users outside the loop might prevent this. TBH I don't
think LICM is the place to do this (why doesn't CGP apply this
heuristic itself?) but at least I understand the motivation.

However, the transform is also applied to all other "free"
instructions, which are just that (removed during lowering and not
"folded" in some way). For such instructions, this transform seems
somewhere between useless, counter-productive (undoing CSE/GVN) and
actively incorrect. For example, this transform can duplicate freeze
instructions, which is illegal.

This patch limits the transform to just foldable GEPs, though we
might want to drop it from LICM entirely as a followup.

This is a small compile-time improvement, because querying TTI cost
model for every single instruction is expensive.

Differential Revision: https://reviews.llvm.org/D149136
2023-04-28 14:31:23 +02:00
Luke Lau
32dbe0f5c0 [RISCV] Fix labels in fixed-vectors-fp test 2023-04-28 12:01:46 +01:00
Lawrence Benson
cd68e17bc2 [AArch64] Add support for efficient bitcast in vector truncate store.
Following the changes in D145301, we now also support the efficient bitcast
when storing the bool vector. Previously, this was expanded.

Differential Revision: https://reviews.llvm.org/D148316
2023-04-28 11:19:45 +01:00
Alexis Engelke
ab21beaccc [AArch64][FastISel] Handle CRC32 intrinsics
With a similar reason as D148023; some applications make heavy use of
the CRC32 intrinsic (e.g., as part of a hash function) and therefore
benefit from avoiding frequent SelectionDAG fallbacks. In our
application, we get a 2% compile-time improvement.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D148917
2023-04-28 11:29:23 +02:00
Luke Lau
bd6fa8656a [RISCV] Add tests for illegal fixed length vectors that need widened
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D148518
2023-04-28 10:19:01 +01:00
Enna1
d961f66b28 [hwasan] fix false positive when hwasan-match-all-tag flag is enabled and short granules are used
When hwasan-match-all-tag flag is enabled and short granules are used, at the point checking if this is a short tag case, the tag from pointer is stored in X16 register,
which breaks the assumption that tag from shadow memory is stored in X16 register, this will cause a false positive.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D149252
2023-04-28 17:00:26 +08:00
Enna1
9baa85271d [hwasan][test] add test for hwasan-check-memaccess when hwasan-match-all-tag flag and short granules both used
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D149399
2023-04-28 16:57:31 +08:00
Jordan Rupprecht
fbf42f1fe2 Revert "[CodeGenPrepare] Estimate liveness of loop invariants when checking for address folding profitability"
This reverts commit 5344d8e10bb7d8672d4bfae8adb010465470d51b.

It causes non-determinism when building clang. See the review thread on D143897.
2023-04-27 19:16:32 -07:00
Jeffrey Byrnes
7f0a881e6c [AMDGPU] Track liveins for max-ilp-sched-strategy
Even if optimizing for ILP, it is still useful to track RP to avoid spilling. Given that, we need to maintin consistent liveness state with the RP tracker. This patch makes RP tracking consistent by updating for liveins.

Otherwise, we should completely eliminate RP tracking for this scheduler (checkScheduling, initCandidate).

Differential Revision: https://reviews.llvm.org/D149358
2023-04-27 16:45:45 -07:00
Nick Desaulniers
012ea747ed [CodeGen][MachineLastInstrsCleanup] fix INLINEASM_BR hazard
If the removable definition resides in an INLINEASM_BR target, the
reuseable candidate might not dominate the INLINEASM_BR.

   bb0:
      INLINEASM_BR &"" %bb.1
      renamable $x8 = MOVi64imm 29273397577910035
      B %bb.2
      ...
    bb1:
      renamable $x8 = MOVi64imm 29273397577910035
      renamable $x8 = ADDXri killed renamable $x8, 2048, 0
    bb2:

Removing the second mov is a hazard when the inline asm branches to bb1.

Skip such replacements when the to be removed instruction is in the
target of such an INLINEASM_BR instruction.

We could get more aggressive about this in the future, but for now
simply abort.

This is causing a boot failure on linux-4.19.y branches of the LTS Linux
kernel for ARCH=arm64 with CONFIG_RANDOMIZE_BASE=y (KASLR) and
CONFIG_UNMAP_KERNEL_AT_EL0=y (KPTI).

Link: https://reviews.llvm.org/D123394
Link: https://github.com/ClangBuiltLinux/linux/issues/1837

Thanks to @nathanchance for the report, and @ardb for debugging.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D149191
2023-04-27 13:40:00 -07:00
Nick Desaulniers
095a0c67bb [CodeGen] precommit machine-latecleanup test
Demonstrates a hazard in machine-latecleanup.

Differential Revision: https://reviews.llvm.org/D149190
2023-04-27 13:39:48 -07:00
Changpeng Fang
1ab8b9ae15 AMDGPU: Define sub-class of SGPR_64 for tail call return
Summary:
  Registers for tail call return should not be clobbered by callee.
So we need a sub-class of SGPR_64 (excluding callee saved registers (CSR)) to hold
the tail call return address.

Because GFX and C calling conventions have different CSR, we need to define
the sub-class separately. This work is an extension of D147096 with the
consideration of GFX calling convention.

Based on the calling conventions, different instructions will be selected with
different sub-class of SGPR_64 as the input.

Reviewers: arsenm, cdevadas and sebastian-ne

Differential Revision: https://reviews.llvm.org/D148824
2023-04-27 10:45:11 -07:00
David Green
4249d609ac [AArch64] Regenerate trunc-to-tbl and zext-to-tbl tests. NFC
The -mattr=+global-isel is not valid syntax, so those lines have been removed.
With Global-ISel there is currently missing vector legalization for wide G_EXT,
and it does not support BE.
2023-04-27 17:21:13 +01:00
skc7
e016fb57b3 [AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic.
Legalize soffset of buffer instructions using waterfall loop.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141030
2023-04-27 19:36:50 +05:30
Jingu Kang
044f27f62f [AArch64] Precommit tests for VECTOR_SHUFFLE 2023-04-27 14:44:09 +01:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Alexis Engelke
7751a91465 [AArch64][FastISel] Handle call with multiple return regs
The code closely follows the X86 back-end. Applications that make heavy
use of {i64, i64} returns to use two registers strongly benefit from the
reduced number of SelectionDAG fallbacks.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D148346
2023-04-27 11:59:33 +02:00
Luo, Yuanke
8f7f9d86a7 [X86] Machine combine vnni instruction.
"vpmaddwd + vpaddd" can be combined to vpdpwssd and the latency is
reduced after combination. However when vpdpwssd is in a critical path
the combination get less ILP. It happens when vpdpwssd is in a loop, the
vpmaddwd can be executed in parallel in multi-iterations while vpdpwssd
has data dependency for each iterations. If vpaddd is in a critical path
while vpmaddwd is not, it is profitable to split vpdpwssd into "vpmaddwd
+ vpaddd".
This patch is based on the machine combiner framework to acheive decision
on "vpmaddwd + vpaddd" combination. The typical example code is as
below.
```
__m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) {

    for (int i = 0; i < cnt; ++i) {
        __m256i a = p[i];
        __m256i m = _mm256_madd_epi16 (b, a);
        c = _mm256_add_epi32(m, c);
    }

    return c;
}
```

Differential Revision: https://reviews.llvm.org/D148980
2023-04-27 16:42:04 +08:00
Jay Foad
47d3cbcf84 [BranchFolder] Skip redundant IMPLICIT_DEFs of subregs
Differential Revision: https://reviews.llvm.org/D148509
2023-04-27 09:40:06 +01:00
Jay Foad
12b70ad68c [BranchFolder] Precommit AMDGPU test case for D148509 2023-04-27 09:40:06 +01:00
Nicolai Hähnle
1e63f8272e AMDGPU: Fix an assertion in SIOptimizeVGPRLiveRange
As the comment notes, the shader results in an INSERT_SUBREG with
"undef" (dead) operand in the Endif block. The same can happen with
REG_SEQUENCE. The register is considered dead from a liveness
analysis perspective. The correct thing to do seems to be nothing:
we keep the undef use of the register, the register allocator should
still be able to take the liveness into account correctly.

Differential Revision: https://reviews.llvm.org/D149161
2023-04-27 09:39:44 +02:00
Noah Goldstein
ddfee6d0b6 [X86] Support X86ISD::PCMPEQ and X86ISD::PCMPGT in ComputeKnownBits
These functions where missing support but are used enough that it
makes sense to track them.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D148963
2023-04-26 23:48:24 -05:00
Yeting Kuo
1855c0a82a [RISCV] Support vector strict rounding operations.
The patch basically models custom lowering of base rounding operations to expand
rounding by coverting to ingter and coverting back to FP. The other one thing
the patch does is to covert sNan of the source to qNan.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D148519
2023-04-27 11:35:34 +08:00
Craig Topper
5854b39847 [RISCV] Remove the uret instruction.
This was part of the N extension which did not make it into
version 1.12 of the privilege specification.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D149308
2023-04-26 17:11:58 -07:00
Matt Arsenault
5b7fa4a48d VE: Register null MCTargetStreamer 2023-04-26 19:27:11 -04:00
Brad Smith
c30c291887 [SPARC] Lower BR_CC to BPr on 64-bit target whenever possible
On 64-bit target, when doing i64 BR_CC where one of the comparison operands is a
constant zero, try to fold the compare and BPcc into a BPr instruction.

For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D142461
2023-04-26 18:56:00 -04:00
Matt Arsenault
bbc7b30fbf AMDGPU: Remove invalid testcase for enqueue kernel
The call didn't have the right calling convention, but calls to
kernels are supposed to be illegal anyway.
2023-04-26 17:25:30 -04:00
David Green
d340ef697d [AArch64][SVE] Generate smull/umull instead of sve v2i64 mul
A neon smull/umull should be preferred over a sve v2i64 mul with two extends.
It will be both less instructions and a lower cost multiply instruction.

Differential Revision: https://reviews.llvm.org/D148248
2023-04-26 22:12:00 +01:00
Craig Topper
3ce3ee6169 [RISCV] Make Zicntr and Zihpm imply Zicsr.
Zicntr and Zihpm are names for groups of CSRs so they should imply
that CSRs exist.

Reviewed By: asb, kito-cheng

Differential Revision: https://reviews.llvm.org/D148962
2023-04-26 10:11:14 -07:00
Craig Topper
236898f619 [RISCV] Accept zicntr and zihpm command line options
This change adds the definition of the two extensions, but does not either a) make any register definitions conditional on them or b) enabled the extensions by default.

This is somewhat analogous to https://reviews.llvm.org/D143953, but with some key differences.  The best discussion I can find on status is here: https://github.com/riscv/riscv-profiles/issues/43.  These were removed between document version 2.1 and 2.2, but were not defined as new extensions in 2.2.  That addition came later - in March 2022.

According to https://drive.google.com/file/d/1qa57pePesOiDOrNzxuuGFhCL4Rbi9AYB/view these were ratified in March 2023.

Reviewed By: asb, reames

Differential Revision: https://reviews.llvm.org/D144215
2023-04-26 10:11:07 -07:00
Mingming Liu
9879e5865a [InlineAsm][AArch64]Add backend support for flag output parameters
- The set of flag is from https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Flag-Output-Operands

Before:
- ARM64 GCC supports flag output constraints, while Clang doesn't parse condition code, as shown in https://gcc.godbolt.org/z/7jzMEK796
- LLVM ISel won't lower them either (as shown in https://gcc.godbolt.org/z/Pv4PPf56c)

After:
- Given flag output constraints in LLVM IR, condition code is parsed and flag output is lowered to 'cset'.
- Clang parse is not added in this patch.

Differential Revision: https://reviews.llvm.org/D149032
2023-04-26 09:18:41 -07:00
Jay Foad
22516593ae [AMDGPU] Add GFX11 ds_min_f32 / ds_max_f32 tests 2023-04-26 17:09:12 +01:00
Paul Kirth
bface3947e [RISCV] Make SCS prologue interrupt safe on RISC-V
Prior to this patch the SCS prologue used the following instruction
sequence.

```
s[w|d]  ra, 0(gp)
addi    gp, gp, [4|8]
```

The problem with this sequence is that an interrupt occurring between the
store and the increment could clobber the value just written to the SCS.

https://reviews.llvm.org/D84414#inline-813203 pointed out a similar
issues that could have affected the epilogue.

This patch changes the instruction sequence in the prologue to:

```
addi    gp, gp, [4|8]
s[w|d]  ra, -[4|8](gp)
```

The downside to this is that there is now a data dependency between the
add and the store.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D149099
2023-04-26 15:58:09 +00:00
Joe Nash
f8ec7a0944 [AMDGPU] Delete test for illegal v_cndmask_b16_dpp
There are no VOP2 or VOP2 with dpp forms of v_cndmask_b16. Delete the
test. NFC.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D149184
2023-04-26 09:50:44 -04:00
Janek van Oirschot
124acb7ca3 [AMDGPU] Fix negative offset values interpretation in getMemOperandsWithOffset for DS
The offset values may result in an erroneous scheduling of a load before write for a memory location if the offset values are represented as negative values in MIR, despite actually being unsigned values. This representation in MIR happens as SelectionDAG::getConstant could go through APInt to represent the encoding which assumes the MSB of the encoding as a sign-bit, regardless of whether it is supposed to be a signed value. The 8-bit negative (interpreted) value gets cast to an unsigned 32 bit value in getMemOperandsWithOffset used for comparisons in areMemAccessesTriviallyDisjoint eventually leading to an erroneous schedule in the machine scheduler.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D149080
2023-04-26 14:10:25 +01:00