42604 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
522b259976 [AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940
Differential Revision: https://reviews.llvm.org/D121843
2022-03-17 12:12:06 -07:00
Vang Thao
27e1931508 [AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs
When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.

Differential Revision: https://reviews.llvm.org/D121874
2022-03-17 11:38:53 -07:00
Julian Lettner
22570bac69 Lower @llvm.global_dtors using __cxa_atexit on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-17 10:47:13 -07:00
Matt Arsenault
8d66603a48 Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case"
This reverts commit c46aab01c002b7a04135b8b7f1f52d8c9ae23a58.

This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
2022-03-17 13:12:01 -04:00
Craig Topper
bbd2ecf9f0 [RISCV] Add +experimental-zvfh extension to cover half types in vectors.
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.

In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.

Draft spec here https://github.com/riscv/riscv-v-spec/pull/780

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D121345
2022-03-17 10:04:02 -07:00
Yonghong Song
d2b4a675a8 [BPF] Fix a bug in BPFAdjustOpt pass for icmp transformation
When checking a bcc issue related to bcc tool inject.py,
I found a bug in BPFAdjustOpt pass for icmp transformation,
caused by typo's. For the following condition:
  Cond2Op != ICmpInst::ICMP_SLT && Cond1Op != ICmpInst::ICMP_SLE
it should be
  Cond2Op != ICmpInst::ICMP_SLT && Cond2Op != ICmpInst::ICMP_SLE

This patch fixed the problem and a test case is added.

Differential Revision: https://reviews.llvm.org/D121883
2022-03-17 09:25:18 -07:00
David Green
0fa4aeb453 [AArch64] Add extra insert-subvector tests. NFC 2022-03-17 15:29:07 +00:00
Sanjay Patel
67e9151096 [x86] try harder to use shift instead of test if it can save some immediate bytes
We favor 'and' and 'test' in earlier phases of optimization,
and that's usually the better option, but we can save a few
instruction bytes by converting a mask constant to a shift here.

Differential Revision: https://reviews.llvm.org/D121147
2022-03-17 09:10:57 -04:00
David Green
0b6df40c52 [AArch64] Combine ISD::AND into AArch64ISD::ANDS
If we already have a AArch64ISD::ANDS node with identical operands, we
can merge any ISD::AND into it, reducing the instruction count by
calculating the value and the flags in a single operation. This code is
taken from the X86 backend, and could also handle AArch64ISD::ADDS and
AArch64ISD::SUBS, but I couldn't find any test cases where it came up.

Differential Revision: https://reviews.llvm.org/D118584
2022-03-17 09:44:11 +00:00
Lian Wang
214afc7116 [RISCV] Add patterns for vnsrl.wi and vnsra.wi instructions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121675
2022-03-17 07:22:32 +00:00
Abinav Puthan Purayil
f59cb41ba1 [AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.

Differential Revision: https://reviews.llvm.org/D121770
2022-03-17 10:12:32 +05:30
Heejin Ahn
b8038a916d [WebAssembly] Disable SimplifyDemandedVectorElts after legalization
This fixes a reported bug that caused an infinite loop during the
SelectionDAG optimization phase in ISel, by creating an overridable hook
in `TargetLowering` that allows us to bail out from running
`SimplifyDemandedVectorElts`.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D121869
2022-03-16 20:52:43 -07:00
Heejin Ahn
0ca2132067 [WebAssembly] Improve EH/SjLj error messages
This includes a function name and a relevant instruction in error
messages when possible, making them more helpful.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D120678
2022-03-16 20:50:34 -07:00
Christudasan Devadasan
6dd21d1db1 [AMDGPU][SIFoldOperands] Consider the alignment constraints
Enforced an alignment check while folding the operands.
2022-03-17 08:27:53 +05:30
Christudasan Devadasan
af717d4aca [AMDGPU][MachineVerifier] Alignment check for fp32 packed math instructions
The fp32 packed math instructions are introduced in gfx90a.
If their vector register operands are not properly aligned, the
verifier should flag them. Currently, the verifier failed to
report it and the compiler ended up emitting a broken assembly.
This patch fixes that missed case in TII::verifyInstruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121794
2022-03-17 08:21:35 +05:30
Craig Topper
74cf8575f7 [RISCV] Remove stale FIXME from a test. NFC 2022-03-16 14:55:11 -07:00
Craig Topper
2e10671ec7 [RISCV] Improve detection of when to skip (and (srl x, c2) c1) -> (srli (slli x, c3-c2), c3) isel.
We have a special case to skip this transform if c1 is 0xffffffff
and x is sext_inreg in order to use sraiw+zext.w. But we were only
checking that we have a sext_inreg opcode, not how many bits are
being sign extended.

This commit adds a check that it is a sext_inreg from i32 so we know for
sure that an sraiw can be created.
2022-03-16 14:54:34 -07:00
Arthur Eubanks
2371c5a0e0 [OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Basically the same as D120527.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D121847
2022-03-16 14:11:53 -07:00
Thomas Lively
7e8913d775 [WebAssembly] Fix names of SIMD instructions containing '_zero'
Fix the instruction names to match the WebAssembly spec:

 - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
 - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`

Also rename related things like intrinsics, builtins, and test functions to
match.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D121661
2022-03-16 13:34:57 -07:00
Yonghong Song
98e2274458 [BPF] fix a CO-RE bitfield relocation error with >8 record alignment
Jussi Maki reported a fatal error like below for a bitfield
CO-RE relocation:
  fatal error: error in backend: Unsupported field expression for
  llvm.bpf.preserve.field.info, requiring too big alignment
The failure is related to kernel struct thread_struct. The following
is a simplied example.

Suppose we have below structure:
  struct t2 {
    int a[8];
  } __attribute__((aligned(64))) __attribute__((preserve_access_index));
  struct t1 {
    int f1:1;
    int f2:2;
    struct t2 f3;
  } __attribute__((preserve_access_index));

Note that struct t2 has aligned 64, which is used sometimes in the
kernel to enforce cache line alignment.

The above struct will be encoded into BTF and the following is what
C code looks like and the struct will appear in the file like vmlinux.h.
  struct t2 {
        int a[8];
        long: 64;
        long: 64;
        long: 64;
        long: 64;
  } __attribute__((preserve_access_index));
  struct t1 {
        int f1: 1;
        int f2: 2;
        long: 61;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        long: 64;
        struct t2 f3;
  } __attribute__((preserve_access_index));

Note that after
  origin_source -> BTF -> new_source
transition, the new source has the same memory layout as the old one
but the alignment interpretation inside the compiler could be different.
The bpf program will use the later explicitly padded structure as in
vmlinux.h.

In the above case, the compiler internal ABI alignment for new struct t1
is 16 while it is 4 for old struct t1. I didn't do a thorough investigation
why the ABI alignment is 16 and I suspect it is related to anonymous padding
in the above.

Current BPF bitfield CO-RE handling requires alignment <= 8 so proper
bitfield operatin can be performed. Therefore, alignment 16 will cause
a compiler fatal error.

To fix the ABI alignment >=16, let us check whether the bitfield
can be held within a 8-byte-aligned range. If this is the case,
we can use alignment 8. Otherwise, a fatal error will be reported.

Differential Revision: https://reviews.llvm.org/D121821
2022-03-16 12:16:46 -07:00
Jessica Clarke
659363c0cc [RISCV] Ensure PseudoLA* can be hoisted
Since we mark the pseudos as mayLoad but do not provide any MMOs,
isSafeToMove conservatively returns false, stopping MachineLICM from
hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a
load, so stop marking that as mayLoad to allow it to be hoisted, and for
the others make sure to add MMOs during lowering to indicate they're GOT
loads and thus can be freely moved.

Fixes https://github.com/llvm/llvm-project/issues/54372

Reviewed By: MaskRay, arichardson

Differential Revision: https://reviews.llvm.org/D121654
2022-03-16 18:45:36 +00:00
Jessica Clarke
883f755639 [NFC][RISCV] Pre-commit tests for hoisting of PseudoLLA/PseudoLA*
Only PseudoLLA is currently hoisted; this will be fixed in a subsequent
commit.
2022-03-16 18:45:19 +00:00
Jake Egan
c7dc9dbaff [VE] Remove output to /dev/stdout
Sending output to /dev/stdout on AIX gets an llc permission denied error, so this patch removes this from the tests.

Reviewed By: simoll, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D121799
2022-03-16 11:42:09 -04:00
Simon Pilgrim
e3deb7d88b [X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling
Fixes #54171
2022-03-16 11:05:36 +00:00
Simon Pilgrim
330b532a34 [X86] Add PR54171 test case 2022-03-16 11:05:36 +00:00
Simon Moll
91fad1167a [VE] v512|256 f32|64 fneg isel and tests
fneg instruction isel and tests. We do this also in preparation of fused
negatate-multiple-add fp operations.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121620
2022-03-16 11:31:26 +01:00
Florian Hahn
e5822ded56
[FunctionAttrs] Infer argmemonly .
This patch adds initial argmemonly inference, by checking the underlying
objects of locations returned by MemoryLocation.

I think this should cover most cases, except function calls to other
argmemonly functions.

I'm not sure if there's a reason why we don't infer those yet.

Additional argmemonly can improve codegen in some cases. It also makes
it easier to come up with a C reproducer for 7662d1687b09 (already fixed,
but I'm trying to see if C/C++ fuzzing could help to uncover similar
issues.)

Compile-time impact:
NewPM-O3: +0.01%
NewPM-ReleaseThinLTO: +0.03%
NewPM-ReleaseLTO+g: +0.05%

https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121415
2022-03-16 10:24:33 +00:00
David Green
09a2b5b506 [AArch64] Regenerate and extend peephole-and-tst.ll tests. NFC 2022-03-16 09:44:20 +00:00
Matthias Gehre
09854f2af3 [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers
Emit calls to __divei4 and friends for divison/remainder of large integers.

This fixes https://github.com/llvm/llvm-project/issues/44994.

The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329

The compiler-rt part is in https://reviews.llvm.org/D120327

Differential Revision: https://reviews.llvm.org/D120329
2022-03-16 09:36:28 +00:00
Nikita Popov
57d57b1afd [AAEval] Make compatible with opaque pointers
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.

This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.

To somewhat minimize the churn, printing still uses faux typed
pointer notation.
2022-03-16 10:02:11 +01:00
Haocong.Lu
6a54776fe0 [RISCV] Select SRLI+SLLI for AND with leading ones mask
Select SRLI+SLLI for and i64 %x, imm if the imm is a leading ones mask.
It's useful in RV64 when the mask exceeds simm32 (cannot be generated by LUI).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121598
2022-03-16 02:10:57 +00:00
Matthias Braun
84ef62126a X86ISelDAGToDAG: Transform TEST + MOV64ri to SHR + TEST
Optimize a pattern where a sequence of 8/16 or 32 bits is tested for
zero: LLVM normalizes this towards and `AND` with mask which is usually
good, but does not work well on X86 when the mask does not fit into a
64bit register. This DagToDAG peephole transforms sequences like:

```
movabsq $562941363486720, %rax # imm = 0x1FFFE00000000
testq %rax, %rdi
```

to

```
shrq $33, %rdi
testw %di, %di
```

The result has a shorter encoding and saves a register if the tested
value isn't used otherwise.

Differential Revision: https://reviews.llvm.org/D121320
2022-03-15 14:18:04 -07:00
Matthias Braun
baae814377 Add tests for D121320
Differential Revision: https://reviews.llvm.org/D121319
2022-03-15 14:18:04 -07:00
Stefan Pintilie
78406ac898 [PowerPC][P10] Add Vector pair calling convention
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.

Part of an original patch by: Lei Huang

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117225
2022-03-15 14:08:42 -05:00
Joe Nash
687d20de7f [AMDGPU] Regen checks again no-remat-indirect-mov
NFC. Update script does not behave right since the run lines have
identical output. Delete the duplicated check prefix added in
22cfbf7ecacdf7db47c2f65fe896bdf62ebcc0f3
2022-03-15 13:44:41 -04:00
Joe Nash
4cf86bd744 [AMDGPU] Regen checks for schedule-barrier
NFC. Hasn't been updated since script added check-next
2022-03-15 13:35:43 -04:00
Joe Nash
22cfbf7eca [AMDGPU] Regen checks for no-remat-indirect-mov
NFC. Hasn't been updated since the update script started adding
check-next.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121719
2022-03-15 13:33:42 -04:00
Craig Topper
1bf4bbc492 [LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later.
If we promote the ABS and then Expand in LegalizeDAG, then both the
sra and the xor will have their inputs sign extended. This generates
extra code on RISCV which lacks an i8 or i16 sign extend instructon.
If we expand during type legalization, then only the sra will get its
input sign extended. RISCV is able to combine this with the sra by
doing a shift left followed by an sra.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121664
2022-03-15 08:27:39 -07:00
Craig Topper
ad94dfb9a0 [DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference
RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.

This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.

Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121650
2022-03-15 08:26:47 -07:00
Simon Moll
6ac3d8ef9c [VE] strided v256.23 isel and tests
ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121616
2022-03-15 15:29:19 +01:00
Simon Pilgrim
7262eacd41 Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower @llvm.global_dtors using __cxa_atexit on MachO"
Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'
2022-03-15 13:01:35 +00:00
Simon Pilgrim
f591231cad [X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization
This replaces the attempt in 20af71f8ec47319d375a871db6fd3889c2487cbd to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart.

Fixes #53760
2022-03-15 12:16:11 +00:00
Qiu Chaofan
300e1293de [PowerPC] Disable perfect shuffle by default
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:

  %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
  %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>

The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.

The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D121082
2022-03-15 15:52:24 +08:00
Julian Lettner
9c542a5a4e Lower @llvm.global_dtors using __cxa_atexit on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327
2022-03-14 17:51:18 -07:00
Stanislav Mekhanoshin
c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin
1f53f20fc1 [AMDGPU] Support gfx940 v_lshl_add_u64 instruction
Differential Revision: https://reviews.llvm.org/D121401
2022-03-14 15:45:42 -07:00
Stanislav Mekhanoshin
36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Stanislav Mekhanoshin
47bac63d3f [AMDGPU] gfx940 memory model
Differential Revision: https://reviews.llvm.org/D121242
2022-03-14 15:01:46 -07:00
Simon Pilgrim
edd3e705bb [X86] Fix avx512.mask.vpshld/vpshrd tests to correctly test maskz cases 2022-03-14 21:20:26 +00:00
Stanislav Mekhanoshin
72a9e5f891 [AMDGPU] Restrict machine copy propagation from creating unaligned classes
Fixes: SWDEV-326366

Differential Revision: https://reviews.llvm.org/D121491
2022-03-14 14:09:40 -07:00