When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.
Differential Revision: https://reviews.llvm.org/D121874
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
This reverts commit c46aab01c002b7a04135b8b7f1f52d8c9ae23a58.
This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
Currently we allow half types in vectors if the scalar Zfh extension
is enabled. This behavior is not inline with the vector spec. For f32
and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control
the availablity of floating point types in vectors.
In order to make our compiler compliant, we either need to remove all support
for half in vectors or we need an extension to control it.
Draft spec here https://github.com/riscv/riscv-v-spec/pull/780
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121345
When checking a bcc issue related to bcc tool inject.py,
I found a bug in BPFAdjustOpt pass for icmp transformation,
caused by typo's. For the following condition:
Cond2Op != ICmpInst::ICMP_SLT && Cond1Op != ICmpInst::ICMP_SLE
it should be
Cond2Op != ICmpInst::ICMP_SLT && Cond2Op != ICmpInst::ICMP_SLE
This patch fixed the problem and a test case is added.
Differential Revision: https://reviews.llvm.org/D121883
We favor 'and' and 'test' in earlier phases of optimization,
and that's usually the better option, but we can save a few
instruction bytes by converting a mask constant to a shift here.
Differential Revision: https://reviews.llvm.org/D121147
If we already have a AArch64ISD::ANDS node with identical operands, we
can merge any ISD::AND into it, reducing the instruction count by
calculating the value and the flags in a single operation. This code is
taken from the X86 backend, and could also handle AArch64ISD::ADDS and
AArch64ISD::SUBS, but I couldn't find any test cases where it came up.
Differential Revision: https://reviews.llvm.org/D118584
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.
Differential Revision: https://reviews.llvm.org/D121770
This fixes a reported bug that caused an infinite loop during the
SelectionDAG optimization phase in ISel, by creating an overridable hook
in `TargetLowering` that allows us to bail out from running
`SimplifyDemandedVectorElts`.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D121869
This includes a function name and a relevant instruction in error
messages when possible, making them more helpful.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D120678
The fp32 packed math instructions are introduced in gfx90a.
If their vector register operands are not properly aligned, the
verifier should flag them. Currently, the verifier failed to
report it and the compiler ended up emitting a broken assembly.
This patch fixes that missed case in TII::verifyInstruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D121794
We have a special case to skip this transform if c1 is 0xffffffff
and x is sext_inreg in order to use sraiw+zext.w. But we were only
checking that we have a sext_inreg opcode, not how many bits are
being sign extended.
This commit adds a check that it is a sext_inreg from i32 so we know for
sure that an sraiw can be created.
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.
Basically the same as D120527.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D121847
Fix the instruction names to match the WebAssembly spec:
- `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
- `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`
Also rename related things like intrinsics, builtins, and test functions to
match.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D121661
Jussi Maki reported a fatal error like below for a bitfield
CO-RE relocation:
fatal error: error in backend: Unsupported field expression for
llvm.bpf.preserve.field.info, requiring too big alignment
The failure is related to kernel struct thread_struct. The following
is a simplied example.
Suppose we have below structure:
struct t2 {
int a[8];
} __attribute__((aligned(64))) __attribute__((preserve_access_index));
struct t1 {
int f1:1;
int f2:2;
struct t2 f3;
} __attribute__((preserve_access_index));
Note that struct t2 has aligned 64, which is used sometimes in the
kernel to enforce cache line alignment.
The above struct will be encoded into BTF and the following is what
C code looks like and the struct will appear in the file like vmlinux.h.
struct t2 {
int a[8];
long: 64;
long: 64;
long: 64;
long: 64;
} __attribute__((preserve_access_index));
struct t1 {
int f1: 1;
int f2: 2;
long: 61;
long: 64;
long: 64;
long: 64;
long: 64;
long: 64;
long: 64;
long: 64;
struct t2 f3;
} __attribute__((preserve_access_index));
Note that after
origin_source -> BTF -> new_source
transition, the new source has the same memory layout as the old one
but the alignment interpretation inside the compiler could be different.
The bpf program will use the later explicitly padded structure as in
vmlinux.h.
In the above case, the compiler internal ABI alignment for new struct t1
is 16 while it is 4 for old struct t1. I didn't do a thorough investigation
why the ABI alignment is 16 and I suspect it is related to anonymous padding
in the above.
Current BPF bitfield CO-RE handling requires alignment <= 8 so proper
bitfield operatin can be performed. Therefore, alignment 16 will cause
a compiler fatal error.
To fix the ABI alignment >=16, let us check whether the bitfield
can be held within a 8-byte-aligned range. If this is the case,
we can use alignment 8. Otherwise, a fatal error will be reported.
Differential Revision: https://reviews.llvm.org/D121821
Since we mark the pseudos as mayLoad but do not provide any MMOs,
isSafeToMove conservatively returns false, stopping MachineLICM from
hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a
load, so stop marking that as mayLoad to allow it to be hoisted, and for
the others make sure to add MMOs during lowering to indicate they're GOT
loads and thus can be freely moved.
Fixes https://github.com/llvm/llvm-project/issues/54372
Reviewed By: MaskRay, arichardson
Differential Revision: https://reviews.llvm.org/D121654
Sending output to /dev/stdout on AIX gets an llc permission denied error, so this patch removes this from the tests.
Reviewed By: simoll, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D121799
fneg instruction isel and tests. We do this also in preparation of fused
negatate-multiple-add fp operations.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121620
This patch adds initial argmemonly inference, by checking the underlying
objects of locations returned by MemoryLocation.
I think this should cover most cases, except function calls to other
argmemonly functions.
I'm not sure if there's a reason why we don't infer those yet.
Additional argmemonly can improve codegen in some cases. It also makes
it easier to come up with a C reproducer for 7662d1687b09 (already fixed,
but I'm trying to see if C/C++ fuzzing could help to uncover similar
issues.)
Compile-time impact:
NewPM-O3: +0.01%
NewPM-ReleaseThinLTO: +0.03%
NewPM-ReleaseLTO+g: +0.05%
https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D121415
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.
This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.
To somewhat minimize the churn, printing still uses faux typed
pointer notation.
Select SRLI+SLLI for and i64 %x, imm if the imm is a leading ones mask.
It's useful in RV64 when the mask exceeds simm32 (cannot be generated by LUI).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D121598
Optimize a pattern where a sequence of 8/16 or 32 bits is tested for
zero: LLVM normalizes this towards and `AND` with mask which is usually
good, but does not work well on X86 when the mask does not fit into a
64bit register. This DagToDAG peephole transforms sequences like:
```
movabsq $562941363486720, %rax # imm = 0x1FFFE00000000
testq %rax, %rdi
```
to
```
shrq $33, %rdi
testw %di, %di
```
The result has a shorter encoding and saves a register if the tested
value isn't used otherwise.
Differential Revision: https://reviews.llvm.org/D121320
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.
Part of an original patch by: Lei Huang
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117225
NFC. Update script does not behave right since the run lines have
identical output. Delete the duplicated check prefix added in
22cfbf7ecacdf7db47c2f65fe896bdf62ebcc0f3
NFC. Hasn't been updated since the update script started adding
check-next.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D121719
If we promote the ABS and then Expand in LegalizeDAG, then both the
sra and the xor will have their inputs sign extended. This generates
extra code on RISCV which lacks an i8 or i16 sign extend instructon.
If we expand during type legalization, then only the sra will get its
input sign extended. RISCV is able to combine this with the sra by
doing a shift left followed by an sra.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D121664
RISCV strong prefers i32 values be sign extended to i64. This combine
was always zero extending the constant using APInt methods.
This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead.
getNode will call TLI.isSExtCheaperThanZExt to decide how to handle
the constant.
Tests were copied from D121598 where I noticed that we were creating
constants that were hard to materialize.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D121650
ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121616
This replaces the attempt in 20af71f8ec47319d375a871db6fd3889c2487cbd to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart.
Fixes#53760
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:
%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
%v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.
The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D121082
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121327