Switch RISC-V to legalize during LegalizeVectorOps instead of
LegalizeDAG. LegalizeDAG uses the OpVT for legalize action while
LegalizeVectorOps uses the result VT. We really should fix that.
ISD::CondCode is a separate num space from opcodes. isOperationLegalOrCustom
should take an opcode.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D149528
This function gets called for vectors and ISD::SELECT_CC was never
intended to support vectors. Some updates were made to support
it when this function started getting used for vectors.
Overall, using separate ISD::SETCC and ISD::SELECT looks like an
improvement even for scalar.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D149481
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.
There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.
Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.
Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.
I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.
If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.
This could also be of use for strictfp handling, where code may be
changing the denormal mode.
Alternative name could be "unknown".
Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
The previous patch (D148980) didn't set the InstrIdxForVirtReg correctly
in genAlternativeDpCodeSequence(). It causes vnni lit test failure when
LLVM_ENABLE_EXPENSIVE_CHECKS is on.
First, in CodeGenPrepare.cpp, line 6891, the VectorCond will always be false
because if not function will return at 6888.
Second, in SelectionDAGBuilder.cpp, line 5443, getSExtValue() will return
value as int type, but now we use unsigned Val to maintain it, which make the
if condition at 5452 meaningless.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D149033
If the removable definition resides in an INLINEASM_BR target, the
reuseable candidate might not dominate the INLINEASM_BR.
bb0:
INLINEASM_BR &"" %bb.1
renamable $x8 = MOVi64imm 29273397577910035
B %bb.2
...
bb1:
renamable $x8 = MOVi64imm 29273397577910035
renamable $x8 = ADDXri killed renamable $x8, 2048, 0
bb2:
Removing the second mov is a hazard when the inline asm branches to bb1.
Skip such replacements when the to be removed instruction is in the
target of such an INLINEASM_BR instruction.
We could get more aggressive about this in the future, but for now
simply abort.
This is causing a boot failure on linux-4.19.y branches of the LTS Linux
kernel for ARCH=arm64 with CONFIG_RANDOMIZE_BASE=y (KASLR) and
CONFIG_UNMAP_KERNEL_AT_EL0=y (KPTI).
Link: https://reviews.llvm.org/D123394
Link: https://github.com/ClangBuiltLinux/linux/issues/1837
Thanks to @nathanchance for the report, and @ardb for debugging.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D149191
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
For functions with very large numbers of live variables, lookups into
LiveRegMap previously detoriated to linear searches.
This slightly increases memory usage, but that is barely measurable.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D149330
We were previously using the condition as the mask. By the semantics
of VP operations, that means that anywhere the condition is false
returns poison and not the false operand.
Use an all ones mask instead.
No tests are affected because RISC-V drops the mask when lowering.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D149310
The only way known bits could help identify a known power of two is if
it knows exactly which power of two it is, i.e. if it is a known
constant. But in that case the value should have been simplified to a
constant already. So save some compile time by not calling
computeKnownBits.
Differential Revision: https://reviews.llvm.org/D149325
"vpmaddwd + vpaddd" can be combined to vpdpwssd and the latency is
reduced after combination. However when vpdpwssd is in a critical path
the combination get less ILP. It happens when vpdpwssd is in a loop, the
vpmaddwd can be executed in parallel in multi-iterations while vpdpwssd
has data dependency for each iterations. If vpaddd is in a critical path
while vpmaddwd is not, it is profitable to split vpdpwssd into "vpmaddwd
+ vpaddd".
This patch is based on the machine combiner framework to acheive decision
on "vpmaddwd + vpaddd" combination. The typical example code is as
below.
```
__m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) {
for (int i = 0; i < cnt; ++i) {
__m256i a = p[i];
__m256i m = _mm256_madd_epi16 (b, a);
c = _mm256_add_epi32(m, c);
}
return c;
}
```
Differential Revision: https://reviews.llvm.org/D148980
Remove assert from AssignmentTrackingAnalysis that fires if a local variable
has non-alloca storage. The analysis can emit these locations but the
assignment tracking code in SelectionDAG isn't ready to handle non-alloca
storage for locals yet. The AssignmentTrackingPass (pass that adds assignment
tracking metadata) ignores non-alloca dbg.declares, so the only variables
affected are those who's backing storage is changed from an alloca during
optimisation, and the result is the variables are dropped.
Fixes: https://ci.chromium.org/ui/p/pigweed/builders/toolchain/
toolchain-ci-pigweed-linux/b8783274592206481489/overview
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D149135
The vectors being sorted here shouldn't contain duplicate entries. Prior to
this patch this was checked with an assert within the `std::sort`
predicate. However, `std::sort` may compare an element against itself which
causes the assert to fire (false positive). Move the assert outside of the sort
predicate to avoid such issues.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D149045
If cloning a DBG_VALUE instruction, register uses in that instruction could
lead to constraining of a virtual register that would not happen if the
DBG_VALUE was not present at all. This lead to different code with/without
debug info.
Now we only do that register class constraining if we dealing with a non
debug instruction.
Differential Revision: https://reviews.llvm.org/D149146
The kernel and kext environments do not provide the `__cxa_atexit()`
function, so we can't use it for lowering global module destructors.
Unfortunately, just querying for "compiling for kernel/kext?" in the LTO
pipeline isn't possible (kernel/kext identifier isn't part of the triple
yet) so we need to pass down a CodeGen flag.
rdar://93536111
Differential Revision: https://reviews.llvm.org/D148967
In rare situations involving AVX intrinsics, it seems LLVM can be coaxed
into generating copies to arguments that look like this:
$xmm0 = VMOVAPSrr $xmm1, implicit-def $ymm0
CALL64 @something ymm0
This particular form of copy implicitly zeros the upper lanes of ymm0,
hence there's an implicit-def for the register in the copy. The X86
implementation of describeLoadedValue doesn't attempt to describe this sort
of copy which causes the generic implementation in
TargetInstrInfo::describeLoadedValue to fire an assertion saying it
expected the target hook to handle it.
Play it safe in the generic implementation and return the "no location /
value" return value, rather than asserting.
Differential Revision: https://reviews.llvm.org/D148626
This commit implements the two NTLH intrinsic functions.
```
type __riscv_ntl_load (type *ptr, int domain);
void __riscv_ntl_store (type *ptr, type val, int domain);
```
```
enum {
__RISCV_NTLH_INNERMOST_PRIVATE = 2,
__RISCV_NTLH_ALL_PRIVATE,
__RISCV_NTLH_INNERMOST_SHARED,
__RISCV_NTLH_ALL
};
```
We encode the non-temporal domain into MachineMemOperand flags.
1. Create the RISC-V built-in function with custom semantic checking.
2. Assume the domain argument is a compile time constant,
and make it as LLVM IR metadata (nontemp_node).
3. Encode domain value as two bits MachineMemOperand TargetMMOflag.
4. According to MachineMemOperand TargetMMOflag, select corrsponding ntlh instruction.
Currently, it supports scalar type and fixed-length vector type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D143364
When checking the profitability of folding an address computation
into a memory instruction, the compiler tries to determine the liveness
of the values, comprising the address, at the point of the memory instruction.
This patch improves on the live variable estimates by including
the loop invariants which are references in the loop body.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D143897
This commit splits the generic part of `LoopInfo` into separate files.
These new `GenericLoopInfo` files are located in `llvm/Support` to be inline
with `GenericDomTree`.
Furthermore, this change ensures that MLIR's Bazel build does not have
to link against `LLVMAnalysis` just to use these template headers.
Depends on D148219
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D148235
These codes deleted are dead code, we never go into it.
1. In AggressiveAntiDepBreaker.cpp, have assert AntiDepReg != 0.
2. IfConversion.cpp, Kind can only be one unique value, so isFalse && isRev
can never be true.
3. DAGCombiner.cpp, at line 3675, we have considered the condition like
```
// fold (sub x, c) -> (add x, -c)
if (N1C) {
return DAG.getNode(ISD::ADD, DL, VT, N0,
DAG.getConstant(-N1C->getAPIntValue(), DL, VT));
}
```
4. ScheduleDAGSDNodes.cpp, we have Latency > 1 at line 663
5. MasmParser.cpp, code exists in a switch-case block which decided by
the value FirstTokenKind, at line 1621, FirstTokenKind could only be
one of AsmToken::Dollar, AsmToken::At and AsmToken::Identifier.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D148610
Currently clangDriver passes -femulated-tls and -fno-emulated-tls to cc1.
cc1 forwards the option to LLVMCodeGen and ExplicitEmulatedTLS is used
to decide the value. Simplify this by moving the Clang decision to
clangDriver and moving the LLVM decision to InitTargetOptionsFromCodeGenFlags.
If the add node has wrap flags then they will be destroyed by converting to
sub/not. The flags can be useful in converting to rhadd, for example, but that
may be required late if the node types need to be legalized. This limits the
preferIncOfAddToSubOfNot fold until after legalize DAG if the node have flags
to allow more folding.
Differential Revision: https://reviews.llvm.org/D148809
The "nested" `AddressingModeMatcher`s in
`AddressingModeMatcher::isProfitableToFoldIntoAddressingMode` are constructed
using the original memory instruction, even though they check whether the
address operand of a differrent memory instructon is foldable. The memory
instruction is used only for a dominance check (when not checking for
profitability), and using the wrong memory instruction does not change the
outcome of the test - if an address is foldable, the dominance test afects which
of the two possible ways to fold is chosen, but this result is discarded.
As an example, in
target triple = "x86_64-linux"
declare i1 @check(i64, i64)
define i32 @f(i1 %cc, ptr %p, ptr %q, i64 %n) {
entry:
br label %loop
loop:
%iv = phi i64 [ %i, %C ], [ 0, %entry ]
%offs = mul i64 %iv, 4
%c.0 = icmp ult i64 %iv, %n
br i1 %c.0, label %A, label %fail
A:
br i1 %cc, label %B, label %C
C:
%u = phi i32 [0, %A], [%w, %B]
%i = add i64 %iv, 1
%a.0 = getelementptr i8, ptr %p, i64 %offs
%a.1 = getelementptr i8, ptr %a.0, i64 4
%v = load i32, ptr %a.1
%c.1 = icmp eq i32 %v, %u
br i1 %c.1, label %exit, label %loop
B:
%a.2 = getelementptr i8, ptr %p, i64 %offs
%a.3 = getelementptr i8, ptr %a.2, i64 4
%w = load i32, ptr %a.3
br label %C
exit:
ret i32 -1
fail:
ret i32 0
}
the dominance test is perfomed between `%i = ...` and `%v = ...` at the moment
we're checking whether `%a3 = ...` is foldable
Using the memory instruction, which uses the interesting address is "more
correct" and this change is needed by a future patch.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D143896
This patch recommits 0827e2fa3fd15b49fd2d0fc676753f11abb60cab after
reverting it in ed7ada259f665a742561b88e9e6c078e9ea85224. Added
workround for `Targetlowering::AddrMode` no longer being an aggregate
in C++20.
`AArch64TargetLowering::isLegalAddressingMode` has a number of
defects, including accepting an addressing mode, which consists of
only an immediate operand, or not checking the offset range for an
addressing mode in the form `1*ScaledReg + Offs`.
This patch fixes the above issues.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D143895
Change-Id: I41a520c13ce21da503ca45019979bfceb8b648fa
This commit adds support for scalable vector types in theComplexDeinterleaving
pass, allowing it to recognize and handle `llvm.vector.interleave2` and
`llvm.vector.deinterleave2` intrinsics for both fixed and scalable vectors
Differential Revision: https://reviews.llvm.org/D147451
The pass uses ReachingDefAnalysis which has no information about
instructions in dead blocks.
So do not process them.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D148329
Previously, `vecreduce_{and,or} vNi1` could lead to miscompilations because the legalizer first decides to `any_ext` the operand (which is correct for `vecreduce_{and,or}`) and then decides to use `vecreduce_u{min,max}` instead (for which `any_ext` is incorrect). This patch changes it so the `vecreduce_u{min,max}` operations use `sign_ext` instead of `any_ext`.
Issue: https://github.com/llvm/llvm-project/issues/62211
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D148672
`AArch64TargetLowering::isLegalAddressingMode` has a number of
defects, including accepting an addressing mode which consists of only
an immediate operand, or not checking the offset range for an
addressing mode in the form `1*ScaledReg + Offs`.
This patch fixes the above issues.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D143895
Change-Id: I756fa21941844ded44f082ac7eea4391219f9851
As reported on Issue #62234 - we weren't correctly using the SDValue operand to get its value type, resulting in a failure when it came from a SDNode with multiple results
We haven't been able to create a suitable upstream regression test, but its been confirmed by inspection by both myself and @topperc
Fixes#62234
It looks like it is still profitable to accept a transformable to a legal vector
type, not just a legal vector, as long as vector elements are the same between
two of those types.
Differential Revision: https://reviews.llvm.org/D148229
The placement is currently wrong in the presence of function entry related
instrumentations (prefixdata, -fpatchable-function-entry=, -fsanitize=kcfi,
etc).