scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}
We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.
In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)
Differential Revision: https://reviews.llvm.org/D136713
This is not quite NFC because one of the users should
now avoid the DIVREM opcodes too, but I'm not sure
how to test that.
I used the same name as an analysis function in IR
in case we want to expand this to include other
operations.
Another potential use is proposed in D136713.
memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D136456
I'm unsure what the code does without the semicolon. On the surface it
seems like the assert below it would be considered part of the if
and thus the assert would only execute if DestReg is 0. But 0 isn't
considered a virtual register so the assert should fail.
Found by PVS Studio.
Reported https://pvs-studio.com/en/blog/posts/cpp/1003/ (N7)
Verify the LiveVariables analysis after a pass that claims to preserve
it, even if there are no further passes (apart from the verifier itself)
that would use the analysis.
Differential Revision: https://reviews.llvm.org/D129213
We can expect that the sequence of inserting-of-extracts-into-undef
will be successfully lowered back into widening of the source vector,
but it seems that at least for X86 mask vectors, we have a really hard time
recovering from inserting-into-zero.
I've looked into alternative fix injection points, and they are much more
involved, by the time of `LowerBUILD_VECTORvXi1()`/`LowerINSERT_VECTOR_ELT()`
the constants might be obscured, so it does not seem like we can easily
deal with this by lowering into bit math later on,
some other pieces are missing.
Instead, it seems like just clearing the padding away via an `AND`-mask
is at least not a worse choice. Why create a problem where there wasn't one.
Though yes, it is possible that there are cases where constants originate
from the source IR, so some other fix may still be needed.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D136046
This reverts commit 65aaecca8842dec30d03734a7fe8ce33c5afec81.
There was an ordering problem in the calculation of the partial
remainder.
Original commit message:
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Differential Revision: https://reviews.llvm.org/D135541
If the upper half of an abs() is all sign bits, then we can perform the abs() using just the lower half and then zero extend.
I've limited the DAG combine to only sign_extend_inreg (and free truncate/zero_extend) to minimise any later promotion issues, but for legalization a similar fold can use ComputeNumSignBits to be more aggressive.
Alive2: https://alive2.llvm.org/ce/z/y32fS4Fixes#43370
Differential Revision: https://reviews.llvm.org/D136559
If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135541
This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d.
The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but
passes locally. I'm really confused.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399
The CanFoldNonConst doesn't work correctly with opaque constants
because getNode won't constant fold constants if one is opaque. Even
if the operation is AND/OR. This can lead to infinite loops.
This patch does the folding manually in the DAGCombine. Alternatively,
we could improve getNode but that seemed likely to have bigger impact
and possibly increase compile time for the additional checks. We wouldn't
want to directly constant fold because we need to preserve the opaque flag.
Fixes PR58511.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D136472
When a CU attaches some ranges for a subprogram or an inlined code, the CU should be that of the subprogram/inlined code that was emitted.
If not, then these emitted ranges will use the incorrect base of the CU in `emitRangeList`.
A reproducible example is:
When linking these two LLVM IRs, dsymutil will report no mapping for range or inconsistent range data warnings.
`foo.swift`
```swift
import AppKit.NSLayoutConstraint
public class Foo {
public var c: Int {
get {
Int(NSLayoutConstraint().constant)
}
set {
}
}
}
```
`main.swift`
```swift
// no mapping for range
let f: Foo! = nil
// inconsistent range data
//let l: Foo = Foo()
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D136039
This reverts commit d4650d0938e290ca9d6544d6b07da2fb7156762d.
Reverted because it causes several X86 CodeGen test failures in a build
with LLVM_ENABLE_EXPENSIVE_CHECKS=ON.
This doesn't touch objc-arc-contract because that's in the codegen pipeline.
However, this does move its corresponding initialize function into initializeCodegen().
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D135041
Verify the LiveVariables analysis after a pass that claims to preserve
it, even if there are no further passes (apart from the verifier itself)
that would use the analysis.
Differential Revision: https://reviews.llvm.org/D129213
D129213 improves verification of LiveVariables, and caused
CodeGen/X86/statepoint-cmp-sunk-past-statepoint.ll to fail with:
*** Bad machine code: LiveVariables: Block should not be in AliveBlocks ***
after Two-Address instruction pass.
Fix it by clearing AliveBlocks for a register which is no longer used.
Differential Revision: https://reviews.llvm.org/D136445
This is an alternative to fix PR57939 for RISC-V. It definitely
can be argued that the stack temporaries for RISC-V are being created
with an unnecessarily large alignment. But ignoring the alignment
in MachineFrameInfo also seems bad.
Looking at the test update that go with the current ID==0 check,
it was intending to exclude things like the NoAlloc stackid. So I'm
not sure if scalable vectors are intentionally being excluded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135913
@foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it
Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops.
Differential Revision: https://reviews.llvm.org/D136081
Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64.
In particular, for SPARC64 there's new `RetCC_Sparc64_*` functions that handles the return case of the calling convention.
It uses the same analysis as `CC_Sparc64_*` family of funtions, but fails if the return value doesn't fit into the return registers.
This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D132465
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.
While I was there I flipped the condition into an early out.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D136010
This patch renames FuncletPadInst::getNumArgOperands to arg_size for
consistency with CallBase, where getNumArgOperands was removed in
favor of arg_size in commit 3e1c787b3160bed4146d3b2b5f922aeed3caafd7
Differential Revision: https://reviews.llvm.org/D136048
Instead of checking that an operand is constant/opaque before calling getNode() and then checking that the result is a constant, just use FoldConstantArithmetic which will just early-out if the operands are not constant foldable.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.
In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86
This patch fixes issue #57452.
Differential Revision: https://reviews.llvm.org/D132978
The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32.
When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash.
So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D135954
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.
Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.
More details about the SME attributes and design can be found
in D131562.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D131582
Prior to inserting an unconditional branch from X to its
fall through basic block, check if X has any terminators to
avoid inserting additional branches.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D134557
This was scanning through def operands looking for the
symbol operand. This is pointless because the symbol is always
the first operand as enforced by the verifier, and all operands
are implicit.
In Linux PIC model, there are 4 cases about value/label addressing:
Case 1: Function call or Label jmp inside the module.
Case 2: Data access (such as global variable, static variable) inside the module.
Case 3: Function call or Label jmp outside the module.
Case 4: Data access (such as global variable) outside the module.
Due to current llvm inline asm architecture designed to not "recognize" the asm
code, there are quite troubles for us to treat mem addressing differently for
same value/adress used in different instuctions.
For example, in pic model, call a func may in plt way or direclty pc-related,
but lea/mov a function adress may use got.
This patch fix/refine the case 1 and case 2 in inline asm.
Due to currently inline asm didn't support jmp the outsider lable, this patch
mainly focus on fix the function call addressing bugs in inline asm.
Reviewed By: Pengfei, RKSimon
Differential Revision: https://reviews.llvm.org/D133914
This is a simple addition to the convertPhiTypes in CodeGenPrepare to
consider and convert constants as it converts the phi type. Someone
fixed the bug in the motivating example, so the undef is now a constant
0. This does mean converting between integer and floating point
constants, which may have different materialization.
Differential Revision: https://reviews.llvm.org/D135561
MachineInstr's copy constructor works by calling the addOperand method
to add each operand of the old MachineInstr to the new one, one by
one. But addOperand deliberately avoids trying to replicate ties
between operands, on the grounds that the tie refers to operands by
index, and the indices aren't necessarily finalized yet.
This led to a code generation fault when the machine pipeliner cloned
an Arm conditional instruction, and lost the tie between the output
register and the input value to be used when the condition failed to
execute.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D135434
This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad.
Getting a lit test failures on AMDGPU but I can't reproduce it so far.
Reverting to investigate.
(sra X, BW-1) is either 0 or -1. So the multiply is a conditional
negate of Y.
This pattern shows up when type legalizing wide multiplies involving
a sign extended value.
Fixes PR57549.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133399