This patch is a follow up to D43315, and adds the following new load/store
TLS specific instructions for integer and floating point scalar types:
```
LHAXTLS
LWAXTLS
LHAXTLS_32
LWAXTLS_32
LFSXTLS
LFDXTLS
STFSXTLS
STFDXTLS
```
These instructions can be used to optimized TLS sequences when D-Form
loads/stores follow an ADD_TLS instruction.
Duplicate versions of these instructions are also added within an isAsmParserOnly=1
block (similar to D47382) to allow llvm-mc to assemble these instructions.
Differential Revision: https://reviews.llvm.org/D153645
This is a follow on to D152740. The focus of this patch is on actually removing the old TA (unsuffixed) version. I realized we already had plumbing for combined TA/TU pseudos - used by some of the ternary instructions. As such, we can go ahead and fully remove the old TA, and rename the _TU variant to be unsuffixed. (The rename must happen in this patch for the table structure to work out as expected.)
The scheduling difference comes from an omission in D152740. If we selected a _MASK variant - either from manual ISEL or instrincs - we were going through doPeepholeMaskedRVV and still getting the TA variant. The use of the IsCombined flag in the MaskedPseudo table causes us to use the TU (now unsuffixed) variant instead.
Differential Revision: https://reviews.llvm.org/D153155
Add a missing check that ensures that ComplexDeinterleaving for reduction
is only analyzed for Real and Imaginary Instructions of the same type.
Differential Revision: https://reviews.llvm.org/D153862
The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm
were marked as kill, instead of define. Best to use the pre-existing
t1CondCodeOp fn to construct CPSRs.
Reviewed By: simonwallis2
Differential Revision: https://reviews.llvm.org/D153763
Add X86ISD::ANDNP handling to targetShrinkDemandedConstant as well, which allows us to replace a lot of truncated masks with (rematerializable) allones values
Remove DAGISel checks on calling conventions. GlobalISel doesn't have
these checks either and we prefer it that way (see D152794).
Add a simple test like the one introduced in D117479 for GlobalISel.
Differential Revision: https://reviews.llvm.org/D153535
In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16,
followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will
have the same VTList, which causes a collision in CSEMap.
To differentiate the original VTList, let's add the size in generating
an ID. Otherwise the compiler crashes in refineAlignment:
`MMO->getSize() == getSize() && "Size mismatch!"`
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153712
`NoMerge` attribute on machine instructions prevents certain
transformations from merging these instructions.
One of such transformations is 'llvm/lib/CodeGen/BranchFolding.cpp'.
This attribute should be copied from IR `call` instructions to machine
level instructions. See `X86TargetLowering::LowerCall` as another
example.
Differential Revision: https://reviews.llvm.org/D152987
If we don't find a node with either operand through
isEssentiallyExtractHighSubvector, there is little point
recreating the node with the same operands. Returning
SDValue better communicates that no changes were made.
This fixes#63491 by not recreating uabd nodes with swapped
operands. As noted in the ticket there are other fixes that
might be useful to make too, but this should prevent the
infinite combine.
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.
We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.
Differential Revision: https://reviews.llvm.org/D152213
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.
Differential Revision: https://reviews.llvm.org/D152215
If the index needs to be scaled by a scalable size, just give up.
Fixes#63459
Reviewed By: frasercrmck, RKSimon
Differential Revision: https://reviews.llvm.org/D153601
Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.
Differential Revision: https://reviews.llvm.org/D152790
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.
Differential Revision: https://reviews.llvm.org/D151358
Blend combines two discriminator values used by other ptrauth ops.
On AArch64 here, it does that by replacing the high 16 bits of the
LHS with the low 16 bits of the RHS.
Usually the RHS is a constant, which lets us do this efficiently in
a single MOVK. When the RHS isn't constant, we can do a BFI.
In a sense, this is implementing an ABI decision (how to lower the
software construct of "blend"), but if there are interesting variants to
consider, this could be made object-file-format-specific in some way.
Differential Revision: https://reviews.llvm.org/D132384
The extracted Constant and Constant::getSplatValue can both be any bitwidth - they don't necessarily match the original ConstantSDNode type
Fixes#63507
The comment about PSHUFLW+PSHUFHW+PSHUFD was outdated as that referred to a single input case, but that is now always handled earlier.
Another step towards removing premature combines to vector truncation combines to PACK.
We already handle vslide1{up,down}, so this extends it to vslide{up,down}.
This was unintentionally added in https://reviews.llvm.org/D150463 and
then removed in 37cfcfcef76bb615b941d7077ca81168bd7ad080, but unless I'm
missing something this should still be ok as the mask only controls what
destination elements are written to.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153631
Currently vslide1{up,down}s can have vmerges folded into them, but not
vslide{up,down}s.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153630
We were only checking for the previous insructions to write exactly
the register or a super register. We ignored writes to a subregister
and continued searching for the producing instruction. We need to
abort instead.
There's another check inside the if body to abort if the registers
don't match exactly. So we just need to check for overlap so we
enter the if body.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D153490
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions.
```
int sel(int a, int b)
{
return (a < b) ? a : 0;
}
```
```
slt $a1,$a0,$a1
masknez $a2,$r0,$a1
maskeqz $a0,$a0,$a1
or $a0,$a0,$a2
```
=>
```
slt $a1,$a0,$a1
maskeqz $a0,$a0,$a1
```
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D153193
This implements the remaining overflow generating instructions in the AArch64
GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG
anymore. We make use of PostSelectOptimize to cleanup the hereby generated
flag-setting operations when the carry-out is unused. Since we do not fallback
anymore when selecting add/sub atomics on O0 some test changes were required
there.
Fixes: https://github.com/llvm/llvm-project/issues/59407
Differential Revision: https://reviews.llvm.org/D153164
D137796 made these passes unused.
`opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback`
so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.
Following what is already performed for vector.reduce.fmin/fmax, this adds
lowering for the new vector.reduce.fminimum/fmaximum nodes to the SVE fminv
and fmaxv instructions via the existing FMINV_PRED/FMAXV_PRED nodes.
Differential Revision: https://reviews.llvm.org/D153288
Without the changes from D153598.
Original commit message:
For the same reasons as D151284, this requires custom lowering of the
truncate libcall on hard float ABIs (the normal libcall code path is
used on soft ABIs).
The extend operation is implemented by a shift just as in the standard
legalisation, but needs to be custom lowered because i32 isn't a legal
type on RV64.
This patch aims to make the minimal changes that result in correct
codegen for the bfloat.ll tests.
Differential Revision: https://reviews.llvm.org/D151663
This was committed with D153598 merged into it. Reverting to recommit as separate patches.
This reverts commit 690b1c847f0b188202a86dc25a0a76fd8c4618f4.
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).
Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.
Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.
Differential Revision: https://reviews.llvm.org/D153655
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.
Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.
After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.
Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in `.text` are congruent with the existing output produced by the
default and LTO pipelines.
Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977
Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.
Reviewed By: tejohnson, MaskRay, nikic
Differential Revision: https://reviews.llvm.org/D146776