52796 Commits

Author SHA1 Message Date
Craig Topper
6e4be7e12a [RISCV] Split double out of compress-float.ll. Add Zcf and Zcd RUN lines.
Make Zcf/Zcd depend on Zca.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D153826
2023-06-27 09:44:51 -07:00
Amy Kwan
11b71ade51 [PowerPC][TLS] Add additional TLS X-Form loads/store instructions
This patch is a follow up to D43315, and adds the following new load/store
TLS specific instructions for integer and floating point scalar types:
```
LHAXTLS
LWAXTLS
LHAXTLS_32
LWAXTLS_32
LFSXTLS
LFDXTLS
STFSXTLS
STFDXTLS
```
These instructions can be used to optimized TLS sequences when D-Form
loads/stores follow an ADD_TLS instruction.

Duplicate versions of these instructions are also added within an isAsmParserOnly=1
block (similar to D47382) to allow llvm-mc to assemble these instructions.

Differential Revision: https://reviews.llvm.org/D153645
2023-06-27 11:33:38 -05:00
Philip Reames
65de6b0b0e [RISCV] Remove legacy TA/TU pseudo distinction for VID
This is a follow on to D152740. The focus of this patch is on actually removing the old TA (unsuffixed) version. I realized we already had plumbing for combined TA/TU pseudos - used by some of the ternary instructions. As such, we can go ahead and fully remove the old TA, and rename the _TU variant to be unsuffixed. (The rename must happen in this patch for the table structure to work out as expected.)

The scheduling difference comes from an omission in D152740. If we selected a _MASK variant - either from manual ISEL or instrincs - we were going through doPeepholeMaskedRVV and still getting the TA variant. The use of the IsCombined flag in the MaskedPseudo table causes us to use the TU (now unsuffixed) variant instead.

Differential Revision: https://reviews.llvm.org/D153155
2023-06-27 09:12:00 -07:00
Igor Kirillov
1fce8df53a Fix the ComplexDeinterleaving bug when handling mixed reductions.
Add a missing check that ensures that ComplexDeinterleaving for reduction
is only analyzed for Real and Imaginary Instructions of the same type.

Differential Revision: https://reviews.llvm.org/D153862
2023-06-27 14:40:49 +00:00
Ties Stuij
03db28edbb [ARM] in ExpandTMOV32BitImm, CPSR register ops should be Defined
The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm
were marked as kill, instead of define. Best to use the pre-existing
t1CondCodeOp fn to construct CPSRs.

Reviewed By: simonwallis2

Differential Revision: https://reviews.llvm.org/D153763
2023-06-27 14:58:22 +01:00
Simon Pilgrim
7b77dd6afd [X86] SimplifyDemandedBitsForTargetNode - add X86ISD::ANDNP handling
Add X86ISD::ANDNP handling to targetShrinkDemandedConstant as well, which allows us to replace a lot of truncated masks with (rematerializable) allones values
2023-06-27 14:23:00 +01:00
Diana Picus
d98e44b343 [AMDGPU][DAGISel] Be more flexible about what calls are allowed
Remove DAGISel checks on calling conventions. GlobalISel doesn't have
these checks either and we prefer it that way (see D152794).

Add a simple test like the one introduced in D117479 for GlobalISel.

Differential Revision: https://reviews.llvm.org/D153535
2023-06-27 09:49:38 +02:00
Alex MacLean
17aa37dd30 [SelectionDAG] Add memory size for CSEMap ID calculation
In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16,
followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will
have the same VTList, which causes a collision in CSEMap.

To differentiate the original VTList, let's add the size in generating
an ID. Otherwise the compiler crashes in refineAlignment:
`MMO->getSize() == getSize() && "Size mismatch!"`

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153712
2023-06-26 16:12:48 -07:00
Eduard Zingerman
6a6db74b77 [BPF] Propagate NoMerge attribute when lowering function calls
`NoMerge` attribute on machine instructions prevents certain
transformations from merging these instructions.
One of such transformations is 'llvm/lib/CodeGen/BranchFolding.cpp'.

This attribute should be copied from IR `call` instructions to machine
level instructions. See `X86TargetLowering::LowerCall` as another
example.

Differential Revision: https://reviews.llvm.org/D152987
2023-06-27 01:15:45 +03:00
David Green
aaca8e2c34 [AArch64] Don't recreate nodes in tryCombineLongOpWithDup
If we don't find a node with either operand through
isEssentiallyExtractHighSubvector, there is little point
recreating the node with the same operands. Returning
SDValue better communicates that no changes were made.

This fixes #63491 by not recreating uabd nodes with swapped
operands. As noted in the ticket there are other fixes that
might be useful to make too, but this should prevent the
infinite combine.
2023-06-26 22:41:18 +01:00
Matthias Braun
02ba5b8c6b Ignore load/store until stack address computation
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.

We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.

Differential Revision: https://reviews.llvm.org/D152213
2023-06-26 13:50:36 -07:00
Matthias Braun
759b217626 Switch tests to use update_llc_test_checks
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.

Differential Revision: https://reviews.llvm.org/D152215
2023-06-26 13:50:36 -07:00
Philip Reames
237efe7eaa [RISCV] Regen rvv/fixed-vectors-fmf.ll to avoid spurious test deltas 2023-06-26 12:53:33 -07:00
Craig Topper
4afa2ab7a5 [RISCV][SelectionDAGBuilder] Fix an implicit scalable TypeSize to fixed size conversion in getUniformBase.
If the index needs to be scaled by a scalable size, just give up.

Fixes #63459

Reviewed By: frasercrmck, RKSimon

Differential Revision: https://reviews.llvm.org/D153601
2023-06-26 11:56:17 -07:00
Matt Arsenault
f2596b754c SeparateConstOffsetFromGEP: Don't use SCEV
This was only using the SCEV expressions as a map key, which we can do
just as well with the value pointers. This also allows it to handle
vectors.
2023-06-26 13:58:06 -04:00
Maurice Heumann
249bd9eab0 [ARM] Fix codegen of unaligned volatile load/store of i64
Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.

Differential Revision: https://reviews.llvm.org/D152790
2023-06-26 10:45:41 -07:00
Eli Friedman
bc7f11ccb0 [SelectionDAG] Improve expansion of wide min/max
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.

Differential Revision: https://reviews.llvm.org/D151358
2023-06-26 10:45:41 -07:00
Ahmed Bougacha
b3272f5ddb [AArch64][PAC] Select MOVK for ptrauth.blend intrinsic.
Blend combines two discriminator values used by other ptrauth ops.
On AArch64 here, it does that by replacing the high 16 bits of the
LHS with the low 16 bits of the RHS.

Usually the RHS is a constant, which lets us do this efficiently in
a single MOVK.  When the RHS isn't constant, we can do a BFI.

In a sense, this is implementing an ABI decision (how to lower the
software construct of "blend"), but if there are interesting variants to
consider, this could be made object-file-format-specific in some way.

Differential Revision: https://reviews.llvm.org/D132384
2023-06-26 09:43:37 -07:00
Simon Pilgrim
868351f894 [X86] combineMul - ensure getTargetConstantFromNode splat extraction is the correct element width
The extracted Constant and Constant::getSplatValue can both be any bitwidth - they don't necessarily match the original ConstantSDNode type

Fixes #63507
2023-06-26 16:50:14 +01:00
Simon Pilgrim
6756947ac6 [X86] lowerV8I16Shuffle - use PACKSS(SEXT_INREG(X),SEXT_INREG(Y)) for pre-SSSE3 truncation shuffles
The comment about PSHUFLW+PSHUFHW+PSHUFD was outdated as that referred to a single input case, but that is now always handled earlier.

Another step towards removing premature combines to vector truncation combines to PACK.
2023-06-26 16:50:13 +01:00
Luke Lau
0e9384a6c6 [RISCV] Teach doPeepholeMaskedRVV to handle vslide{up,down}
We already handle vslide1{up,down}, so this extends it to vslide{up,down}.

This was unintentionally added in https://reviews.llvm.org/D150463 and
then removed in 37cfcfcef76bb615b941d7077ca81168bd7ad080, but unless I'm
missing something this should still be ok as the mask only controls what
destination elements are written to.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153631
2023-06-26 09:36:03 +01:00
Luke Lau
0d0bfa8a14 [RISCV] Add test cases for vmerge peephole with vslides
Currently vslide1{up,down}s can have vmerges folded into them, but not
vslide{up,down}s.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153630
2023-06-26 09:36:00 +01:00
Craig Topper
b105b3266f [RISCV] Properly handle partial writes in isConvertibleToVMV_V_V.
We were only checking for the previous insructions to write exactly
the register or a super register. We ignored writes to a subregister
and continued searching for the producing instruction. We need to
abort instead.

There's another check inside the if body to abort if the registers
don't match exactly. So we just need to check for overlap so we
enter the if body.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D153490
2023-06-25 23:08:47 -07:00
Craig Topper
43e57bda4c [RISCV] Add test case for D153490. NFC 2023-06-25 22:58:18 -07:00
Wang Rui
4262ae20d8 [LoongArch] Optimize conditional selection of integer
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions.

```
int sel(int a, int b)
{
    return (a < b) ? a : 0;
}
```

```
slt	$a1,$a0,$a1
masknez	$a2,$r0,$a1
maskeqz	$a0,$a0,$a1
or	$a0,$a0,$a2
```

=>

```
slt	$a1,$a0,$a1
maskeqz	$a0,$a0,$a1
```

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D153193
2023-06-26 10:33:51 +08:00
Weining Lu
fb563717fb Revert "[LoongArch] Optimize conditional selection of integer"
This reverts commit 3dd319ecf3be64598ea84d1730033854cade7123.

Sorry, I forgot to amend the author name and email when merging this
patch.
2023-06-26 10:30:42 +08:00
Matt Arsenault
c3b27c236d RegAllocGreedy: Fix assert with remarks on unassigned subregisters
This tried to query the physical subregister on virtual registers
if they were left unassigned.
2023-06-25 19:26:25 -04:00
Matt Arsenault
9f274939db AMDGPU: Handle the easy parts of strict fptrunc
f64->f16 is hard. The expansion is all integer but we need
to raise exceptions. Also doesn't handle the illegal f16 targets.
2023-06-25 19:26:25 -04:00
Matt Arsenault
3d409e55a1 AMDGPU: Handle constrained fpext 2023-06-25 19:26:25 -04:00
Amaury Séchet
391a95fdb1 [NFC] Autogenerate CodeGen/AMDGPU/combine-reg-or-const.ll 2023-06-25 22:56:42 +00:00
Amaury Séchet
632a8aca07 [NFC] Autogenerate CodeGen/PowerPC/tail-dup-break-cfg.ll 2023-06-25 22:55:49 +00:00
Niwin Anto
10b1f58cba [AArch64][GlobalISel] IR translate support for a return instruction of type <1 x i8> or <1 x i16> when using GlobalISel.
Code generation for return instruction of type <1 x i8> or <1 x i16> when using GlobalISel causes internal compiler crash Could not handle ret ty.

Fixes: https://github.com/llvm/llvm-project/issues/58211

Differential Revision: https://reviews.llvm.org/D153300
2023-06-25 14:40:48 -07:00
Tobias Stadler
84a6a057e6 [AArch64][GlobalISel] Select G_UADDE/G_SADDE/G_USUBE/G_SSUBE
This implements the remaining overflow generating instructions in the AArch64
GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG
anymore. We make use of PostSelectOptimize to cleanup the hereby generated
flag-setting operations when the carry-out is unused. Since we do not fallback
anymore when selecting add/sub atomics on O0 some test changes were required
there.

Fixes: https://github.com/llvm/llvm-project/issues/59407

Differential Revision: https://reviews.llvm.org/D153164
2023-06-25 14:32:00 -07:00
Amaury Séchet
e345b9ca7a [NFC] Autogenerate CodeGen/PowerPC/pr40922.ll 2023-06-25 21:05:06 +00:00
David Green
6fcc562fc7 [AArch64] Add SVE tests for double reducts of vector.reduce.fmaximum/fminimum. NFC
Now that the SVE parts are in, we can fill in the double reduction tests
without them causing problems.
2023-06-25 08:44:43 +01:00
Fangrui Song
2a61ceddb3 [BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal
D137796 made these passes unused.

`opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback`
so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.
2023-06-24 22:44:06 -07:00
Amaury Séchet
93af6bdcaf [NFC] Autogenerate CodeGen/PowerPC/select-i1-vs-i1.ll 2023-06-25 01:27:29 +00:00
Amaury Séchet
8412a17b79 [NFC] Autogenerate CodeGen/ARM/2013-07-29-vector-or-combine.ll 2023-06-25 01:05:21 +00:00
Amaury Séchet
7457acb842 [NFC] Autogenerate CodeGen/ARM/2011-03-15-LdStMultipleBug.ll 2023-06-25 01:02:49 +00:00
Amaury Séchet
e271a539c5 [NFC] Autogenerate CodeGen/ARM/and-sext-combine.ll 2023-06-25 00:55:03 +00:00
Amaury Séchet
78c1985f99 [NFC] Autogenerate CodeGen/ARM/machine-cse-cmp.ll 2023-06-25 00:44:30 +00:00
Amaury Séchet
2e8111d4c4 [NFC] Autogenerate CodeGen/ARM/pr35103.ll 2023-06-25 00:29:14 +00:00
Thorsten Schütt
e0e998f8d8 [GlobalIsel][X86]] Legalize G_CONSTANT_FOLD_BARRIER
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D153684
2023-06-24 17:19:51 +02:00
David Green
f8003689f1 [AArch64] Add SVE lowering for vector.reduce.fminimum and fmaximum
Following what is already performed for vector.reduce.fmin/fmax, this adds
lowering for the new vector.reduce.fminimum/fmaximum nodes to the SVE fminv
and fmaxv instructions via the existing FMINV_PRED/FMAXV_PRED nodes.

Differential Revision: https://reviews.llvm.org/D153288
2023-06-24 11:12:58 +01:00
Thorsten Schütt
68ed9d9472 [GlobalIsel][X86] G_STORE extension
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153643
2023-06-24 05:51:30 +02:00
Alex Brachet
6085eb3084 Revert "Reland [llvm] Preliminary fat-lto-objects support"
This reverts commit 44265dc3554ef40920b587eeb787a400663af6c7.
2023-06-24 01:15:50 +00:00
Alex Bradbury
929124993a Recommit "[RISCV] Implement support for bf16 truncate/extend on hard FP targets"
Without the changes from D153598.

Original commit message:

For the same reasons as D151284, this requires custom lowering of the
truncate libcall on hard float ABIs (the normal libcall code path is
used on soft ABIs).

The extend operation is implemented by a shift just as in the standard
legalisation, but needs to be custom lowered because i32 isn't a legal
type on RV64.

This patch aims to make the minimal changes that result in correct
codegen for the bfloat.ll tests.

Differential Revision: https://reviews.llvm.org/D151663
2023-06-23 17:23:12 -07:00
Craig Topper
076759f068 Revert "[RISCV] Implement support for bf16 truncate/extend on hard FP targets"
This was committed with D153598 merged into it. Reverting to recommit as separate patches.

This reverts commit 690b1c847f0b188202a86dc25a0a76fd8c4618f4.
2023-06-23 17:23:12 -07:00
Teresa Johnson
200cc952a2 [LTO][GlobalDCE] Use pass parameter instead of module flag for LTO phase
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).

Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.

Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.

Differential Revision: https://reviews.llvm.org/D153655
2023-06-23 17:05:07 -07:00
Paul Kirth
44265dc355 Reland [llvm] Preliminary fat-lto-objects support
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a `.llvm.lto` section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new `.llvm.lto` section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the `.llvm.lto` section and object code
in  `.text` are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Earlier versions of this patch were missing REQUIRES lines for llc
related tests in Transforms/EmbedBitcode. Those tests are now under
CodeGen/X86, which should avoid running the check on unsupported
platforms.

Reviewed By: tejohnson, MaskRay, nikic

Differential Revision: https://reviews.llvm.org/D146776
2023-06-23 23:23:58 +00:00