46631 Commits

Author SHA1 Message Date
David Sherwood
37f8ffc64c [AArch64][SME2] Add LLVM IR intrinsics for the vertical dot products
Adds intrinsics for the following SME2 instructions:

* BFVDOT (32-bit)
* FVDOT (32-bit)
* SVDOT (2-way) (32-bit)
* SVDOT (4-way) (32-bit and 64-bit)
* UVDOT (2-way) (32-bit)
* UVDOT (4-way) (32-bit and 64-bit)
* SUVDOT (32-bit)
* USVDOT (32-bit)

NOTE: These intrinsics are still in development and are subject to future changes.

Differential Revision: https://reviews.llvm.org/D142000
2023-01-20 13:01:03 +00:00
Kerry McLaughlin
2e35d684d7 [AArch64][SME2] Add multi-vector multiply-add long intrinsics.
Adds (single, multi & indexed) intrinsics for the following:
 - bfmlal/bfmlsl
 - fmlal/fmlsl
 - smlal/smlsl
 - umlal/umlsl

This patch also extends SelectSMETileSlice to handle scaled vector select offsets.

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D142004
2023-01-20 11:33:29 +00:00
Kerry McLaughlin
cfd3a0e04a [AArch64][SME2] Add multi-vector fused multiply-add/subtract intrinsics
Adds intrinsics for the following:
 - fmla (single, multi & indexed)
 - fmls (single, multi & indexed)

NOTE: These intrinsics are still in development and are subject
to future changes.

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D141946
2023-01-20 11:14:41 +00:00
Craig Topper
f4fa34c359 Revert "[X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows."
This reverts commit 928a1764d6bdf84073c9d85875f45c1716d6ff12.

Committed accidentally
2023-01-20 00:41:14 -08:00
Craig Topper
928a1764d6 [X86][WIP] Change precision control to FP80 during u64->fp32 conversion on Windows.
This is an alternative to D141074 to fix the problem by adjusting
the precision control dynamically.

This isn't quite complete yet. I want to support fadd with an load
folded into it too. That's the code we will usually generate.

Posting for early review so we can do some testing of this solution.

Differential Revision: https://reviews.llvm.org/D142178
2023-01-20 00:34:05 -08:00
Craig Topper
1692dff0b3 Revert "[X86] Avoid converting u64 to f32 using x87 on Windows"
This reverts commit a6e3027db7ebe6863e44bafcfeaacc16bdc88a3f.

Chrome and Halide are both reporting issues with importing builtins.

Maybe the better direction is to manually adjust FPCW for the inline
sequence on Windows.
2023-01-19 21:36:07 -08:00
Ben Shi
c919ea5b48 [AVR] Fix incorrectly printed global symbol operands in inline-asm
Fixes https://github.com/llvm/llvm-project/issues/58879

Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D142096
2023-01-20 09:45:00 +08:00
Jeffrey Byrnes
1f08d3bc3a [AMDGPU] Further reduce attaching of implicit operands to spills
Extension of https://reviews.llvm.org/D141101 to even further reduce the amount of implicit operands we attach. The main benefit is to improve cability of post-ra scheduler, and reduce unneeded dependency resolution (e.g. inserting snops).

Unfortunately, we run into regressions if we completely minimize the amount implicit operands (naively), we run into some regressions (e.g. dual_movs are replaced with multiple calls to v_mov). This is even more reason to switch to LiveRegUnits.

Nonetheless, this patch removes the operands which we can for free (more or less).

Change-Id: Ib4f409202b36bdbc59eed615bc2d19fa8bd8c057

Differential Revision: https://reviews.llvm.org/D141557

Change-Id: I8b039e3c0d39436b384083f8beb947ee1b1730b2
2023-01-19 14:31:07 -08:00
Evgenii Stepanov
bd3ee371e9 Revert "[AArch64][v8.3A] Avoid inserting implicit landing pads (PACI*SP)"
Linux kernel sets SCTRL_EL1.BT0 and BT1 to 1 unconditionally, which
makes PACIASP equivalent to BTI C + PACIA LR,SP.

Use the shorter instruction sequence by default.

I'm not aware of anyone who needs the opposite. They are welcome to
revert to the current behavior under a subtarget feature or an
environment check.

This reverts commit 571c8c5263a79293aaadae07b11feb36726eaf53.

Differential Revision: https://reviews.llvm.org/D141978
2023-01-19 14:09:22 -08:00
Paul Kirth
af9a452e57 [llvm][codegen] Fix non-determinism in StackFrameLayoutAnalysisPass output
We were iterating over a SmallPtrSet when outputting slot variables.
This is still correct but made the test fail under reverse iteration.
This patch replaces the SmallPtrSet with a SmallVector.

Also remove the "Stack Frame Layout" lines from arm64-opt-remarks-lazy-bfi test,
since those also break under reverse iteration.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D142127
2023-01-19 20:04:14 +00:00
Stanislav Mekhanoshin
63e7e9c875 [AMDGPU] Treat WMMA the same as MFMA for sched_barrier
MFMA and WMMA essentially the same thing, but apear on different ASICs.

Differential Revision: https://reviews.llvm.org/D142062
2023-01-19 10:52:31 -08:00
Stanislav Mekhanoshin
e7f080b359 [AMDGPU] Introduce separate register limit bias in scheduler
Current implementation abuses ErrorMargin to apply an additional
bias to VGPR and SGPR limits under a high register pressure. The
ErrorMargin exists to account for inaccuracies of the RP tracker
and not to tackle an excess pressure. Introduce separate bias for
this purpose and also make it different for SGPRs and VGPRs as we
may want to use different values in the future.

This is supposed to be NFC, however there is a subtle difference
when subtracting a margin overflows the limit. Doing two subtractions
makes it less probable, although manifests only in mir tests with
an artificially small register budget.

Differential Revision: https://reviews.llvm.org/D142051
2023-01-19 10:51:40 -08:00
Zino Benaissa
68f45796ed [AARCH64][SVE] Do not optimize vector conversions
shuffle_vector instructions are serialized targeting SVE fixed vectors, see
https://reviews.llvm.org/D139111. This patch disables
optimizeExtendOrTruncateConversion peepholes that generates shuffle_vector.

Differential Revision: https://reviews.llvm.org/D141439
2023-01-19 16:50:31 +00:00
Jonas Paulsson
a9c5a98f81 [SystemZ] Improvement in tryRxSBG().
Only allow replacements of nodes that have a single user. This is better as
simple instructions (e.g. XGRK) are one cycle faster, and it helps in cases
where both inputs share a common node.

Review: Ulrich Weigand
2023-01-19 10:43:52 -06:00
Michal Paszkowski
2bcedd4643 [SPIR-V] Emit OpExecutionMode ContractionOff for no FP_CONTRACT metadata
This change makes the AsmPrinter emit OpExecutionMode ContractionOff
when both opencl.enable.FP_CONTRACT and spirv.ExecutionMode
metadata are not present.

Differential Revision: https://reviews.llvm.org/D141734
2023-01-19 15:26:34 +01:00
Alex Brachet
67bd3c58c0 [X86] Add register definitions for cfi directives
Add {e,r}flags, {g,f}s.base registers so they can be referenced in cfi
directives,. They are not otherwise useable in any instructions,
but can be implicitly pushed to the stack like with pushf for
{e,r}flags.

Differential Revision: https://reviews.llvm.org/D141879
2023-01-19 14:10:31 +00:00
Amaury Séchet
7e5681cf29 [DAG] Peek through ZEXT/TRUNC in foldAddSubMasked1
Fix a regression in D141883

Depends on D141883

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141884
2023-01-19 13:23:42 +00:00
Michal Paszkowski
786cb151d9 [SPIR-V] Add -opaque-pointers=0 to some LIT tests
Differential Revision: https://reviews.llvm.org/D142061
2023-01-19 14:02:14 +01:00
Amaury Séchet
2826869d7b [DAG] Do not combine any_ext when we combine and into zext.
This transofrm loses information that can be useful for other transforms.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141883
2023-01-19 12:37:05 +00:00
David Sherwood
871815e062 [AArch64][SVE2p1] Add SVE2.1 while (predicate-pair) intrinsics
Adds intrinsics for the following instructions:

* WHILEGE (predicate pair)
* WHILEGT (predicate pair)
* WHILEHI (predicate pair)
* WHILEHS (predicate pair)
* WHILELE (predicate pair)
* WHILELO (predicate pair)
* WHILELS (predicate pair)
* WHILELT (predicate pair)

I've added an opcode selector called SelectOpcodeFromVT to
AArch64ISelDAGToDAG.cpp that we will extend in future to
select opcodes from different MVTs. For now, the only use is
for selecting predicate types.

NOTE: These intrinsics are still in development and are subject
to future changes.

Differential Revision: https://reviews.llvm.org/D141936
2023-01-19 09:32:20 +00:00
wanglei
7fa0a3c923 [LoongArch] Add an option for MCInstPrinter to print numeric reg names
`-loongarch-numeric-reg` for llvm-mc and llc.
`-M numeric` (which matches GNU objdump) for llvm-objdump and llvm-mc.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D141743
2023-01-19 16:29:22 +08:00
icedrocket
a6e3027db7 [X86] Avoid converting u64 to f32 using x87 on Windows
The code below currently prints less accurate values only on Windows 32-bit. On Windows, the default precision control on x87 is only 53-bit, and FADD triggers rounding with that precision, so the final result may be less accurate. This revision avoids less accurate conversions by using library calls instead.

```

int main() {
    int64_t n = 0b0000000000111111111111111111111111011111111111111111111111111111;
    printf("%lld, %.0f, %.0f", n, (float)n, (float)(uint64_t)n);

    return 0;
}
```

Reviewed By: craig.topper, lebedev.ri

Differential Revision: https://reviews.llvm.org/D141074
2023-01-18 22:41:34 -08:00
Paul Kirth
557a5bc336 [codegen] Add StackFrameLayoutAnalysisPass
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Reviewed By: nickdesaulniers, thegameg

Differential Revision: https://reviews.llvm.org/D135488
2023-01-19 01:51:14 +00:00
Jeffrey Byrnes
f0e7ae085f [AMDGPU] Run autogen checks on test
Change-Id: I46f2ced9ceac592c2a93a00631014a806d4b0693
2023-01-18 16:12:18 -08:00
Tulio Magno Quites Machado Filho
1136cf1721 [SystemZ] Implement lowering of GET_ROUNDING
Add support for _FLT_ROUNDS_ in SystemZ.

Patch by Tulio Magno Quites Machado Filho.

Reviewed By: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D140988
2023-01-18 14:41:19 -06:00
Roman Lebedev
7460842fb2
[DAGCombiner] combineShuffleOfSplatVal(): don't assert that shuffle is non-undef
As per the test case from Steven Johnson in https://reviews.llvm.org/rGf8d9097168b7#1165311
we can indeed encounter such shuffles, that produce all-undef after folding,
before something else manages to optimize them away.
2023-01-18 18:45:08 +03:00
Samuel Parker
32af267447 [NFC][WebAssembly] Add tests
Add more variations to fpclamptosat.
2023-01-18 13:30:53 +00:00
Dmitry Bushev
3cba33c56f [RISCV][ISelLowering] Fix select lowering issue
Fix bug that leads to some pseudo instructions not being lowered.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D141395
2023-01-18 15:09:40 +03:00
David Green
21df504399 [DAG][ARM][AArch64] Transform max(a,b) - min(a,b) -> abd(a,b)
This adds both signed and unsigned transforms for
max(a, b) - min(a, b) -> abd(a, b).

unsigned: https://alive2.llvm.org/ce/z/RF4jGQ
signed: https://alive2.llvm.org/ce/z/Cjr2zE

Fixes: #59894

Differential Revision: https://reviews.llvm.org/D141706
2023-01-18 11:44:26 +00:00
Tim Northover
3ed58d4df6 AArch64: allocate small fixed args properly in varargs functions.
On Darwin, function arguments occupy their real size when passed on the stack
(e.g. an i16 only consumes 2 bytes). This means that, even for fixed args in
varargs calls we need to keep track of the original type being passed before
any DAG/GISel promotions. Existing logic only applied this fix to the
non-varargs case leading to mismatch between caller & callee in those
situations.

On Linux & Windows these arguments always occupy a 64-bit slot anyway so
there's no special handling needed.
2023-01-18 11:35:24 +00:00
chenglin.bi
45299fb0f9 Reapply [AArch64] fold subs ugt/ult to ands when the second operand is mask/pow2
Origianl patch made a mistake that ugt is reverse cc should be ule.
And ule < C will be generalize to ult < C + 1. So the new patch add support for ult < Pow2 case.

https://alive2.llvm.org/ce/z/naBw5A

Reviewed By: samtebbs, chapuni

Differential Revision: https://reviews.llvm.org/D141829
2023-01-18 19:24:20 +08:00
Luke Lau
a0d80c2398 [RISCV] Generalize performFP_TO_INTCombine to vectors
Like in the scalar domain, combine calls to (fp_to_int (ftrunc X)) on
scalable and fixed-length vectors into a single vfcvt instruction.
For truncating rounds, the static vfcvt.rtz rounding mode is used.
Otherwise use the VFCVT_RM_ variants to set the rounding mode
dynamically.
Closes https://github.com/llvm/llvm-project/issues/56737

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D141599
2023-01-18 10:53:24 +00:00
Luke Lau
98b9340c07 [RISCV][NFC] Add test cases for rounding vfcvt
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D141600
2023-01-18 10:53:21 +00:00
David Green
e26ec330c4 [DAG][AArch64][ARM] Combine abd(sub(x, y)) to abd if the sub is nsw
This implements the fold (abs (sub nsw x, y)) -> abds(x, y). Providing
the sub is nsw this appears to be valid without the extensions that are
usually used for abds. https://alive2.llvm.org/ce/z/XHVaB3. The
equivalent abdu combine seems to not be valid.

Differential Revision: https://reviews.llvm.org/D141665
2023-01-18 10:10:52 +00:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Pierre van Houtryve
fd3300123d [CodeGen] Prevent overlapping subregs in getCoveringSubRegIndexes
If `getCoveringSubRegIndexes` returns a set of subregister indexes where some subregisters overlap others, it can create unsatisfiable copy bundles that eventually cause VirtRegRewriter to error out due to "cycles in copy bundle".

We can simply prevent this by making the algorithm skip over subregisters indexes that would cause an overlap with already-covered lanes.

Note that in the case of AMDGPU, this problem is caused by the lack of subregisters indexes for 13/14/15-register tuples. We have everything up until 12, then we have 16 and 32 but nothing between 12 and 16.
This means that the best candidate to do the least amount of copies when splitting a 29-register tuple was to copy (e.g.) 0-15 and 14-29, causing an overlap.
With this change, getCoveringSubRegIndexes will now prefer using something like 0-15, 16-28 and 1

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141576
2023-01-18 03:50:17 -05:00
Pierre van Houtryve
6a60a68e72 [AMDGPU] Precommit test for D141576
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141903
2023-01-18 03:49:37 -05:00
chendewen
d0942df43e [AArch64][SVE] Add more intrinsics in 'isZeroingInactiveLanes'.
The REINTERPRET_CAST operation generates redundant and and ptrue instructions.
For some instructions, this is redundant, because its inactive lanes are zeroed by construction.
For example. Codegen before:
```
facgt p2.d, p0/z, z4.d, z1.d
ptrue p1.d
and p1.b, p2/z, p2.b, p1.b
```
After:
```
facgt p1.d, p0/z, z4.d, z1.d
```
ref: https://reviews.llvm.org/D129851

Reviewed By:sdesmalen,paulwalker-arm

Differential Revision:https://reviews.llvm.org/D141469
2023-01-18 11:06:13 +08:00
Anshil Gandhi
5073a622a7 [MachineBasicBlock] Explicit FT branching param
Introduce a parameter in getFallThrough() to optionally
allow returning the fall through basic block in spite of
an explicit branch instruction to it. This parameter is
set to false by default.

Introduce getLogicalFallThrough() which calls
getFallThrough(false) to obtain the block while avoiding
insertion of a jump instruction to its immediate successor.

This patch also reverts the changes made by D134557 and
solves the case where a jump is inserted after another jump
(branch-relax-no-terminators.mir).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140790
2023-01-17 17:12:08 -07:00
Rahman Lavaee
3d6841b2b1 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2023-01-17 15:25:29 -08:00
Craig Topper
29f5e9e6f0 [RISCV] Use zeroext instead of signext in mask reduction tests. NFC
This is more consistent with ABI and how bools on RISC-V are
represented.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D141963
2023-01-17 15:15:45 -08:00
Mircea Trofin
5898be19e6 [mlgo] Remove the protobuf dependency
The dependency was due to the log format. This change switches to the
previously-introduced (D139370) "dependency-free" logger instead of the
protobuf-based one.

A subsequent change will clean out the unnecessary abstraction left
behind.

This change drops the logger unittest, we have sufficient test coverage
via lit tests, and a unit test would require adding, unnecesarily, a log
reader (the reader is expected to be python, for the ML side, and there
is a reader for that under Analysis/models, used for tests).

Differential Revision: https://reviews.llvm.org/D141720
2023-01-17 13:12:27 -08:00
Craig Topper
b8b756c6f1 [RISCV] Add missing check prefixes to vreductions-mask.ll. NFC
There's a conflict between the riscv32 and riscv64 output for some
tests which caused the script to drop the check lines.

Add specific check prefixes for these cases.
2023-01-17 12:57:51 -08:00
Noah Goldstein
ca5d11751e Add additional tests for ctlz{_zero_undef} to test folding with xor; NFC
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D141549
2023-01-17 11:04:26 -08:00
David Green
1f2c37afbe [AArch64][SVE] Implement isVScaleKnownToBeAPowerOfTwo
According to https://developer.arm.com/documentation/102105/ia-00/?lang=en

> Arm is making a retrospective change to the SVE architecture to remove
> the capability of selecting a non-power-of-two vector length in
> non-Streaming SVE as well as in Streaming SVE mode. Specific updates as
> a result of this change will be communicated in due course.

This patch implements the isVScaleKnownToBeAPowerOfTwo method to teach
DAG Combines that VScale will be known to be a power of 2, which helps
reduce or simplify some expressions (notably the udiv in vector trip
count expressions).

Differential Revision: https://reviews.llvm.org/D141486
2023-01-17 15:49:29 +00:00
Francesco Petrogalli
229162d4d7 [MIScheduler] Print top/down cycle in the SUnit dump.
Add an extra command line option to `llc` that allows checking at what cycle an instruction has been scheduled by the machine scheduler.

Differential Revision: https://reviews.llvm.org/D141289
2023-01-17 15:55:43 +01:00
zhongyunde
2deb10c108 [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging
This fix is similar to D124325, and I find the DestructiveBinaryComm
operation type also may be allocated same register, so insert the LSL.

      movprfx       z0.s, p0/z, z0.s
      lsl z0.b, p0/m, z0.b, #0
      fmul z0.s, p0/m, z0.s, z0.s

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D141471
2023-01-17 20:45:59 +08:00
David Green
e49367e7f3 [ARM] Fix i1 shuffle lowering with multiple operands.
The existing lowering of i1 vector shuffle was only considering
single-source shuffles, always assuming the second was undef. This
extends that to properly handle both operands.
2023-01-17 11:29:51 +00:00
chenglin.bi
6c37fbdfcf Revert "[AArch64] fold subs ugt/ult to ands when the second operand is a mask"
This reverts commit 4a64024c1410692197e4b54e27e7b269a67c78f4.

The original commit made a misstake that ugt reverse should be ule
2023-01-17 18:41:44 +08:00
Samuel Parker
bba9221d9f [NFC][WebAssembly] Update test
Run update_llc_test_checks.py on address-offsets.ll
2023-01-17 10:34:43 +00:00