52796 Commits

Author SHA1 Message Date
Qi Hu
ddd7d35c6c [RegAlloc] Fix assertion failure caused by inline assembly
When inline assembly code requests more registers than available, the
MachineInstr::emitError function in the RegAllocFast pass emits an error
but doesn't stop the pass, and then the compiler crashes later with an
assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns
a random physical register, and therefore avoids the crash after producing
the diagnostic. This problem has been observed for both rustc and clang,
while it doesn't occur in gcc.
2023-07-25 19:21:03 -04:00
Corbin Robeck
7a4968b5a3 [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output
In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156040

Author:    Corbin Robeck <corbin.robeck@amd.com>
2023-07-25 12:20:13 -07:00
Kevin P. Neal
76c22b18ea [FPEnv][AMDGPU] Correct strictfp tests.
Correct AMDGPU strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

Mostly these tests just needed the strictfp attribute on function
definitions.  I've also removed the strictfp attribute from uses
of the constrained intrinsics because it comes by default since
D154991, but I only did this in tests I was changing anyway.

I also removed attributes added to declare lines of intrinsics. The
attributes of intrinsics cannot be changed in a test so I eliminated
attempts to do so.

Test changes verified with D146845.
2023-07-25 13:24:46 -04:00
Craig Topper
f6dc75cdd8 [RISCV] Add DAG combine to pull xor with 1 through select idiom that uses czero_eqz/nez.
If we are selecting between two setccs that need to be legalized
with xor, the select will be legalized first. Detect this pattern
so we can pull the xor through to expose it to additional
optimizations.

We could generalize this to other operations, but those normally
get handled in DAG combine before select legalization.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156159
2023-07-25 09:13:24 -07:00
Craig Topper
b34a8b3a52 [RISCV] Generalize combineAddOfBooleanXor to support any boolean not just setcc.
Instead of checking for setcc, look for any 0/1 value.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156153
2023-07-25 09:04:49 -07:00
Craig Topper
5ff5dac852 [RISCV] Add simple DAG combine to pull xor with 1 through select_cc.
If we're selecting the result of two setccs that have been legalized
by introducing an xor with 1, we can pull the xor with 1 through the
select to enable more optimizations.

We could generalize this to other binary operators with identical
conditions, but those are usually caught before we legalize the select.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156144
2023-07-25 09:03:45 -07:00
Weining Lu
212d6aa0da Revert "[LoongArch] Support -march=native and -mtune="
This reverts commit 92c06114b2ea9900a3364fb395988dfb065758f7.
2023-07-25 23:32:15 +08:00
Nikita Popov
0cab8d2041 Reapply [IR] Mark and/or constant expressions as undesirable
This reapplies the change for and, but also marks or as undesirable
at the same time. Only handling one of them can cause infinite
combine loops due to the asymmetric handling.

-----

In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
2023-07-25 15:31:45 +02:00
Weining Lu
92c06114b2 [LoongArch] Support -march=native and -mtune=
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.

D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.

A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.

`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.

And these two preprocessor macros are defined:
- __loongarch_arch
- __loongarch_tune

[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc

Differential Revision: https://reviews.llvm.org/D155824
2023-07-25 21:01:51 +08:00
Matt Arsenault
e3fd8f83a8 AMDGPU: Correctly expand f64 sqrt intrinsic
rocm-device-libs and llpc were avoiding using f64 sqrt
intrinsics in favor of their own expansions. Port the
expansion into the backend. Both of these users should be
updated to call the intrinsic instead.

The library and llpc expansions are slightly different.
llpc uses an ldexp to do the scale; the library uses a multiply.

Use ldexp to do the scale instead of the multiply.
I believe v_ldexp_f64 and v_mul_f64 are always the same number of
cycles, but it's cheaper to materialize the 32-bit integer constant
than the 64-bit double constant.

The libraries have another fast version of sqrt which will
be handled separately.

I am tempted to do this in an IR expansion instead. In the IR
we could take advantage of computeKnownFPClass to avoid
the 0-or-inf argument check.
2023-07-25 07:54:11 -04:00
Matt Arsenault
47b3ada432 AMDGPU: Add more sqrt f64 lowering tests
Almost all permutations of the flags are potentially relevant.
2023-07-25 07:54:11 -04:00
Paul Walker
74445d652d [SVE] Add vselect(mla/mls) patterns for cases where a multiplicand is used for the false lanes.
Differential Revision: https://reviews.llvm.org/D155972
2023-07-25 10:02:32 +00:00
Jim Lin
4eff7fae60 [RISCV] Merge rv32/rv64 vector narrowing integer right shift intrinsic tests that have the same content. NFC. 2023-07-25 16:09:21 +08:00
LiaoChunyu
620e61c518 [RISCV] Match ext_vl+sra_vl/srl_vl+trunc_vector_vl to vnsra.wv/vnsrl.wv
similar to D117454, try to add vl patterns and testcases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155466
2023-07-25 14:21:02 +08:00
Freddy Ye
6d23a3faa4 [X86] Support -march=graniterapids-d and update -march=graniterapids
Reviewed By: pengfei, RKSimon, skan

Differential Revision: https://reviews.llvm.org/D155798
2023-07-25 13:48:31 +08:00
pvanhout
3cd4afce5b [AMDGPU] Allow vector access types in PromoteAllocaToVector
Depends on D152706
Solves SWDEV-408279

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D155699
2023-07-25 07:44:48 +02:00
pvanhout
3890a3b113 [AMDGPU] Use SSAUpdater in PromoteAlloca
This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly.

Note PromoteAlloca is still reliant on SROA running first to
canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass.

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D152706
2023-07-25 07:44:47 +02:00
WANG Rui
e7c9a99dfe [LoongArch] Implement isSExtCheaperThanZExt
Implement isSExtCheaperThanZExt.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Differential Revision: https://reviews.llvm.org/D154919
2023-07-25 09:41:32 +08:00
WANG Rui
1a3da0bc1e [LoongArch] Add test case showing suboptimal codegen when zero extending
Add test case showing suboptimal codegen when zero extending.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D154918
2023-07-25 09:31:33 +08:00
chenli
d25c79dc70 [LoongArch] Support InlineAsm for LSX and LASX
The author of the following files is licongtian <licongtian@loongson.cn>:
- clang/lib/Basic/Targets/LoongArch.cpp
- llvm/lib/Target/LoongArch/LoongArchAsmPrinter.cpp
- llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp

The files mentioned above implement InlineAsm for LSX and LASX as follows:
- Enable clang parsing LSX/LASX register name, such as $vr0.
- Support the case which operand type is 128bit or 256bit when the
  constraints is 'f'.
- Support the way of specifying LSX/LASX register by using constraint,
  such as "={$xr0}".
- Support the operand modifiers 'u' and 'w'.
- Support and legalize the data types and register classes involved in
  LSX/LASX in the lowering process.

Reviewed By: xen0n, SixWeining

Differential Revision: https://reviews.llvm.org/D154931
2023-07-25 09:02:29 +08:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00
Craig Topper
49429783b0 [RISCV] Add lowering for scalar fmaximum/fminimum.
Unlike fmaxnum and fminnum, these operations propagate nan and
consider -0.0 to be less than +0.0.

Without Zfa, we don't have a single instruction for this. The
lowering I've used forces the other input to nan if one input
is a nan. If both inputs are nan, they get swapped. Then use
the fmax or fmin instruction.

New ISD nodes are needed because fmaxnum/fminnum to not define
the order of -0.0 and +0.0.

This lowering ensures the snans are quieted though that is probably not
required in default environment). Also ensures non-canonical nans
are canonicalized, though I'm also not sure that's needed.

Another option could be to use fmax/fmin and then overwrite the
result based on the inputs being nan, but I'm not sure we can do
that with any less code.

Future work will handle nonans FMF, and handling the case where
we can prove the input isn't nan.

This does fix the crash in #64022, but we need to do more work
to avoid scalarization.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D156069
2023-07-24 13:46:35 -07:00
Philip Reames
a6cd1f623d [RISCV] Adjust memcpy lowering test coverage w/V
This is fixing a mistake in 4f4f49137.
2023-07-24 12:32:17 -07:00
Philip Reames
4f4f491375 [RISCV] Add memcpy lowering test coverage with and without V 2023-07-24 12:25:09 -07:00
Matt Arsenault
0d797b71eb RegisterCoaleser: Fix empty subrange verifier error
In this example an implicit def had live-out undef subrange
defs. After coalescing with the def from a previous block, the
undef-defed lanes are no longer live out of the block in the new
interval. An empty subrange was tenatively created for these lanes,
but it must be deleted.
2023-07-24 12:18:34 -04:00
Matt Arsenault
2a53b6c06b RegisterCoalescer: Fix verifier error on redef of subregister for live out implicit_defs
A live out implicit_def wasn't deleted, but the subranges weren't
correctly updated. The main range was correct but the def
corresponding to the initial main range def instruction was missing
from the lanes redefined in another block.

The written lanes are not quite the same as the valid lanes in the
case of an implicit_def.

Fixes verifier error in blender. There is an additional verifier in
some of the testcase variants where an empty subrange remains.
2023-07-24 12:18:34 -04:00
Matt Arsenault
e561e7cb48 AMDGPU: Implement combineRepeatedFPDivisors 2023-07-24 11:19:36 -04:00
Yuanqiang Liu
5e32f1da06 [NVPTX] Expand select_cc on bfloat16 type
Expand select_cc on bfloat16 and bfloat16v2 type.

Differential Revision: https://reviews.llvm.org/D156085
2023-07-24 17:01:31 +02:00
Craig Topper
5990199e2c [RISCV] Add CZERO_EQZ/CZERO_NEZ to ComputeNumSignBitsForTargetNode.
Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D156082
2023-07-24 07:43:02 -07:00
Craig Topper
82686d7d55 [RISCV] Add test case for D156082 to condops.ll
This test is copied from select-cc.ll. It wasn't worth adding
Zicond RUN lines to that file.

Reviewed By: asb, wangpc

Differential Revision: https://reviews.llvm.org/D156083
2023-07-24 07:42:46 -07:00
Craig Topper
9da0db4dd8 [RISCV] Add CZERO_EQZ/CZERO_NEZ to computeKnownBitsForTargetNode.
Reviewed By: wangpc

Differential Revision: https://reviews.llvm.org/D156081
2023-07-24 07:38:12 -07:00
Pravin Jagtap
d163b76ce3 [AMDGPU] Fix llvm.amdgcn.wave.reduce.umax/umin MIR tests
Fixes the MIR tests reported in https://lab.llvm.org/buildbot/#/builders/16/builds/51955

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156125
2023-07-24 10:19:37 -04:00
Simon Pilgrim
1e86b63eb3 [X86] fpclamptosat.ll - add nounwind to get rid of cfi noise
Helps cleanup D150372
2023-07-24 15:11:26 +01:00
David Green
c3c8f0025a [AArch64] Add vselect(fmin/fmax) SVE patterns
For both minnum/maxnum and minimum/maximum, this adds tablegen patterns for
vselect(fmin/fmax), creating a predicate fminnm/fmaxnm/fmin/fmax nodes.

Differential Revision: https://reviews.llvm.org/D155872
2023-07-24 14:55:38 +01:00
David Green
0e30ca2ec9 [AArch64] Extra testing for vselect(fmin/max patterns. NFC
See D155872.
2023-07-24 14:55:38 +01:00
Simon Pilgrim
de3f7f01fe [X86] combineConcatVectorOps - add concat(ctpop)/concat(ctlz)/concat(cttz) handling 2023-07-24 14:50:28 +01:00
Simon Pilgrim
bcf728ed8a [X86] Add some basic concat(ctpop)/concat(ctlz)/concat(cttz) widening tests 2023-07-24 14:50:28 +01:00
Simon Pilgrim
2773098ee3 [X86] combineConcatVectorOps - add basic concat(unpack(x,y),unpack(z,w)) -> unpack(concat(x,z),concat(y,w)) handling
Very limited support as we don't want to interfere with build_vector patterns
2023-07-24 13:50:15 +01:00
Sander de Smalen
c80976549a [AArch64] NFC: Move fadda tests to separate file.
We want to test the fadda tests with 'streaming-compatible' flags,
such that we can ensure no 'fadda' (not valid in streaming mode) is
generated.
2023-07-24 12:02:14 +00:00
Sander de Smalen
07d6502045 [AArch64][SME] NFC: Pass target feature on RUN line, instead of function attribute.
This is anticipating adding new RUN lines testing for +sme, alongside +sve/+sve2.
2023-07-24 12:02:13 +00:00
Joseph Huber
6a51997ccc [NVPTX] Fix lack of .noreturn on certain functions for aliases
Forgot to include this special handling on the declaration of the alias
function.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D156012
2023-07-24 07:01:05 -05:00
Simon Pilgrim
454bea07b2 [X86] combineConcatVectorOps - add concat(psadbw(x,y),psadbw(z,w)) -> psadbw(concat(x,z),concat(y,w)) handling 2023-07-24 11:34:08 +01:00
Simon Pilgrim
ce47e13bb9 [X86] Add reduce_add(ctpop(x)) 'count all bits in a vector' tests
Also add some basic buildvector variants: build_vector(reduce_add(ctpop(x0)), reduce_add(ctpop(x1)), ...)
2023-07-24 11:34:08 +01:00
Nikita Popov
e49103b279 [Mips] Fix argument lowering for illegal vector types (PR63608)
The Mips MSA ABI requires that legal vector types are passed in
scalar registers in packed representation. E.g. a type like v16i8
would be passed as two i64 registers.

The implementation attempts to do the same for illegal vectors with
non-power-of-two element counts or non-power-of-two element types.
However, the SDAG argument lowering code doesn't support this, and
it is not easy to extend it to support this (we would have to deal
with situations like passing v7i18 as two i64 values).

This patch instead opts to restrict the special argument lowering
to only vectors with power-of-two elements and round element types.
Everything else is lowered naively, that is by passing each element
in promoted registers.

Fixes https://github.com/llvm/llvm-project/issues/63608.

Differential Revision: https://reviews.llvm.org/D154445
2023-07-24 12:07:09 +02:00
Jay Foad
0f10850e51 [CodeGen] Add machine verification to some tests
This is to catch errors in an upcoming patch.
2023-07-24 11:04:10 +01:00
Luke Lau
5b95bba6fe [RISCV] Set Fast flag for unaligned memory accesses
The +unaligned-scalar-mem and +unaligned-vector-mem features were added in
D126085 and D149375 respectively to allow subtargets to indicate that
they supported misaligned loads/stores with "sufficient" performance.
This is separate from whether or not the target actually supports
misaligned accesses, which could be determined from Zicclsm.

This patch enables the Fast flag under the assumption that any subtarget
that declares support for +unaligned-*-mem will want to opt into
optimisations that take advantage of misaligned scalar accesses, such as
store merging.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D150771
2023-07-24 10:58:57 +01:00
WANG Rui
9c21f95541 [LoongArch] Implement isZextFree
This returns true for 8-bit and 16-bit loads, allowing ld.bu/ld.hu to be selected and avoiding unnecessary masks.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154819
2023-07-24 17:49:25 +08:00
WANG Rui
90e08c2600 [LoongArch] Add test case showing suboptimal codegen when loading unsigned char/short
Implementing isZextFree will allow ld.bu or ld.hu to be selected rather than ld.b+mask and ld.h+mask.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154818
2023-07-24 17:48:16 +08:00
WANG Rui
899aaffcbc [LoongArch] Implement isLegalICmpImmediate
This causes a trivial improvement in the legalicmpimm.ll test case.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154811
2023-07-24 17:42:11 +08:00
WANG Rui
0cceea90bf [LoongArch][NFC] Add tests for (X & -256) == 256 -> (X >> 8) == 1
Add tests for (X & -256) == 256 -> (X >> 8) == 1.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D154810
2023-07-24 17:42:10 +08:00