llvm-project

Author	SHA1	Message	Date
Qi Hu	ddd7d35c6c	[RegAlloc] Fix assertion failure caused by inline assembly When inline assembly code requests more registers than available, the MachineInstr::emitError function in the RegAllocFast pass emits an error but doesn't stop the pass, and then the compiler crashes later with an assertion failure. This commit, mimicking the RegAllocGreedy pass, assigns a random physical register, and therefore avoids the crash after producing the diagnostic. This problem has been observed for both rustc and clang, while it doesn't occur in gcc.	2023-07-25 19:21:03 -04:00
Corbin Robeck	7a4968b5a3	[AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output In code object 5 (https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata) the AMDGPU backend added the .uses_dynamic_stack bit to the kernel meta data to identity kernels which have compile time indeterminable stack usage (indirect function calls and recursion mainly). This patch adds this information to the output of the kernel-resource-usage remarks. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156040 Author: Corbin Robeck <corbin.robeck@amd.com>	2023-07-25 12:20:13 -07:00
Kevin P. Neal	76c22b18ea	[FPEnv][AMDGPU] Correct strictfp tests. Correct AMDGPU strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. I also removed attributes added to declare lines of intrinsics. The attributes of intrinsics cannot be changed in a test so I eliminated attempts to do so. Test changes verified with D146845.	2023-07-25 13:24:46 -04:00
Craig Topper	f6dc75cdd8	[RISCV] Add DAG combine to pull xor with 1 through select idiom that uses czero_eqz/nez. If we are selecting between two setccs that need to be legalized with xor, the select will be legalized first. Detect this pattern so we can pull the xor through to expose it to additional optimizations. We could generalize this to other operations, but those normally get handled in DAG combine before select legalization. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156159	2023-07-25 09:13:24 -07:00
Craig Topper	b34a8b3a52	[RISCV] Generalize combineAddOfBooleanXor to support any boolean not just setcc. Instead of checking for setcc, look for any 0/1 value. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156153	2023-07-25 09:04:49 -07:00
Craig Topper	5ff5dac852	[RISCV] Add simple DAG combine to pull xor with 1 through select_cc. If we're selecting the result of two setccs that have been legalized by introducing an xor with 1, we can pull the xor with 1 through the select to enable more optimizations. We could generalize this to other binary operators with identical conditions, but those are usually caught before we legalize the select. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156144	2023-07-25 09:03:45 -07:00
Weining Lu	212d6aa0da	Revert "[LoongArch] Support -march=native and -mtune=" This reverts commit 92c06114b2ea9900a3364fb395988dfb065758f7.	2023-07-25 23:32:15 +08:00
Nikita Popov	0cab8d2041	Reapply [IR] Mark and/or constant expressions as undesirable This reapplies the change for and, but also marks or as undesirable at the same time. Only handling one of them can cause infinite combine loops due to the asymmetric handling. ----- In preparation for removing support for and/or expressions, mark them as undesirable. As such, we will no longer implicitly create such expressions, but they still exist.	2023-07-25 15:31:45 +02:00
Weining Lu	92c06114b2	[LoongArch] Support -march=native and -mtune= As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Differential Revision: https://reviews.llvm.org/D155824	2023-07-25 21:01:51 +08:00
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Matt Arsenault	47b3ada432	AMDGPU: Add more sqrt f64 lowering tests Almost all permutations of the flags are potentially relevant.	2023-07-25 07:54:11 -04:00
Paul Walker	74445d652d	[SVE] Add vselect(mla/mls) patterns for cases where a multiplicand is used for the false lanes. Differential Revision: https://reviews.llvm.org/D155972	2023-07-25 10:02:32 +00:00
Jim Lin	4eff7fae60	[RISCV] Merge rv32/rv64 vector narrowing integer right shift intrinsic tests that have the same content. NFC.	2023-07-25 16:09:21 +08:00
LiaoChunyu	620e61c518	[RISCV] Match ext_vl+sra_vl/srl_vl+trunc_vector_vl to vnsra.wv/vnsrl.wv similar to D117454, try to add vl patterns and testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155466	2023-07-25 14:21:02 +08:00
Freddy Ye	6d23a3faa4	[X86] Support -march=graniterapids-d and update -march=graniterapids Reviewed By: pengfei, RKSimon, skan Differential Revision: https://reviews.llvm.org/D155798	2023-07-25 13:48:31 +08:00
pvanhout	3cd4afce5b	[AMDGPU] Allow vector access types in PromoteAllocaToVector Depends on D152706 Solves SWDEV-408279 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155699	2023-07-25 07:44:48 +02:00
pvanhout	3890a3b113	[AMDGPU] Use SSAUpdater in PromoteAlloca This allows PromoteAlloca to not be reliant on a second SROA run to remove the alloca completely. It just does the full transformation directly. Note PromoteAlloca is still reliant on SROA running first to canonicalize the IR. For instance, PromoteAlloca will no longer handle aggregate types because those should be simplified by SROA before reaching the pass. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D152706	2023-07-25 07:44:47 +02:00
WANG Rui	e7c9a99dfe	[LoongArch] Implement isSExtCheaperThanZExt Implement isSExtCheaperThanZExt. Signed-off-by: WANG Rui <wangrui@loongson.cn> Differential Revision: https://reviews.llvm.org/D154919	2023-07-25 09:41:32 +08:00
WANG Rui	1a3da0bc1e	[LoongArch] Add test case showing suboptimal codegen when zero extending Add test case showing suboptimal codegen when zero extending. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D154918	2023-07-25 09:31:33 +08:00
chenli	d25c79dc70	[LoongArch] Support InlineAsm for LSX and LASX The author of the following files is licongtian <licongtian@loongson.cn>: - clang/lib/Basic/Targets/LoongArch.cpp - llvm/lib/Target/LoongArch/LoongArchAsmPrinter.cpp - llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp The files mentioned above implement InlineAsm for LSX and LASX as follows: - Enable clang parsing LSX/LASX register name, such as $vr0. - Support the case which operand type is 128bit or 256bit when the constraints is 'f'. - Support the way of specifying LSX/LASX register by using constraint, such as "={$xr0}". - Support the operand modifiers 'u' and 'w'. - Support and legalize the data types and register classes involved in LSX/LASX in the lowering process. Reviewed By: xen0n, SixWeining Differential Revision: https://reviews.llvm.org/D154931	2023-07-25 09:02:29 +08:00
Kai Luo	f26af16e2c	[PowerPC][AIX] Enable quadword atomics by default for AIX On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D151312	2023-07-25 08:21:07 +08:00
Craig Topper	49429783b0	[RISCV] Add lowering for scalar fmaximum/fminimum. Unlike fmaxnum and fminnum, these operations propagate nan and consider -0.0 to be less than +0.0. Without Zfa, we don't have a single instruction for this. The lowering I've used forces the other input to nan if one input is a nan. If both inputs are nan, they get swapped. Then use the fmax or fmin instruction. New ISD nodes are needed because fmaxnum/fminnum to not define the order of -0.0 and +0.0. This lowering ensures the snans are quieted though that is probably not required in default environment). Also ensures non-canonical nans are canonicalized, though I'm also not sure that's needed. Another option could be to use fmax/fmin and then overwrite the result based on the inputs being nan, but I'm not sure we can do that with any less code. Future work will handle nonans FMF, and handling the case where we can prove the input isn't nan. This does fix the crash in #64022, but we need to do more work to avoid scalarization. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156069	2023-07-24 13:46:35 -07:00
Philip Reames	a6cd1f623d	[RISCV] Adjust memcpy lowering test coverage w/V This is fixing a mistake in 4f4f49137.	2023-07-24 12:32:17 -07:00
Philip Reames	4f4f491375	[RISCV] Add memcpy lowering test coverage with and without V	2023-07-24 12:25:09 -07:00
Matt Arsenault	0d797b71eb	RegisterCoaleser: Fix empty subrange verifier error In this example an implicit def had live-out undef subrange defs. After coalescing with the def from a previous block, the undef-defed lanes are no longer live out of the block in the new interval. An empty subrange was tenatively created for these lanes, but it must be deleted.	2023-07-24 12:18:34 -04:00
Matt Arsenault	2a53b6c06b	RegisterCoalescer: Fix verifier error on redef of subregister for live out implicit_defs A live out implicit_def wasn't deleted, but the subranges weren't correctly updated. The main range was correct but the def corresponding to the initial main range def instruction was missing from the lanes redefined in another block. The written lanes are not quite the same as the valid lanes in the case of an implicit_def. Fixes verifier error in blender. There is an additional verifier in some of the testcase variants where an empty subrange remains.	2023-07-24 12:18:34 -04:00
Matt Arsenault	e561e7cb48	AMDGPU: Implement combineRepeatedFPDivisors	2023-07-24 11:19:36 -04:00
Yuanqiang Liu	5e32f1da06	[NVPTX] Expand select_cc on bfloat16 type Expand select_cc on bfloat16 and bfloat16v2 type. Differential Revision: https://reviews.llvm.org/D156085	2023-07-24 17:01:31 +02:00
Craig Topper	5990199e2c	[RISCV] Add CZERO_EQZ/CZERO_NEZ to ComputeNumSignBitsForTargetNode. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D156082	2023-07-24 07:43:02 -07:00
Craig Topper	82686d7d55	[RISCV] Add test case for D156082 to condops.ll This test is copied from select-cc.ll. It wasn't worth adding Zicond RUN lines to that file. Reviewed By: asb, wangpc Differential Revision: https://reviews.llvm.org/D156083	2023-07-24 07:42:46 -07:00
Craig Topper	9da0db4dd8	[RISCV] Add CZERO_EQZ/CZERO_NEZ to computeKnownBitsForTargetNode. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D156081	2023-07-24 07:38:12 -07:00
Pravin Jagtap	d163b76ce3	[AMDGPU] Fix llvm.amdgcn.wave.reduce.umax/umin MIR tests Fixes the MIR tests reported in https://lab.llvm.org/buildbot/#/builders/16/builds/51955 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156125	2023-07-24 10:19:37 -04:00
Simon Pilgrim	1e86b63eb3	[X86] fpclamptosat.ll - add nounwind to get rid of cfi noise Helps cleanup D150372	2023-07-24 15:11:26 +01:00
David Green	c3c8f0025a	[AArch64] Add vselect(fmin/fmax) SVE patterns For both minnum/maxnum and minimum/maximum, this adds tablegen patterns for vselect(fmin/fmax), creating a predicate fminnm/fmaxnm/fmin/fmax nodes. Differential Revision: https://reviews.llvm.org/D155872	2023-07-24 14:55:38 +01:00
David Green	0e30ca2ec9	[AArch64] Extra testing for vselect(fmin/max patterns. NFC See D155872.	2023-07-24 14:55:38 +01:00
Simon Pilgrim	de3f7f01fe	[X86] combineConcatVectorOps - add concat(ctpop)/concat(ctlz)/concat(cttz) handling	2023-07-24 14:50:28 +01:00
Simon Pilgrim	bcf728ed8a	[X86] Add some basic concat(ctpop)/concat(ctlz)/concat(cttz) widening tests	2023-07-24 14:50:28 +01:00
Simon Pilgrim	2773098ee3	[X86] combineConcatVectorOps - add basic concat(unpack(x,y),unpack(z,w)) -> unpack(concat(x,z),concat(y,w)) handling Very limited support as we don't want to interfere with build_vector patterns	2023-07-24 13:50:15 +01:00
Sander de Smalen	c80976549a	[AArch64] NFC: Move fadda tests to separate file. We want to test the fadda tests with 'streaming-compatible' flags, such that we can ensure no 'fadda' (not valid in streaming mode) is generated.	2023-07-24 12:02:14 +00:00
Sander de Smalen	07d6502045	[AArch64][SME] NFC: Pass target feature on RUN line, instead of function attribute. This is anticipating adding new RUN lines testing for +sme, alongside +sve/+sve2.	2023-07-24 12:02:13 +00:00
Joseph Huber	6a51997ccc	[NVPTX] Fix lack of `.noreturn` on certain functions for aliases Forgot to include this special handling on the declaration of the alias function. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D156012	2023-07-24 07:01:05 -05:00
Simon Pilgrim	454bea07b2	[X86] combineConcatVectorOps - add concat(psadbw(x,y),psadbw(z,w)) -> psadbw(concat(x,z),concat(y,w)) handling	2023-07-24 11:34:08 +01:00
Simon Pilgrim	ce47e13bb9	[X86] Add reduce_add(ctpop(x)) 'count all bits in a vector' tests Also add some basic buildvector variants: build_vector(reduce_add(ctpop(x0)), reduce_add(ctpop(x1)), ...)	2023-07-24 11:34:08 +01:00
Nikita Popov	e49103b279	[Mips] Fix argument lowering for illegal vector types (PR63608) The Mips MSA ABI requires that legal vector types are passed in scalar registers in packed representation. E.g. a type like v16i8 would be passed as two i64 registers. The implementation attempts to do the same for illegal vectors with non-power-of-two element counts or non-power-of-two element types. However, the SDAG argument lowering code doesn't support this, and it is not easy to extend it to support this (we would have to deal with situations like passing v7i18 as two i64 values). This patch instead opts to restrict the special argument lowering to only vectors with power-of-two elements and round element types. Everything else is lowered naively, that is by passing each element in promoted registers. Fixes https://github.com/llvm/llvm-project/issues/63608. Differential Revision: https://reviews.llvm.org/D154445	2023-07-24 12:07:09 +02:00
Jay Foad	0f10850e51	[CodeGen] Add machine verification to some tests This is to catch errors in an upcoming patch.	2023-07-24 11:04:10 +01:00
Luke Lau	5b95bba6fe	[RISCV] Set Fast flag for unaligned memory accesses The +unaligned-scalar-mem and +unaligned-vector-mem features were added in D126085 and D149375 respectively to allow subtargets to indicate that they supported misaligned loads/stores with "sufficient" performance. This is separate from whether or not the target actually supports misaligned accesses, which could be determined from Zicclsm. This patch enables the Fast flag under the assumption that any subtarget that declares support for +unaligned-*-mem will want to opt into optimisations that take advantage of misaligned scalar accesses, such as store merging. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D150771	2023-07-24 10:58:57 +01:00
WANG Rui	9c21f95541	[LoongArch] Implement isZextFree This returns true for 8-bit and 16-bit loads, allowing ld.bu/ld.hu to be selected and avoiding unnecessary masks. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154819	2023-07-24 17:49:25 +08:00
WANG Rui	90e08c2600	[LoongArch] Add test case showing suboptimal codegen when loading unsigned char/short Implementing isZextFree will allow ld.bu or ld.hu to be selected rather than ld.b+mask and ld.h+mask. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154818	2023-07-24 17:48:16 +08:00
WANG Rui	899aaffcbc	[LoongArch] Implement isLegalICmpImmediate This causes a trivial improvement in the legalicmpimm.ll test case. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154811	2023-07-24 17:42:11 +08:00
WANG Rui	0cceea90bf	[LoongArch][NFC] Add tests for (X & -256) == 256 -> (X >> 8) == 1 Add tests for (X & -256) == 256 -> (X >> 8) == 1. Signed-off-by: WANG Rui <wangrui@loongson.cn> Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D154810	2023-07-24 17:42:10 +08:00

... 71 72 73 74 75 ...

52796 Commits