llvm-project

Author	SHA1	Message	Date
Jay Foad	31fbfa57e7	[AMDGPU] Regenerate some spill checks	2023-06-07 17:50:51 +01:00
Vy Nguyen	e60b30d5e3	Reland "D144999 [MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs." Reasons for rolling forward: - the crash reported from Chromium was fixed in D151824 (not related to this patch at all) - since D152824 was committed, it should now be safe to roll this forward. New change: - add an additional _ in name check This reverts commit 4980eead4d0b4666d53dad07afb091375b3a13a0.	2023-06-07 10:03:50 -04:00
Phoebe Wang	2011ad0cbb	[X86][FP16] Do not generate VBROADCAST for fp16 We cannot lower VBROADCAST i16 under AVX1. Fixes #63114 Differential Revision: https://reviews.llvm.org/D152350	2023-06-07 20:54:56 +08:00
David Green	beb3a9a5e6	[AArch64][SVE] Add a commutative VSelectCommPredOrPassthruPatFrags This adds a commutative version of VSelectPredOrPassthruPatFrags (renamed from EitherVSelectOrPassthruPatFrags) that checks both variants for commutative operations like min/max. I have not attempted to handle fp operation that require fast-math flags. Differential Revision: https://reviews.llvm.org/D151084	2023-06-07 13:18:16 +01:00
Valery Pykhtin	342acfc9bb	[AMDGPU] Turn off pass to rewrite partially used virtual superregisters after RenameIndependentSubregs pass with registers of minimal size. There is a failure with this pass in the case when target register class for a subregister isn't known from instruction description (for ex. COPY). Currently in this situation the RC is obtained using TargetRegisterInfo::getSubRegisterClass but in general it's not working. In order to fix this two things should be done: 1. Stop processing a subregister if the target register class is unknown (conservative approach) 2. Improve deduction of subregister' target register class (i.e by processing COPY chain) I was going to implement point 1 but my tests use implicit operands for S_NOP and they don't have associated target register class and all tests fail. Therefore I decided to turn off the pass now, implement point 1 and fix my tests. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D152291	2023-06-07 12:05:25 +02:00
Simon Pilgrim	49bd51d918	[X86] Add test case for Issue #63108	2023-06-07 10:19:14 +01:00
Juan Manuel MARTINEZ CAAMAÑO	abe6ecd7e5	[AsmPrinter][AMDGPU] Generate uwtable entries in .eh_frame Consider only targets where `MCAsmInfo::ExceptionsType == ExceptionHandling::None` and that support CFI (when `MCAsmInfo::UsesCFIForDebug` is set to true): currently, only AMDGPU. This patch enables the emission of CFI information in the .eh_frame section when the uwtable attribute is present on a function. Before, we could generate CFI information for debugging puproses only. This patch prepares AMDGPU to support collecting GPU stack traces in the future. I did a first implementation (https://reviews.llvm.org/D139024) but at the time I had not realized that no other platform used `UsesCFIForDebug`. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D151806	2023-06-07 09:54:47 +02:00
Weining Lu	47601815ec	[LoongArch] Define `ual` feature and override `allowsMisalignedMemoryAccesses` Some CPUs do not allow memory accesses to be unaligned, e.g. 2k1000la who uses the la264 core on which misaligned access will trigger an exception. In this patch, a backend feature called `ual` is defined to decribe whether the CPU supports unaligned memroy accesses. And this feature can be toggled by clang options `-m[no-]unaligned-access` or the aliases `-m[no-]strict-align`. When this feature is on, `allowsMisalignedMemoryAccesses` sets the speed number to 1 and returns true that allows the codegen to generate unaligned memory access insns. Clang options `-m[no-]unaligned-access` are moved from `m_arm_Features_Group` to `m_Group` because now more than one targets use them. And a test is added to show that they remain unused on a target that does not support them. In addition, to keep compatible with gcc, a new alias `-mno-strict-align` is added which is equal to `-munaligned-access`. The feature name `ual` is consistent with linux kernel [1] and the output of `lscpu` or `/proc/cpuinfo` [2]. There is an `LLT` variant of `allowsMisalignedMemoryAccesses`, but seems that curently it is only used in GlobalISel which LoongArch doesn't support yet. So this variant is not implemented in this patch. [1]: https://github.com/torvalds/linux/blob/master/arch/loongarch/include/asm/cpu.h#L77 [2]: https://github.com/torvalds/linux/blob/master/arch/loongarch/kernel/proc.c#L75 Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D149946	2023-06-07 13:40:58 +08:00
Jeffrey Byrnes	db61927951	[AMDGPU][IGLP]: Add rules to SchedGroups Differential Revision: https://reviews.llvm.org/D146774 Change-Id: Icd7aaaa0b257a25713c22ead0813777cef7d5859	2023-06-06 19:19:21 -07:00
Craig Topper	2b09f53b32	[RISCV] Remove overly restrictive assert from negateFMAOpcode. It's possible that both multiplicands are being negated. This won't change the opcode, but we can delete the two negates. Allow this case to get through negateFMAOpcode. I think D152260 will also fix this test case, but in the future it may be possible for an fneg to appear after we've already converted to RISCVISD opcodes in which case D152260 won't help. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D152296	2023-06-06 18:55:58 -07:00
Florian Mayer	38f7c7eb1a	Revert "Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C)."" Revert broke even more stuff. This reverts commit d5fbec30939f2c9f82475cf42c638619514b5c67.	2023-06-06 17:39:05 -07:00
Florian Mayer	d5fbec3093	Revert "[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C)." Triggers UBSan error. This reverts commit 58b2d652af49ee9d9ff2af6edd7f67f23b26bfee.	2023-06-06 17:30:07 -07:00
Artem Belevich	ef8655adc8	[NVPTX] Adapt tests to make them usable with CUDA-12.x CUDA-12 no longer supports 32-bit compilation. Tests agnostic to 32/64 compilation mode are switched to use nvptx64. Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+. Differential Revision: https://reviews.llvm.org/D152199	2023-06-06 14:22:12 -07:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Matt Arsenault	5d361ad2a4	AMDGPU/GlobalISel: Fix broken / copy paste error in sext_inreg test	2023-06-06 17:07:18 -04:00
Craig Topper	bb10612587	[RISCV] Use PACK in RISCVMatInt for constants that have the same lower and upper 32 bits. This requires Zbkb. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D152293	2023-06-06 13:30:33 -07:00
Noah Goldstein	809b1d834d	[KnownBits] Return `0` for poison {s,u}div inputs It seems consistent to always return zero for known poison rather than varying the value. We do the same elsewhere. Differential Revision: https://reviews.llvm.org/D150922	2023-06-06 15:14:10 -05:00
David Green	2a8df8d0b9	[AArch64][SVE] Add one-use-check to EitherVSelectOrPassthruPatFrags As pointed out in D149968 vselect predicate patterns could do with a one-use check to prevent multiple operations being created. This updates the EitherVSelectOrPassthruPatFrags pattern frags used in creating predicates min/max. Differential Revision: https://reviews.llvm.org/D151080	2023-06-06 21:10:32 +01:00
Craig Topper	58b2d652af	[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C). Where C is a simm32. This costs an extra temporary register, but avoids a constant pool. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D152236	2023-06-06 11:59:12 -07:00
Simon Pilgrim	9a81b69757	[AArch64] Regenerate tests with missing immediate hex asm comments Reduces diff in a future commit	2023-06-06 19:44:28 +01:00
Simon Pilgrim	a279a09ab9	Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:44:24 +01:00
Simon Pilgrim	78de45fd4a	Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:07:33 +01:00
Jay Foad	a4a3ac10cb	[AMDGPU] Remove extract_subvector patterns Removing them seems to slightly increase code quality as well as simplifying both the tablegen and C++ parts of the code. Differential Revision: https://reviews.llvm.org/D149853	2023-06-06 14:04:50 +01:00
Ricardo Jesus	3a87c15026	[AArch64][NFC] Normalise name of indexed forms of SQRDMLAH/SQRDMLSH Most indexed vector instructions are suffixed with v<N><TY>_indexed. SQRDMLAH/SQRDMLSH are the exception, being suffixed with <TY>_indexed instead, which can complicate matching them slightly. Differential Revision: https://reviews.llvm.org/D152161	2023-06-06 13:02:36 +00:00
Simon Pilgrim	85b77b13e3	[GlobalISel][X86] Add G_IMPLICIT_DEF / G_CONSTANT legalization handling	2023-06-06 11:45:22 +01:00
Thorsten Schütt	60b8019ea0	[GlobalIsel][X86] Legalize G_ANYEXT, G_SEXT, and G_ZEXT Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152243	2023-06-06 12:22:09 +02:00
wangpc	26e41a80d0	[RISCV] Handle "o" inline asm memory constraint This is the same as D100412. We just found the same crash when we tried to compile some packages like mariadb, php, etc. For constraint "o", it means "A memory operand is allowed, but only if the address is offsettable". So I think it can be handled just like constraint "m" for RISCV target. And we print verbose information when unsupported constraints occur. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D151979	2023-06-06 17:50:40 +08:00
Carl Ritson	6afc4b0629	[AMDGPU] WQM: Ensure exact mode placement before branches Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions. Reviewed By: dstuttard Differential Revision: https://reviews.llvm.org/D152228	2023-06-06 18:11:35 +09:00
Serge Pavlov	10e7899818	[FPEnv] Get rid of extra moves in fpenv calls If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP environment is represented as a region in memory, extra moves can appear. For example the code: define void @func_01(ptr %ptr) { %env = call i256 @llvm.get.fpenv.i256() store i256 %env, ptr %ptr ret void } produces DAG: ch = get_fpenv_mem ch, memory_region val: i256, ch = load ch, memory_region ch = store ch, ptr, val In this case the extra moves can be avoided if `get_fpenv_mem` got pointer to the memory where the FP environment should be finally placed. This change implement such optimization for this use case. Differential Revision: https://reviews.llvm.org/D150437	2023-06-06 14:54:52 +07:00
Carl Ritson	7275637505	[AMDGPU] Pre-commit test for D152228 (NFC)	2023-06-06 16:00:20 +09:00
Luo, Yuanke	787f3008be	[X86] Pre-commit test case for D152227.	2023-06-06 14:56:45 +08:00
Luo, Yuanke	60b7dbb670	[X86] Add test cases for D152227.	2023-06-06 14:24:46 +08:00
Paulo Matos	9571a28ee4	[WebAssembly] Add tests ensuring rotates persist Due to the nature of WebAssembly, it's always better to keep rotates instead of trying to optimize it. Commit 9485d983 disabled the generation of fsh for rotates, however these tests ensure that future changes don't change the behaviour for the Wasm backend that tends to have different optimization requirements than other architectures. Also see: https://github.com/llvm/llvm-project/issues/62703 Differential Revision: https://reviews.llvm.org/D152126	2023-06-06 07:48:35 +02:00
Ben Shi	b1f0cb89c1	[AVR][NFC][test] Supplement more tests of 8-bit rotation Reviewed By: Patryk27, jacquesguan Differential Revision: https://reviews.llvm.org/D152129	2023-06-06 11:24:18 +08:00
Jianjian GUAN	77da27b5e3	[RISCV] Improve selection for vector fpclass. Since vfclass intruction will only set one single bit in the result, so if we only want to check 1 fp class, we could use vmseq to do it. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151967	2023-06-06 10:24:24 +08:00
Matt Arsenault	ecf30c31fb	AMDGPU: Fix broken test	2023-06-05 20:44:59 -04:00
NAKAMURA Takumi	d3777f20c5	test/AMDGPU: REQUIRES asserts (D148184)	2023-06-06 08:55:46 +09:00
Matt Arsenault	30bd96fa17	AMDGPU: Add baseline test for undoing mul add 1 reassociation Add some tests for combines to undo regressions caused by 0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c.	2023-06-05 18:44:17 -04:00
Matt Arsenault	b25c001ad3	AMDGPU: Fold zext into result of v_mad_u16 on high zeroing targets Avoids regressions in future patch.	2023-06-05 18:41:07 -04:00
Matt Arsenault	db08f9a2d5	AMDGPU: Add baseline 16-bit mad matching tests	2023-06-05 18:41:07 -04:00
Matt Arsenault	cb4b7340b0	AMDGPU: Convert test to generated checks	2023-06-05 18:41:06 -04:00
Craig Topper	b64ddae8a2	[RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases. This patch lowers to vsetvli when the AVL is i32 or XLenVT and the VF is a power of 2 in the range [1, 64]. VLEN=32 is not supported as we don't have a valid type mapping for that. VF=1 is not supported with Zve32* only. The element width is used to set the SEW for the vsetvli if possible. Otherwise we use SEW=8. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D150824	2023-06-05 15:02:11 -07:00
Craig Topper	4157bfb230	[RISCV] Add RISCVISD nodes for vfwadd/vfwsub. Add a DAG combine to form these from FADD_VL/FSUB_VL and FP_EXTEND_VL. This makes it similar to other widening ops and allows us to handle using the same FP_EXTEND_VL for both operands. Differential Revision: https://reviews.llvm.org/D151969	2023-06-05 14:12:47 -07:00
Artem Belevich	73464e377b	[NVPTX] fixed vector-compare test. Apparently this test didn't actually test anything other that the IR compiles.	2023-06-05 12:49:12 -07:00
Artem Belevich	dc90f42ea7	Coalesce 16-bit FP types to use integer register classes. i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers. The changes are mostly mechanical, intended to remove unnecessary register classes which tend to produce redundant register moves. Differential Revision: https://reviews.llvm.org/D151601 v2f16 regtype conversion to i32	2023-06-05 12:21:52 -07:00
Krzysztof Drewniak	23098bd454	[AMDGPU] Add intrinsic for converting global pointers to resources Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit pointer, the 16-bit stride/swizzling constant that replace the high 16 bits of an address in a buffer resource, the 32-bit extent/number of elements, and the 32-bit flags (the latter two being the 3rd and 4th wards of the resource), and combines them into a ptr addrspace(8). This intrinsic is lowered during the early phases of the backend. This intrinsic is needed so that alias analysis can correctly infer that a certain buffer resource points to the same memory as some global pointer. Previous methods of constructing buffer resources, which relied on ptrtoint, would not allow for such an inference. Depends on D148184 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148957	2023-06-05 17:07:59 +00:00
Krzysztof Drewniak	ab37937812	[AMDGPU] Use resource base for buffer instruction MachineMemOperands 1. Remove the existing code that would encode the constant offsets (if there were any) on buffer intrinsic operations onto their `MachineMemOperand`s. As far as I can tell, this use of `offset` has no substantial impact on the generated code, especially since the same reasoning is performed by areMemAccessesTriviallyDisjoint(). 2. When a buffer resource intrinsic takes a pointer argument as the base resource/descriptor, place that memory argument in the value field of the MachineMemOperand attached to that intrinsic. This is more conservative than what would be produced by more typical LLVM code using GEP, as the Value (for alias analysis purposes) corresponding to accessing buffer[0] and buffer[1] is the same. However, the target-specific analysis of disjoint offsets covers a lot of the simple usecases. Despite this limitation, the new buffer intrinsics, combined with LLVM's existing pointer annotations, allow for non-trivial optimizations, as seen in the new tests, where marking two buffer descriptors "noalias" allows merging together loads and stores in a "load from A, modify loaded value, store to B" sequence, which would not be possible previously. Depends on D147547 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148184	2023-06-05 17:06:57 +00:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
JP Lehr	c9998ec145	Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order." This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45. This patch lead to build time outs on the AMDGPU OpenMP runtime buildbot.	2023-06-05 10:55:58 -04:00
Simon Pilgrim	c2926c6c4d	[GlobalISel][X86] Regenerate legalize-undef.mir	2023-06-05 14:41:40 +01:00

1 2 3 4 5 ...

48353 Commits