llvm-project

Author	SHA1	Message	Date
Noah Goldstein	809b1d834d	[KnownBits] Return `0` for poison {s,u}div inputs It seems consistent to always return zero for known poison rather than varying the value. We do the same elsewhere. Differential Revision: https://reviews.llvm.org/D150922	2023-06-06 15:14:10 -05:00
David Green	2a8df8d0b9	[AArch64][SVE] Add one-use-check to EitherVSelectOrPassthruPatFrags As pointed out in D149968 vselect predicate patterns could do with a one-use check to prevent multiple operations being created. This updates the EitherVSelectOrPassthruPatFrags pattern frags used in creating predicates min/max. Differential Revision: https://reviews.llvm.org/D151080	2023-06-06 21:10:32 +01:00
Craig Topper	58b2d652af	[RISCV] Add special case to selectImm for constants that can be created with (ADD (SLLI C, 32), C). Where C is a simm32. This costs an extra temporary register, but avoids a constant pool. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D152236	2023-06-06 11:59:12 -07:00
Simon Pilgrim	9a81b69757	[AArch64] Regenerate tests with missing immediate hex asm comments Reduces diff in a future commit	2023-06-06 19:44:28 +01:00
Simon Pilgrim	a279a09ab9	Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:44:24 +01:00
Simon Pilgrim	78de45fd4a	Revert rGab4b924832ce26c21b88d7f82fcf4992ea8906bb - [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:07:33 +01:00
Jay Foad	a4a3ac10cb	[AMDGPU] Remove extract_subvector patterns Removing them seems to slightly increase code quality as well as simplifying both the tablegen and C++ parts of the code. Differential Revision: https://reviews.llvm.org/D149853	2023-06-06 14:04:50 +01:00
Ricardo Jesus	3a87c15026	[AArch64][NFC] Normalise name of indexed forms of SQRDMLAH/SQRDMLSH Most indexed vector instructions are suffixed with v<N><TY>_indexed. SQRDMLAH/SQRDMLSH are the exception, being suffixed with <TY>_indexed instead, which can complicate matching them slightly. Differential Revision: https://reviews.llvm.org/D152161	2023-06-06 13:02:36 +00:00
Simon Pilgrim	85b77b13e3	[GlobalISel][X86] Add G_IMPLICIT_DEF / G_CONSTANT legalization handling	2023-06-06 11:45:22 +01:00
Thorsten Schütt	60b8019ea0	[GlobalIsel][X86] Legalize G_ANYEXT, G_SEXT, and G_ZEXT Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152243	2023-06-06 12:22:09 +02:00
wangpc	26e41a80d0	[RISCV] Handle "o" inline asm memory constraint This is the same as D100412. We just found the same crash when we tried to compile some packages like mariadb, php, etc. For constraint "o", it means "A memory operand is allowed, but only if the address is offsettable". So I think it can be handled just like constraint "m" for RISCV target. And we print verbose information when unsupported constraints occur. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D151979	2023-06-06 17:50:40 +08:00
Carl Ritson	6afc4b0629	[AMDGPU] WQM: Ensure exact mode placement before branches Fix for D151797 where the change accidentally allowed exit to exact mode between branch instructions. Reviewed By: dstuttard Differential Revision: https://reviews.llvm.org/D152228	2023-06-06 18:11:35 +09:00
Serge Pavlov	10e7899818	[FPEnv] Get rid of extra moves in fpenv calls If intrinsic `get_fpenv` or `set_fpenv` is lowered to the form where FP environment is represented as a region in memory, extra moves can appear. For example the code: define void @func_01(ptr %ptr) { %env = call i256 @llvm.get.fpenv.i256() store i256 %env, ptr %ptr ret void } produces DAG: ch = get_fpenv_mem ch, memory_region val: i256, ch = load ch, memory_region ch = store ch, ptr, val In this case the extra moves can be avoided if `get_fpenv_mem` got pointer to the memory where the FP environment should be finally placed. This change implement such optimization for this use case. Differential Revision: https://reviews.llvm.org/D150437	2023-06-06 14:54:52 +07:00
Carl Ritson	7275637505	[AMDGPU] Pre-commit test for D152228 (NFC)	2023-06-06 16:00:20 +09:00
Luo, Yuanke	787f3008be	[X86] Pre-commit test case for D152227.	2023-06-06 14:56:45 +08:00
Luo, Yuanke	60b7dbb670	[X86] Add test cases for D152227.	2023-06-06 14:24:46 +08:00
Paulo Matos	9571a28ee4	[WebAssembly] Add tests ensuring rotates persist Due to the nature of WebAssembly, it's always better to keep rotates instead of trying to optimize it. Commit 9485d983 disabled the generation of fsh for rotates, however these tests ensure that future changes don't change the behaviour for the Wasm backend that tends to have different optimization requirements than other architectures. Also see: https://github.com/llvm/llvm-project/issues/62703 Differential Revision: https://reviews.llvm.org/D152126	2023-06-06 07:48:35 +02:00
Ben Shi	b1f0cb89c1	[AVR][NFC][test] Supplement more tests of 8-bit rotation Reviewed By: Patryk27, jacquesguan Differential Revision: https://reviews.llvm.org/D152129	2023-06-06 11:24:18 +08:00
Jianjian GUAN	77da27b5e3	[RISCV] Improve selection for vector fpclass. Since vfclass intruction will only set one single bit in the result, so if we only want to check 1 fp class, we could use vmseq to do it. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151967	2023-06-06 10:24:24 +08:00
Matt Arsenault	ecf30c31fb	AMDGPU: Fix broken test	2023-06-05 20:44:59 -04:00
NAKAMURA Takumi	d3777f20c5	test/AMDGPU: REQUIRES asserts (D148184)	2023-06-06 08:55:46 +09:00
Matt Arsenault	30bd96fa17	AMDGPU: Add baseline test for undoing mul add 1 reassociation Add some tests for combines to undo regressions caused by 0cfc6510323fbb5a56a5de23cbc65f7cc30fd34c.	2023-06-05 18:44:17 -04:00
Matt Arsenault	b25c001ad3	AMDGPU: Fold zext into result of v_mad_u16 on high zeroing targets Avoids regressions in future patch.	2023-06-05 18:41:07 -04:00
Matt Arsenault	db08f9a2d5	AMDGPU: Add baseline 16-bit mad matching tests	2023-06-05 18:41:07 -04:00
Matt Arsenault	cb4b7340b0	AMDGPU: Convert test to generated checks	2023-06-05 18:41:06 -04:00
Craig Topper	b64ddae8a2	[RISCV] Lower experimental_get_vector_length intrinsic to vsetvli for some cases. This patch lowers to vsetvli when the AVL is i32 or XLenVT and the VF is a power of 2 in the range [1, 64]. VLEN=32 is not supported as we don't have a valid type mapping for that. VF=1 is not supported with Zve32* only. The element width is used to set the SEW for the vsetvli if possible. Otherwise we use SEW=8. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D150824	2023-06-05 15:02:11 -07:00
Craig Topper	4157bfb230	[RISCV] Add RISCVISD nodes for vfwadd/vfwsub. Add a DAG combine to form these from FADD_VL/FSUB_VL and FP_EXTEND_VL. This makes it similar to other widening ops and allows us to handle using the same FP_EXTEND_VL for both operands. Differential Revision: https://reviews.llvm.org/D151969	2023-06-05 14:12:47 -07:00
Artem Belevich	73464e377b	[NVPTX] fixed vector-compare test. Apparently this test didn't actually test anything other that the IR compiles.	2023-06-05 12:49:12 -07:00
Artem Belevich	dc90f42ea7	Coalesce 16-bit FP types to use integer register classes. i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers. The changes are mostly mechanical, intended to remove unnecessary register classes which tend to produce redundant register moves. Differential Revision: https://reviews.llvm.org/D151601 v2f16 regtype conversion to i32	2023-06-05 12:21:52 -07:00
Krzysztof Drewniak	23098bd454	[AMDGPU] Add intrinsic for converting global pointers to resources Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit pointer, the 16-bit stride/swizzling constant that replace the high 16 bits of an address in a buffer resource, the 32-bit extent/number of elements, and the 32-bit flags (the latter two being the 3rd and 4th wards of the resource), and combines them into a ptr addrspace(8). This intrinsic is lowered during the early phases of the backend. This intrinsic is needed so that alias analysis can correctly infer that a certain buffer resource points to the same memory as some global pointer. Previous methods of constructing buffer resources, which relied on ptrtoint, would not allow for such an inference. Depends on D148184 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148957	2023-06-05 17:07:59 +00:00
Krzysztof Drewniak	ab37937812	[AMDGPU] Use resource base for buffer instruction MachineMemOperands 1. Remove the existing code that would encode the constant offsets (if there were any) on buffer intrinsic operations onto their `MachineMemOperand`s. As far as I can tell, this use of `offset` has no substantial impact on the generated code, especially since the same reasoning is performed by areMemAccessesTriviallyDisjoint(). 2. When a buffer resource intrinsic takes a pointer argument as the base resource/descriptor, place that memory argument in the value field of the MachineMemOperand attached to that intrinsic. This is more conservative than what would be produced by more typical LLVM code using GEP, as the Value (for alias analysis purposes) corresponding to accessing buffer[0] and buffer[1] is the same. However, the target-specific analysis of disjoint offsets covers a lot of the simple usecases. Despite this limitation, the new buffer intrinsics, combined with LLVM's existing pointer annotations, allow for non-trivial optimizations, as seen in the new tests, where marking two buffer descriptors "noalias" allows merging together loads and stores in a "load from A, modify loaded value, store to B" sequence, which would not be possible previously. Depends on D147547 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148184	2023-06-05 17:06:57 +00:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
JP Lehr	c9998ec145	Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order." This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45. This patch lead to build time outs on the AMDGPU OpenMP runtime buildbot.	2023-06-05 10:55:58 -04:00
Simon Pilgrim	c2926c6c4d	[GlobalISel][X86] Regenerate legalize-undef.mir	2023-06-05 14:41:40 +01:00
Simon Pilgrim	ca0caa23ce	[X86] Replace X32 test check prefix with X86 + add common CHECK prefix We try to only use X32 for gnux32 triple test cases	2023-06-05 14:41:40 +01:00
Simon Pilgrim	fcacc41a22	[X86] Replace X32 test check prefix with X86 We try to only use X32 for gnux32 triple test cases	2023-06-05 14:41:40 +01:00
Amaury Séchet	e69fa03ddd	[DAGCombine] Make sure combined nodes are added back to the worklist in topological order. Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127115	2023-06-05 11:09:18 +00:00
Simon Pilgrim	b28bc5f5ad	[GlobalISel][X86] Add 128/256/512-bit vector and/or/xor test coverage Based off the legalize-add-v*.mir tests	2023-06-05 12:08:22 +01:00
Simon Pilgrim	dbd3695092	[GlobalISel][X86] Add illegal types and 32-bit target scalar and/or/xor test coverage Based off the legalize-add.mir tests	2023-06-05 12:08:22 +01:00
Jay Foad	9912bcc8ec	[AMDGPU] Regenerate some GlobalISel checks	2023-06-05 11:21:31 +01:00
Simon Pilgrim	d37bd544ff	[X86] canonicalizeShuffleWithBinOps - ensure a binary shuffle of binops have the same value type Fixes #63091	2023-06-05 11:18:28 +01:00
Simon Pilgrim	d75efc1d51	[X86] Add test case for Issue #63091	2023-06-05 11:18:27 +01:00
Simon Pilgrim	346ee549e5	[GlobalISel][X86] Add G_CTTZ_ZERO_UNDEF/G_CTTZ legalization handling G_CTTZ_ZERO_UNDEF is always legal using the BSF instruction, G_CTTZ requires the BMI1 TZCNT instruction	2023-06-05 11:18:27 +01:00
David Green	2b4807ba04	[AArch64][SVE] Predicated mla/mls patterns To go with D149267 and D149967, this adds predicated mla/mls patterns, selected from select(mask, add(a, mul(b, c)), a) -> mla(a, mask, b, c). The existing patterns are eventually removed by D149967. Differential Revision: https://reviews.llvm.org/D149969	2023-06-05 10:08:57 +01:00
Qiu Chaofan	9e17e08324	[PowerPC] Combine fptoint-store under strict cases Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D141249	2023-06-05 16:24:02 +08:00
esmeyi	6f57d8df2d	Revert "[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode." This reverts commit 4054c68644dfebbb584bca698a25d18d1d312bae. Due to AIX system linker requires DWARF64 for XCOFF64.	2023-06-05 02:50:47 -04:00
Serge Pavlov	eecaeb6f10	[FPEnv] Intrinsics for access to FP environment The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'. They are used to read floating-point environment, set it or reset to some default state. They do the same actions as C library functions 'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls to these functions. The new intrinsics specify FP environment as a value of integer type, it is convenient of most targets where the FP state is a content of some register. Some targets however use long representations. On X86 the size of FP environment is 256 bits, and even half of this size is not a legal ibteger type. To facilitate legalization in such cases, two sets of DAG nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP environment may be represented by a legal integer type. Nodes GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in memory, much like `fesetenv` and `fegetenv` do. They are used when target has long representation for floationg-point state. Differential Revision: https://reviews.llvm.org/D71742	2023-06-05 13:10:01 +07:00
Qiu Chaofan	69bc8ff766	Reland "[PowerPC] Simplify fp-to-int store optimization" The build failure should be fixed by de681d53. Follow-up refactor will be done in future patches. This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.	2023-06-05 13:53:08 +08:00
Ben Shi	53a7c254e4	[AVR][NFC][test] Suppement a test of the pseudo instruction RORBRd Reviewed By: aykevl, Patryk27 Differential Revision: https://reviews.llvm.org/D152087	2023-06-04 23:19:21 +08:00
Simon Pilgrim	9424a54201	[GlobalIsel][X86] Update legalization of G_AND/G_OR/G_XOR Replace the legacy G_AND/G_OR/G_XOR legalizer, this handles all scalar promotion and vector clamping (allows AVX1 to handle 256-bit logic ops).	2023-06-04 11:44:27 +01:00

1 2 3 4 5 ...

48337 Commits