llvm-project

Author	SHA1	Message	Date
Paweł Bylica	8f5b1d9e14	[test][DAGCombine] Add tests for cmp+add -> addcarry Tests for https://reviews.llvm.org/D118037.	2022-01-25 22:17:09 +01:00
Dávid Bolvanský	b35ef580d8	[NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll	2022-01-25 21:25:12 +01:00
David Green	c415ff186d	[AArch64] Add extra vecreduce.add tests, including extending reductions. NFC This is all the reductions from i8 -> i64 with either sign or zero extensions.	2022-01-25 18:10:09 +00:00
eopXD	b089e4072a	[RISCV] Don't allow i64 vector div by constant to use mulh with Zve64x EEW=64 of mulh and its vairants requires V extension. Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117947	2022-01-25 09:55:05 -08:00
Simon Pilgrim	ef0d90f682	[X86] Regenerate avx-vbroadcast.ll Remove '' around mattr to stop update script crash and use X86 prefixes instead of X32	2022-01-25 16:23:22 +00:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Simon Pilgrim	ea4b0489f5	[X86][AVX] Add PR47194 shuffle test case	2022-01-25 15:06:39 +00:00
Simon Pilgrim	fc15ab7b1b	[X86] Add folded load tests to PR46809 tests	2022-01-25 12:37:15 +00:00
Danila Malyutin	153b1e3cba	[AArch64] Add patterns for relaxed atomic ld/st into fp registers Adds patterns to match integer loads/stores bitcasted to fp values Fixes https://github.com/llvm/llvm-project/issues/52927 Differential Revision: https://reviews.llvm.org/D117573	2022-01-25 15:33:37 +03:00
Paul Walker	d95cf1f6cf	[SVE] Enable ISD::ABDS/U ISel for scalable vectors. NOTE: This patch also includes tests that highlight those cases where the existing DAG combine doesn't yet work well for SVE. Differential Revision: https://reviews.llvm.org/D117873	2022-01-25 12:14:53 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Victor Perez	19d3dc6e22	[VP] Update CodeGen/RISCV/rvv/vpgather-sdnode.ll test	2022-01-25 10:49:05 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Dávid Bolvanský	9fa6ad4c58	Revert "[NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll" This reverts commit e2f8d28afba0a6545284ad3b54a4b7532c3253b6.	2022-01-25 11:28:26 +01:00
Dávid Bolvanský	e2f8d28afb	[NFC] Added test with select with unpredictable metadata; regenerate x86-cmov-converter.ll	2022-01-25 11:13:20 +01:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Simon Pilgrim	902184e6cc	[X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y)) There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed Improves codegen for Issue #53379	2022-01-25 00:24:06 +00:00
Simon Pilgrim	11bb4a1111	[X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq()) vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB. If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.	2022-01-24 22:43:29 +00:00
Simon Pilgrim	8d298355ca	[X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern. Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type	2022-01-24 21:42:03 +00:00
Quinn Pham	6a028296fe	[PowerPC] Emit warning when SP is clobbered by asm This patch emits a warning when the stack pointer register (`R1`) is found in the clobber list of an inline asm statement. Clobbering the stack pointer is not supported. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D112073	2022-01-24 15:12:23 -06:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin	c27f8fb968	[AMDGPU] Remove cndmask from readsExecAsData Differential Revision: https://reviews.llvm.org/D117909	2022-01-24 11:24:47 -08:00
Sander de Smalen	11cea7e5ce	[AArch64] NFC: Clarify and auto-generate some CodeGen tests. * For ext-narrow-index.ll, move vscale_range attribute closer to the function definition, rather than through indirect #<num> attribute. This makes the test a bit easier to read. * auto-generated CHECK lines for sve-cmp-select.ll and named-vector-shuffles-sve.ll. * re-generated CHECK lines for tests that had a mention they were auto-generated, but where the CHECK lines were out of date.	2022-01-24 17:42:37 +00:00
Simon Pilgrim	6997f4d07f	[X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379) As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial Fixes Issue #53379	2022-01-24 16:44:37 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Craig Topper	cd2a9ff397	[RISCV] Select int_riscv_vsll with shift of 1 to vadd.vv. Add might be faster than shift. We can't do this earlier without using a Freeze instruction. This is the intrinsic version of D106689. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118013	2022-01-24 08:04:53 -08:00
Matt Arsenault	18aabae8e2	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.	2022-01-24 09:45:41 -05:00
Bjorn Pettersson	354b2c36ee	Pre-commit test cases for (sra (load)) -> (sextload) folds. NFC Add test case to show missing folds for (sra (load)) -> (sextload). Differential Revision: https://reviews.llvm.org/D116929	2022-01-24 15:30:55 +01:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.	2022-01-24 09:26:52 -05:00
Paul Walker	34aedbe90d	[AArch64] Regenerate CHECK lines for llvm/test/CodeGen/AArch64/sve2-int-mul.ll	2022-01-24 14:11:19 +00:00
Simon Pilgrim	0553f5e61a	[X86] Add cmp-equality bool reductions PR53379 test coverage	2022-01-24 14:05:10 +00:00
Simon Pilgrim	4436d4cd7c	[X86] Rename cmp-with-zero bool reductions Explicitly name them icmp0_* - I'm intending to add PR53379 test coverage shortly	2022-01-24 14:05:10 +00:00
Simon Pilgrim	f7079bf9ee	[X86] Fix v8i8 -> v8i16 typo in bool reductions We were supposed to be testing <8 x i16> reductions	2022-01-24 14:05:09 +00:00
Fraser Cormack	d42678b453	[RISCV] Add side-effect-free vsetvli intrinsics This patch introduces new intrinsics that enable the use of vsetvli in contexts where only the returned vector length is of interest. The pre-existing intrinsics are marked with side-effects, which prevents even trivial optimizations on/across them. These intrinsics are intended to be used in situations where the vector length is fed in turn to RVV intrinsics or to vector-predication intrinsics during loop vectorization, for example. Those codegen paths ensure that instructions are generated with their own implicit vsetvli, so the vector length and vtype can be relied upon to be correct. No corresponding C builtins are planned at this stage, though that is a possibility for the future if the need arises. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117910	2022-01-24 13:52:08 +00:00
Simon Pilgrim	0e70dd858e	[X86] Add PR46249 test case showing poorly widened select predicate mask	2022-01-24 12:59:30 +00:00
SForeKeeper	70f83f3084	[RISCV] add support for zbkx subextension in MC layer. This patch adds support for zbkx extension from K extension(v1.0.0) in MC layer. Instructions with same functionality and same encoding is defined in the bitmanip extension. It defines {Xperm8, Xperm4} as instruction aliases for xperm.* in Zbp extension. When Zbkx is enabled while Zbp is not, xperm.h will not be available. When Zbkx and Zbp are both enabled, the instructions will be decoded in Zbp format. [[ https://reviews.llvm.org/D94999 \| D94999 ]] this is the patch that introduces xperm.* instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117889	2022-01-24 20:38:46 +08:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Bjorn Pettersson	12a499eb00	Pre-commit test case for trunc+lshr+load folds This is a pre-commit of test cases relevant for D117406. @srl_load_narrowing1 is showing a pattern that could be folded into a more narrow load. @srl_load_narrowing2 is showing a similar pattern that happens to be optimized already, but that happens in two steps (first triggering a combine based on SRL and later another combine based on TRUNCATE). Differential Revision: https://reviews.llvm.org/D117588	2022-01-24 12:22:03 +01:00
Fraser Cormack	af773a1818	[RISCV][VP] Lower VP_MERGE to RVV instructions This patch adds lowering of the llvm.vp.merge.* intrinsic (ISD::VP_MERGE) to RVV vmerge/vfmerge instructions. It introduces a special pseudo form of vmerge which allows a tied merge operand, allowing us to specify the tail elements as being equal to the "on false" operand, using a tied-def constraint and a "tail undisturbed" policy. While this strategy allows us to often lower the intrinsic to just one instruction, it may be less efficient in fixed-vector types as the number of tail elements may extend far beyond the length of the fixed vector. Another strategy could be to use a vmerge/vfmerge instruction with an AVL equal to the length of the vector type, and manipulate the condition operand such that mask elements greater than the operation's EVL are false. I've also observed inefficient codegen in which our 'VF' patterns don't match raw floating-point SPLAT_VECTORs, which occur in scalable-vector code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117561	2022-01-24 11:05:05 +00:00
Fraser Cormack	e7926e8d97	[RISCV] Match VF variants for masked VFRDIV/VFRSUB This patch follows up on D117697 to help the simple binary operations behave similarly in the presence of masks. It also enables CGP sinking support for vp.fdiv and vp.fsub intrinsics, now that VFRDIV and VFRSUB are consistently matched with a LHS splat for masked and unmasked variants. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117783	2022-01-24 10:59:43 +00:00
Abinav Puthan Purayil	912af6b570	[AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests.	2022-01-24 16:30:40 +05:30
Jay Foad	aa50b93e7c	[AMDGPU][GlobalISel] Add more sign/zero/any-extension tests Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.	2022-01-24 10:16:51 +00:00
Jay Foad	906ebd5830	[AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir	2022-01-24 10:16:51 +00:00
Nikita Popov	0d1308a7b7	[AArch64][GlobalISel] Support returned argument with multiple registers The call lowering code assumed that a returned argument could only consist of one register. Pass an ArrayRef<Register> instead of Register to make sure that all parts get assigned. Fixes https://github.com/llvm/llvm-project/issues/53315. Differential Revision: https://reviews.llvm.org/D117866	2022-01-24 10:55:28 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Chenbing.Zheng	9aaa74aeef	[RISCV] Add patterns of SET[U]LT_VI for STECC forms This patch optmizes "li a0, 5 vmsgt[u].vx v10, v8, a0" -> "vmsgt[u].vi v10, v8, 5" Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118014	2022-01-24 08:50:49 +00:00
jacquesguan	ba16e3c31f	[RISCV] Decouple Zve* extensions and the V extension. According to the spec, there are some difference between V and Zve64d. For example, the vmulh integer multiply variants that return the high word of the product (vmulh.vv, vmulh.vx, vmulhu.vv, vmulhu.vx, vmulhsu.vv, vmulhsu.vx) are not included for EEW=64 in Zve64, but V extension does support these instructions. So we should decouple Zve extensions and the V extension. Differential Revision: https://reviews.llvm.org/D117854	2022-01-24 14:55:21 +08:00

1 2 3 4 5 ...

41828 Commits