llvm-project

Author	SHA1	Message	Date
Piyou Chen	a70aa5ea7c	[RISCV] precommit for removing useless copy from undef subreg testcase from https://github.com/llvm/llvm-project/issues/63554 Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D155039	2023-07-20 20:38:24 -07:00
Philip Reames	4f057f5296	[RISCV] Expand memset.inline test coverage [nfc] Add coverage for unaligned overlap cases, and for vector stores. Note that the vector memset here is coming from store combining, not memset lowering.	2023-07-20 17:37:36 -07:00
Matt Arsenault	d33ab05467	AMDGPU: Add flag to disable fdiv processing in IR pass We kind of have to have multiple implementations of fdiv split between the two selectors with some pre-processing. Add yet another test to check for consistency of interpretation of flag combinations. We have quite a bit of test redundancy here already, but there are so many possible interesting permutations it's unwieldy to cover every detail in any one of them. We have a number of overlapping fdiv tests but it's hard to follow everything going on as it is.	2023-07-20 19:51:15 -04:00
Matt Arsenault	b2d58b596c	AMDGPU: Expand rsq testing to cover contract flag The 1.0/sqrt(x) -> rsq(x) fold increases precision and probably needs a contract flag.	2023-07-20 19:51:15 -04:00
Matt Arsenault	fb54afd1b7	AMDGPU: Fold fsub [+-0] into fneg when folding source modifiers This isn't always folded to fneg for a freestanding fsub depending on the denormal mode. When matching source modifiers, we're implicitly canonicalizing the input so we can fold it here. Doesn't bother handling the VOP3P case since it's only relevant with DAZ, which nobody really uses with f16. For f64, tests show an existing bug where DAGCombiner tries to respect the denormal mode for fsub -0, x, but not after it's lowered to fadd -0, (fneg x). Either the fold is wrong or we shouldn't restrict the fsub case based on the denormal mode. https://reviews.llvm.org/D155652	2023-07-20 19:29:40 -04:00
Matt Arsenault	881e9f2934	AMDGPU: Regenerate test checks Mostly a workaround for recent reverts in update_test_checks	2023-07-20 19:26:35 -04:00
Matt Arsenault	ca34f1bdcd	AMDGPU: Add baseline test for folding fsub into fneg modifiers	2023-07-20 18:29:35 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	076bc374fc	AMDGPU: Add some new baseline tests for exp lowering	2023-07-20 18:14:24 -04:00
Philip Reames	34c01a6044	[RISCV] Add memset.inline test coverage with and without V [nfc]	2023-07-20 15:03:53 -07:00
Philip Reames	eb3f2fe467	[RISCV] Revise check names for unaligned memory op tests [nfc] This has come up a few times in review; the current ones seem to be universally confusing. Even I as the original author of most of these get confused. Switch to using the SLOW/FAST naming used by x86, hopefully that's a bit clearer.	2023-07-20 13:36:53 -07:00
Jingu Kang	351b4c17dd	Revert "[MachineLICM] Handle Subloops" This reverts commit 50dd383d08670960540fecb4b48c0f0429fbfba3.	2023-07-20 17:12:25 +01:00
Nikita Popov	9dc391e89c	Revert "[IR] Mark add constant expressions as undesirable" This reverts commit f8a36d8c3e264c4fccf8058e699201a452ea7bb7. I believe this is causing an assertion failure on the sanitizer-x86_64-linux buildbot: clang++: /b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::BinaryOperator, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed. #10 0x000055bdd7e82408 canonicalizeLogicFirst(llvm::BinaryOperator&, llvm::IRBuilder<llvm::TargetFolder, llvm::IRBuilderCallbackInserter>&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2131:5 #11 0x000055bdd7e80183 llvm::InstCombinerImpl::visitAnd(llvm::BinaryOperator&) /b/sanitizer-x86_64-linux/build/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2661:20 Likely the code is encountering a constant expression in a case it didn't before.	2023-07-20 18:09:17 +02:00
Jingu Kang	50dd383d08	[MachineLICM] Handle Subloops Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outmost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-07-20 16:39:13 +01:00
Jingu Kang	8bad7ad6d6	[AArch64] Reuse larger DUPLANE if available As combining DUP, try to reuse larger DUPLANELANE. Differential Revision: https://reviews.llvm.org/D155592	2023-07-20 15:49:33 +01:00
Kevin P. Neal	95c2d01dfe	[FPEnv][RISCV] Correct strictfp tests. Correct RISC-V strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics Mostly these tests just needed the strictfp attribute on function definitions. I've also removed the strictfp attribute from uses of the constrained intrinsics because it comes by default since D154991, but I only did this in tests I was changing anyway. Test changes verified with D146845.	2023-07-20 10:16:56 -04:00
Jake Egan	311abf5fc0	Implement -frecord-command-line for XCOFF integrated assembler path The patch D153600 implemented `-frecord-command-line` for the XCOFF direct assembly path. This patch adds support for the XCOFF integrated assembly path. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D154921	2023-07-20 09:45:37 -04:00
Nikita Popov	f8a36d8c3e	[IR] Mark add constant expressions as undesirable In preparation for removing support for add expressions, mark them as undesirable. As such, we will no longer implicitly create such expressions, but they still exist.	2023-07-20 15:24:19 +02:00
Danila Malyutin	e1aa4e7b38	[Statepoint] Use correct RegisterClass for spilling Copy propagation might have changed the register class of the register Differential Revision: https://reviews.llvm.org/D155792	2023-07-20 16:00:00 +03:00
Simon Pilgrim	cc77da5020	[X86] LowerTRUNCATE - use LowerTruncateVecPackWithSignBits for prefer-256 bit AVX512 cases during type legalization If the AVX512 target will split the 512-bit vector truncation then try to use PACKSS/PACKUS first.	2023-07-20 13:55:28 +01:00
Thorsten Schütt	9d138baeb5	[GIsel][AArch64] extend legalization of G_INSERT_VECTOR_ELT Fixes https://github.com/llvm/llvm-project/issues/63826 Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D155274	2023-07-20 13:40:00 +02:00
Simon Pilgrim	7567b72f4d	[DAG] ShrinkDemandedConstant - early-out for empty DemandedBits/Elts Leave this to constant folding in SimplifyDemandedBits Fixes #63975	2023-07-20 12:18:10 +01:00
Simon Pilgrim	697f60598e	[DAG] hoistLogicOpWithSameOpcodeHands - ensure SIGN_EXTEND_INREG nodes have the same extension value type Fix bug in the check for matching SIGN_EXTEND_INREG types	2023-07-20 10:44:46 +01:00
Simon Pilgrim	f1cc7913f3	[X86] Add test case showing incorrect and(sextinreg(v0,i2),sextinreg(v1,i5)) -> sextinreg(and(v0,v1),i2) fold	2023-07-20 10:44:46 +01:00
David Green	0c41c59dee	[DAG][AArch64] Fix truncated vscale constant types It appears that vscale values truncated to i1 causes mismatches in the constant types when created in getNode. https://godbolt.org/z/TaaTo86ne. Differential Revision: https://reviews.llvm.org/D155626	2023-07-20 09:12:05 +01:00
Fangrui Song	82b4368f7f	[llvm-readobj] Print <null> for relocation target with an empty name For a relocation, we don't differentiate the two cases: * the symbol index is 0 * the symbol index is non zero, the type is not STT_SECTION, and the name is empty. Clang generates such local symbols for RISC-V linker relaxation. So we may print ``` Offset Info Type Symbol's Value Symbol's Name + Addend 000000000000001c 0000000100000039 R_RISCV_32_PCREL 0000000000000000 0 // llvm-readobj 0x1C R_RISCV_32_PCREL - 0x0 ``` while GNU readelf prints "<null>", which is clearer. Let's match the GNU behavior. Related to https://reviews.llvm.org/D81842 ``` 000000000000001c 0000000100000039 R_RISCV_32_PCREL 0000000000000000 <null> + 0 // llvm-readobj 0x1C R_RISCV_32_PCREL <null> 0x0 ``` Reviewed By: jhenderson, kito-cheng Differential Revision: https://reviews.llvm.org/D155353	2023-07-20 00:42:38 -07:00
Fangrui Song	94830bf56c	[WebAssembly] Use SetVector to stabilize iteration order after D120365 StringMap iteration order is not guaranteed to be deterministic (https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h).	2023-07-20 00:02:06 -07:00
Freddy Ye	1c154bd755	[X86] Add AVX-VNNI-INT16 instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D155145	2023-07-20 14:31:16 +08:00
Danila Malyutin	76fd79b9d5	[X86] Recognize standalone `(1 << nbits) - 1` pattern as bzhi This can be thought as a subcase of `x & ((1 << nbits) - 1)` where x == -1 Differential Revision: https://reviews.llvm.org/D155622	2023-07-20 09:18:23 +03:00
Danila Malyutin	c1013a6eee	[X86][AArch64] Add additional extract_lowbits test Check that vreg_width-1 mask is only removed for shifts Differential Revision: https://reviews.llvm.org/D155734	2023-07-20 09:18:19 +03:00
Freddy Ye	049d6a3f42	[X86] Add SM4 instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D155148	2023-07-20 13:35:15 +08:00
Freddy Ye	c6f66de21a	[X86] Add SM3 instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D155147	2023-07-20 10:24:16 +08:00
Freddy Ye	fc3b7874b6	[X86] Add SHA512 instructions. For more details about this instruction, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D155146	2023-07-20 09:44:44 +08:00
Amara Emerson	ccffc27050	[AArch64][GlobalISel] Widen (<2 x s16> = G_BUILD_VECTOR) to <2 x s32>. We don't support this as a argument or return type, it's always promoted to <2 x s32>. Performing the widening prevents us from having selection failures due to unsupported extends. Fixes https://github.com/llvm/llvm-project/issues/58274	2023-07-19 16:50:54 -07:00
Craig Topper	7dfe62327d	[RISCV] Add a DAG combine for (czero_eq X, (xor Y, 1)) -> (czero_ne X, Y) if Y is 0 or 1. This is an alternative to D155288 that can handle other sources of xori like FP compares. Unfortunately, it misses the i64 setge case on RV32 in condops.ll. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D155328	2023-07-19 12:33:08 -07:00
Simon Pilgrim	310a9a4f28	[X86] matchBinaryShuffle - relax PACKSS for v2i64 -> v4i32 shuffle truncation pattern match. Similar to combineVectorSignBitsTruncation, we don't require all-signbits source inputs, just enough signbits to reach into the lowest i16 to safely use PACKSSDW.	2023-07-19 18:58:21 +01:00
Momchil Velikov	4c95f79cce	[CodeGenPrepare] Refactor optimizeSelectInst (NFC) Refactor to use BasicBlockUtils functions and make life easier for a subsequent patch for updating the dominator tree. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154053	2023-07-19 18:56:44 +01:00
Johannes Doerfert	d015018cb7	[AMDGPUAttributor][FIX] No endless recursion for recursive initializers Fixes: https://github.com/llvm/llvm-project/issues/63956	2023-07-19 10:27:01 -07:00
Craig Topper	3055c5815a	[RISCV] Upgrade Zvfh version to 1.0 and move out of experimental state. This has been ratified according to https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions Differential Revision: https://reviews.llvm.org/D155668	2023-07-19 10:03:57 -07:00
Luke Lau	efedcbeeb8	[RISCV] Fold ops into vmv.v.v as vmerge with all-ones mask A vmv.v.v shares the same encoding as a vmerge that isn't masked, so we can also fold it into its operands if we treat it as a vmerge with an all-ones mask. We take care here not to actually transform the existing vmv into a vmerge, otherwise things like True.hasOneUse() become inaccurate. Instead this just returns an equivalent list of operands. This is an alternative to D153351. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D155101	2023-07-19 17:24:42 +01:00
Luke Lau	0f277ab361	[RISCV] Fold vmerge into its ops with smaller VL if known Currently when folding vmerge into its operands, we stop if the VLs aren't identical. However since the body of (vmerge (vop)) is the intersection of vmerge and vop's bodies, we can use the smaller of the two VLs if we know it ahead of time. This patch relaxes the constraint on VL if they are both constants, or if either of them are VLMAX. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D155071	2023-07-19 17:24:40 +01:00
Luke Lau	66dc29a82a	[RISCV] Add tests for merges with differing VLs that could be folded Reviewed By: reames Differential Revision: https://reviews.llvm.org/D155069	2023-07-19 17:24:38 +01:00
Simon Pilgrim	db50b77ed4	[X86] matchBinaryShuffle - match PACKSS for v2i64 -> v4i32 all-signbits shuffle truncation patterns. Ideally matchShuffleWithPACK should be able to handle this, but it needs a major rewrite to handle illegal types.	2023-07-19 17:02:11 +01:00
Djordje Todorovic	80e20c8a8d	[RISCV] Add DAG combine for CTTZ/CTLZ in the case of input 0 Within the AggressiveInstCombine Pass we have an analysis/optimization that matches that pattern of the Table Based CTZ. Some Targets do not support/define ctz(0), but since the AggressiveInstCombine is just an extension of InstCombine, it should be a target-independent canonicalization Pass, and therefore, we decided to introduce several instructions, such as select and compare that produce canonical IR, even if the input is 0. The task for the Targets that do support that input is to handle such a case and to produce an optimal assembly. This patch optimizes the CTTZ/CTLZ instructions if the input is 0 by performing the`DAG combine`, by generating the cttz(x) & 0x1f pattern (the same goes for ctlz as well). Differential Revision: https://reviews.llvm.org/D151449	2023-07-19 16:22:04 +02:00
Simon Pilgrim	6cf8bde056	[X86] getFauxShuffleMask - add SIGN_EXTEND_VECTOR_INREG handling for all-signbits sources Add suport for shuffle combines (via combineEXTEND_VECTOR_INREG) to begin from SIGN_EXTEND_VECTOR_INREG nodes	2023-07-19 14:32:34 +01:00
Simon Pilgrim	32ed3031fa	[X86] Add test coverage for Issue #63946	2023-07-19 14:05:13 +01:00
Simon Pilgrim	70893b62cf	[X86] matchUnaryShuffle - match SIGN_EXTEND_VECTOR_INREG patterns for 'all-signbits' sources Adapt the existing ANY/ZERO_EXTEND_VECTOR_INREG shuffle matching to also recognise SIGN_EXTEND_VECTOR_INREG patterns to handle cases where we're effectively "splatting" all-signbits sources.	2023-07-19 14:05:13 +01:00
John Brawn	cee7e7b245	[ARM] Correctly handle execute-only in EmitStructByval Currently when compiling for an execute-only target without movt then EmitStructByval will generate a constant pool load which isn't compatible with execute-only. Handle this by emitting tMOVi32imm, and also simplify the existing movt handling by emitting t2MOVi32imm or MOVi32imm. Differential Revision: https://reviews.llvm.org/D154944	2023-07-19 13:56:36 +01:00
John Brawn	1b12b1a335	[ARM] Restructure MOVi32imm expansion to not do pointless instructions The expansion of the various MOVi32imm pseudo-instructions works by splitting the operand into components (either halfwords or bytes) and emitting instructions to combine those components into the final result. When the operand is an immediate with some components being zero this can result in pointless instructions that just add zero. Avoid this by restructuring things so that a separate function handles splitting the operand into components, then don't emit the component if it is a zero immediate. This is straightforward for movw/movt, where we just don't emit the movt if it's zero, but the thumb1 expansion using mov/add/lsl is more complex, as even when we don't emit a given byte we still need to get the shift correct. Differential Revision: https://reviews.llvm.org/D154943	2023-07-19 13:56:36 +01:00
Ties Stuij	84f888ca82	[ARM] don't emit constant pool for Thumb1 XO/stack guard combo Currently for armv6-m and armv8-m.baseline, we emit constant pool code when we use execute-only (XO) in combination with stack guards. XO is a new feature for armv6-m, and this patch is part of a series of patches that substitutes constant pool generation with the tMOVi32imm equivalent. However XO for armv8-m.baseline has been available for about 6 years, and so for armv8-m.baseline this is a bugfix. Reviewed By: simonwallis2, olista01 Differential Revision: https://reviews.llvm.org/D155170	2023-07-19 13:51:43 +01:00

... 73 74 75 76 77 ...

52796 Commits