llvm-project

Author	SHA1	Message	Date
Sanjay Patel	267400c9b0	[x86] add tests for fmul/fdiv with identity constant in select arm; NFC	2022-02-01 15:43:28 -05:00
Sanjay Patel	8191472246	[x86] add more tests for select with identity constant; NFC D118644	2022-02-01 15:43:27 -05:00
Stanislav Mekhanoshin	79606ee85c	[AMDGPU] Check atomics aliasing in the clobbering annotation MemorySSA considers any atomic a def to any operation it dominates just like a barrier or fence. That is correct from memory state perspective, but not required for the no-clobber metadata since we are not using it for reordering. Skip such atomics during the scan just like a barrier if it does not alias with the load. Differential Revision: https://reviews.llvm.org/D118661	2022-02-01 12:33:25 -08:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit 100763a88fe97b22cd5e3f69d203669aac3ed48f as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Stanislav Mekhanoshin	c2b18a3cc5	[AMDGPU] Allow scalar loads after barrier Currently we cannot convert a vector load into scalar if there is dominating barrier or fence. It is considered a clobbering memory access to prevent memory operations reordering. While reordering is not possible the actual memory is not being clobbered by a barrier or fence and we can still use a scalar load for a uniform pointer. The solution is not to bail on a first clobbering access but traverse MemorySSA to the root excluding barriers and fences. Differential Revision: https://reviews.llvm.org/D118419	2022-02-01 11:43:17 -08:00
Chris Bieneman	7a0cbe11fb	[NFC] These tests require a default target These test cases all rely on a default target being specified. Adding the requirement gets the tests properly skipped when LLVM_DEFAULT_TARGET_TRIPLE is unset.	2022-02-01 13:18:39 -06:00
Fangrui Song	1494d064fa	[AMDGPU][test] Add dso_local to prevent preemptible alias resolution	2022-02-01 10:23:45 -08:00
David Green	c40744d4d6	[AArch64] Add some CCMP testing. NFC	2022-02-01 18:15:34 +00:00
Krzysztof Parzyszek	c935f6e048	[Hexagon] Punt on registers without reaching defs in addr mode opt This fixes https://github.com/llvm/llvm-project/issues/52636.	2022-02-01 09:52:59 -08:00
Craig Topper	7eb7810727	[RISCV] Fix a vsetvli insertion bug involving loads/stores. The first phase of the analysis can avoid a vsetvli if an earlier instruction in the block used an SEW and LMUL that when combined with the EEW of the load/store would produce the desired EMUL. If we avoided a vsetvli this will affect the global analysis we do in the second phase. The third phase where we really insert the vsetvlis needs to agree with the first phase. If it doesn't we can insert vsetvlis that invalidate the global analysis. In the test case there is a VSETVLI in the preheader that sets SEW=64 and LMUL=1. Inside the loop there is a VADD with SEW=64 and LMUL=1. This VADD is followed by a store that wants wants SEW=32 LMUL=1/2. Because it has EEW=32 as part of the opcode the SEW=64 LMUL=1 from the VADD can be become EMUL=1 for the store. So the first phase determines no vsetvli is needed. The third phase manages CurInfo differently than BBInfo.Change from the first phase. CurInfo is only updated when we see a vsetvli or insert a vsetvli. This was done to allow predecessor block information from the global analysis to be applied to multiple instructions. Since the loop body has no vsetvli we won't update CurInfo for either the VADD or the VSE. This prevented us from checking the store vsetvli elision for the VSE resulting in a vsetvli SEW=32 LMUL=1/2 being emitted which invalidated the global analysis. To mitigate this, I've added a BBLocalInfo variable that more closely matches the first phase propagation. This gets updated based on the VADD and prevents emitting a vsetvli for the store like we did in the first phase. I wonder if we should do an earlier phase to handle the load/store case by adding more pseudo opcodes and changing the SEW/LMUL for those instructions before the insertion analysis. That might be more robust than trying to guarantee two phases make the same decision. Fixes the test from D118629. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118667	2022-02-01 07:29:01 -08:00
David Green	d9b4577c45	[AArch64] Add signed version of uaddlv test. NFC	2022-02-01 14:51:23 +00:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Nikita Popov	a1dc6d4b83	[AArch64] Do not use ABI alignment for mops.memset.tag Pointer element types do not imply that the pointer is ABI aligned. We should be using either an explicit align attribute here, or fall back to an alignment of 1. This fixes a new element type access introduced in D117764. I don't think this makes any practical difference though, as the lowering does not depend on alignment. Differential Revision: https://reviews.llvm.org/D118681	2022-02-01 14:37:53 +01:00
Alexander Shaposhnikov	d03076223b	[CodeGen][AArch64] Fix typo in legalizer-info-validation.mir	2022-02-01 12:14:25 +00:00
Alexander Shaposhnikov	80c27fbf94	[CodeGen][AArch64] Fix typo in arm64-zero-cycle-zeroing.ll	2022-02-01 12:08:06 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Fraser Cormack	e9ceeedf30	[RISCV][3/3] Switch undef -> poison in scalable-vector RVV tests	2022-02-01 11:06:56 +00:00
Fraser Cormack	8d1169cf74	[RISCV][2/3] Switch undef -> poison in fixed-vector RVV tests	2022-02-01 11:06:56 +00:00
Fraser Cormack	414f21ed23	[RISCV][1/3] Switch undef -> poison in VP RVV tests Inspired by a recent Discourse post on undef vs. poison usage, this series of patches should reduce the number of undefs in LLVM tests by around 10%. Only undef vector operands to insertelement/shufflevector have been handled, which are by far the most common we've got. The switchover is split into 3 fairly arbitrary clusters to make it slightly more manageable: vector predication, fixed-length vectors, scalable vectors.	2022-02-01 11:06:55 +00:00
Nikita Popov	ccda3d4ec1	[AArch64] Regenerate test checks (NFC) The check lines were in the wrong order.	2022-02-01 11:55:02 +01:00
Fraser Cormack	b00bce2a93	[RISCV] Add a test showing an incorrect VSETVLI insertion This test shows a loop, whose preheader uses a SEW=64, LMUL=1 vector operation. The loop body starts off with another SEW=64, LMUL=1 VADD vector operation, before switching to a SEW=32, LMUL=1/2 vector store instruction. We can see that the VSETVLI insertion pass omits a VSETVLI before the VADD (thinking it inherits its configuration from the preheader) but does place a SEW=32, LMUL=1/2 VSETVLI before the store. This results in a miscompilation as when the loop comes back around, the VADD is incorrectly configured with SEW=32, LMUL=1/2. It appears to be a bad load/store optimization, as replacing the vector store with an SEW=32, LMUL=1/2 VADD does correctly insert a VSETVLI. The issue is therefore possibly arising from canSkipVSETVLIForLoadStore. Differential Revision: https://reviews.llvm.org/D118629	2022-02-01 10:21:29 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
Bjorn Pettersson	06105f2ef1	Pre-commit test cases missing SSHLSAT/USHLSAT folds. NFC	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Jay Foad	d2e5d3512b	[StructurizeCFG] Clean up some boolean not instructions In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.) Differential Revision: https://reviews.llvm.org/D118623	2022-02-01 09:35:37 +00:00
Changpeng Fang	1194b9cdda	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272	2022-01-31 18:07:47 -08:00
Mircea Trofin	9aa2c914b9	[mlgo][regalloc] Factor live interval feature calculation Factoring it out so we can subsequently cache it. This should be a NFC, however, for the float quantities, we see small errors in the least significant digits. This is because, before, we were summing up one by one. Now, we sum up results of sums. This shouldn't matter for ML, and will require rework when we do quantization (avoiding floats altogether), but meanwhile, it did require an update to the reference file used for testing. The patch also bumps the precision of the variables involved in this, to reduce the error (note they are casted back to float at the end by the SET macro, since we only work with float and not double in TF) Differential Revision: https://reviews.llvm.org/D118659	2022-01-31 15:19:15 -08:00
tyb0807	5aa08bf708	[AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS. A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS(). This is part 4/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu. Differential Revision: https://reviews.llvm.org/D117764	2022-01-31 20:56:27 +00:00
tyb0807	78fd413cf7	[AArch64][GlobalISel] CodeGen for Armv8.8/9.3 MOPS This implements codegen for Armv8.8/9.3 Memory Operations extension (MOPS). Any memcpy/memset/memmov intrinsics will always be emitted as a series of three consecutive instructions P, M and E which perform the operation. The SelectionDAG implementation is split into a separate patch. AArch64LegalizerInfo will now consider the following generic opcodes if +mops is available, instead of legalising by expanding them to libcalls: G_BZERO, G_MEMCPY_INLINE, G_MEMCPY, G_MEMMOVE, G_MEMSET The s8 value of memset is legalised to s64 to match the pseudos. AArch64O0PreLegalizerCombinerInfo will still be able to combine G_MEMCPY_INLINE even if +mops is present, as it is unclear whether it is better to generate fixed length copies or MOPS instructions for the inline code of small or zero-sized memory operations, so we choose to be conservative for now. AArch64InstructionSelector will select the above as new pseudo instructions: AArch64::MOPSMemory{Copy/Move/Set/SetTagging} These are each expanded to a series of three instructions (e.g. SETP/SETM/SETE) which must be emitted together during code emission to avoid scheduler reordering. This is part 3/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson and Son Tuan Vu Differential Revision: https://reviews.llvm.org/D117763	2022-01-31 20:54:41 +00:00
Mircea Trofin	afbc7bdf98	[mlgo][regalloc][test] Add comprehensive log output testing	2022-01-31 12:46:18 -08:00
Sam Clegg	3e230d15eb	Revert "[WebAssembly] Refactor and fix emission of external IR global decls" This reverts commit 00bf4755e90c89963a135739218ef49c2417109f. This change broke the emscripten builder (among other things): https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8823500584349280721/overview Sample failure: ``` test_unistd_unlink (test_core.core0) ... wasm-ld: error: symbol type mismatch: __stdio_write >>> defined as WASM_SYMBOL_TYPE_FUNCTION in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(__stdio_write.o) >>> defined as WASM_SYMBOL_TYPE_DATA in /usr/local/google/home/sbc/dev/wasm/emscripten/cache/sysroot/lib/wasm32-emscripten/libc-debug.a(stderr.o) ```	2022-01-31 12:20:56 -08:00
Sanjay Patel	06fd721fe7	[x86] add tests for binop of select with identity constant; NFC	2022-01-31 15:08:00 -05:00
Paul Walker	bcda4c48c8	[SVE] By using SEL when orring predicates we forgo the need for a PTRUE. Differential Revision: https://reviews.llvm.org/D118463	2022-01-31 19:39:23 +00:00
Paul Walker	804915f5dc	[SVE] Extend isel pattern coverage for INCP & DECP. Adds patterns for: add(x, cntp(p, p)) -> incp(x, p) sub(x, cntp(p, p)) -> decp(x, p) Differential Revision: https://reviews.llvm.org/D118567	2022-01-31 19:05:05 +00:00
Florian Hahn	23091f7d50	[AArch64] Bail out for float operands in SetCC optimization. The optimization added in D118139 causes a crash on the added test case while trying to zero extend an vector of floats. Fix the crash by bailing out for floating point operands. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D118615	2022-01-31 18:20:47 +00:00
Craig Topper	aae947e860	[RISCV] Separate the Zfhmin and Zfh extensions. The spec doesn't seem to be written as if Zfh implies Zfhmin. They seem to be separate extensions. This patch moves the instructions from Zfhmin to be enabled with either the Zfh or Zfhmin extensions. Reviewed By: achieveartificialintelligence Differential Revision: https://reviews.llvm.org/D118581	2022-01-31 09:06:43 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Jay Foad	ae68b3a457	[AMDGPU] Add test for a problem with noclobber metadata If AMDGPUAnnotateUniformValues finds a load from a uniform pointer with no potentially clobbering stores between the kernel entry point and the load instruction, it adds noclobber metadata to the address. This is unsafe because it can get applied to other loads in the same which do have aliasing stores. Differential Revision: https://reviews.llvm.org/D118458	2022-01-31 11:09:34 +00:00
Simon Pilgrim	ffd0e464b4	[X86] Add mul(x,x) tests showing miscompile As raised by @efriedma on D117995 - the source must not be undef/poison to demand any bits in mul(x,x) other than bit[1] https://alive2.llvm.org/ce/z/Cxkjen	2022-01-31 11:07:03 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Paulo Matos	00bf4755e9	[WebAssembly] Refactor and fix emission of external IR global decls This patches fixes the visibility and linkage information of symbols referring to IR globals. Emission of external declarations is now done in the first execution of emitConstantPool rather than in emitLinkage (and a few other places). This is the point where we have already gathered information about used symbols (by running the MC Lower PrePass) and not yet started emitting any functions so that any declarations that need to be emitted are done so at the top of the file before any functions. This changes the order of a few directives in the final asm file which required an update to a few tests. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D118122	2022-01-31 11:42:21 +01:00
Craig Topper	e1075186a6	[RISCV] Custom lower brev8 intrinsic to RISCVISD::GREV. We can use the RISCVISD::GREV encoding that swaps the bits in each byte. This allows it to use the existing computeKnownBits support for RISCVISD::GREV.	2022-01-30 12:41:09 -08:00
Simon Pilgrim	156f83adc2	[X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).	2022-01-30 20:07:04 +00:00
Simon Pilgrim	2cdbaca394	[X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2))) Allows pow2 mask tests to avoid an unnecessary constant load. Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.	2022-01-30 15:53:21 +00:00
Simon Pilgrim	4e3ba526bf	[X86] Add tests showing failure to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2))) This would allow pow2 mask tests to avoid an unnecessary constant load. Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.	2022-01-30 15:42:59 +00:00

1 2 3 4 5 ...

41974 Commits