llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	328754474a	[DAG] SimplifySetCC - clang-format add/xor/sub with constant handling. NFC.	2022-04-04 13:30:17 +01:00
David Green	2abaa027d9	[AArch64] Teach the costmodel about widening muls A vector mul(sext, sext) or mul(zext, zext) will be code generated as a single smull or umull instruction. This most notably effects v2i64 multiplies, which are otherwise not legal and need to be expanded. The oneuse check has also been slightly changed, as it is already checked from the use of isWideningInstruction in getCastInstrCost. Differential Revision: https://reviews.llvm.org/D123006	2022-04-04 12:45:04 +01:00
Nikita Popov	3c9f3f76f1	[ConstantFold] Fold zero-index GEPs with opaque pointers With opaque pointers, we can eliminate zero-index GEPs even if they have multiple indices, as this no longer impacts the result type of the GEP. This optimization is already done for instructions in InstSimplify, but we were missing the corresponding constant expression handling. The constexpr transform is a bit more powerful, because it can produce a vector splat constant and also handles undef values -- it is an extension of an existing single-index transform.	2022-04-04 13:04:27 +02:00
Jeremy Morse	059d1f84d2	[DebugInfo] Correctly recognize bitfields when emitting dwarf Use the "isBitfield" flag for debug types to determine whether something is a bitfield, rather than trying to guess from it's layout. Fixes https://bugs.llvm.org/show_bug.cgi?id=44601 Patch by: mahkoh Differential Revision: https://reviews.llvm.org/D96334	2022-04-04 11:14:13 +01:00
Simon Pilgrim	623d4b5787	[X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.	2022-04-04 10:51:26 +01:00
Florian Hahn	1817c526e1	[VPlan] Update VPInterleavedAccessInfo to use getVectorLoopRegion. Update VPInterleavedAccessInfo to use the generic getVectorLoopRegion helper instead of relying on the entry block being the top-most vector loop region.	2022-04-04 10:26:39 +01:00
Martin Sebor	5ccfd5f6d4	[SimplifyLibCalls] Optimize memchr() with known char+str and unknown length If both the character and string are known, but the length potentially isn't, we can optimize the memchr() call to a select of either the known position of the character or null. Split off from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	5197d2791f	[SimplifyLibCalls] Move handling of constant char earlier (NFC) Handle the simple constant char case before the bitmask optimization. This will allow extending the code to handle a non-constant size argument in a followup change. Split out from https://reviews.llvm.org/D122836.	2022-04-04 11:01:33 +02:00
Martin Sebor	d18991debf	[SimplifyLibCalls] Fold memchr() with size 1 If the memchr() size is 1, then we can convert the call into a single-byte comparison. This works even if both the string and the character are unknown. Split off from https://reviews.llvm.org/D122836.	2022-04-04 10:41:20 +02:00
Florian Hahn	8cd1892725	[VPlan] Remember previous loop and reset vector loop. At the moment this is NFC, but will be needed once nested loops are also modeled as regions. Preparation for D123005.	2022-04-04 09:27:15 +01:00
Nikita Popov	a5c3b5748c	[MemCpyOpt] Work around PR54682 As discussed on https://github.com/llvm/llvm-project/issues/54682, MemorySSA currently has a bug when computing the clobber of calls that access loop-varying locations. I think a "proper" fix for this on the MemorySSA side might be non-trivial, but we can easily work around this in MemCpyOpt: Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess() call to find a clobber on either the src or dest location, and then refines it for the src and dest clobber. This was intended as an optimization, as the location-less API is cached, while the location-affected APIs are not. However, I don't think this really makes a difference in practice, because I don't think anything will use the cached clobbers on those calls later anyway. On CTMark, this patch seems to be very mildly positive actually. So I think this is a reasonable way to avoid the problem for now, though MemorySSA should also get a fix. Differential Revision: https://reviews.llvm.org/D122911	2022-04-04 10:19:51 +02:00
Nikita Popov	c0cc98251a	[Float2Int] Make sure dependent ranges are calculated first (PR54669) The range calculation in walkForwards() assumes that the ranges of the operands have already been calculated. With the used visit order, this is not necessarily the case when there are multiple roots. (There is nothing guaranteeing that instructions are visited in topological order.) Fix this by queuing instructions for reprocessing if the operand ranges haven't been calculated yet. Fixes https://github.com/llvm/llvm-project/issues/54669. Differential Revision: https://reviews.llvm.org/D122817	2022-04-04 10:18:39 +02:00
Min-Yih Hsu	fccdc5618d	[M68k] Adopt VarLenCodeEmitter for shift / rotate instructions This patch is covered by existing MC tests.	2022-04-03 22:52:32 -07:00
Argyrios Kyrtzidis	5877df735d	[Support/BLAKE3] CMake: Remove the workaround that checks for "CC=ccache /path/to/clang" The LLVM builders that were doing that have been updated to use "-DLLVM_CCACHE_BUILD=ON" instead.	2022-04-03 21:02:02 -07:00
Augie Fackler	603ae73146	AttributorAttributes: guard against TLI being nullptr I didn't dig into this very much because it appears to be totally valid (especially once these properties can come from attributes instead of only from hard-coded library functions) for TLI to not be defined, and nothing broke when I added this check, including with all my other patches applied. Differential Revision: https://reviews.llvm.org/D122917	2022-04-03 23:19:23 -04:00
Augie Fackler	e90bce8f91	CallBase: fix getFnAttr so it also checks the function Prior to this change, CallBase::hasFnAttr checked the called function to see if it had an attribute if it wasn't set on the CallBase, but getFnAttr didn't do the same delegation, which led to very confusing behavior. This patch fixes the issue by making CallBase::getFnAttr also check the function under the same circumstances. Test changes look (to me) like they're cleaning up redundant attributes which no longer get specified both on the callee and call. We also clean up the one ad-hoc implementation of this getter over in InlineCost.cpp. Differential Revision: https://reviews.llvm.org/D122821	2022-04-03 23:19:23 -04:00
Philip Reames	88de27e3fd	[LV] Handle non-integral types when considering interleave widening legality In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible. This fixes pr54634.	2022-04-03 20:16:20 -07:00
Philip Reames	7c51669c21	[memcpyopt] Restructure store(load src, dest) form of callslotopt for compile time The search for the clobbering call is fairly expensive if uses are not optimized at construction. Defer the clobber walk to the point in the implementation we need it; there are a bunch of bailouts before that point. (e.g. If the source pointer is not an alloca, we can't do callslotopt.) On a test case which involves a bunch of copies from argument pointers, this switches memcpyopt from > 1/2 second to < 10ms.	2022-04-03 20:16:20 -07:00
Xiang1 Zhang	f830392be7	Correct spelling error in TLS-Load-Hoist	2022-04-04 08:27:54 +08:00
David Green	3c88ff44c5	[AArch64] Remove unsued WideningBaseCost. NFC The WideningBaseCost is always 0. This removes it to clean up the code.	2022-04-03 22:16:39 +01:00
Kazu Hirata	d3684c3359	[IR] Remove unused forward declarations (NFC)	2022-04-03 12:54:54 -07:00
Kazu Hirata	e5121be910	Revert "Apply clang-tidy fixes for readability-redundant-declaration in Debug.cpp (NFC)" This reverts commit 0fe01a9346658c0955b68b123f2b470b018114b1. The commit caused build failures like: llvm/lib/Support/Debug.cpp:65:3: error: ‘setCurrentDebugTypes’ was not declared in this scope; did you mean ‘setCurrentDebugType’?	2022-04-03 08:14:11 -07:00
Kazu Hirata	1fe01a9346	Apply clang-tidy fixes for readability-redundant-declaration in Debug.cpp (NFC)	2022-04-03 08:04:12 -07:00
Kazu Hirata	c45d369ced	Apply clang-tidy fixes for readability-redundant-member-init in YAMLParser.cpp (NFC)	2022-04-03 08:04:11 -07:00
Simon Pilgrim	fbfd78f7aa	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.) Fixes #54658	2022-04-03 10:05:10 +01:00
Alexander Shaposhnikov	6cf10b7e6e	[InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases srem(X, PowerOf2) == C and srem(X, PowerOf2) != C for positive C. This addresses the issue https://github.com/llvm/llvm-project/issues/54650 Differential revision: https://reviews.llvm.org/D122942 Test plan: make check-all	2022-04-03 03:57:05 +00:00
Sanjay Patel	5f8c2b884d	[InstCombine] limit icmp fold with sub if other sub user is a phi This is a hacky fix for: https://github.com/llvm/llvm-project/issues/54558 As discussed there, codegen regressed when we opened up this transform to allow extra uses ( 61580d0949fd3465 ), and it's not clear how to undo the transforms at the later stage of compilation. As noted in the code comments, there's a set of remaining folds that are still limited to one-use, so we can try harder to refine and expand the limitations on these folds, but it's likely to be an up-and-down battle as we find and overcome similar regressions. Differential Revision: https://reviews.llvm.org/D122909	2022-04-02 19:23:42 -04:00
Sanjay Patel	97ac0cd6c4	[InstCombine] fold fcmp with lossy casted constant (2nd try) This is a retry of 9397bdc67eb2 - that was reverted until we had a clang warning in place to alert users about a possible mistake in source. The warning was added with ab982eace6e4. This is noted as a missing clang warning in #54222, but it is also a missing optimization opportunity. Alive2 proofs: https://alive2.llvm.org/ce/z/Q8drDq https://alive2.llvm.org/ce/z/pE6LRt I don't see a single conversion for all predicates using "getFCmpCode" logic, so other predicates are left as a TODO item.	2022-04-02 19:23:01 -04:00
Roman Lebedev	308ca349cb	[InstCombine] Fold `(X \| C2) ^ C1 --> (X & ~C2) ^ (C1^C2)` These two are equivalent, and i think the `and` form is more-ish canonical. General proof: https://alive2.llvm.org/ce/z/RrF5s6 If constant on the (outer) `xor` is an `undef`, the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2 However, if the constant on the (inner) `or` is an `undef`, we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7 I guess, producing a zero `and`-mask is optimal in that case. alive-tv is happy about the entirety of `xor-of-or.ll`.	2022-04-03 00:12:56 +03:00
Martin Storsjö	578d85e924	[Support] [BLAKE3] Fix compilation with CMAKE_OSX_ARCHITECTURES With CMake, one can build for multiple macOS architectures at the same time by setting CMAKE_OSX_ARCHITECTURES to multiple architectures (avoiding needing to do two separate builds and gluing the binaries together after the build). In this case, while targeting x86_64 and arm64, neither IS_X64 nor IS_ARM64 is set, while compilation of the individual source files will hit those cases (in either architecture mode). Therefore, if we on the CMake level decide not to include the architecture specific SIMD implementation files, also tell the source this explicitly by passing the defines indicating that we don't expect to use them. Such a build clearly is less ideal than explicitly targeting one architecture at a time if it won't include all the SIMD optimizations, but that's a tradeoff that is up to the one deciding to do such an universal build. This also fixes builds for i386. The blake3 source code automatically enables the SIMD implementations when building for i386, but we don't provide the sources for that build configuration. Differential Revision: https://reviews.llvm.org/D122884	2022-04-03 00:02:59 +03:00
Martin Storsjö	d0abdc22b8	[Support] [BLAKE3] Remove .hidden directives from windows-gnu assembly sources COFF symbols don't have anything corresponding to a `.hidden` flag; both GNU binutils as and LLVM's built-in assembler errors out on these directives. This reverts one part of 7f05aa2d4c36d6d53f97ac3e0db30ec600abbc62, fixing builds for mingw x86_64. Differential Revision: https://reviews.llvm.org/D122893	2022-04-02 23:58:31 +03:00
Florian Hahn	95b2aa511e	[VPlan] Set VPlan header block name to vector.body. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-02 19:34:32 +01:00
Florian Hahn	5bedc1f093	[ConstraintElimination] Move logic to build worklist to helper (NFC). This refactor makes it easier to extend the logic to collect information from blocks in the future, without even further increasing the size of eliminateConstriants.	2022-04-02 16:55:05 +01:00
wanglei	cd85ea9431	[LoongArch] Fix instruction definition This patch fixes issue with the LU32I_D instruction, which did not have an input register operand. Differential Revision: https://reviews.llvm.org/D122970	2022-04-02 18:08:29 +08:00
Serge Pavlov	c625b6051c	Remove duplicate code from wouldInstructionBeTriviallyDead There is a similar check few lines above in this function.	2022-04-02 16:04:39 +07:00
Craig Topper	d970e96c53	[RISCV] Add lowering for vp.fptoui and vp.uitofp. This is a straightforward extension of D122512 to unsigned integers.	2022-04-01 18:28:46 -07:00
Michael Gottesman	e24f534879	[debug-info] As an NFC commit, refactor EmitFuncArgumentDbgValue so that it can be extended to support llvm.dbg.addr. The reason why I am making this change is that before this commit, EmitFuncArgumentDbgValue relied on a boolean flag IsDbgDeclare both to signal that a DBG_VALUE should be made to be indirect /and/ that the original intrinsic was a dbg.declare. This is no longer always true if we add support for handling dbg.addr since we will have an indirect DBG_VALUE that is a different intrinsic from dbg.declare. With that in mind, in this NFC patch, we prepare for future fixes by introducing a 3 case-enum argument to EmitFuncArgumentDbgValue that allows the caller to explicitly specify how the argument's DBG_VALUE should be emitted. This then allows us to turn the indirect checks into a != FuncArgumentDbgValueKind::Value and prepare us for a future where we add support here for llvm.dbg.addr directly. rdar://83957028 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D122945	2022-04-01 17:07:28 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Simon Pilgrim	76cd11f303	[DAG] Add llvm::isMinSignedConstant helper. NFC Pulled out of D122754	2022-04-01 17:47:34 +01:00
Simon Pilgrim	c64f37f818	[X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns As mentioned on PR52267. Differential Revision: https://reviews.llvm.org/D122815	2022-04-01 17:26:29 +01:00
Simon Pilgrim	b8652fbcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) (RECOMMITTED) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Recommitted with a fix to ensure we zext/trunc the SETCC result to the original type. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:59:06 +01:00
Florian Hahn	f8101e4d68	Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)" This reverts commit 14e3650f01d158f7e4117c353927a07ceebdd504. The issue causing the revert were fixed independently in a08c90a4023f and 14e5f9785c9c.	2022-04-01 16:53:39 +01:00
Simon Pilgrim	5a457bd2fa	Revert rGa5f637bcbb7d1e08ce637f113fc117c3f4b2b110 "[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))" Investigating a sanitizer-windows buildbot breakage	2022-04-01 16:48:24 +01:00
Simon Pilgrim	9afa6811ad	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow 64-bit sublane shuffling on AVX512BW v64i8 shuffles We were only performing this on 256-bit vectors on AVX2 targets Noticed while triaging Issue #54658	2022-04-01 16:40:10 +01:00
Simon Pilgrim	a5f637bcbb	[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) As noticed on PR39174, if we're extracting a single non-constant bit index, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc. Differential Revision: https://reviews.llvm.org/D122891	2022-04-01 16:07:56 +01:00
Florian Hahn	14e5f9785c	[LV] Add SCEV workaround from 80e8025 to epilogue vector code path. This was exposed by 14e3650f. The recommit of 14e3650f will hit the problematic code path requiring the workaround. test case that crashes without the workaround.	2022-04-01 15:14:47 +01:00
Jay Foad	c246b7bd4a	[AMDGPU] Only count global-to-global as indirect accesses Previously any load (global, local or constant) feeding into a global load or store would be counted as an indirect access. This patch only counts global loads feeding into a global load or store. The rationale is that the latency for global loads is generally much larger than the other kinds. As a side effect this makes it easier to write small kernels test cases that are not counted as having indirect accesses, despite the fact that arguments to the kernel are accessed with an SMEM load. Differential Revision: https://reviews.llvm.org/D122804	2022-04-01 13:48:13 +01:00
Florian Hahn	a08c90a402	[LV] Re-use TripCount from EPI.TripCount. During skeleton construction for the epilogue vector loop, generic helpers use getOrCreateTripCount, which will re-expand the trip count computation. Instead, re-use the TripCount created during main loop vectorization.	2022-04-01 13:47:34 +01:00
Nikita Popov	792f80e166	[CoroSplit] Use freeze instead of bitcast for dummy instructions Not all types that can appear in arguments can be bitcasts -- in particular, bitcasts do not support struct types.	2022-04-01 13:07:25 +02:00
Xiang1 Zhang	a56f264958	Refine tls-load-hoista llvm option Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D122890	2022-04-01 19:03:58 +08:00

1 2 3 4 5 ...

156927 Commits