llvm-project

Author	SHA1	Message	Date
Anna Thomas	4fc52db116	[InstCombine] Remove weaker fence adjacent to a stronger fence We have an instCombine rule to remove identical consecutive fences. We can extend this to remove weaker fences when we have consecutive stronger fence. As stated in the LangRef, a fence with a stronger ordering also implies ordering weaker than itself: "A fence which has seq_cst ordering, in addition to having both acquire and release semantics specified above, participates in the global program order of other seq_cst operations and/or fences." Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D118607	2022-02-01 11:05:34 -08:00
Nikita Popov	8d992862a0	[InstCombine] Remove some pointer element type accesses One of these is guarded against opaque pointers, and the others were accessing the call function type in a rather convoluted way.	2022-01-27 10:15:35 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Caroline Concatto	ad43217a04	[InstCombine] Fold for masked gather when loading the same value each time. This patch checks in the masked gather when the first operand value is a splat and the mask is all one, because the masked gather is reloading the same value each time. This patch replaces this pattern of masked gather by a scalar load of the value and splats it in a vector. Differential Revision: https://reviews.llvm.org/D115726	2022-01-21 14:19:51 +00:00
Pawe Bylica	1d7604fdce	[InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one "active byte", i.e. all active bits are contained in boundaries of a single byte of x. https://alive2.llvm.org/ce/z/nvbbU5 https://alive2.llvm.org/ce/z/KiiL3J Reviewed By: spatel, craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D117680	2022-01-21 01:25:30 +01:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Caroline Concatto	8e5a5b619d	[InstCombine] Fold for masked scatters to a uniform address When masked scatter intrinsic does a uniform store to a destination address from a source vector, and in this case, the mask is all one value. This patch replaces the masked scatter with an extracted element of the last lane of the source vector and stores it in the destination vector. This patch also folds when the value in the masked scatter is a splat. In this case, the mask cannot be all zero, and it folds to a scalar store of the value in the destination pointer. Differential Revision: https://reviews.llvm.org/D115724	2022-01-14 09:44:34 +00:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Philip Reames	f4c54683d6	[instcombine] Infer alignment for aligned_alloc with potentially zero size This change removes a previous restriction where we had to prove the allocation performed by aligned_alloc was non-zero in size before using the align parameter to annotate the result. I believe this was conservatism around the C11 specification of this routine which allowed UB when size was not a multiple of alignment, but if so, it was a partial one at best. (ex: align 32, size 16 was equally UB, but not restricted) The spec has since been clarified to require nullptr return, not UB. A nullptr - the documented return for this function on failure for all cases after UB mentioned above was removed - is trivially aligned for any power of two. This isn't totally new behavior even for this transform, we'd previously annotate potentially failing allocs (e.g. huge sizes) meaning we were putting align on potentially null pointers anyways. This change simpy does the same for all failure modes.	2022-01-10 08:48:49 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Philip Reames	2cafbcb560	[instcombine] Key deref vs deref_or_null annotation of allocation sites off nonnull attribute Goal is to remove use of isOpNewLike. I looked at a couple approaches to this, and this turned out to be the cheapest one. Just letting deref_or_null be generated causes a bunch of test diffs, and I couldn't convince myself there wasn't a real regression somewhere. A generic instcombine to convert deref_or_null + nonnull to deref is annoying complicated since you have to mix facts from callsite and declaration while manipulating only existing call site attributes. It just wasn't worth the code complexity. Note that the change in new-delete-itanium.ll is a real regression. If you have a callsite which overrides the builtin status of a nobuiltin declaration, and you don't put the apppriate attributes on that callsite, you may lose the deref fact. I decided this didn't matter; if anyone disagrees, you can add this case to the generic non-null inference.	2022-01-08 10:33:54 -08:00
Philip Reames	dcbc91f40c	[instcombine] Delete duplicate object size logic nstCombine appears to duplicate the allocation size logic used inside getObjectSize when figuring out which attributes are safe to place on the callsite. We can use the existing utility function instead. The test change is correct. With aligned_alloc, a zero alignment is required to return nullptr. As such, deref_or_null is a correct attribute to use. Differential Revision: https://reviews.llvm.org/D116816	2022-01-07 10:32:26 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Itay Bookstein	f9059efa0d	[InstCombine] Extend stacksave/restore elimination Previously, InstCombine detected a pair of llvm.stacksave/stackrestore instructions that are adjacent modulo debug instructions in order to eliminate the llvm.stackrestore. This precludes situations where intervening instructions (e.g. loads) preclude the llvm.stacksave and llvm.stackrestore from becoming adjacent. This commit extends the logic and allows for eliminating the llvm.stackrestore when the range of instructions between them does not include any alloca or side-effect causing instructions. Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com> Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113105	2021-11-10 10:41:58 +02:00
Itay Bookstein	fe7491d32f	[InstCombine][NFC] Refactor llvm.stackrestore handling Hoist the instruction classification logic outside the loop in preparation for reuse in a future commit. Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com> Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113464	2021-11-10 10:41:56 +02:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Sanjay Patel	6063e6b499	[InstCombine] move add after min/max intrinsic This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152. Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures) The different output for the umin test is due to a fold added with c4fc2cb5b2d98125 : // umin(x, 1) == zext(x != 0) We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max. Differential Revision: https://reviews.llvm.org/D110038	2021-09-26 09:49:10 -04:00
Florian Hahn	e08a5dc86f	[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC). InstCombine's worklist can be re-used by other passes like VectorCombine. Move it to llvm/Transform/Utils and rename it to InstructionWorklist. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110181	2021-09-22 08:47:21 +01:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
Dávid Bolvanský	a4a426c9e0	[InstCombine] Added llvm.powi optimizations If power is even: powi(-x, p) -> powi(x, p) powi(fabs(x), p) -> powi(x, p) powi(copysign(x, y), p) -> powi(x, p)	2021-09-16 19:42:21 +02:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Arthur Eubanks	b81fc14f2d	[NFC][InstCombine] Make check for sret in a vararg function clearer We're trying to get the parameter index of sret and see if it's part of a function's varargs. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D109335	2021-09-07 11:19:27 -07:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit 4c4093e6e39fe6601f9c95a95a6bc242ef648cd5. This reverts commit 0a2b1ba33ae6dcaedb81417f7c4cc714f72a5968. This reverts commit d9873711cb03ac7aedcaadcba42f82c66e962e6e. This reverts commit 791006fb8c6fff4f33c33cb513a96b1d3f94c767. This reverts commit c22b64ef66f7518abb6f022fcdfd86d16c764caf. This reverts commit 72ebcd3198327da12804305bda13d9b7088772a8. This reverts commit 5fa6039a5fc1b6392a3c9a3326a76604e0cb1001. This reverts commit 9efda541bfbd145de90f7db38d935db6246dc45a. This reverts commit 94d3ff09cfa8d7aecf480e54da9a5334e262e76b.	2021-09-02 13:53:56 +03:00
Sanjay Patel	8c7a7e1f67	[InstCombine] allow more min/max with 'not' folds for intrinsics isFreeToInvert allows min/max with 'not' on both operands, so easing the argument restriction catches the case where that operand has one use. We already handle the sub-patterns when there are less uses: https://alive2.llvm.org/ce/z/8Jatm_ ...but this is another step towards parity with the equivalent icmp+select idioms ( D98152 ). Differential Revision: https://reviews.llvm.org/D109059	2021-09-01 14:40:00 -04:00
Sanjay Patel	8a10f4a0f6	[InstCombine] use isFreeToInvert to generalize min/max with 'not' This mimics the code for the corresponding cmp-select idiom. This also prevents an infinite loop because isFreeToInvert does not match constant expressions. So this patch solves the same problem as D108814 and obsoletes it, but my main motivation is to enhance the pattern matching to allow more invertible ops. That change will be a follow-up patch on top of this one. Differential Revision: https://reviews.llvm.org/D109058	2021-09-01 14:34:22 -04:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Sanjay Patel	50c1138796	[InstCombine] add TODO about another min/max fold; NFC Suggested in post-commit for d0975b7cb0e1	2021-08-17 14:14:25 -04:00
Sanjay Patel	e73f4e1123	[InstCombine] remove unused function argument; NFC	2021-08-17 08:10:42 -04:00
Sanjay Patel	d0975b7cb0	[InstCombine] fold signed min/max intrinsics with negated operands If both operands are negated, we can invert the min/max and do the negation after: smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y) smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y) This is visible as a remaining regression in D98152. I don't see a way to generalize this for 'unsigned' or adapt Negator to handle it. This only appears to be safe with 'nsw': https://alive2.llvm.org/ce/z/GUy1zJ Differential Revision: https://reviews.llvm.org/D108165	2021-08-17 08:10:42 -04:00
David Green	c6b7db015f	[InstCombine] Add call to matchSAddSubSat from min/max This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing the same patterns to match if the canonical form of a min/max is an intrinsics, not a icmp/select. Differential Revision: https://reviews.llvm.org/D108077	2021-08-15 17:25:16 +01:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of 6de1dbbd09c1 which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit 6de1dbbd09c12abbec7eb187ffa1afbd47302dfa because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with 6de1dbbd09c1 , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See 1e801439be26569c9ede6fd309a645b00adb656c for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch 16ff91ebccda1128c43ff3cee104e2c603569fb2, reverted in 0c28a7c990c5218d6aec47c5052a51cba686ec5e because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit 16ff91ebccda1128c43ff3cee104e2c603569fb2. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Roman Lebedev	4ba3326f17	[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) This allows the expansion logic to actually trigger if the argument was extended from i1 element type, like the rest of the reductions expect. Alive2 agrees: https://alive2.llvm.org/ce/z/wcfews (or zext) https://alive2.llvm.org/ce/z/FCXNFx (or sext) https://alive2.llvm.org/ce/z/f26zUY (and zext) https://alive2.llvm.org/ce/z/jprViN (and sext)	2021-08-03 00:54:35 +03:00
Roman Lebedev	554fc9ad0a	[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/3oqir9 (self) https://alive2.llvm.org/ce/z/6cuI5m (zext) https://alive2.llvm.org/ce/z/4FL8rD (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	f47b7b6d10	[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/noXtZ8 (self) https://alive2.llvm.org/ce/z/JNrN6C (zext) https://alive2.llvm.org/ce/z/58snuN (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	b9b7162b8b	[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/NbBaeT (self) https://alive2.llvm.org/ce/z/iEaig4 (zext) https://alive2.llvm.org/ce/z/meGb3y (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:23 +03:00

1 2 3 4 5 ...

850 Commits