llvm-project

Author	SHA1	Message	Date
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Thomas Preud'homme	bad0290ce3	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-11 14:19:43 +00:00
Joe Ellis	67464dfe36	[DebugInfo] Only perform TypeSize -> unsigned cast when necessary This commit moves a line in SelectionDAGBuilder::handleDebugValue to avoid implicitly casting a TypeSize object to an unsigned earlier than necessary. It was possible that we bail out of the loop before the value is ever used, which means we could create a superfluous TypeSize warning. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D96423	2021-02-11 13:54:09 +00:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Kazu Hirata	7e75f6fc1d	[SelectionDAG] Use range-based for loops (NFC)	2021-02-09 22:14:30 -08:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit 4a64d8fe392449b205e59031aad5424968cf7446. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Thomas Preud'homme	a50ab8672d	Revert STRICT_FCMP nonan optimisation Summary: This reverts commit b7b61a7b5bc63df0d84f3722a1dcfa375c35ba30 which fails on some of the builders: http://lab.llvm.org:8011/#/builders/14/builds/5806 Reviewers: Subscribers:	2021-02-09 11:27:35 +00:00
Thomas Preud'homme	b7b61a7b5b	Improve STRICT_FSETCC codegen in absence of no NaN As for SETCC, use a less expensive condition code when generating STRICT_FSETCC if the node is known not to have Nan. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91972	2021-02-09 11:18:16 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies 3fe3946d9a958b7af6130241996d9cfcecf559d4 without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit 3fe3946d9a958b7af6130241996d9cfcecf559d4. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Serge Pavlov	bf416d166b	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Fangrui Song	d5bbaaaf95	[XRay] Make __xray_customevent support non-Linux	2021-01-25 00:48:21 -08:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Craig Topper	79e798aca3	Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This recommits 2c51bef76cbf0149101b9e7c7c658b4a58657929. I've fixed the broken check line from when I renamed the test function. Original commit message: This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount.	2021-01-18 11:08:28 -08:00
Craig Topper	5d431c3d32	Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This reverts commit 2c51bef76cbf0149101b9e7c7c658b4a58657929. I seem to have messed up the check lines in the test.	2021-01-18 11:00:20 -08:00
Craig Topper	2c51bef76c	[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results. This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount. Differential Revision: https://reviews.llvm.org/D94149	2021-01-18 10:41:36 -08:00
Jeroen Dobbelaere	668827b648	Introduce llvm.noalias.decl intrinsic The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias scope is declared. When the intrinsic is duplicated, a decision must also be made about the scope: depending on the reason of the duplication, the scope might need to be duplicated as well. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D93039	2021-01-16 09:20:45 +01:00
Heejin Ahn	9e4eadeb13	[WebAssembly] Update basic EH instructions for the new spec This implements basic instructions for the new spec. - Adds new versions of instructions: `catch`, `catch_all`, and `rethrow` - Adds support for instruction selection for the new instructions - `catch` needs a custom routine for the same reason `throw` needs one, to encode `__cpp_exception` tag symbol. - Updates `WebAssembly::isCatch` utility function to include `catch_all` and Change code that compares an instruction's opcode with `catch` to use that function. - LateEHPrepare - Previously in LateEHPrepare we added `catch` instruction to both `catchpad`s (for user catches) and `cleanuppad`s (for destructors). In the new version `catch` is generated from `llvm.catch` intrinsic in instruction selection phase, so we only need to add `catch_all` to the beginning of cleanup pads. - `catch` is generated from instruction selection, but we need to hoist the `catch` instruction to the beginning of every EH pad, because `catch` can be in the middle of the EH pad or even in a split BB from it after various code transformations. - Removes `addExceptionExtraction` function, which was used to generate `br_on_exn` before. - CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this function on the new instruction causes crashes, and the new version will be added in a later CL, whose contents will be completely different. So deleting the whole function will make the diff easier to read. - Reenables all disabled tests in exception.ll and eh-lsda.ll and a single basic test in cfg-stackify-eh.ll. - Updates existing tests to use the new assembly format. And deletes `br_on_exn` instructions from the tests and FileCheck lines. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94040	2021-01-09 01:48:06 -08:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
Juneyoung Lee	5cdf6ed744	[CodeGen] recognize select form of and/ors when splitting branch conditions Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a \|\| b" in C/C++. "a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a. This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353). In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison. The discussion at D93065 has more context about this. This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize select form of and/or as well using m_LogicalAnd/Or. Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93853	2021-01-01 04:46:10 +09:00
Kazu Hirata	1e3ed09165	[CodeGen] Use llvm::append_range (NFC)	2020-12-28 19:55:16 -08:00
Bjorn Pettersson	a89d751fb4	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
Matt Arsenault	2e0e03c6a0	OpaquePtr: Require byval on x86_intrcc parameter 0 Currently the backend special cases x86_intrcc and treats the first parameter as byval. Make the IR require byval for this parameter to remove this special case, and avoid the dependence on the pointee element type. Fixes bug 46672. I'm not sure the IR is enforcing all the calling convention constraints. clang seems to ignore the attribute for empty parameter lists, but the IR tolerates it.	2020-12-14 16:34:37 -05:00
Kerry McLaughlin	4519ff4b6f	[SVE][CodeGen] Add the ExtensionType flag to MGATHER Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode. Also updated SelectionDAGDumper::print_details so that details of the gather load (is signed, is scaled & extension type) are printed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91084	2020-12-09 11:19:08 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Simon Moll	3ffbc79357	[VP] Build VP SDNodes Translate VP intrinsics to VP_* SDNodes. The tests check whether a matching vp_* SDNode is emitted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91441	2020-12-09 11:36:51 +01:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Kerry McLaughlin	f6dd32fd35	[SVE][CodeGen] Lower scalable masked gathers Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only) Changes in this patch: - Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use. - Improve codegen with refineIndexType/refineUniformBase, added in D90942 - Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91092	2020-12-07 12:20:41 +00:00
Kazu Hirata	a553ac9791	[CodeGen] llvm::erase_if (NFC)	2020-12-05 15:44:40 -08:00
Francesco Petrogalli	f6150aa41a	[SelectionDAGBuilder] Update signature of `getRegsAndSizes()`. The mapping between registers and relative size has been updated to use TypeSize to account for the size of scalable EVTs. The patch is a NFCI, if not for the fact that with this change the function `getUnderlyingArgRegs` does not raise a warning for implicit conversion of `TypeSize` to `unsigned` when generating machine code from the test added to the patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92096	2020-11-30 17:38:51 +00:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Craig Topper	2d6042937b	[SelectionDAGBuilder] Add SPF_NABS support to visitSelect We currently don't match this which limits the effectiveness of D91120 until InstCombine starts canonicalizing to llvm.abs. This should be easy to remove if/when we remove the SPF_ABS handling. Differential Revision: https://reviews.llvm.org/D92118	2020-11-25 14:54:26 -08:00
Hongtao Yu	d0e42037bf	[CSSPGO] MIR target-independent pseudo instruction for pseudo-probe intrinsic This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR, ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`. ``` bb.0.bb0: frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags PSEUDO_PROBE 837061429793323041, 1, 0 $edi = MOV32ri 1, debug-location !13; test.c:0 JCC_1 %bb.1, 4, implicit $eflags bb.2.bb2: PSEUDO_PROBE 837061429793323041, 3, 0 PSEUDO_PROBE 837061429793323041, 4, 0 $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp RETQ bb.1.bb1: PSEUDO_PROBE 837061429793323041, 2, 0 PSEUDO_PROBE 837061429793323041, 4, 0 $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp RETQ ``` The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86495	2020-11-20 10:52:43 -08:00
Craig Topper	a7eae62a42	[SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version. The default version only works if the returned node has a single result. The X86 and PowerPC versions support multiple results and allow a single result to be returned from a node with multiple outputs. And allow a single result that is not result 0 of the node. Also replace the Mips version since the new version should work for it. The original version handled multiple results, but only if the new node and original node had the same number of results. Differential Revision: https://reviews.llvm.org/D91846	2020-11-20 10:06:53 -08:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Leonard Chan	a97f62837f	[llvm][IR] Add dso_local_equivalent Constant The `dso_local_equivalent` constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target. When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from `dso_local` where use of a function or variable is resolved to a symbol in the same linkage unit. In this patch: - Addition of `dso_local_equivalent` and handling it - Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of `dso_local_equivalent` for relative references This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959) which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, `dso_local_equivalent` can be used as a generic wrapper for the function to still allow for this static offset calculation to be done. See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details. Differential Revision: https://reviews.llvm.org/D77248	2020-11-19 10:26:17 -08:00
Florian Hahn	1983acce7c	[SelDAGBuilder] Do not require simple VTs for constraints. In some cases, the values passed to `asm sideeffect` calls cannot be mapped directly to simple MVTs. Currently, we crash in the backend if that happens. An example can be found in the @test_vector_too_large_r_m test case, where we pass <9 x float> vectors. In practice, this can happen in cases like the simple C example below. using vec = float __attribute__((ext_vector_type(9))); void f1 (vec m) { asm volatile("" : "+r,m"(m) : : "memory"); } One case that use "+r,m" constraints for arbitrary data types in practice is google-benchmark's DoNotOptimize. This patch updates visitInlineAsm so that it use MVT::Other for constraints with complex VTs. It looks like the rest of the backend correctly deals with that and properly legalizes the type. And we still report an error if there are no registers to satisfy the constraint. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D91710	2020-11-19 09:31:54 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
Kerry McLaughlin	ffbbfc76ca	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Gaurav Jain	4634ad6c0b	[NFC] Set return type of getStackPointerRegisterToSaveRestore to Register Differential Revision: https://reviews.llvm.org/D89858	2020-10-21 16:19:38 -07:00
David Sherwood	35a531fb45	[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. I've changed some of these places to use the equivalent scalar operator. Differential Revision: https://reviews.llvm.org/D88482	2020-10-19 08:30:31 +01:00
David Sherwood	f693f915a0	[SVE][CodeGen] Replace uses of TypeSize comparison operators In certain places in the code we can never end up in a situation where we're mixing fixed width and scalable vector types. For example, we can't have truncations and extends that change the lane count. Also, in other places such as GenWidenVectorStores and GenWidenVectorLoads we know from the behaviour of FindMemType that we can never choose a vector type with a different scalable property. In various places I have used EVT::bitsXY functions instead of TypeSize::isKnownXY, where it probably makes sense to keep an assert that scalable properties match. Differential Revision: https://reviews.llvm.org/D88654	2020-10-19 08:08:41 +01:00
Amara Emerson	e72cfd938f	Rename the VECREDUCE_STRICT_{FADD,FMUL} SDNodes to VECREDUCE_SEQ_{FADD,FMUL}. The STRICT was causing unnecessary confusion. I think SEQ is a more accurate name for what they actually do, and the other obvious option of "ORDERED" has the issue of already having a meaning in FP contexts. Differential Revision: https://reviews.llvm.org/D88791	2020-10-07 10:45:09 -07:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Craig Topper	1127662c6d	[SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS. getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize FP constants to the RHS. When this happens we should create the node with the FMF that was requested. By using FlagInserter when can ensure any calls to getNode/getSetcc during canonicalization will also get the flags. Differential Revision: https://reviews.llvm.org/D88063	2020-10-05 14:55:23 -07:00
Dávid Bolvanský	179e15d53a	[SystemZ] Optimize bcmp calls (PR47420) Solves https://bugs.llvm.org/show_bug.cgi?id=47420 Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87988	2020-09-25 17:55:39 +02:00

... 2 3 4 5 6 ...

1772 Commits