llvm-project

Author	SHA1	Message	Date
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Matt Arsenault	0fb6856aff	ARM/GlobalISel: Get pointer type from value instead of getPointerSize Avoid using getPointerSize and pass through the original value type.	2022-03-31 16:46:23 -04:00
Sanjay Patel	436b875e49	[SDAG] avoid libcalls to fmin/fmax for soft-float targets This is an extension of D70965 to avoid creating a mathlib call where it did not exist in the original source. Also see D70852 for discussion about an alternative proposal that was abandoned. In the motivating bug report: https://github.com/llvm/llvm-project/issues/54554 ...we also have a more general issue about handling "no-builtin" options. Differential Revision: https://reviews.llvm.org/D122610	2022-03-30 11:22:03 -04:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Julian Lettner	64902d335c	Reland "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-23 18:36:55 -07:00
Zequan Wu	581dc3c729	Revert "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" This reverts commit 22570bac694396514fff18dec926558951643fa6.	2022-03-23 16:11:54 -07:00
Simon Pilgrim	7f8572b8c3	[ARM] select_xform.ll - re-add and fix missing CHECK prefixes We were still checking test results with the CHECK prefix but they had bit-rotted since whenever it'd been removed from the --check-prefixes list	2022-03-22 17:35:10 +00:00
Eli Friedman	ddca66622c	[ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd. Regression from 2f497ec3; we should not try to generate ldrexd on targets that don't have it. Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency. That doesn't really have any practical effect, though. On Thumb targets where we need to use __sync_* libcalls, there is no libcall for stores, so SelectionDAG calls __sync_lock_test_and_set_8 anyway.	2022-03-18 15:54:38 -07:00
Eli Friedman	f10f16a6a9	Autogenerate llvm/test/CodeGen/ARM/atomic-load-store.ll	2022-03-18 15:54:38 -07:00
Eli Friedman	2f497ec3a0	[ARM] Fix ARM backend to correctly use atomic expansion routines. Without this patch, clang would generate calls to __sync_* routines on targets where it does not make sense; we can't assume the routines exist on unknown targets. Linux has special implementations of the routines that work on old ARM targets; other targets have no such routines. In general, atomics operations which aren't natively supported should go through libatomic (__atomic_) APIs, which can support arbitrary atomics through locks. ARM targets older than v6, where this patch makes a difference, are rare in practice, but not completely extinct. See, for example, discussion on D116088. This also affects Cortex-M0, but I don't think __sync_ routines actually exist in any Cortex-M0 libraries. So in practice this just leads to a slightly different linker error for those cases, I think. Mechanically, this patch does the following: - Ensures we run atomic expansion unconditionally; it never makes sense to completely skip it. - Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate number on all ARM subtargets. - Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to correctly handle subtargets that don't have atomic instructions. Differential Revision: https://reviews.llvm.org/D120026	2022-03-18 12:43:57 -07:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Arthur Eubanks	2371c5a0e0	[OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Basically the same as D120527. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D121847	2022-03-16 14:11:53 -07:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Zhiyao Ma	adc26b4eae	[ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue. It fixes the overflow of 8-bit immediate field in the emitted instruction that allocates large stacklet. For thumb2 targets, load large immediate by a pair of movw and movt instruction. For thumb1 and ARM targets, load large immediate by reading from literal pool. Differential Revision: https://reviews.llvm.org/D118545	2022-03-10 15:15:24 -08:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( 74a65e3834d9487 ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Craig Topper	8e132c5c1d	[LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub. Previously we used sra+add+xor if ADDCARRY is supported. This changes to sra+xor+sub is SUBCARRY is available. This is consistent with the recent change to the default expansion in LegalizeDAG. Differential Revision: https://reviews.llvm.org/D121039	2022-03-07 11:28:32 -08:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit 30e612ebdfb0f243eb63d93487790a53c26ae873.	2022-03-02 14:10:11 +08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with ecf606cb4329ae since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
David Green	a10789d6cd	[ARM] Recognize SSAT and USAT from SMIN/SMAX We have some recognition of SSAT and USAT from SELECT_CC at the moment. This extends the matching to SMIN/SMAX which can help catch more cases, either from min/max being the canonical form in instcombine or from some expanded nodes like fp_to_si_sat. Differential Revision: https://reviews.llvm.org/D119819	2022-02-23 08:55:54 +00:00
David Green	4d5b020d6e	[ARM] Addition SSAT/USAT tests for min/max patterns. NFC	2022-02-21 16:24:58 +00:00
Craig Topper	a6fb1bb306	[ARM] Remove unused lowerABS function. NFC This function was added in D49837, but no setOperationAction call was added with it. The code is equivalent to what is done by the default ExpandIntRes_ABS implementation when ADDCARRY is supported. Test case added to verify this. There was some existing coverage from Thumb2 MVE tests, but they started from vectors.	2022-02-20 22:43:23 -08:00
Jay Foad	f510045d82	[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC. Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ]. Differential Revision: https://reviews.llvm.org/D117298	2022-02-18 16:10:56 +00:00
Nikita Popov	ff040eca93	[FastISel] Reuse register for bitcast that does not change MVT The current FastISel code reuses the register for a bitcast that doesn't change the IR type, but uses a reg-to-reg copy if it changes the IR type without changing the MVT. However, we can simply reuse the register in that case as well. In particular, this avoids unnecessary reg-to-reg copies for pointer bitcasts. This was found while inspecting O0 codegen differences between typed and opaque pointers. Differential Revision: https://reviews.llvm.org/D119432	2022-02-14 09:13:17 +01:00
Nikita Popov	ff5a9c3c65	[CodeGen] Regenerate test checks (NFC)	2022-02-10 14:10:00 +01:00
Oliver Stannard	a76620143c	[ARM] Patterns for vector conversion between half and float These patterns were omitted because clang only allows converting between these types using intrinsics, but other front-ends or optimisation passes may want to use them. Differential revision: https://reviews.llvm.org/D119354	2022-02-10 09:51:55 +00:00
Mark Murray	3d7662142d	[ARM] Undeprecate complex IT blocks AArch32/Armv8A introduced the performance deprecation of certain patterns of IT instructions. After some debate internal to ARM, this is now being reverted; i.e. no IT instruction patterns are performance deprecated anymore, as the perfomance degredation is not significant enough. This reverts the following: "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." The deprecation no longer applies, but the behaviour may be controlled by the -arm-restrict-it and -arm-no-restrict-it command-line options, with the latter being the default. No warnings about complex IT blocks will be generated. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118044	2022-02-07 15:47:53 +00:00
Roman Lebedev	ee4ba9f3a1	Revert "[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`." Unfortunately, it seems we really do need to take the long route; start from the "merge" block, find (all the) "dispatch" blocks, and deal with each "dispatch" block separately, instead of simply starting from each "dispatch" block like it would logically make sense, otherwise we run into a number of other missing folds around `switch` formation, missing sinking/hoisting and phase ordering. This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7. This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c. This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a. This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.	2022-02-03 12:32:50 +03:00
Roman Lebedev	1e353f0922	[SimplifyCFG] Start redesigning `FoldTwoEntryPHINode()`. The current `FoldTwoEntryPHINode()` is not quite designed correctly. It starts from the merge point, and then tries to detect the 'divergence' point. Because of that, it is limited to the simple two-predecessor case, where the PHI completely goes away. but that is rather pessimistic, and it doesn't make much sense from the costmodel side of things. For example if there is some other unrelated predecessor of the merge point, we could split the merge point so that the then/else blocks first branch to an empty block and then to the merge point, and then we'd be able to speculate the then/else code. But if we'd instead simply start at the divergence point, and look for the merge point, then we'll just natively support this case. There's also the fact that `SpeculativelyExecuteBB()` already does just that, but only if there is a single block to speculate, and with a much more restrictive cost model. But that also means we have code duplication. Now, sadly, while this is as much NFCI as possible, there is just no way to cleanly migrate to the proper implementation. The results are going to be different somewhat because of various phase ordering effects and SimplifyCFG block iteration strategy.	2022-02-02 17:53:56 +03:00
David Green	82973edfb7	[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected. Fixes #53120 and #50905 and #51761. Differential Revision: https://reviews.llvm.org/D117592	2022-01-27 19:19:46 +00:00
David Green	1fec2154b2	[ARM][AArch64] Cleanup and autogenerate v8.1a vqdrmlah tests. NFC	2022-01-27 18:43:06 +00:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Fangrui Song	e6cdef187e	[XRay][test] Clean up llc RUN lines	2022-01-21 17:00:03 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
David Green	351edf1c47	[ARM] Remove FeaturePerfMon from armv7-m FeaturePerfMon relates to the PMU extensions available in armv7-a, and should not be available in v7-m (it requires loading from a system register with a mrc). Sink it down a level in the dependency map so that it isn't present in ARMv7m or HasV8MMainlineOps. It is also removed from the Neoverse-N2, as it will already be transitively included. Differential Revision: https://reviews.llvm.org/D117022	2022-01-12 09:44:53 +00:00
Nick Desaulniers	79ebc3b0dd	[llvm][test] rewrite callbr to use i rather than X constraint NFC In D115311, we're looking to modify clang to emit i constraints rather than X constraints for callbr's indirect destinations. Prior to doing so, update all of the existing tests in llvm/ to match. Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115410	2022-01-11 11:31:08 -08:00
Tim Northover	0b5b35fdbd	ARM: make FastISel & GISel pass -1 to ADJCALLSTACKUP to signal no callee pop. The interface for these instructions changed with support for mandatory tail calls, and now -1 indicates the CalleePopAmount argument is not valid. Unfortunately I didn't realise FastISel or GISel did calls at the time so didn't update them.	2022-01-11 11:31:13 +00:00
Nikita Popov	f430c1eb64	[Tests] Add elementtype attribute to indirect inline asm operands (NFC) This updates LLVM tests for D116531 by adding elementtype attributes to operands that correspond to indirect asm constraints.	2022-01-06 14:23:51 +01:00
David Green	2ec3ca7477	[ARM] Extend IsCMPZCSINC to handle CMOV A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp', and can be treated the same in IsCMPZCSINC added in D114013. This allows us to remove the unnecessary CMOV in the same way that we could remove a CSINC. Differential Revision: https://reviews.llvm.org/D115188	2021-12-27 14:15:03 +00:00
Kristina Bessonova	81378f7e56	Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches Try to revert D113741 once again. This also reverts 0ac75e82fff93a80ca401d3db3541e8d1d9098f9 (D114705) as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test failure w/o D113741. This reverts commit f9607d45f399e2afc39ec16222ea68b4e0831564. Differential Revision: https://reviews.llvm.org/D116225	2021-12-24 00:47:04 +02:00
David Green	26f6fbe2be	[ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering. As reported from a failing firefox build, we can sometimes get frame indices with negative offsets from a t2LDRi8. This adds support for them, to prevent the crash.	2021-12-14 12:49:27 +00:00
Roman Lebedev	c1a36ba002	[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask') In most test changes this allows us to drop some broadcasts/shuffles. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104156	2021-12-13 20:03:44 +03:00
Ties Stuij	63eb7ff47d	[ARM] Implement PAC return address signing mechanism for PACBTI-M This patch implements PAC return address signing for armv8-m. This patch roughly accomplishes the following things: - PAC and AUT instructions are generated. - They're part of the stack frame setup, so that shrink-wrapping can move them inwards to cover only part of a function - The auth code generated by PAC is saved across subroutine calls so that AUT can find it again to check - PAC is emitted before stacking registers (so that the SP it signs is the one on function entry). - The new pseudo-register ra_auth_code is mentioned in the DWARF frame data - With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT validates the corresponding value of SP - Emit correct unwind information when PAC is replaced by PACBTI - Handle tail calls correctly Some notes: We make the assembler accept the `.save {ra_auth_code}` directive that is emitted by the compiler when it saves a register that contains a return address authentication code. For EHABI we need to have the `FrameSetup` flag on the instruction and handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit `.save {ra_auth_code}`, instead of `.save {r12}`. For PACBTI-M, the instruction which computes return address PAC should use SP value before adjustment for the argument registers save are (used for variadic functions and when a parameter is is split between stack and register), but at the same it should be after the instruction that saves FPCXT when compiling a CMSE entry function. This patch moves the varargs SP adjustment after the FPCXT save (they are never enabled at the same time), so in a following patch handling of the `PAC` instruction can be placed between them. Epilogue emission code adjusted in a similar manner. PACBTI-M code generation should not emit any instructions for architectures v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases is handled separately by a future ticket. note on tail calls: If the called function has four arguments that occupy registers `r0`-`r3`, the only option for holding the function pointer itself is `r12`, but this register is used to keep the PAC during function/prologue epilogue and clobbers the function pointer. When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to keep six values - the four function arguments, the function pointer and the PAC, which is obviously impossible. One option would be to authenticate the return address before all callee-saved registers are restored, so we have a scratch register to temporarily keep the value of `r12`. The issue with this approach is that it violates a fundamental invariant that PAC is computed using CFA as a modifier. It would also mean using separate instructions to pop `lr` and the rest of the callee-saved registers, which would offset the advantages of doing a tail call. Instead, this patch disables indirect tail calls when the called function take four or more arguments and the return address sign and authentication is enabled for the caller function, conservatively assuming the caller function would spill LR. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Momchil Velikov - Ties Stuij Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112429	2021-12-07 10:15:19 +00:00
Ties Stuij	0fbb17458a	[ARM] Implement setjmp BTI placement for PACBTI-M This patch intends to guard indirect branches performed by longjmp by inserting BTI instructions after calls to setjmp. Calls with 'returns-twice' are lowered to a new pseudo-instruction named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Alexandros Lamprineas - Ties Stuij Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D112427	2021-12-06 11:07:10 +00:00
Kristina Bessonova	0ac75e82ff	Reland [DwarfDebug] Move emission of global vars, types and imports to endModule() This patch proposes to move emission of global variables, types, imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule(). Effectively, this changes nothing but the order of debug entities which will be as follows: * subprograms (including related context, local variables/labels, local imported entities; related types can be created as a part of the emission of local entities of an abstract subprogram); * global variables (including related context and types); * retained types and enums; * non-local-scoped imported entities; * basic types; * other types left (as a part of local variables attributes emission). Note that the order of emitted compile units may also be changed as now we emit units that contain subprograms first and then all other non-empty units. The motivation behind this change is the following: (1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline, from this time IR can be significantly changed by target-specific passes. If it happens for debug metadata of global entities, those changes will not be reflected in the emitted DWARF. (2) imported subprogram names should refer to an abstract subprogram if it exists, but it isn't known in DwarfDebug::beginModule() (it's possible to make some guesses based on location info, but it's not quite reliable); (3) aforementioned entities if they are scoped within a bracketed block (subject of D113741) couldn't be emitted in DwarfDebug::beginModule() (they need parent emitted first). Another problem is if to try to gather some information about local entities and defer their emission (till subprogram's processing or DwarfDebug::endModule()) all the gathered details might be irrelevant / invalid by the time the entities are being emitted (because of (1)). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114705	2021-12-05 13:56:45 +02:00
David Green	57ff805a6d	[DAG] Create fptoui.sat from clamped fptosi As an extension to D111976, this converts clamp fptosi, clamped between 0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with conversions that naturally saturate, such as Arm. X86 disables the transform as some of the test cases increases in size. A fptoui.sat necessitates a fp clamp without native support, so there is little use in converting if the instruction is just going to be expanded. Differential Revision: https://reviews.llvm.org/D112428	2021-12-05 09:25:52 +00:00

1 2 3 4 5 ...

4503 Commits