llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	ca05058b49	[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612 ) Turn the following deinterleaved load patterns ``` %l = masked.load(%ptr, /mask=/110110110110, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] %f2 = shufflevector %l, [2, 5, 8, 11] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1111) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 %f2 = poison ``` The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component. Similarly, turning the following snippet ``` %l = masked.load(%ptr, /mask=/110000110000, /passthru=/poison) %f0 = shufflevector %l, [0, 3, 6, 9] %f1 = shufflevector %l, [1, 4, 7, 10] ``` into ``` %s = riscv.vlsseg2(/passthru=/poison, %ptr, /mask=/1010) %f0 = extractvalue %s, 0 %f1 = extractvalue %s, 1 ``` Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.	2025-08-12 14:08:18 -07:00
AZero13	6a425f1e54	[ARM] Have custom lowering for ucmp and scmp (#149315 ) Limited to non-thumb1 for scmp at the moment, since there is no good way to do it.	2025-08-08 06:51:18 +01:00
Philip Reames	dbd9eae95a	[IA] Support vp.store in lowerinterleavedStore (#149605 ) Follow up to 28417e64, and the whole line of work started with 4b81dc7. This change merges the handling for VPStore - currently in lowerInterleavedVPStore - into the existing dedicated routine used in the shuffle lowering path. This removes the last use of the dedicated lowerInterleavedVPStore and thus we can remove it. This contains two changes which are functional. First, like in 28417e64, merging support for vp.store exposes the strided store optimization for code using vp.store. Second, it seems the strided store case had a significant missed optimization. We were performing the strided store at the full unit strided store type width (i.e. LMUL) rather than reducing it to match the input width. This became obvious when I tried to use the mask created by the helper routine as it caused a type incompatibility. Normally, I'd try not to include an optimization in an API rework, but structuring the code to both be correct for vp.store and not optimize the existing case turned out be more involved than seemed worthwhile. I could pull this part out as a pre-change, but its a bit awkward on it's own as it turns out to be somewhat of a half step on the possible optimization; the full optimization is complex with the old code structure. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>	2025-07-22 15:50:17 -07:00
Philip Reames	28417e6459	[IA] Support vp.load in lowerInterleavedLoad [nfc-ish] (#149174 ) This continues in the direction started by commit 4b81dc7. We essentially merges the handling for VPLoad - currently in lowerInterleavedVPLoad - into the existing dedicated routine. This removes the last use of the dedicate lowerInterleavedVPLoad and thus we can remove it. This isn't quite NFC as the main callback has support for the strided load optimization whereas the VPLoad specific version didn't. So this adds the ability to form a strided load for a vp.load deinterleave with one shuffle used.	2025-07-17 17:29:28 -07:00
Boyao Wang	697beb3f17	[TargetLowering] Change getOptimalMemOpType and findOptimalMemOpLowering to take LLVM Context (#147664 ) Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So that we can use EVT::getVectorVT to generate EVT type in getOptimalMemOpType. Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).	2025-07-10 11:11:09 +08:00
AZero13	8bd6d36a44	[ARM] Override hasAndNotCompare (#145441 ) bics is available on ARM. USAT regressions are to be fixed after this because that is an issue with the ARMISelLowering and should be another PR. Note that opt optimizes those testcases to min/max intrinsics anyway so this should have no real effect on codegen. Proof: https://alive2.llvm.org/ce/z/kPVQ3_	2025-06-29 11:15:56 +01:00
Matt Arsenault	874a02f05b	ARM: Move ABI helpers from Subtarget to TargetMachine (#144680 ) These are module level concepts, and attaching them to the function level subtarget is confusing. Similarly these other helpers that only operate on the triple should also be removed from the subtarget.	2025-06-19 09:38:22 +09:00
Matt Arsenault	bc8908a4e9	ARM: Move declaration of supportSplitCSR to be public (#144679 ) This is an implementation of a public method from the base class, so it should also be public. Avoids unrelated diff in a future patch.	2025-06-19 09:36:10 +09:00
Marius Kamp	10647685ca	[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554 ) This change adds new parameters to the method `shouldFoldSelectWithIdentityConstant()`. The method now takes the opcode of the select node and the non-identity operand of the select node. To gain access to the appropriate arguments, the call of `shouldFoldSelectWithIdentityConstant()` is moved after all other checks have been performed. Moreover, this change adjusts the precondition of the fold so that it would work for `SELECT` nodes in addition to `VSELECT` nodes. No functional change is intended because all implementations of `shouldFoldSelectWithIdentityConstant()` are adjusted such that they restrict the fold to a `VSELECT` node; the same restriction as before. The rationale of this change is to make more fine grained decisions possible when to revert the InstCombine canonicalization of `(select c (binop x y) y)` to `(binop (select c x idc) y)` in the backends.	2025-05-29 09:46:39 +02:00
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
Sergei Barannikov	9ae92d7056	[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969 ) With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709). Pull Request: https://github.com/llvm/llvm-project/pull/119969	2024-12-21 05:29:51 +03:00
David Green	e020f46027	[ARM] Fix BF16 lowering with FullFP16 This adds test coverage for bf16 instructions, making sure that lowering bf16 works with and without +fullfp16.	2024-12-19 10:20:35 +00:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Sergei Barannikov	aff98e4be0	[ARM] Stop gluing 1-bit shifts (#116547 ) 1. When two (or more) nodes are glued, DAG scheduler will always schedule them as one piece, i.e. it will not allow any instructions to be scheduled between them. It does so because if nodes are glued this usually means that there is an implicit register dependency between them, and an intervening node could clobber this physical register. When emitting such nodes into machine IR, they will also be stuck together, e.g.: ``` %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr %10:gpr = RRX %3, implicit $cpsr ``` 2. If a node has Glue result, SelectionDAG will not try to CSE this node. If it did, it would break the implicit physical register dependency. In practice this means that if a node with Glue result has multiple uses, it has to be duplicated before each use. This the reason for `ARMTargetLowering::duplicateCmp` to exist. When using normal data dependency, dependent nodes can freely be scheduled around. If there is a physical register dependency between nodes, the physical register will be copied to/from a virtual register, allowing other nodes to intervene between them. The resulting machine IR might look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = COPY $cpsr %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al /, $noreg, $noreg %12:gpr = BICri killed %11, -2147483648, 14 / CC::al /, $noreg, $noreg $cpsr = COPY %10 %13:gpr = RRX %3, implicit $cpsr ``` The two copies are likely to be eliminated by register coalescer, given that there are no instructions between them that clobber this physical register. If the copies are unwanted in the first place (they could be expensive or impossible), DAG scheduler will try to avoid inserting them wherever possible, and the resulting machine IR will look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = ORRrsi killed %9, %3, 242, 14 / CC::al /, $noreg, $noreg %11:gpr = BICri killed %10, -2147483648, 14 / CC::al */, $noreg, $noreg %12:gpr = RRX %3, implicit $cpsr ``` On ARM, arithmetic operations and LSLS already use the new data flow approach. This patch extends it to include 1-bit shifts. Pull Request: https://github.com/llvm/llvm-project/pull/116547	2024-11-19 17:46:48 +03:00
Oliver Stannard	376d7b27fa	[ARM] Optimise byval arguments in tail-calls We don't need to copy byval arguments to tail calls via a temporary, if we can prove that we are not copying from the outgoing argument area. This patch does this when the source if the argument is one of: * Memory in the local stack frame, which can't be used for tail-call arguments. * A global variable. We can also avoid doing the copy completely if the source and destination are the same memory location, which is the case when the caller and callee have the same signature, and pass some arguments through unmodified.	2024-10-25 09:34:09 +01:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Lucas Duarte Prates	78ff617d3f	[ARM] CMSE security mitigation on function arguments and returned values (#89944 ) The ABI mandates two things related to function calls: - Function arguments must be sign- or zero-extended to the register size by the caller. - Return values must be sign- or zero-extended to the register size by the callee. As consequence, callees can assume that function arguments have been extended and so can callers with regards to return values. Here lies the problem: Nonsecure code might deliberately ignore this mandate with the intent of attempting an exploit. It might try to pass values that lie outside the expected type's value range in order to trigger undefined behaviour, e.g. out of bounds access. With the mitigation implemented, Secure code always performs extension of values passed by Nonsecure code. This addresses the vulnerability described in CVE-2024-0151. Patches by Victor Campos. --------- Co-authored-by: Victor Campos <victor.campos@arm.com>	2024-06-20 10:22:01 +01:00
Sergei Barannikov	23c1b488fe	[ARM] Remove duplicate custom SDag node (NFCI) (#93419 ) ARMISD::SUBS is a duplicate of ARMISD::SUBC. The node was introduced in 5745b6ac. This patch replaces SUBS with SUBC and reverts changes in *.td files.	2024-06-15 12:08:32 +03:00
Reid Kleckner	385faf9cde	[ARM/X86] Standardize the isEligibleForTailCallOptimization prototypes (#90688 ) Pass in CallLoweringInfo (CLI) instead of passing in the various fields directly. Also pass in CCState (CCInfo), which is computed in both the caller and the callee for a minor efficiency saving. There may also be a small correctness improvement for sibcalls with vectorcall, which has an odd way of recomputing argument locations. This is a step towards improving the handling of musttail on armv7, which we have numerous issues filed about in our tracker. I took inspiration for this from the RISCV tail call eligibility check, which uses a similar prototype.	2024-05-03 13:56:55 -07:00
Nikita Popov	a71565d75e	[ARM] Don't include IRBuilder.h in ARMISelLowering.h (NFC) Just a forward declaration is sufficient.	2024-04-15 15:44:40 +09:00
Harald van Dijk	52864d9c7b	[ARM] Switch to soft promoting half types. (#80440 ) The traditional promotion is known to generate wrong code. Fixes #73805.	2024-02-02 21:40:40 +00:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
Serge Pavlov	2f81788067	[ARM][FPEnv] Lowering of fpmode intrinsics (#74054 ) LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate control modes, the bits of FP environment that affect FP operations. On ARM these bits are in FPSCR together with the status bits. The implementation of these intrinsics produces code close to that of functions `fegetmode` and `fesetmode` from GLIBC. Pull request: https://github.com/llvm/llvm-project/pull/74054	2023-12-18 18:57:36 +07:00
Nick Desaulniers	330fa7d2a4	[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057 ) Given a list of constraints for InlineAsm (ex. "imr") I'm looking to modify the order in which they are chosen. Before doing so, I noticed a fair amount of logic is duplicated between SelectionDAGISel and GlobalISel for this. That is because SelectionDAGISel is also trying to lower immediates during selection. If we detangle these concerns into: 1. choose the preferred constraint 2. attempt to lower that constraint Then we can slide down the list of constraints until we find one that can be lowered. That allows the implementation to be shared between instruction selection frameworks. This makes it so that later I might only need to adjust the priority of constraints in one place, and have both selectors behave the same.	2023-09-25 08:53:03 -07:00
Nick Desaulniers	86735a4353	reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66264 ) reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5. Fix up build failures in targets I missed in #66003 Kept as 3 commits for reviewers to see better what's changed. Will squash when merging. - reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003) - fix all the targets I missed in #66003 - fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll	2023-09-13 13:31:24 -07:00
Reid Kleckner	ee643b706b	Revert "[InlineAsm] wrap ConstraintCode in enum class NFC (#66003 )" This reverts commit 2ca4d136124d151216aac77a0403dcb5c5835bcd. Also revert the followup, "[InlineAsm] fix botched merge conflict resolution" This reverts commit 8b9bf3a9f715ee5dce96eb1194441850c3663da1. There were SystemZ and Mips build errors, too many to fix forward.	2023-09-13 09:58:02 -07:00
Nick Desaulniers	2ca4d13612	[InlineAsm] wrap ConstraintCode in enum class NFC (#66003 ) Similar to commit 2fad6e69851e ("[InlineAsm] wrap Kind in enum class NFC") Fix the TODOs added in commit 93bd428742f9 ("[InlineAsm] refactor InlineAsm class NFC (#65649)")	2023-09-13 08:48:09 -07:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.	2023-07-13 14:25:39 +01:00
Jay Foad	58d1eaa3b6	[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI. Record the SP adjustment on entry to each basic block. This is almost always zero except on targets like ARM which can split a basic block in the middle of a call sequence. This simplifies PEI::replaceFrameIndices which previously had to visit basic blocks in a specific order and had special handling for unreachable blocks. More importantly it paves the way for an equally simple implementation of a backwards version of replaceFrameIndices, which is required to fully convert PrologEpilogInserter to backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D154281	2023-07-12 14:29:26 +01:00
Igor Kirillov	40a81d3100	[CodeGen] Refactor IR generation functions to use IRBuilder in ComplexDeinterleaving pass This patch updates several functions in LLVM's IR generation code to accept an IRBuilder object as an argument, rather than an Instruction that indicates the insertion point for new instructions. This change is necessary to handle sophisticated -Ofast optimization cases from D148558 where it's unclear which instructions should be used as the insertion point for new operations. Differential Revision: https://reviews.llvm.org/D148703	2023-05-30 16:18:28 +00:00
Kazu Hirata	ae056b56f9	[ARM] Remove unused declaration RemapAddSubWithFlags The corresponding function definition was removed by: commit e891654a5855a43104a4f3744a754c5e028c03c7 Author: Evan Cheng <evan.cheng@apple.com> Date: Tue Aug 30 01:34:54 2011 +0000 While we are at it, this patch removes ARMPCLabelIndex, for which the host compiler issues an unused variable warning.	2023-05-16 18:42:47 -07:00
Kazu Hirata	fbaf65c8eb	[ARM] Remove unused declaration LowerGLOBAL_OFFSET_TABLE The unused declaration was introduced without a corresponding function definition by: commit bd41cf880c9f3a65c9366565fa4db2ddb6b57e1c Author: Tim Northover <tnorthover@apple.com> Date: Thu Jan 7 09:03:03 2016 +0000	2023-05-06 17:31:39 -07:00
NAKAMURA Takumi	c1221251fb	Restore CodeGen/MachineValueType.h from `Support` This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024	2023-05-03 00:13:20 +09:00
David Green	d321f3aa64	[ARM] Enable shouldFoldSelectWithIdentityConstant for MVE We already have tablegen patterns for a lot of these, but performing the combine earlier in DAG can help in a few extra cases. Differential Revision: https://reviews.llvm.org/D149269	2023-04-28 14:57:51 +01:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Jake Egan	08533f8b86	Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.	2023-02-14 15:20:06 -05:00
Alex Richardson	bd87a2449d	[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282	2023-02-09 10:11:40 +00:00
David Green	1af3f596f6	[DAG] Fold Op(vecreduce(a), vecreduce(b)) into vecreduce(Op(a,b)) So long as the operation is reassociative, we can reassociate the double vecreduce from for example fadd(vecreduce(a), vecreduce(b)) to vecreduce(fadd(a,b)). This will in general save a few instructions, but some architectures (MVE) require the opposite fold, so a shouldExpandReduction is added to account for it. Only targets that use shouldExpandReduction will be affected. Differential Revision: https://reviews.llvm.org/D141870	2023-02-08 11:43:36 +00:00
David Green	dfe11c0021	[ARM] Add various tests for reductions of shuffles. NFC	2023-02-07 10:06:58 +00:00
Roman Lebedev	cc39c3b17f	[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas that have variably-indexed loads. That does bring up questions of cost model, since that requires creating wide shifts. Indeed, our legalization for them is not optimal. We either split it into parts, or lower it into a libcall. But if the shift amount is by a multiple of CHAR_BIT, we can also legalize it throught stack. The basic idea is very simple: 1. Get a stack slot 2x the width of the shift type 2. store the value we are shifting into one half of the slot 3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit 4. index into the slot (starting from the base half into which we spilled, either upwards or downwards) 5. load 6. split loaded integer This works for both little-endian and big-endian machines: https://alive2.llvm.org/ce/z/YNVwd5 And better yet, if the original shift amount was not a multiple of CHAR_BIT, we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K I think, if we are going perform shift->shift-by-parts expansion more than once, we should instead go through stack, which is what this patch does. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D140638	2023-01-14 19:12:18 +03:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00
Krzysztof Parzyszek	864aaa21b4	TargetLowering: convert Optional to std::optional	2022-12-01 16:19:10 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Nicholas Guy	d52e2839f3	[ARM][CodeGen] Add support for complex deinterleaving Adds the Complex Deinterleaving Pass implementing support for complex numbers in a target-independent manner, deferring to the TargetLowering for the given target to create a target-specific intrinsic. Differential Revision: https://reviews.llvm.org/D114174	2022-11-14 14:02:27 +00:00
Momchil Velikov	6602110152	[ARM] Enable and/cmp0 folding The `CodeGenPrepare` pass can sink bitwise `and` used by compare to zero into the basic blocks where the users are. This operation is guarded by lowering hook, which is disabled for ARM. In the ARM architecture versions from v7-M up these two operations can be folded into `tst rN, #imm` instruction. Sinking of `and` can also enable the cmov-to-bfi DAG combiner. This patch fixes some benchmark regressions caused by https://reviews.llvm.org/D129370 as well scoring slightly better overall. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D134360	2022-09-26 11:31:23 +01:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00

1 2 3 4 5 ...

596 Commits