llvm-project

Author	SHA1	Message	Date
Alex MacLean	9affa177b5	[NVPTX] Add support for calling aliases (#81170 ) The current implementation of aliases tries to remove all the aliases in the module to prevent the generic version of `AsmPrinter` from emitting them incorrectly. Unfortunately, if the aliases are used this will fail. Instead let's override the function to print aliases directly. In addition, the declarations of the alias functions must occur before the uses. To fix this we emit alias declarations as part of `emitDeclarations` and only emit the `.alias` directives at the end (where we can assume the aliasee has also already been declared).	2024-02-08 17:14:13 -06:00
Francesco Petrogalli	fffcc5ca83	[CodeGen] Add ValueType v3i8 (NFCI). (#80826 )	2024-02-08 16:54:12 +01:00
Jeremy Morse	a643ab852a	[DebugInfo][RemoveDIs] Final omnibus test fixing for RemoveDIs (#81125 ) With this, I get a clean test suite running under RemoveDIs, the non-intrinsic representation of debug-info, including under asan. We've previously established that we generate identical binaries for some large projects, so this i just edge-case cleanup. The changes: * CodeGenPrepare fixups need to apply to dbg.assigns as well as dbg.values (a dbg.assign is a dbg.value). * Pin a test for constant-deletion to intrinsic debug-info: this very rare scenario uses a different kill-location sigil in dbg.value mode to RemoveDIs mode, which generates spurious test differences. * Suppress a memory leak in a unit test: the code for dealing with trailing debug-info in a block is necessarily fiddly, leading to this leak when testing it. Developer-facing interfaces for moving instructions around always deal with this behind the scenes. * SROA, when replacing some vector-loads, needs to insert the replacement loads ahead of any debug-info records so that their values remain dominated by a definition. Set the head-bit indicating our insertion should come before debug-info.	2024-02-08 11:49:04 +00:00
Simon Pilgrim	b35c519762	[DAG] tryToFoldExtendOfConstant - share the same SDLoc argument instead of recreating it over and over again.	2024-02-08 11:43:29 +00:00
Jeremy Morse	faa2f9658a	[DebugInfo] Handle dbg.assigns in FastISel (#80734 ) There are some rare circumstances where dbg.assign intrinsics can reach FastISel. They are a more specialised kind of dbg.value intrinsic with more information about the originating alloca. They only occur during optimisation, but might reach FastISel through always_inlining an optimised function into an optnone function. This is a slight problem as it's not safe (for debug-info accuracy) to ignore any intrinsics, and for RemoveDIs (the intrinsic-replacement project) it causes a crash through an unhandled switch case. To get around this, we can just treat the dbg.assign as a dbg.value (it's an actual subclass) and use the variable location information from the dbg.value fields. This loses a small amount of debug-info about stack locations, but is more accurate than just ignoring the intrinsic. (This has popped up deep in an LTO build of a large codebase while testing RemoveDIs, I figured it'd be good to fix it for the intrinsic-form at the same time, just to demonstrate the correct behaviour).	2024-02-08 10:44:43 +00:00
Luke Lau	ece66dbc60	[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR (#80452 ) This handles two cases where we can work out some known-zero bits for ISD::STEP_VECTOR. The first case handles when we know the low bits are zero because the step amount is a power of two. This is taken from https://reviews.llvm.org/D128159, and even though the original patch didn't end up landing this case due to it not having any test difference, I've included it here for completeness's sake. The second case handles the case when we have an upper bound on vscale_range. We can use this to work out the upper bound on the number of elements, and thus what the maximum step will be. From the maximum step we then know which hi bits are zero. On its own, computing the known hi bits results in some small improvements for RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping to be able to use this later to reduce the LMUL in index calculations for vrgather/indexed accesses. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2024-02-08 10:04:55 +08:00
Arthur Eubanks	d05bd34a18	[NFC][NewPM/Codegen] Remove unused parameter from verifyMachineFunction The MachineFunctionAnalysisManager forward declaration is messing with upcoming changes.	2024-02-07 22:15:09 +00:00
Arthur Eubanks	bb531c9a00	[NewPM/Codegen] Move MachineModuleInfo ownership outside of analysis (#80937 ) With the legacy pass manager, MachineModuleInfoWrapperPass owned the MachineModuleInfo used in the codegen pipeline. It can do this since it's an ImmutablePass that doesn't get invalidated. However, with the new pass manager, it is legal for the ModuleAnalysisManager to clear all of its analyses, regardless of if the analysis does not want to be invalidated. So we must move ownership of the MachineModuleInfo outside of the analysis (this is similar to PassInstrumentation). For now, make the PassBuilder user register a MachineModuleAnalysis that returns a reference to a MachineModuleInfo that the user owns. Perhaps we can find a better place to own the MachineModuleInfo to make using the codegen pass manager less cumbersome in the future.	2024-02-07 09:15:43 -08:00
Craig Topper	79fec2f8ba	[AtomicExpand][RISCV] Call shouldExpandAtomicRMWInIR before widenPartwordAtomicRMW (#80947 ) This gives the target a chance to keep an atomicrmw op that is smaller than the minimum cmpxchg size. This is needed to support the Zabha extension for RISC-V which provides i8/i16 atomicrmw operations, but does not provide an i8/i16 cmpxchg or LR/SC instructions. This moves the widening until after the target requests LLSC/CmpXChg/MaskedIntrinsic expansion. Once we widen, we call shouldExpandAtomicRMWInIR again to give the target another chance to make a decision about the widened operation. I considered making the targets return AtomicExpansionKind::Expand or a new expansion kind for And/Or/Xor, but that required the targets to special case And/Or/Xor which they weren't currently doing.	2024-02-07 08:24:50 -08:00
Michael Maitland	c954986fec	[GISel] Add support for scalable vectors in getGCDType (#80307 ) This function can be called from buildCopyToRegs where at least one of the types is a scalable vector type. This function crashed because it did not know how to handle scalable vector types. This patch extends the functionality of getGCDType to handle when at least one of the types is a scalable vector. getGCDType between a fixed and scalable vector is not implemented since the docstring of the function explains that getGCDType is used to build MERGE/UNMERGE instructions and we will never build a MERGE/UNMERGE between fixed and scalable vectors. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-02-07 10:32:12 -05:00
Simon Pilgrim	de7beb06e7	[DAG] ExpandShiftWithKnownAmountBit - reduce number of repeated getOpcode / getOperand calls. NFC.	2024-02-07 14:07:02 +00:00
Simon Pilgrim	670c2529bb	[DAG] Use DAGCombiner::SimplifyDemandedBits wrappers with default (all) DemandedElts. NFC. Don't call TLI.SimplifyDemandedVectorElts directly from every SimplifyDemandedBits call, use the more expressive wrappers instead first. This reduces the number of places we call TLI.SimplifyDemandedVectorElts and CommitTargetLoweringOpt to make it easier to track. Part of the work to process DAG nodes in topological order.	2024-02-07 11:12:29 +00:00
Carl Ritson	9bda1de0b6	[TwoAddressInstruction] Propagate undef flags for partial defs (#79286 ) If part of a register (lowered from REG_SEQUENCE) is undefined then we should propagate undef flags to uses of those lanes. This is only performed when live intervals are present as it requires live intervals to correctly match uses to defs, and the primary goal is to allow precise computation of subrange intervals.	2024-02-07 16:46:00 +09:00
Wang Pengcheng	cb7561ac5a	[Sched] Add MacroFusion mutation if fusions are not empty (#72227 ) We can get the fusions list by `getMacroFusions` and if it is not empty, then we will add the MacroFusion mutation automatically.	2024-02-07 15:38:02 +08:00
Michael Maitland	055ac72ecc	[GISel] Add support for scalable vectors in getLCMType (#80306 ) This function can be called from buildCopyToRegs where at least one of the types is a scalable vector type. This function crashed because it did not know how to handle scalable vector types. This patch extends the functionality of getLCMType to handle when at least one of the types is a scalable vector. getLCMType between a fixed and scalable vector is not implemented since the docstring of the function explains that getLCMType is used to build MERGE/UNMERGE instructions and we will never build a MERGE/UNMERGE between fixed and scalable vectors.	2024-02-06 20:23:07 -05:00
Craig Topper	0fb9f68bae	[SelectionDAG] Use getRegisterType instead of getTypeToTransformTo in ComputePHILiveOutRegInfo. (#80773 ) Since we used getNumRegisters right before this, I think this is the correct interface we should be using here. I'm experimenting with making i32 legal on RISC-V 64, but using i64 for the register type between basic blocks. This was one of the first issues I found trying to do that.	2024-02-06 09:39:19 -08:00
Craig Topper	cca49663a5	[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803 ) The comment and code here seems to match getTypeForExtReturn. The history shows that at the time this code was added, similar code existed in SelectionDAGBuilder. SelectionDAGBuiler code has since been refactored into getTypeForExtReturn. This patch makes FastISel match SelectionDAGBuilder. The test changes are because X86 has customization of getTypeForExtReturn. So now we only extend returns to i8. Stumbled onto this difference by accident.	2024-02-06 09:38:25 -08:00
Thorsten Schütt	364f781344	[GlobalIsel] Combine logic of icmps (#77855 ) Inspired by InstCombinerImpl::foldAndOrOfICmpsUsingRanges with some adaptations to MIR.	2024-02-06 15:58:02 +01:00
David Green	2e3de997ab	[DAG] Generalize setcc(setcc) fold to use known bits. If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove the outer setcc as it will produce the same value as the inner. This can be generalized to anything where the top bits are known to be 0, as the value will remain as 1 or 0.	2024-02-06 12:39:48 +00:00
Simon Pilgrim	b8cdc2638e	[DAG] visitCTPOP - if only the upper half of the ctpop operand is zero then see if its profitable to only count the lower half. (#80473 )	2024-02-06 12:19:31 +00:00
paperchalice	c9fd738388	[CodeGen] Port DeadMachineInstructionElim to new pass manager (#80582 ) A simple enough op pass so we can test standard instrumentations in future.	2024-02-06 17:56:56 +08:00
Philip Reames	e722d9662d	[DAG] Avoid a crash when checking size of scalable type in visitANDLike Fixes https://github.com/llvm/llvm-project/issues/80744. This transform doesn't handled vectors at all, The fixed length ones pass the first check, but would fail the constant operand checks which immediate follow. This patch takes the simplest approach, and just guards the transform for scalar integers.	2024-02-05 14:30:10 -08:00
Jay Foad	abea3b2799	[RDF] Skip over NoRegister. NFCI. (#80672 ) This just avoids useless work of adding NoRegister to BaseSet, for consistency with other places that iterate over all physical registers.	2024-02-05 14:15:55 +00:00
Petar Avramovic	06f711a906	AMDGPU/GlobalISelDivergenceLowering: select divergent i1 phis (#80003 ) Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper. Use machine uniformity analysis to find divergent i1 phis and select them as lane mask phis in same way SILowerI1Copies select VReg_1 phis. Note that divergent i1 phis include phis created by LCSSA and all cases of uses outside of cycle are actually covered by "lowering LCSSA phis". GlobalISel lane masks are registers with sgpr register class and S1 LLT. TODO: General goal is that instructions created in this pass are fully instruction-selected so that selection of lane mask phis is not split across multiple passes. patch 3 from: https://github.com/llvm/llvm-project/pull/73337	2024-02-05 14:07:01 +01:00
Craig Topper	6590d0fed5	[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342 ) If we have a shifted mask, we may be able to reduce the load width to the width of the non-zero part of the mask and use an offset to the base address to remove the srl. The offset is given by C+trailingzeros(ShiftedMask). Then we add a final shl to restore the trailing zero bits. I've use the ARM test because that's where the existing (and (srl (load))) tests were. The X86 test was modified to keep the H register.	2024-02-04 16:05:51 -08:00
Craig Topper	f72da9f4fd	[SelectionDAG] Use getShiftAmountConstant to simplify code. NFC (#80561 ) Replace calls to getShiftAmountTy+getConstant with getShiftAmountContant.	2024-02-04 16:05:14 -08:00
Simon Pilgrim	114a33be47	[DAG] getStackAlignedMMO - return the getMachineMemOperand result directly (style). NFC.	2024-02-04 14:01:55 +00:00
Kazu Hirata	1b33b3f27f	[MIRParser] Simplify a string comparison (NFC)	2024-02-03 21:43:10 -08:00
Kazu Hirata	7d269a4841	[CodeGen] Use range-based for loops (NFC)	2024-02-03 21:43:01 -08:00
paperchalice	e7ec0c972e	[CodeGen] Port PrintMIR to new pass manager (#79440 ) The legacy version print machine functions to a string stream, then output the module and string in `doFinalization`. This patch break `MIRPrintingPass` into two parts `PrintMIRPreparePass` and `PrintMIRPass`. `PrintMIRPreparePass` output the original IR in yaml string, `PrintMIRPass` just print the machine function, so we can avoid the `doFinalization`.	2024-02-03 16:52:54 +08:00
Michael Maitland	ad0acf9ef6	[GISEL] More accounting for scalable vectors when operating on LLTs (#80372 ) This is stacked on by #80377 and #80378	2024-02-02 14:26:39 -05:00
Manish Kausik H	a768bc6ef6	[SelectionDAG] Use unaligned store to move AVX registers onto stack for `extractelement` (#78422 ) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes #77730	2024-02-02 22:49:31 +05:30
Harald van Dijk	274d1b000c	[NFC] Add useFPRegsForHalfType(). (#74147 ) Currently, half operations can be promoted in one of two ways. * If softPromoteHalfType() returns false, fp16 values are passed around in fp32 registers, and whole chains of fp16 operations are promoted to fp32 in one go. * If softPromoteHalfType() returns true, fp16 values are passed around in i16 registers, and individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. The softPromoteHalfType behavior is necessary for correctness, but changing this for an existing target breaks the ABI. Therefore, this commit adds a third option: * If softPromoteHalfType() returns true and useFPRegsForHalfType() returns true as well, fp16 values are passed around in fp32 registers, but individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. This change does not yet update any target to make use of it.	2024-02-02 14:05:13 +00:00
Rahman Lavaee	acec6419e8	[SHT_LLVM_BB_ADDR_MAP] Allow basic-block-sections and labels be used together by decoupling the handling of the two features. (#74128 ) Today `-split-machine-functions` and `-fbasic-block-sections={all,list}` cannot be combined with `-basic-block-sections=labels` (the labels option will be ignored). The inconsistency comes from the way basic block address map -- the underlying mechanism for basic block labels -- encodes basic block addresses (https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). Specifically, basic block offsets are computed relative to the function begin symbol. This relies on functions being contiguous which is not the case for MFS and basic block section binaries. This means Propeller cannot use binary profiles collected from these binaries, which limits the applicability of Propeller for iterative optimization. To make the `SHT_LLVM_BB_ADDR_MAP` feature work with basic block section binaries, we propose modifying the encoding of this section as follows. First let us review the current encoding which emits the address of each function and its number of basic blocks, followed by basic block entries for each basic block. \| \| \| \|--\|--\| \| Address of the function \| Function Address \| \| Number of basic blocks in this function \| NumBlocks \| \| BB entry 1 \| BB entry 2 \| ... \| BB entry #NumBlocks To make this work for basic block sections, we treat each basic block section similar to a function, except that basic block sections of the same function must be encapsulated in the same structure so we can map all of them to their single function. We modify the encoding to first emit the number of basic block sections (BB ranges) in the function. Then we emit the address map of each basic block section section as before: the base address of the section, its number of blocks, and BB entries for its basic block. The first section in the BB address map is always the function entry section. \| \| \| \|--\|--\| \| Number of sections for this function \| NumBBRanges \| \| Section 1 begin address \| BaseAddress[1] \| \| Number of basic blocks in section 1 \| NumBlocks[1] \| \| BB entries for Section 1 \|..................\| \| Section #NumBBRanges begin address \| BaseAddress[NumBBRanges] \| \| Number of basic blocks in section #NumBBRanges \| NumBlocks[NumBBRanges] \| \| BB entries for Section #NumBBRanges The encoding of basic block entries remains as before with the minor change that each basic block offset is now computed relative to the begin symbol of its containing BB section. This patch adds a new boolean codegen option `-basic-block-address-map`. Correspondingly, the front-end flag `-fbasic-block-address-map` and LLD flag `--lto-basic-block-address-map` are introduced. Analogously, we add a new TargetOption field `BBAddrMap`. This means BB address maps are either generated for all functions in the compiling unit, or for none (depending on `TargetOptions::BBAddrMap`). This patch keeps the functionality of the old `-fbasic-block-sections=labels` option but does not remove it. A subsequent patch will remove the obsolete option. We refactor the `BasicBlockSections` pass by separating the BB address map and BB sections handing to their own functions (named `handleBBAddrMap` and `handleBBSections`). `handleBBSections` renumbers basic blocks and places them in their assigned sections. `handleBBAddrMap` is invoked after `handleBBSections` (if requested) and only renumbers the blocks. - New tests added: - Two tests basic-block-address-map-with-basic-block-sections.ll and basic-block-address-map-with-mfs.ll to exercise the combination of `-basic-block-address-map` with `-basic-block-sections=list` and '-split-machine-functions`. - A driver sanity test for the `-fbasic-block-address-map` option (basic-block-address-map.c). - An LLD test for testing the `--lto-basic-block-address-map` option. This reuses the LLVM IR from `lld/test/ELF/lto/basic-block-sections.ll`. - Renamed and modified the two existing codegen tests for basic block address map (`basic-block-sections-labels-functions-sections.ll` and `basic-block-sections-labels.ll`) - Removed `SHT_LLVM_BB_ADDR_MAP_V0` tests. Full deprecation of `SHT_LLVM_BB_ADDR_MAP_V0` and `SHT_LLVM_BB_ADDR_MAP` version less than 2 will happen in a separate PR in a few months.	2024-02-01 17:50:46 -08:00
Craig Topper	5cf0fb4317	[StackSlotColoring] Ignore non-spill objects in RemoveDeadStores. (#80242 ) The stack slot coloring pass is concerned with optimizing spill slots. If any change is a pass is made over the function to remove stack stores that use the same register and stack slot as an immediately preceding load. The register check is too simple for constant registers like AArch64 and RISC-V's zero register. This register can be used as the result of a load if we want to discard the result, but still have the memory access performed. Like for a volatile or atomic load. If the code sees a load from the zero register followed by a store of the zero register at the same stack slot, the pass mistakenly believes the store isn't needed. Since the main stack coloring optimization is only concerned with spill slots, it seems reasonable that RemoveDeadStores should only be concerned with spills. Since we never generate a reload of x0, this avoids the issue seen by RISC-V. Test case concept is adapted from pr30821.mir from X86. That test had to be updated to mark the stack slot as a spill slot. Fixes #80052.	2024-02-01 13:25:15 -08:00
Jiahan Xie	10c2d5ff7c	[RISCV][GISel] RegBank select and instruction select for vector G_ADD, G_SUB (#74114 ) RegisterBank Selection for scalable vector G_ADD and G_SUB by creating new mappings for different types of vector register banks. Then implement Instruction Selection for the same operations by choosing the correct RISC-V vector register class.	2024-02-01 15:06:43 -05:00
Quentin Dian	112fba974c	[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147 ) Per #80143, we can remove the extra line break when there is no instruction.	2024-02-01 22:10:52 +08:00
Kazu Hirata	39fa304866	[llvm] Use StringRef::starts_with (NFC)	2024-01-31 23:54:07 -08:00
wangpc	995d21bc6f	[SelectOpt] Print instruction instead of pointer Pull Request: https://github.com/llvm/llvm-project/pull/80125	2024-02-01 13:10:52 +08:00
Zaara Syeda	a03a6e9964	[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530 ) This patch adds support for common and local symbols in the TOC for AIX. Note that we need to update isVirtualSection so as a common symbol in TOC will have the symbol type XTY_CM and will be initialized when placed in the TOC so sections with this type are no longer virtual. --------- Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>	2024-01-31 16:34:21 -05:00
Jay Foad	baf1b19763	[CodeGen] Use regunits instead of MCRegUnitIterator in RegisterClassInfo. NFC.	2024-01-31 16:27:54 +00:00
Jay Foad	e34fd2e193	[CodeGen] Simplify RegisterClassInfo BitVector comparisons. NFC.	2024-01-31 16:25:19 +00:00
Nikita Popov	f2df4bfe54	[AsmParser] Support non-consecutive global value numbers (#80013 ) https://github.com/llvm/llvm-project/pull/78171 added support for non-consecutive local value numbers. This extends the support for global value numbers (for globals and functions). This means that it is now possible to delete an unnamed global definition/declaration without breaking the IR. This is a lot less common than unnamed local values, but it seems like something we should support for consistency. (Unnamed globals are used a lot in Rust though.)	2024-01-31 17:04:30 +01:00
Quentin Dian	b7738e275d	[MIRPrinter] Don't print space when there is no successor (#80143 ) Extra space causes the checks generated by update_mir_test_checks to be unavailable. ``` # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4 # RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir \| FileCheck %s --- name: foo body: \| ; CHECK-LABEL: name: foo ; CHECK: bb.0: ; CHECK-NEXT: successors: ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.1: ; CHECK-NEXT: RET 0, $eax bb.0: successors: bb.1: RET 0, $eax ... ``` The failure log is as follows: ``` llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match ; CHECK-NEXT: {{ $}} ^ <stdin>:21:13: note: 'next' match was here successors: ^ <stdin>:21:13: note: previous match ended here successors: ```	2024-01-31 22:35:41 +08:00
Simon Pilgrim	912cdd2179	[DAG] AddNodeIDCustom - call ShuffleVectorSDNode::getMask once instead of repeated getMaskElt calls. Use a simpler for-range loop to append all shuffle mask elements	2024-01-31 12:01:01 +00:00
Jay Foad	942cc9a222	Revert "[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder (#80015 )" This reverts commit f8525030004f907cd108e7c18df255a6d3b23124. It was supposed to speed things up but llvm-compile-time-tracker.com showed a slight slow down.	2024-01-31 10:25:51 +00:00
Jay Foad	f852503000	[CodeGen] Don't include aliases in RegisterClassInfo::IgnoreCSRForAllocOrder (#80015 ) Previously we called ignoreCSRForAllocationOrder on every alias of every CSR which was expensive on targets like AMDGPU which define a very large number of overlapping register tuples. On such targets it is simpler and faster to call ignoreCSRForAllocationOrder once for every physical register. Differential Revision: https://reviews.llvm.org/D146735	2024-01-31 08:16:06 +00:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
PiJoules	a356e6ccad	[SelectionDAG] Expand fixed point multiplication into libcall (#79352 ) 32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom nodes during expansion which will cause fixed point multiplication of _Accum types to fail with fixed point arithmetic. Prior to this, we just happen to use fixed point multiplication on platforms that happen to support these MULHS/MUL_LOHI. This patch attempts to check if the multiplication can be done via libcalls, which are provided by the arm runtime. These libcall attempts are made elsewhere, so this patch refactors that libcall logic into its own functions and the fixed point expansion calls and reuses that logic.	2024-01-30 13:58:55 -08:00
Jay Foad	77e5136ce4	[CodeGen] Use RegUnits in RegisterClassInfo::getLastCalleeSavedAlias (#79996 ) Change the implementation of getLastCalleeSavedAlias to use RegUnits instead of register aliases. This is much faster on targets like AMDGPU which define a very large number of overlapping register tuples. No functional change intended. If PhysReg overlaps multiple CSRs then getLastCalleeSavedAlias(PhysReg) could conceivably return a different arbitrary one, but currently it is only used for some debug printing anyway. Differential Revision: https://reviews.llvm.org/D146734	2024-01-30 14:06:45 +00:00

1 2 3 4 5 ...

35296 Commits