llvm-project

Author	SHA1	Message	Date
Craig Topper	d5d1417659	[RISCV][GISel] Use libcalls for rint, nearbyint, trunc, round, and roundeven intrinsics. (#108779 )	2024-09-18 12:07:44 -07:00
Craig Topper	292ee93a87	[CodeGen] Use Register in SwitchLoweringUtils. NFC (#109092 ) Use an empty Register() instead of -1U.	2024-09-18 09:43:21 -07:00
Phoebe Wang	a10c9f994b	Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140 ) Reverts llvm/llvm-project#109116	2024-09-18 21:35:38 +08:00
Phoebe Wang	76eda76f9f	[X86][BF16] Add libcall for F80 -> BF16 (#109116 ) This fixes #108936, but the calling convention doesn't match with GCC. I doubt we have such a lib function for now, so leave the calling convention as is.	2024-09-18 21:23:10 +08:00
Craig Topper	9d3ab1c36e	[SelectionDAGBuilder] Use Register in more places. NFC"	2024-09-17 23:49:58 -07:00
Craig Topper	fe012bd52d	[SelectionDAG] Use Register around RegisterSDNode related functions. NFC RegisterSDNode itself already stored a Register.	2024-09-17 23:26:56 -07:00
Craig Topper	ca0613e0fc	[LegalizeFloatTypes] Handle replacement for strict ops inside SoftPromoteHalfOp_FP_TO_XINT. NFC Return SDValue() so we can notify the caller we did all replacements. Restore the getNumValues() == 1 check in the assert in the caller now that all handles only return nodes with a single result.	2024-09-17 16:25:10 -07:00
Michael Maitland	e08c2178ef	[MachineVerifier] Fix bug in MachineVerifier for G_INSERT_SUBVECTOR (#109048 )	2024-09-17 16:57:41 -04:00
Stephen Tozer	51a29b5f16	Revert2 "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instruction in BB (#105524 )" Reverted due to large .debug_line size regressions for some configurations; work currently in place to improve the output of this behaviour in PR #108251. This patch also modifies two tests that were created or modified after the original commit landed and are affected by the revert: llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll llvm/test/DebugInfo/X86/empty-line-info.ll This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.	2024-09-17 18:29:20 +01:00
Craig Topper	da46244e49	Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific." This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a. Regression was reported in Halide for arm32.	2024-09-17 09:04:43 -07:00
Craig Topper	f36580fcb5	[LegalizeVectorOps] Remove calls to DAG.UnrollVectorsOps from some expansion handlers. NFC (#108930 ) Instead, return SDValue() to tell the caller to do the unrolling. This is consistent with how some other handler work. Especially the handlers that live in TLI. ExpandBITREVERSE was rewritten to not take the Results vector an argument.	2024-09-17 08:35:22 -07:00
David Green	2242cd2b6a	[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959 ) The same is true for and / xor reductions, where the sext / zext can be sank down through the bitwise operation. https://alive2.llvm.org/ce/z/TvzCd5	2024-09-17 15:24:00 +01:00
Mikhail R. Gadelha	d2125e1db6	[RISCV] Support STRICT_UINT_TO_FP and STRICT_SINT_TO_FP (#102503 ) This patch adds support for the missing STRICT_UINT_TO_FP and STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was previously crashing. The code is in line with how other strict_* nodes are handled (e.g., getting op(1) instead of op(0) when it's a strict node, as op(0) in a strict node is the entry token).	2024-09-17 11:21:52 -03:00
Michael Maitland	ee2add0683	[GISEL] Fix bugs and clarify spec of G_EXTRACT_SUBVECTOR (#108848 ) The implementation was missing the fact that `G_EXTRACT_SUBVECTOR` destination and source vector can be different types. Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate the correct opcode. Clarify the G_EXTRACT_SUBVECTOR specification.	2024-09-17 10:08:39 -04:00
Thorsten Schütt	acfa294b5e	[GlobalIsel] Canonicalize G_FCMP (#108891 ) As a side-effect, we start constant folding fcmps.	2024-09-17 09:42:04 +02:00
Craig Topper	884ff9e3f9	[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific. Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.	2024-09-16 21:48:42 -07:00
David Green	960c975acd	[AArch64] Expand scmp/ucmp vector operations with sub (#108830 ) Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select, under Neon we can use the arithmetic expansion to generate fewer instructions. Notably it also prevents the scalarization of vselect during vector-legalization.	2024-09-16 18:44:52 +01:00
nebulark	f5ba3e1fa6	[CodeView] Flatten cmd args in frontend for LF_BUILDINFO (#106369 )	2024-09-16 19:29:42 +02:00
Thorsten Schütt	5c348f692a	[GlobalIsel] Canonicalize G_ICMP (#108755 ) As a side-effect, we start constant folding icmps. Split out from https://github.com/llvm/llvm-project/pull/105991.	2024-09-16 19:25:34 +02:00
David Green	feac761f37	[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297 ) This is an implementation of the saturating fp to int conversions for GlobalISel. On AArch64 the converstion instrctions work this way, producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is ported from SDAG. AArch64 has a lot of existing tests for fptosi_sat, covering a wide range of types. I have tried to make most of them work all at once, but a few fall back due to other missing features such as f128 handling for min/max.	2024-09-16 10:33:59 +01:00
ErikHogeman	e16ec9b45e	[SelectionDAG] Do not build illegal nodes with users (#108573 ) When we build a node with illegal type which has a user, it's possible that it can end up being processed by the DAG combiner later before it's removed, which can trigger an assert expecting the types to be legalized already.	2024-09-16 10:02:42 +01:00
Nikita Popov	dfa54298ff	[InitUndef] Enable the InitUndef pass on non-AMDGPU targets (#108353 ) The InitUndef pass works around a register allocation issue, where undef operands can be allocated to the same register as early-clobber result operands. This may lead to ISA constraint violations, where certain input and output registers are not allowed to overlap. Originally this pass was implemented for RISCV, and then extended to ARM in #77770. I've since removed the target-specific parts of the pass in #106744 and #107885. This PR reduces the pass to use a single requiresDisjointEarlyClobberAndUndef() target hook and enables it by default. The hook is disabled for AMDGPU, because overlapping early-clobber and undef operands are known to be safe for that target, and we get significant codegen diffs otherwise. The motivating case is the one in arm64-ldxr-stxr.ll, where we were previously incorrectly allocating a stxp input and output to the same register.	2024-09-16 09:48:25 +02:00
Craig Topper	a5b63b5cb7	[VirtRegMap] Store MCRegister in Virt2PhysMap. (#108775 ) Remove NO_PHYS_REG in favor of MCRegister() and converting MCRegister to bool.	2024-09-15 14:04:59 -07:00
Craig Topper	76b54df87a	[StackSlotColoring] Use Register for isLoadFromStackSlot/isStoreToStackSlot result. NFC	2024-09-15 12:05:28 -07:00
Craig Topper	23953798f3	[VirtRegMap] Remove unnecessary calls to Register::id() accessing IndexMaps. VirtReg2IndexFunctor already takes a Register.	2024-09-15 09:59:34 -07:00
Matt Arsenault	c49a1ae6d6	DAG: Reorder isFMAFasterThanFMulAndFAdd checks (NFC) Basic legality checks should be first.	2024-09-15 16:33:01 +04:00
Robert Dazi	8837898b8d	[DAGCombine] Count leading ones: refine post DAG/Type Legalisation if promotion (#102877 ) This PR is related to #99591. In this PR, instead of modifying how the legalisation occurs depending on surrounding instructions, we refine after legalisation. This PR has two parts: * `SDPatternMatch/MatchContext`: Modify a little bit the code to match Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to make it compatible with `VPMatchContext`, instead of only `m_Opc` supported. Some tests were added to ensure no regressions. * `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the patterns using matching context. Remaining Tasks: - [ ] GlobalISel - [ ] Currently the pattern matching will occur even before legalisation. Should I restrict it to specific stages instead ? - [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another location for style consistency purpose ? @topperc --------- Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>	2024-09-15 15:48:36 +04:00
Simon Pilgrim	5910e8d607	[DAG] visitUDIV - call SimplifyDemandedBits to handle hidden constant foldable cases Fixes #108728	2024-09-15 12:29:28 +01:00
Craig Topper	367c145e5f	[IRTranslator][RISCV] Support scalable vector zeroinitializer. (#108666 )	2024-09-14 15:46:18 -07:00
Craig Topper	947374c393	[IRTranslator] Simplify fixed vector ConstantAggregateZero handling. NFC (#108667 ) We don't need to loop through the elements, they're all the same zero. We can get the first element and create a splat build_vector.	2024-09-13 22:02:29 -07:00
Lawrence Benson	b74e779219	[x86] Add lowering for `@llvm.experimental.vector.compress` (#104904 ) This is a follow-up to #92289 that adds lowering of the new `@llvm.experimental.vector.compress` intrinsic on x86 with AVX512 instructions. This intrinsic maps directly to `vpcompress`.	2024-09-13 21:48:01 +02:00
Kazu Hirata	3a274584eb	[LiveDebugValues] Avoid repeated hash lookups (NFC) (#108484 )	2024-09-13 10:41:45 -07:00
Kazu Hirata	b9d85b1263	[CodeGen] Use DenseMap::operator[] (NFC) (#108489 ) Once we modernize CopyInfo with default member initializations, Copies.insert({Unit, ...}) becomes equivalent to: Copies.try_emplace(Unit) which we can simplify further down to Copies[Unit].	2024-09-13 10:04:33 -07:00
Simon Pilgrim	69a21154ca	[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(vec),c3) (#107987 ) Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold. Noticed while working on #107404	2024-09-13 15:13:58 +01:00
Juan Manuel Martinez Caamaño	09a4c23eb4	[NFC][EarlyIfConverter] Turn SSAIfConv into a local variable (#107390 )	2024-09-13 10:43:33 +02:00
Matt Arsenault	9578db9c11	DAG: Handle atomic fsub in node dumper	2024-09-13 10:22:27 +04:00
Craig Topper	a30b1d5a38	[SelectionDAG] Use Register in a few places in InstrEmitter. NFC	2024-09-12 10:29:17 -07:00
Craig Topper	8c05515032	[LegalizeIntegerTypes] Simplify ExpandIntRes_FP_TO_XINT when operand needs to be SoftPromoted. (#107634 ) Create an FP_EXTEND instead of handling the soft promote directly. This FP_EXTEND will be visited and soft promoted itself. This removes a zero extend from the generated code when the f32 type is itself softened. Previously we softened it as an fp16_to_fp which sees the operand as an integer type so we extend it. When we soften the result as an fp_extend we see the source as f16 and don't extend. It only becomes an integer inside call lowering not by type legalization. If this extend is really necessary, then we have an issue when an f16->f32 fp_extend exists in the source and f32 needs to be softened. This simplifies part of #102503.	2024-09-12 08:28:06 -07:00
Joe Faulls	bf8101e4fd	[CodeGen] Clear InitUndef pass new register cache between pass runs (#90967 ) Multiple invocations of the pass could interfere with eachother, preventing some undefs being initialised. I found it very difficult to create a unit test for this due to it being dependent on particular allocations of a previous function. However, the bug can be observed here: https://godbolt.org/z/7xnMo41Gv with the creation of the illegal instruction `vnsrl.wi v9, v8, 0`	2024-09-12 15:01:55 +02:00
Nikita Popov	e2723c2a8a	[InitUndef] Only compute DeadLaneDetector if subreg liveness enabled (NFC) (#108279 ) InitUndef currently always computes DeadLaneDetector, but only actually uses it if subreg liveness is enabled for the target. Make the calculation optional to avoid an unnecessary compile-time impact for targets that don't enable subreg liveness.	2024-09-12 09:00:47 +02:00
Thorsten Schütt	ba4bcce5f5	[GlobalIsel] Combine trunc of binop (#107721 ) trunc (binop X, C) --> binop (trunc X, trunc C) --> binop (trunc X, C`) Try to narrow the width of math or bitwise logic instructions by pulling a truncate ahead of binary operators. Vx and Nx cores consider 32-bit and 64-bit basic arithmetic equal in costs.	2024-09-11 15:04:55 +02:00
Nikita Popov	1e3a24d2e4	[InitUndef] Don't use largest super class (#107885 ) The InitUndef pass currently uses the getLargestSuperClass() hook (which is only used by that pass) to chose the register to initialize. This was done to reduce the number of undef init pseudos needed, e.g. so that the vrnov0 regclass would use the same pseudo as v0. After #106744 we use a single generic pseudo, so this is no longer necessary.	2024-09-11 09:36:20 +02:00
YunQiang Su	5773adb0bf	SelectionDAG: Remove unneeded getSelectCC in expandFMINIMUMNUM_FMAXIMUMNUM (#107416 ) ISD::FCANONICALIZE is enough, which can process NaN or non-NaN correctly, thus getSelectCC is not needed here.	2024-09-11 09:53:04 +08:00
Craig Topper	d2f25e5405	[LegalizeTypes] Avoid creating an unused node in ExpandIntRes_ADDSUB. NFC The Hi result is sometimes calculated a different way and this node goes unused. Defer creation until we know for sure it is neeeded. The test changes is because the node creation order changed the names in the debug output.	2024-09-10 16:39:19 -07:00
Kyungwoo Lee	bf68403484	Attempt to fix [CGData][MachineOutliner] Global Outlining (#90074 ) (#108037 )	2024-09-10 08:21:25 -07:00
Kyungwoo Lee	0f52545289	[CGData][MachineOutliner] Global Outlining (#90074 ) This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past. The machine outliner now operates in one of three modes: 1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module. 2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the `__llvm_outline` section, which will be dead-stripped at link time. 3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw `__llvm_outline` sections from all native objects using the `llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline later). This depends on https://github.com/llvm/llvm-project/pull/105398. After this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang (https://github.com/llvm/llvm-project/pull/90304) will follow for each client side support. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-09-10 06:56:31 -07:00
Simon Pilgrim	7e07c1df67	[DAG] expandAVG - consistently use getShiftAmountConstant for constant shift amounts. NFC	2024-09-10 09:25:58 +01:00
Tobias Stadler	2d338bed00	[CodeGen] Refactor DeadMIElim isDead and GISel isTriviallyDead (#105956 ) Merge GlobalISel's isTriviallyDead and DeadMachineInstructionElim's isDead code and remove all unnecessary checks from the hot path by looping over the operands before doing any other checks. See #105950 for why DeadMIElim needs to remove LIFETIME markers even though they probably shouldn't generally be considered dead. x86 CTMark O3: -0.1% AArch64 GlobalISel CTMark O0: -0.6%, O2: -0.2%	2024-09-09 16:30:44 +02:00
Jeremy Morse	7a930ce327	[DWARF] Emit a minimal line-table for totally empty functions (#107267 ) In degenerate but legal inputs, we can have functions that have no source locations at all -- all the DebugLocs attached to instructions are empty. LLVM didn't produce any source location for the function; with this patch it will at least emit the function-scope source location. Demonstrated by empty-line-info.ll The XCOFF test modified has similar symptoms -- with this patch, the size of the ".dwline" section grows a bit, thus shifting some of the file internal offsets, which I've updated.	2024-09-09 12:54:45 +01:00
Craig Topper	f2b71491d1	[MC] Make MCRegisterInfo::getLLVMRegNum return std::optional<MCRegister>. NFC (#107776 )	2024-09-08 21:21:51 -07:00

1 2 3 4 5 ...

36459 Commits