llvm-project

Author	SHA1	Message	Date
Noah Goldstein	ae76dfb747	[X86] Don't always separate conditions in `(br (and/or cond0, cond1))` into separate branches It makes sense to split if the cost of computing `cond1` is high (proportionally to how likely `cond0` is), but it doesn't really make sense to introduce a second branch if its only a few instructions. Splitting can also get in the way of potentially folding patterns. This patch introduces some logic to try to check if the cost of computing `cond1` is relatively low, and if so don't split the branches. Modest improvement on clang bootstrap build: https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=cycles Average stage2-O3: 0.59% Improvement (cycles) Average stage2-O0-g: 1.20% Improvement (cycles) Likewise on llvm-test-suite on SKX saw a net 0.84% improvement (cycles) There is also a modest compile time improvement with this patch: https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=instructions%3Au Note that the stage2 instruction count increases is expected, this patch trades instructions for decreasing branch-misses (which is proportionately lower): https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=branch-misses NB: This will also likely help for APX targets with the new `CCMP` and `CTEST` instructions. Closes #81689	2024-03-01 15:35:34 -06:00
David Green	dbca8a49b6	[DAG] Improve known bits of Zext/Sext loads with range metadata (#80829 ) This extends the known bits for extending loads which have range metadata, handling the range metadata on the original memory type, extending that to the correct BitWidth.	2024-02-29 12:53:13 +00:00
Craig Topper	e7a303e3cf	[SelectionDAG] Remove unused getIndexedStridedLoadVP/getIndexedStridedStoreVP functions. NFC (#82847 ) These appear to have been copied from getIndexedLoadVP/getIndexedStoreVP which in turn were copied from the non-VP versions.	2024-02-28 15:02:48 -08:00
David Green	6e41d60a71	[SelectionDAG] Change computeAliasing signature from optional<uint64> to LocationSize. (#83017 ) This is another smaller step of #70452, changing the signature of computeAliasing() from optional<uint64_t> to LocationSize, and follow-up changes in DAGCombiner::mayAlias(). There are some test change due to the previous AA->isNoAlias call incorrectly using an unknown size (~UINT64_T(0)). This should then be improved again in #70452 when the types are known to be scalable.	2024-02-28 09:43:05 +00:00
Craig Topper	62d0c01c2c	[SelectionDAG] Remove pointer from MMO for VP strided load/store. (#82667 ) MachineIR alias analysis assumes that only bytes after the pointer will be accessed. This is incorrect if the stride is negative. This is causing miscompiles in our downstream after SLP started making strided loads. Fixes #82657	2024-02-26 16:15:34 -08:00
Noah Goldstein	15a7de697a	[SelectionDAG] Support sign tracking through `{S\|U}INT_TO_FP` Just a minimal amount of easily provable tracking. Proofs: https://alive2.llvm.org/ce/z/RQYbdw Closes #82808 Alive2 to has an issue with `(sitofp i1)`, but it can be verified by hand: https://godbolt.org/z/qKr7hT7s9	2024-02-26 15:35:38 -06:00
Craig Topper	f1bb88bee2	[RISCV] Use PromoteSetCCOperands to promote operands for UMAX/UMIN during type legalization. (#82716 ) For RISC-V, we were always choosing to sign extend when promoting i32->i64. If the promoted inputs happen to be zero extended already, we should use zero extend instead. This is what we do for SETCC.	2024-02-26 10:31:58 -08:00
David Green	257cbea20d	[DAG] Format DAGCombiner::mayAlias. NFC	2024-02-26 18:22:35 +00:00
Yeting Kuo	e510fc7753	[VP][RISCV] Introduce vp.lrint/llrint and RISC-V support. (#82627 ) RISC-V implements vector lrint/llrint by vfcvt.x.f.v.	2024-02-26 16:37:41 +08:00
Owen Anderson	2c5a68858b	Fix non-splat vector SREM expansion when one of the divisors is a power of two. (#82706 ) The expansion previously used, derived from Hacker's Delight, does not work correctly when the dividend is INT_MIN and the divisor is a power of two. We now use an alternate derivation of the A and Q constants specifically for the power-of-two divisor case to avoid this problem. Credit to Fabian Giesen for the new derivation. Fixes https://github.com/llvm/llvm-project/issues/77169	2024-02-25 10:13:05 -05:00
Craig Topper	962a6970f2	[SelectionDAG] Remove unused VP strided load/store creation functions that build an MMO. (#82676 ) The base case of these call InferPtrInfo. This is dangerous due to #82657, but it turns out none of these are used. It seemed best to reduce the surface area until these are needed.	2024-02-23 10:15:49 -08:00
Orlando Cazalet-Hyams	8a16422020	[RemoveDIs] Add DPLabels support [3a/3] (#82633 ) Patch 2 of 3 to add llvm.dbg.label support to the RemoveDIs project. The patch stack adds the DPLabel class, which is the RemoveDIs llvm.dbg.label equivalent. 1. Add DbgRecord base class for DPValue and the not-yet-added DPLabel class. 2. Add the DPLabel class. -> 3. Add support to passes. The next patch, #82639, will enable conversion between dbg.labels and DPLabels. AssignemntTrackingAnalysis support could have gone two ways: 1. Have the analysis store a DPLabel representation in its results - SelectionDAGBuilder reads the analysis results and ignores all DbgRecord kinds. 2. Ignore DPLabels in the analysis - SelectionDAGBuilder reads the analysis results but still needs to iterate over DPLabels from the IR. I went with option 2 because it's less work and is no less correct than 1. It's worth noting that causes labels to sink to the bottom of packs of debug records. e.g., [value, label, value] becomes [value, value, label]. This shouldn't be a problem because labels and variable locations don't have an ordering requirement. The ordering between variable locations is maintained and the label movement is deterministic	2024-02-23 11:37:21 +00:00
Yeting Kuo	850dde063b	[RISCV][VP] Introduce vp saturating addition/subtraction and RISC-V support. (#82370 ) This patch also pick the MatchContext framework from DAGCombiner to an indiviual header file to make the framework be used from other files in llvm/lib/CodeGen/SelectionDAG/.	2024-02-23 14:17:15 +08:00
Craig Topper	de41eae41f	[SelectionDAG][RISCV] Use FP type for legality query for LRINT/LLRINT in LegalizeVectorOps. (#82728 ) This matches how LRINT/LLRINT is queried for scalar types in LegalizeDAG. It's confusing if they do different things since a "Legal" vector LRINT/LLRINT would get through to LegalizeDAG which would then consider it illegal. This doesn't happen currently because RISC-V uses Custom.	2024-02-22 20:18:52 -08:00
Craig Topper	c1716e3fcf	[DAGCombiner][RISCV] CSE zext nneg and sext. (#82597 ) If we have a sext and a zext nneg with the same types and operand we should combine them into the sext. We can't go the other way because the nneg flag may only be valid in the context of the uses of the zext nneg.	2024-02-22 09:06:49 -08:00
David Majnemer	be36812fb7	[TargetLowering] Be more efficient in fp -> bf16 NaN conversions We can avoid masking completely as it is OK (and probably preferable) to bring over some of the existant NaN payload.	2024-02-21 22:47:27 +00:00
David Majnemer	9eff001d3d	[TargetLowering] Correctly yield NaN from FP_TO_BF16 We didn't set the exponent field, resulting in tiny numbers instead of NaNs.	2024-02-21 22:17:02 +00:00
David Majnemer	ddc0f1d8fe	[TargetLowering] Actually add the adjustment to the significand The logic was supposed to be choosing between {0, 1, -1} as an adjustment to the FP bit pattern. However, the adjustment itself was used as the bit pattern instead which result in garbage results.	2024-02-21 19:34:11 +00:00
David Majnemer	cc13f3ba45	Correctly round FP -> BF16 when SDAG expands such nodes (#82399 ) We did something pretty naive: - round FP64 -> BF16 by first rounding to FP32 - skip FP32 -> BF16 rounding entirely - taking the top 16 bits of a FP32 which will turn some NaNs into infinities Let's do this in a more principled way by rounding types with more precision than FP32 to FP32 using round-inexact-to-odd which will negate double rounding issues.	2024-02-21 12:37:02 -05:00
Paul Walker	28fb2b33c2	[LLVM][SelectionDAG] Reduce number of ComputeValueVTs variants. (#75614 ) This is another step in the direction of fixing the `Fixed(0) != Scalable(0)` bugbear, although whilst weird I don't believe it's causing us any real issues.	2024-02-21 13:03:24 +00:00
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
Orlando Cazalet-Hyams	ababa96475	[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252 ) Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The patch stack adds a new base class -> 1. Add DbgRecord base class for DPValue and the not-yet-added DPLabel class. 2. Add the DPLabel class. 3. Enable dbg.label conversion and add support to passes. Patches 1 and 2 are NFC. In the near future we also will rename DPValue to DbgVariableRecord and DPLabel to DbgLabelRecord, at which point we'll overhaul the function names too. The name DPLabel keeps things consistent for now.	2024-02-20 16:00:55 +00:00
Craig Topper	f8cbb67b10	[DAGCombiner] Preserve nneg flag from inner zext when we combine (z/s/aext (zext X)) (#82199 )	2024-02-19 12:21:17 -08:00
Craig Topper	f668a08e00	[DAGCombiner][RISCV] Optimize (zext nneg (truncate X)) if X has known sign bits. (#82227 ) This treats the zext nneg as sext if X is known to have sufficient sign bits to allow the zext or truncate or both to removed. This code is taken from the same optimization for sext.	2024-02-19 10:45:11 -08:00
Manish Kausik H	652081ca9e	[NFC][SelectionDAG] Move function `getStackAlignedMMO` to the beginning of LegalizeDAG.cpp (#82171 )	2024-02-19 19:11:42 +04:00
Tim Northover	0215d2c58b	arm64_32: extend @llvm.stackguard call to in-DAG 64-bits before handing off Pointers are 64-bits in the DAG, so we need to extend the result of loading the cookie when building the DAG.	2024-02-19 10:32:29 +00:00
Craig Topper	d5167c84f9	[DAGCombiner] Allow tryToFoldExtOfLoad to use a sextload for zext nneg. (#81714 ) If the load is used by any signed setccs, we can use a sextload instead of zextload. Then we don't have to give up on extending the load.	2024-02-17 11:37:13 -08:00
Craig Topper	d485317357	[TargetLowering] Emit SIGN_EXTEND_INREG instead of shift pair from optimizeSetCCOfSignedTruncationCheck. (#81785 ) sext_inreg is our canonical form of shift pair before op legalization so DAG combiner will probably create it anyway. If it isn't legal LegalizeDAG will expand to shifts later.	2024-02-15 09:24:02 -08:00
Simon Pilgrim	b279ca2783	[DAG] visitCTPOP - CTPOP(SHIFT(X)) -> CTPOP(X) iff the shift doesn't affect any non-zero bits If the source is being (logically) shifted, but doesn't affect any active bits, then we can call CTPOP on the shift source directly.	2024-02-15 10:41:08 +00:00
Craig Topper	86ce491f30	[DAGCombiner] Remove unneeded commonAlignment from reduceLoadWidth. (#81707 ) We already have the PtrOff factored into MachinePointerInfo. Any calls to getAlign on the new load with do commonAlignment with the MachinePointerInfo offset and the base alignment.	2024-02-13 23:26:25 -08:00
Craig Topper	e6253102a7	[DAGCombiner] Remove unnecessary commonAlignment from CombineExtLoad. (#81705 ) The getAlign function for a load returns the commonAlignment of the "base align" and the offset stored in the MachinePointerInfo. We're splitting a load here, so we should take the base alignment from the original load without any offset that may already exist in the original load. The new load can then maintain its own alignment using just the base alignment and its own offset. Noticed by inspection.	2024-02-13 23:26:08 -08:00
Heejin Ahn	473ef10b0f	[WebAssembly] Demote PHIs in catchswitch BB only (#81570 ) `DemoteCatchSwitchPHIOnly` option in `WinEHPrepare` pass was added in `99d60e0dab`, because Wasm EH uses `WinEHPrepare`, but it doesn't need to demote all PHIs. PHIs in `catchswitch` BBs have to be removed (= demoted) because `catchswitch`s are removed in ISel and `catchswitch` BBs are removed as well, so they can't have other instructions. But because Wasm EH doesn't use funclets, so PHIs in `catchpad` or `cleanuppad` BBs don't need to be demoted. That was the reason `DemoteCatchSwitchPHIOnly` option was added, in order not to demote more instructions unnecessarily. The problem is it should have been set to `true` for Wasm EH. (Its default value is `false` for WinEH) And I mistakenly set it to `false` and wasn't aware about this for more than 5 years. This was not the end of the world; it just means we've been demoting more instructions than we should, possibly huting code size. In practice I think it would've had hardly any effect in real performance given that the occurrence of PHIs in `catchpad` or `cleanuppad` BBs are not very frequent and many people run other optimizers like Binaryen anyway.	2024-02-13 13:43:21 -08:00
Danila Malyutin	e20462a069	[StatepointLowering] Use Constant instead of TargetConstant for undef value (#81635 ) Prevents isel errors when trying to lower gc relocate of undef value (which turns into CopyToReg of TargetConstant). Such relocates may occur after DCE (e.g. after GVN removes some dead blocks) if there are not passes like instcombine scheduled after to clean them up. Fixes #80294 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-02-13 21:58:01 +03:00
Joseph Huber	11fcae69db	[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime clocks (#81331 ) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.	2024-02-13 10:06:25 -06:00
Nikita Popov	25b9ed6e49	[DAGCombine] Fix multi-use miscompile in load combine (#81586 ) The load combine replaces a number of original loads with one new loads and also replaces the output chains of the original loads with the output chain of the new load. This is incorrect if the original load is retained (due to multi-use), as it may get incorrectly reordered. Fix this by using makeEquivalentMemoryOrdering() instead, which will create a TokenFactor with both chains. Fixes https://github.com/llvm/llvm-project/issues/80911.	2024-02-13 16:41:00 +01:00
Simon Pilgrim	d30e941a03	[DAG] Add SelectionDAG::getShiftAmountConstant APInt variant (#81484 ) Asserts that the shift amount is in range and update ExpandShiftByConstant to use getShiftAmountConstant (and legal shift amount types).	2024-02-13 08:06:16 +00:00
Simon Pilgrim	b35c519762	[DAG] tryToFoldExtendOfConstant - share the same SDLoc argument instead of recreating it over and over again.	2024-02-08 11:43:29 +00:00
Jeremy Morse	faa2f9658a	[DebugInfo] Handle dbg.assigns in FastISel (#80734 ) There are some rare circumstances where dbg.assign intrinsics can reach FastISel. They are a more specialised kind of dbg.value intrinsic with more information about the originating alloca. They only occur during optimisation, but might reach FastISel through always_inlining an optimised function into an optnone function. This is a slight problem as it's not safe (for debug-info accuracy) to ignore any intrinsics, and for RemoveDIs (the intrinsic-replacement project) it causes a crash through an unhandled switch case. To get around this, we can just treat the dbg.assign as a dbg.value (it's an actual subclass) and use the variable location information from the dbg.value fields. This loses a small amount of debug-info about stack locations, but is more accurate than just ignoring the intrinsic. (This has popped up deep in an LTO build of a large codebase while testing RemoveDIs, I figured it'd be good to fix it for the intrinsic-form at the same time, just to demonstrate the correct behaviour).	2024-02-08 10:44:43 +00:00
Luke Lau	ece66dbc60	[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR (#80452 ) This handles two cases where we can work out some known-zero bits for ISD::STEP_VECTOR. The first case handles when we know the low bits are zero because the step amount is a power of two. This is taken from https://reviews.llvm.org/D128159, and even though the original patch didn't end up landing this case due to it not having any test difference, I've included it here for completeness's sake. The second case handles the case when we have an upper bound on vscale_range. We can use this to work out the upper bound on the number of elements, and thus what the maximum step will be. From the maximum step we then know which hi bits are zero. On its own, computing the known hi bits results in some small improvements for RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping to be able to use this later to reduce the LMUL in index calculations for vrgather/indexed accesses. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2024-02-08 10:04:55 +08:00
Simon Pilgrim	de7beb06e7	[DAG] ExpandShiftWithKnownAmountBit - reduce number of repeated getOpcode / getOperand calls. NFC.	2024-02-07 14:07:02 +00:00
Simon Pilgrim	670c2529bb	[DAG] Use DAGCombiner::SimplifyDemandedBits wrappers with default (all) DemandedElts. NFC. Don't call TLI.SimplifyDemandedVectorElts directly from every SimplifyDemandedBits call, use the more expressive wrappers instead first. This reduces the number of places we call TLI.SimplifyDemandedVectorElts and CommitTargetLoweringOpt to make it easier to track. Part of the work to process DAG nodes in topological order.	2024-02-07 11:12:29 +00:00
Craig Topper	0fb9f68bae	[SelectionDAG] Use getRegisterType instead of getTypeToTransformTo in ComputePHILiveOutRegInfo. (#80773 ) Since we used getNumRegisters right before this, I think this is the correct interface we should be using here. I'm experimenting with making i32 legal on RISC-V 64, but using i64 for the register type between basic blocks. This was one of the first issues I found trying to do that.	2024-02-06 09:39:19 -08:00
David Green	2e3de997ab	[DAG] Generalize setcc(setcc) fold to use known bits. If we have a `SETCC (SETCC), 0, NE` and ZeroOrOneBooleanContent, we can remove the outer setcc as it will produce the same value as the inner. This can be generalized to anything where the top bits are known to be 0, as the value will remain as 1 or 0.	2024-02-06 12:39:48 +00:00
Simon Pilgrim	b8cdc2638e	[DAG] visitCTPOP - if only the upper half of the ctpop operand is zero then see if its profitable to only count the lower half. (#80473 )	2024-02-06 12:19:31 +00:00
Philip Reames	e722d9662d	[DAG] Avoid a crash when checking size of scalable type in visitANDLike Fixes https://github.com/llvm/llvm-project/issues/80744. This transform doesn't handled vectors at all, The fixed length ones pass the first check, but would fail the constant operand checks which immediate follow. This patch takes the simplest approach, and just guards the transform for scalar integers.	2024-02-05 14:30:10 -08:00
Craig Topper	6590d0fed5	[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342 ) If we have a shifted mask, we may be able to reduce the load width to the width of the non-zero part of the mask and use an offset to the base address to remove the srl. The offset is given by C+trailingzeros(ShiftedMask). Then we add a final shl to restore the trailing zero bits. I've use the ARM test because that's where the existing (and (srl (load))) tests were. The X86 test was modified to keep the H register.	2024-02-04 16:05:51 -08:00
Craig Topper	f72da9f4fd	[SelectionDAG] Use getShiftAmountConstant to simplify code. NFC (#80561 ) Replace calls to getShiftAmountTy+getConstant with getShiftAmountContant.	2024-02-04 16:05:14 -08:00
Simon Pilgrim	114a33be47	[DAG] getStackAlignedMMO - return the getMachineMemOperand result directly (style). NFC.	2024-02-04 14:01:55 +00:00
Manish Kausik H	a768bc6ef6	[SelectionDAG] Use unaligned store to move AVX registers onto stack for `extractelement` (#78422 ) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. Fixes #77730	2024-02-02 22:49:31 +05:30

1 2 3 4 5 ...

13322 Commits