llvm-project

Author	SHA1	Message	Date
Christian Kissig	730df5a437	[Support] Add KnownBits::computeForSubBorrow (#67788 ) - [Support] Add KnownBits::computeForSubBorrow - [CodeGen] Implement USUBC, USUBO_CARRY, and SSUBO_CARRY with KnownBits::computeForSubBorrow - [CodeGen] Compute unknown bits for Carry/Borrow for ADD/SUB - [CodeGen] Compute known bits of Carry/Borrow for UADDO, SADDO, USUBO, and SSUBO Fixes #65893 --------- Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com>	2023-10-18 13:48:47 +01:00
Paul Walker	675231eb09	[SVE ACLE] Allow default zero initialisation for svcount_t. (#69321 ) This matches the behaviour of the other SVE ACLE types.	2023-10-18 10:40:07 +01:00
Noah Goldstein	112e49b381	[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))` to use rotate or to getter constants. If `C0` is a mask and `C1` shifts out all the masked bits (to essentially compare two subsets of `X`), we can arbitrarily re-order shift as `srl` or `shl`. If `C1` (shift amount) is a power of 2, we can replace the and+shift with a rotate. Otherwise, based on target preference we can arbitrarily swap `shl` and `shl` in/out to get better constants. On x86 we can use this re-ordering to: 1) get better `and` constants for `C0` (zero extended moves or avoid imm64). 2) covert `srl` to `shl` if `shl` will be implementable with `lea` or `add` (both of which can be preferable). Proofs: https://alive2.llvm.org/ce/z/qzGM_w Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D152116	2023-10-18 01:16:55 -05:00
Pierre van Houtryve	c464fea779	[DAG] Constant fold FMAD (#69324 ) This has very little effect on codegen in practice, but is a nice to have I think. See #68315	2023-10-18 07:46:24 +02:00
Simon Pilgrim	2a40ec2d3e	[DAG] SimplifyDemandedBits - fix isOperationLegal typo in D146121 We need to check that the simplified ISD::SRL node is legal, not the old one Noticed while trying to isolate the regressions in D155472	2023-10-17 17:50:12 +01:00
Guozhi Wei	760e7d00d1	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-17 16:22:42 +00:00
Simon Pilgrim	2f329d88bc	[DAG] foldConstantFPMath - accept ArrayRef<SDValue> Ops instead of explicit N1/N2 ops First step towards adding unary/ternary fp ops handling, and not just binops	2023-10-17 16:31:46 +01:00
Arthur Eubanks	5fab20bc7e	[NFC] Move StableHashing.h from CodeGen to ADT (#67704 )	2023-10-16 10:42:22 -07:00
Kazu Hirata	0b570ad969	[CodeGen] Remove LiveVariables::{isPHIJoin,setPHIJoin} (#69128 ) The last use of isPHIJoin was removed by: commit fac770b865f59cbe615241dad153ad20d5138b9e Author: Jakob Stoklund Olesen <stoklund@2pi.dk> Date: Sat Feb 9 00:04:07 2013 +0000 so there is no reason to maintain PHIJoins.	2023-10-16 09:31:09 -07:00
Björn Pettersson	4acb96c99f	[SelectionDAG] Tidy up around endianness and isConstantSplat (#68212 ) The BuildVectorSDNode::isConstantSplat function could depend on endianness, and it takes a bool argument that can be used to indicate if big or little endian should be considered when internally casting from a vector to a scalar. However, that argument is default set to false (= little endian). And in many situations, even in target generic code such as DAGCombiner, the endianness isn't specified when using the function. The intent with this patch is to highlight that endianness doesn't matter, depending on the context in which the function is used. In DAGCombiner the code is slightly refactored. Back in the days when the code was written it wasn't possible to request a MinSplatBits size when calling isConstantSplat. Instead the code re-expanded the found SplatValue to match with the EltBitWidth. Now we can just provide EltBitWidth as MinSplatBits and remove the logic for doing the re-expand. While being at it, tidying up around isConstantSplat, this patch also adds an explicit check in BuildVectorSDNode::isConstantSplat to break out from the loop if trying to split an on VecWidth into two halves. Haven't been able to prove that there could be miscompiles involved if not doing so. There are lit tests that trigger that scenario, although I think they happen to later discard the returned SplatValue for other reasons.	2023-10-16 14:53:53 +02:00
Nikita Popov	d4300154b6	Revert "[ValueTracking] Remove by-ref computeKnownBits() overloads (NFC)" This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc. This causes some minor compile-time impact. Revert for now, better to do the change more gradually.	2023-10-16 14:04:09 +02:00
Nikita Popov	b5743d4798	[ValueTracking] Remove by-ref computeKnownBits() overloads (NFC) Remove the old overloads that accept KnownBits by reference, in favor of those that return it by value.	2023-10-16 13:00:31 +02:00
Carl Ritson	e1bb0598b2	[MachineBasicBlock] Fix use after free in SplitCriticalEdge (#68786 ) Remove use after free when attempting to update SlotIndexes in MachineBasicBlock::SplitCriticalEdge. Use MachineFunction delegate mechanism to capture target specific manipulations of branch instructions and update SlotIndexes.	2023-10-15 17:32:27 +09:00
Markus Böck	0ad92c0cbb	[StatepointLowering] Take return attributes of `gc.result` into account (#68439 ) The current lowering of statepoints does not take into account return attributes present on the `gc.result` leading to different code being generated than if one were to not use statepoints. These return attributes can affect the ABI which is why it is important that they are applied in the lowering.	2023-10-14 18:38:18 +02:00
Craig Topper	3750558ee1	[RISCV][GISel] Legalize G_SMULO/G_UMULO (#67635 ) Update `LegalizerHelper::widenScalarMulo` to not create a mulo if we aren't going to use the overflow flag. This prevents needing to legalize the widened operation. This generates better code when we need to make a libcall for multiply.	2023-10-13 20:34:45 -07:00
Kazu Hirata	6e8013a130	[llvm] Stop including llvm/ADT/StringMap.h (NFC) These source files do not use StringMap.	2023-10-13 20:09:33 -07:00
Yingwei Zheng	53c81a8c16	[RISCV][SDAG] Fix constant narrowing when narrowing loads (#69015 ) When narrowing logic ops(OR/XOR) with constant rhs, `DAGCombiner` will fixup the constant rhs node. It is incorrect when lhs is also a constant. For example, we will incorrectly replace `xor OpaqueConstant:i64<8191>, Constant:i64<-1>` with `xor (and OpaqueConstant:i64<8191>, Constant:i64<65535>), Constant:i64<-1>`. Fixes #68855.	2023-10-14 06:38:17 +08:00
Maurice Heumann	187e02fa2d	[CodeGenPrepare] Check types when unmerging GEPs across indirect branches (#68587 ) The optimization in CodeGenPrepare, where GEPs are unmerged across indirect branches must respect the types of both GEPs and their sizes when adjusting the indices. The sample here shows the bug: https://godbolt.org/z/8e9o5sYPP The value `%elementValuePtr` addresses the second field of the `%struct.Blub`. It is therefore a GEP with index 1 and type i8. The value `%nextArrayElement` addresses the next array element. It is therefore a GEP with index 1 and type `%struct.Blub`. Both values point to completely different addresses, even if the indices are the same, due to the types being different. However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast from `%elementValuePtr`, meaning both were treated as equal. The cause for this is that the unmerging optimization does not take types into consideration. It sees both GEPs have `%currentArrayElement` as source operand and therefore tries to rewrite `%nextArrayElement` in terms of `%elementValuePtr`. It changes the index to the difference of the two GEPs. As both indices are `1`, the difference is `0`. As the indices are `0` the GEP is later replaced with a simple bitcast in CodeGenPrepare. Before adjusting the indices, the types of the GEPs would have to be aligned and the indices scaled accordingly for the optimization to be correct. Due to the size of the struct being `16` and the `%elementValuePtr` pointing to offset `1`, the correct index for the unmerged `%nextArrayElement` would be 15. I assume this bug emerged from the opaque pointer change as GEPs like `%elementValuePtr` that access the struct field based of type i8 did not naturally occur before. In light of future migration to ptradd, simply not performing the optimization if the types mismatch should be sufficient.	2023-10-13 09:47:47 +02:00
Momchil Velikov	2ceabf6bdc	[MachineSink] Reduce the number of unnecessary invalidations of StoreInstrCache (NFC) (#68676 ) Don't invalidate the cache when erasing instructions which cannot ever appear in the cache.	2023-10-12 10:06:19 +01:00
Momchil Velikov	86d9faa5a9	[MachineSink] Use LLVM ADTs (NFC) (#68677 ) Replace a few uses of `std::map` with `llvm::DenseMap`.	2023-10-12 10:04:41 +01:00
Rahman Lavaee	28b9126879	[BasicBlockSections] Introduce the path cloning profile format to BasicBlockSectionsProfileReader. (#67214 ) Following up on prior RFC (https://lists.llvm.org/pipermail/llvm-dev/2020-September/145357.html) we can now improve above our highly-optimized basic-block-sections binary (e.g., 2% for clang) by applying path cloning. Cloning can improve performance by reducing taken branches. This patch prepares the profile format for applying cloning actions. The basic block cloning profile format extends the basic block sections profile in two ways. 1. Specifies the cloning paths with a 'p' specifier. For example, `p 1 4 5` specifies that blocks with BB ids 4 and 5 must be cloned along the edge 1 --> 4. 2. For each cloned block, it will appear in the cluster info as `<bb_id>.<clone_id>` where `clone_id` is the id associated with this clone. For example, the following profile specifies one cloned block (2) and determines its cluster position as well. ``` f foo p 1 2 c 0 1 2.1 3 2 5 ``` This patch keeps backward-compatibility (retains the behavior for old profile formats). This feature is only introduced for profile version >= 1.	2023-10-11 22:47:13 -07:00
weiguozhi	b6043f9867	[RA] Disable split around hint register if optimize for size (#68619 ) Split a virtual register with hint may generate COPY instructions in multiple cold basic blocks, and increase code size. So disable this split when the function is optimized for size.	2023-10-11 14:57:15 -07:00
Jay Foad	7ddf6e915c	[SlotIndexes] Use upper/lower bound terminology for MBB searches. NFC. (#68802 ) Rename advanceMBBIndex and findMBBIndex to getMBBLowerBound and add getMBBUpperBound. The motivations are: - Make it clear what kind of search is being done, using names inspired by std::upper/lower_bound. - Simplify getMBBFromIndex which really wants an upper bound search and previously had to work hard to get the result it wanted from a lower bound search.	2023-10-11 16:37:47 +01:00
chuongg3	d88d9834e9	[AArch64][GlobalISel] Support more types for TRUNC (#66927 ) G_TRUNC will get lowered into trunc(merge(trunc(unmerge), trunc(unmerge))) if the source is larger than 128 bits or the truncation is more than half of the current bit size. Now mirrors ZEXT/SEXT code more closely for vector types.	2023-10-11 16:05:25 +01:00
Jay Foad	fac4206e66	[EarlyIfConversion] Simplify condition after #65729	2023-10-11 10:53:12 +01:00
Jay Foad	05c16f40c9	[VirtRegMap] Simplify condition after #65729	2023-10-11 10:33:52 +01:00
Jay Foad	b78f3ea7df	Clean up strange uses of getAnalysisIfAvailable (#65729 ) After a pass calls addRequired<X>() it is strange to call getAnalysisIfAvailable<X>() because analysis X should always be available. Use getAnalysis<X>() instead.	2023-10-11 09:53:00 +01:00
Fangrui Song	2d854dd3e7	Move global namespace cl::opt inside llvm:: or internalize them	2023-10-10 19:58:03 -07:00
Serge Pavlov	462d5830da	[GlobalISel] Add support for *_fpmode intrinsics The change implements support of the intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` in Global Instruction Selector. Now they are lowered into library function calls. Differential Revision: https://reviews.llvm.org/D158260	2023-10-09 21:14:07 +07:00
Hendrik Greving	2600aaab21	Revert "[MachineLICM] Relax overlay conservative PHI check (#67186 )" (#68580 ) This reverts commit 71a8d2e3064fcb3ff76565e6e8529613f90aa51b.	2023-10-09 05:26:58 -07:00
LiqinWeng	111c7c1d07	[VP] IR expansion for bitreverse/bswap (#68504 )	2023-10-09 19:59:52 +08:00
Hendrik Greving	71a8d2e306	[MachineLICM] Relax overlay conservative PHI check (#67186 ) Skip LICM if PHI belongs to the current loop, e.g. is in the loop's header. This prevents LICM from bailing for CFGs like L1: R = LoopInvariant // can be LICM'd BR L1 L2: PHI(R, ..) BR L2	2023-10-09 04:49:11 -07:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Jie Fu	573a083c1c	[DAG] Remove unused variable 'VT' in DAGCombiner.cpp (NFC) /llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26896:7: error: unused variable 'VT' [-Werror,-Wunused-variable] EVT VT = N->getValueType(0); ^ 1 error generated.	2023-10-09 18:30:38 +08:00
Simon Pilgrim	072675f14e	[DAG] foldSelectOfBinops - correctly handle select of binops where ResNo != 0 Correctly handle cases where the select(cond, binop(x, y), binop(z, y)) --> binop(select(cond, x, z), y) fold is selecting ResNo != 0 results (UADDO flags etc.) Fixes #68539	2023-10-09 11:08:55 +01:00
Kazu Hirata	d7b18d5083	Use llvm::endianness{,::little,::native} (NFC) Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces llvm::support::endianness with llvm::endianness.	2023-10-09 00:54:47 -07:00
LiqinWeng	32f7197765	[VP] Use the interface of 'getFunctionalIntrinsicID' to get the non-p… (#68508 ) …redicated Intrinsic ID	2023-10-08 18:14:48 +08:00
Amara Emerson	7510f32f90	[MachineSink] Fix crash due to use-after-free in a MachineInstr* cache. After the SinkAndFold optimization was enabled, we saw some crashes with GISel due to SinkAndFold erasing an MI while a reference was being held in a cache.	2023-10-06 15:02:39 -07:00
Kazu Hirata	e9fa18878c	[SelectionDAG] Fix an unused variable warning This patch fixes: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:10832:12: error: variable 'Changed' set but not used [-Werror,-Wunused-but-set-variable]	2023-10-06 09:27:35 -07:00
Ben Mudd	6d6b395b53	[DebugInfo][SelectionDAG] Add debug info salvaging for TRUNC nodes This patch adds support for salvaging TRUNC nodes during SelectionDAG, fixing LLVM issue #63076: https://github.com/llvm/llvm-project/issues/63076 Reviewed in: https://github.com/llvm/llvm-project/pull/66922	2023-10-06 16:10:33 +01:00
Petar Avramovic	2fa7d652d0	AMDGPU: Fix temporal divergence introduced by machine-sink (#67456 ) Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.	2023-10-06 15:00:08 +02:00
Petar Avramovic	ccf68ab432	Revert "MachineSink: Fix sinking VGPR def out of a divergent loop" This reverts commit 3f8ef57bede94445b1a1042c987cc914a886e7ff.	2023-10-06 15:00:08 +02:00
Matthias Braun	2e26d09106	BlockFrequencyInfo: Add PrintBlockFreq helper (#67512 ) - Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.	2023-10-05 18:26:50 -07:00
Matt Arsenault	5e15997291	MachineFunctionPass: Clear properties before running function (#67962 ) This ensures !isSSA checks in the function work if the input MIR happened to appear as SSA.	2023-10-05 15:11:47 -07:00
Nico Weber	f320065aeb	Revert "[LLVM][DWARF] Add support for monolithic types in .debug_names (#68131 )" This reverts commit 9bbd2bf654634cd95dd0be7948ec8402c3c76e1e. Accidental commit: https://github.com/llvm/llvm-project/pull/68131#issuecomment-1749430207	2023-10-05 14:47:04 -04:00
Matthias Braun	5181156b37	Use BlockFrequency type in more places (NFC) (#68266 ) The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.	2023-10-05 11:40:17 -07:00
Alexander Yermolovich	9bbd2bf654	[LLVM][DWARF] Add support for monolithic types in .debug_names (#68131 ) Added support for Type Units in monolithic DWARF in .debug_names.	2023-10-05 11:14:18 -07:00
Alexey Bataev	e22818d5c9	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-05 06:17:07 -07:00
Kirill Stoimenov	0a776996af	Revert "[DAG] Attempt shl narrowing in SimplifyDemandedBits" This reverts commit 7a8c04ef84ecdab4390b451d4c2fe17bc45a7b63.	2023-10-04 22:15:41 +00:00

1 2 3 4 5 ...

34787 Commits