llvm-project

Author	SHA1	Message	Date
Matt Arsenault	bc7d88faf1	CodeGen: Disable isCopyInstrImpl if there are implicit operands This is a conservative workaround for broken liveness tracking of SUBREG_TO_REG to speculatively fix all targets. The current reported failures are on X86 only, but this issue should appear for all targets that use SUBREG_TO_REG. The next minimally correct refinement would be to disallow only implicit defs. The coalescer now introduces implicit-defs of the super register to track the dependency on other subregisters. If we see such an implicit operand, we cannot simply treat the subregister def as the result operand in case downstream users depend on the implicitly defined parts. Really target implementations should be considering the implicit defs and trying to interpret them appropriately (maybe with some generic helpers). The full implicit def could possibly be reported as the move result, rather than the subregister def but that requires additional work. Hopefully fixes #64060 as well. This needs to be applied to the release branch. https://reviews.llvm.org/D156346	2023-10-02 15:16:40 +03:00
Simon Pilgrim	2984e3529b	[X86] matchIndexRecursively - fold zext(addlike(shl_nuw(x,c1),c2) patterns into LEA Pulled out of D155472 - handle zeroextended scaled address indices	2023-10-02 12:38:25 +01:00
Simon Pilgrim	2908142089	[X86] Add test coverage for zext(or(shl_nuw(x,c1),c2)) pointer math Additional test coverage for D155472	2023-10-02 12:38:25 +01:00
JP Lehr	e816c89c84	Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills" This reverts commit d8127b2ba8a87a610851b9a462f2fc2526c36e37.	2023-10-02 06:26:33 -05:00
Matt Arsenault	414ff812d6	RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG Currently coalescing with SUBREG_TO_REG introduces an invisible load bearing undef. There is liveness for the super register not represented in the MIR. This is part 1 of a fix for regressions that appeared after b7836d856206ec39509d42529f958c920368166b. The allocator started recognizing undef-def subregister MOVs as copies. Since there was no representation for the dependency on the high bits, different undef segments of the super register ended up disconnected and downstream users ended up observing different undefs than they did previously. This does not yet fix the regression. The isCopyInstr handling needs to start handling implicit-defs on any instruction. I wanted to include an end to end IR test since the actual failure only appeared with an interaction between the coalescer and the allocator. It's a bit bigger than I'd like but I'm having a bit of trouble reducing it to something which definitely shows a diff that's meaningful. The same problem likely exists everywhere trying to do anything with SUBREG_TO_REG. I don't understand how this managed to be broken for so long. This needs to be applied to the release branch. https://reviews.llvm.org/D156345	2023-10-02 13:57:09 +03:00
Matt Arsenault	e28708d4f0	RegisterCoalescer: Avoid redundant implicit-def on rematerialize If this was coalescing a def of a subregister with a def of the super register, it was introducing a redundant super-register def and marking the subregister def as dead. Resulting in something like: dead $eax = MOVr0, implicit-def $rax, implicit-def $rax Avoid this by checking if the new instruction already has the super def, so we end up with this instead: dead $eax = MOVr0, implicit-def $rax The dead flag looks suspicious to me, seems like it's easy to buggily interpret dead def of subreg and a non-dead def of an aliasing register. It seems to be intentional though. https://reviews.llvm.org/D156343	2023-10-02 13:33:52 +03:00
Matt Arsenault	b1295dd5c9	RegisterCoalescer: Handle implicit-def of a super register when rematerializing Permit an implicit-def of a virtual register when rematerializing if it defines a super register of a subregister def. The rematerialization pre-legality check should really have been checking the implicit operands, but that should be fixed separately. https://reviews.llvm.org/D156331	2023-10-02 13:11:22 +03:00
Matt Arsenault	274ba2c910	RegisterCoalescer: Add new rematerializing with subregister tests None of the existing MIR tests seem to be directly targeting this situation.	2023-10-02 12:38:46 +03:00
David Green	aacefaf1cc	[AArch64] Move fcopysign to fcopysign-noneon. NFC	2023-10-02 08:03:34 +01:00
Philip Reames	f0505c3dbe	[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821 ) This change adds two related DAG combines which together will take a left-reduce scalar add tree of an explode_vector, and will incrementally form a vector reduction of the vector prefix. If the entire vector is reduced, the result will be a reduction over the entire vector. Profitability wise, this relies on vredsum being cheaper than a pair of extracts and scalar add. Given vredsum is linear in LMUL, and the vslidedown required for the extract is also linear in LMUL, this is clearly true at higher index values. At N=2, it's a bit questionable, but I think the vredsum form is probably a better canonical form anyways. Note that this only matches left reduces. This happens to be the motivating example I have (from spec2017 x264). This approach could be generalized to handle right reduces without much effort, and could be generalized to handle any reduce whose tree starts with adjacent elements if desired. The approach fails for a reduce such as (A+C)+(B+D) because we can't find a root to start the reduce with without scanning the entire associative add expression. We could maybe explore using masked reduces for the root node, but that seems of questionable profitability. (As in, worth questioning - I haven't explored in any detail.) This is covering up a deficiency in SLP. If SLP encounters the scalar form of reduce_or(A) + reduce_sum(a) where a is some common vectorizeable tree, SLP will sometimes fail to revisit one of the reductions after vectorizing the other. Fixing this in SLP is hard, and there's no good reason not to handle the easy cases in the backend. Another option here would be to do this in VectorCombine or generic DAG. I chose not to as the profitability of the non-legal typed prefix cases is very target dependent. I think this makes sense as a starting point, even if we move it elsewhere later. This is currently restructed only to add reduces, but obviously makes sense for any associative reduction operator. Once this is approved, I plan to extend it in this manner. I'm simply staging work in case we decide to go in another direction.	2023-10-01 17:42:07 -07:00
Simon Pilgrim	632022e61c	[AArch64] aarch64-saturating-arithmetic.ll - refresh test missed in #67890	2023-10-01 15:39:24 +01:00
elhewaty	9103b1d68d	[DAG] Extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub implementations with ConstantRange (#67890 ) - Add tests for computeOverflowFor*Sub functions - extend the computeOverflowForSignedSub/computeOverflowForUnsignedSub implementations with ConstantRange (#37109)	2023-10-01 14:57:34 +01:00
Simon Pilgrim	04b403d8cc	[X86] combineConcatVectorOps - only concatenate single-use subops We could maybe extend this by allowing the lowest subop to have multiple uses and extract the lowest subvector result of the concatenated op, but let's just get the fix in first. Fixes #67333	2023-10-01 14:27:55 +01:00
Matt Arsenault	d8127b2ba8	InlineSpiller: Consider if all subranges are the same when avoiding redundant spills This avoids some redundant spills of subranges, and avoids a compile failure. This greatly reduces the numbers of spills in a loop. The main range is not informative when multiple instructions are needed to fully define a register. A common scenario is a lowered reg_sequence where every subregister is sequentially defined, but each def changes the main range's value number. If we look at specific lanes at the use index, we can see the value is actually the same. In this testcase, there are a large number of materialized 64-bit constant defs which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES, which is not considered rematerializable inside the loop. After coalescing, the split constant defs produce main ranges with an apparent phi def. There's no phi def if you look at each individual subrange, and only half of the register is really redefined to a constant. Fixes: SWDEV-380865 https://reviews.llvm.org/D147079	2023-10-01 11:37:53 +03:00
Matt Arsenault	7252787dd9	RegAllocGreedy: Fix detection of lanes read by a bundle SplitKit creates questionably formed bundles of copies when it needs to copy a subset of live lanes and can't do it with a single subregister index. These are merely marked as part of a bundle, and don't start with a BUNDLE instruction. Queries for the slot index would give the first copy in the bundle, and we need to inspect the operands of all the other bundled copies. Also fix and simplify detection of read lane subsets. This causes some RISCV test regressions, but these look like accidentally beneficial splits. I don't see a subrange based reason to perform these splits. Avoids some really ugly regressions in a future patch. https://reviews.llvm.org/D146859	2023-10-01 11:37:48 +03:00
Christian Sigg	5b7a7ec5a2	[NVPTX] Fix code generation for `trap-unreachable`. (#67478 ) https://reviews.llvm.org/D152789 added an `exit` op before each `unreachable`. This means we never get to the `trap` instruction. This change limits the insertion of `exit` instructions to the cases where `unreachable` is not lowered to `trap`. Trap itself is changed to be emitted as `trap; exit;` to convey to `ptxas` that it exits the CFG.	2023-10-01 07:59:24 +02:00
Craig Topper	e39727d41f	[RISCV][GISel] Legalize G_SADDO/G_SSUBO/G_UADDO/G_USUBO. (#67615 )	2023-09-30 11:15:05 -07:00
David Green	f71ad19c04	[AArch64] Add a target feature for AArch64StorePairSuppress The AArch64StorePairSuppress pass prevents the creation of STP under some heuristics. Unfortunately it often prevents the creation of STP in cases where it is obviously beneficial, and it doesn't match my understanding of scheduling/cpu pipelining to prevent the creation of STP. From some benchmarking, even on an in-order cpu where the scheduling is most important I don't see it giving better results. In general the lower instruction count for STP would be expected to give a slightly better cycle count. As the pass specifically mentions the cyclone cpu, this patch adds a target feature for FeatureStorePairSuppress, enabled for all the non-Arm cpus. This has the effect of disabling it for all Arm cpus. Differential Revision: https://reviews.llvm.org/D134646	2023-09-30 11:40:26 +01:00
Mircea Trofin	b6e568da66	[mlgo] fix test post #67826	2023-09-29 18:24:35 -07:00
Mircea Trofin	f179486204	[AsmPrint] Correctly factor function entry count when dumping MBB frequencies (#67826 ) The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test. We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.	2023-09-29 18:06:53 -07:00
Arthur Eubanks	b915f60678	[CodeGen] Don't treat thread local globals as large data (#67764 ) Otherwise they may mistakenly get the large section flag.	2023-09-29 12:56:53 -07:00
Visoiu Mistrih Francis	cc9ba5600e	[test] -march -> -mtriple (#67741 ) Similar to 806761a	2023-09-29 10:43:23 -07:00
Fangrui Song	d20190e684	[test] Change llc -march=aarch64\|arm64 to -mtriple=aarch64\|arm64 Similar to commit 806761a7629df268c8aed49657aeccffa6bca449 to avoid issues due to object file format differences. These tests are currently benign.	2023-09-29 10:13:06 -07:00
David Green	1610311a95	[AArch64] Fixes for BigEndian 128bit volatile, atomic and non-temporal loads/stores This fixes up the generation of 128bit atomic, volatile and non-temporal loads/stores, under the assumption that they should usually be the same as standard versions. https://godbolt.org/z/xxc89eMKE Fixes #64580 Closes #67413	2023-09-29 17:21:19 +01:00
Jay Foad	6e3d2a4b38	[ISel] Fix another crash in new FMA DAG combine (#67818 ) Following on from D135150, this patch fixes another crash caused by this DAG combine: fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E) The combine calls ReplaceAllUsesOfValueWith to replace (fmul C, D) with (fma C, D, E). This can cause nodes to get CSEd. In D135150 the problem was that the (fma C, D, E) node got CSEd away. In this new case, the problem is that the outer fadd node gets CSEd away. To fix it we have to return SDValue(N, 0) from the combine and be careful not to add a deleted node to the worklist.	2023-09-29 17:18:23 +01:00
Matthew Devereau	6f5b372d59	[AArch64][SME2][SVE2p1] Add PNR_3b regclass (#67785 ) This patch adds the PNR_3b regclass for predicate-as-counter registers 0-7 and allows the Upl ASM constraint to use this register class.	2023-09-29 16:17:31 +01:00
Philip Reames	cd03d97043	[RISCV] Add test coverage for sum reduction recognition in DAG And adjust an existing test to not be a simple reduction to preserve test intent.	2023-09-29 07:54:55 -07:00
Nikita Popov	4251aa7a6f	[IRBuilder] Migrate most casts to folding API Migrate creation of most casts to use the FoldXYZ rather than CreateXYZ style APIs. This means that InstSimplifyFolder now works for these, which is what accounts for the AMDGPU test changes.	2023-09-29 12:40:38 +02:00
Mirko Brkušanin	2cd2445c21	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461 ) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.	2023-09-29 11:54:49 +02:00
Matthew Devereau	0d328e3875	[AArch64][SME] Use PNR Reg classes for predicate constraint (#67606 ) This patch fixes an error where ASM with constraints cannot select SME instructions which use the top eight predicate-as-counter registers.	2023-09-29 10:33:25 +01:00
Simon Pilgrim	5d7672b98e	[X86] combine-subo.ll - add common CHECK prefix	2023-09-29 10:31:38 +01:00
Simon Pilgrim	956ae7cf8d	[X86] combine-addo.ll - add common CHECK prefix	2023-09-29 10:31:38 +01:00
Momchil Velikov	b454b04d68	[AArch64] Fix a compiler crash in MachineSink (#67705 ) There were a couple of issues with maintaining register def/uses held in `MachineRegisterInfo`: * when an operand is changed from one register to another, the corresponding instruction must already be inserted into the function, or MRI won't be updated * when traversing the set of all uses of a register, that set must not change	2023-09-29 09:29:20 +01:00
David Green	7cc83c5a18	[AArch64] Don't expand RSHRN intrinsics to add+srl+trunc. We expand aarch64_neon_rshrn intrinsics to trunc(srl(add)), having tablegen patterns to combine the results back into rshrn. See D140297. Unfortunately, but perhaps not surprisingly, other combines can happen that prevent us converting back. For example sext(rshrn) becomes sext(trunc(srl(add))) which will turn into sext_inreg(srl(add))). This patch just prevents the expansion of rshrn intrinsics, reinstating the old tablegen patterns for selecting them. This should allow us to still regognize the rshrn instructions from trunc+shift+add, without performing any negative optimizations for the intrinsics. Closes #67451	2023-09-29 08:26:32 +01:00
Jakub Chlanda	3f8d4a8ef2	Reland [NVPTX] Add support for maxclusterrank in launch_bounds (#66496 ) (#67667 ) This reverts commit 0afbcb20fd908f8bf9073697423da097be7db592.	2023-09-29 08:39:31 +02:00
Yashwant Singh	7ac532efc8	[AMDGPU] Introduce AMDGPU::SGPR_SPILL asm comment flag (#67091 ) Use this flag to give more context to implicit def comments in assembly. Reviewed on phabricator: https://reviews.llvm.org/D153754	2023-09-29 11:15:01 +05:30
Tobias Stadler	305fbc1b32	Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND" This reverts commit 3686a0b611c65f0d7190345b8e3e73cdca9fa657. This seems to have broken some sanitizer tests: https://lab.llvm.org/buildbot/#/builders/184/builds/7721	2023-09-29 03:35:40 +02:00
Tobias Stadler	3686a0b611	[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND The legalizer currently generates lots of G_AND artifacts. For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary. Currently these artifacts have to be removed using post-legalize combines. Omitting these artifacts at their source in the artifact combiner has a few advantages: - We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it. - The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register). - This cleans up a lot of legalizer output and even improves compilation-times. AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count). Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth. This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D159140	2023-09-29 02:11:57 +02:00
Jay Foad	c3939eb827	[AMDGPU] Fix typo in scheduler option name (#67661 ) Fix: -amdgpu-disable-unclustred-high-rp-reschedule Now: -amdgpu-disable-unclustered-high-rp-reschedule	2023-09-28 20:54:57 +01:00
Noah Goldstein	de7881ebf5	[DAGCombiner] Combine `(select c, (and X, 1), 0)` -> `(and (zext c), X)` The middle end canonicalizes: `(and (zext c), X)` -> `(select c, (and X, 1), 0)` But the `and` + `zext` form gets better codegen.	2023-09-28 13:46:46 -05:00
Noah Goldstein	e3e9c94006	[X86][AArch64][RISCV] Add tests for combining `(select c, (and X, 1), 0)` -> `(and (zext c), X)`; NFC	2023-09-28 13:46:46 -05:00
Hiroshi Yamauchi	0ecd8846ae	[AArch64][Win] Emit SEH instructions for the swift async context-related instructions in the prologue and the epilogue. (#66967 ) This fixes an error from checkARM64Instructions() in MCWin64EH.cpp.	2023-09-28 09:43:39 -07:00
Jay Foad	fb32baf0ec	[ARM] Make some test checks more robust This makes some tests robust against minor codegen differences that will be caused by PR #67038.	2023-09-28 14:26:13 +01:00
Tuan Chuong Goh	c381cea873	[AArch64] Fixup test for G_VECREDUCE_ADD Fix test since the review was created	2023-09-28 12:52:17 +00:00
Jay Foad	01aa0c776d	[SPARC] Add a missing SPARC64-LABEL check	2023-09-28 13:15:09 +01:00
Jay Foad	a0a06b1804	[AMDGPU] Make a check slightly more robust Previously this was relying on [[RESULT]] having been defined in an earlier function.	2023-09-28 13:09:51 +01:00
chuongg3	140a094f5f	[AArch64][GlobalISel] More type support for G_VECREDUCE_ADD (#67433 ) G_VECREDUCE_ADD is now able to have v4i16 and v8i8 vector types as source registers	2023-09-28 11:47:26 +01:00
Luke Lau	b14f6eebc9	[RISCV] Fix crash when lowering fixed length insert_subvector into undef at 0 (#67535 ) This fixes a crash seen in https://github.com/openxla/iree/issues/15038 and elsewhere. We were reducing the LMUL for inserts into undef at 0 without inserting it back into the original LMUL at the end. But we don't actually perform the slidedown in this path, so we can just skip reducing LMUL here.	2023-09-28 10:22:16 +01:00
Kishan Parmar	696ea67f19	Disable call to fma for soft-float PowerPC backend generate calls to libc function calls for soft-float, regardless of the -nostdlib /-ffreestanding flag. fma is not a function provided by compiler-rt builtins and thus should not be generated here. PR : [[ https://github.com/llvm/llvm-project/issues/55230 \| #55230 ]] Below is patch given by @nemanjai Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D156344	2023-09-28 14:06:54 +05:30
Qiu Chaofan	cc627828f5	Pre-commit some PowerPC test cases	2023-09-28 15:51:14 +08:00

1 2 3 4 5 ...

50250 Commits