llvm-project

Author	SHA1	Message	Date
Zhiyao Ma	7e8af2fc0c	[ARM] Support -mexecute-only with -mlong-calls. Instead of using constant pools, use movw movt pair. Differential Revision: https://reviews.llvm.org/D136203	2022-10-24 11:41:24 -07:00
Craig Topper	db25f51e37	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d. The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but passes locally. I'm really confused.	2022-10-22 22:50:43 -07:00
Craig Topper	e8b3ffa532	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-22 21:51:45 -07:00
Simon Tatham	526ce9c929	Propagate tied operands when copying a MachineInstr. MachineInstr's copy constructor works by calling the addOperand method to add each operand of the old MachineInstr to the new one, one by one. But addOperand deliberately avoids trying to replicate ties between operands, on the grounds that the tie refers to operands by index, and the indices aren't necessarily finalized yet. This led to a code generation fault when the machine pipeliner cloned an Arm conditional instruction, and lost the tie between the output register and the input value to be used when the condition failed to execute. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D135434	2022-10-13 09:40:35 +01:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Simon Tatham	0648e42e52	[NFC] Pre-commit tests for D135434. pipeliner-preserve-ties.mir demonstrates a current bug in which the output of the Modulo Software Pipelining pass has left off a tie between operands in the conditional `t2ADDri` instruction. It should look like this: %19:rgpr = t2ADDri %1, 1, 1 /* CC::ne */, $cpsr, $noreg, implicit %1(tied-def 0) in which the final input operand is tied to the output, because that's the input that will become the output value if the conditionalized add instruction does not execute, and hence, must necessarily be whatever was in the output register beforehand. In the input to the pipeliner, those `tied-def` specifications are present and correct. But when the pipeliner clones MachineInstrs, it loses them. pipeliner-inlineasm.mir does not demonstrate any bug: the output is already correct, because of compensation code in the machine pipeliner that applies only to INLINEASM instructions. But no test previously exercised that code, so I add one now before making changes in that area.	2022-10-11 13:27:20 +01:00
Filipp Zhinkin	945a1468c9	[ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions. Related issue: https://github.com/llvm/llvm-project/issues/57122 Reviewed By: efriedma, samtebbs Differential Revision: https://reviews.llvm.org/D131786	2022-10-01 12:41:37 +03:00
Momchil Velikov	6602110152	[ARM] Enable and/cmp0 folding The `CodeGenPrepare` pass can sink bitwise `and` used by compare to zero into the basic blocks where the users are. This operation is guarded by lowering hook, which is disabled for ARM. In the ARM architecture versions from v7-M up these two operations can be folded into `tst rN, #imm` instruction. Sinking of `and` can also enable the cmov-to-bfi DAG combiner. This patch fixes some benchmark regressions caused by https://reviews.llvm.org/D129370 as well scoring slightly better overall. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D134360	2022-09-26 11:31:23 +01:00
Yuta Mukai	116838b151	[MachinePipeliner] Fix the interpretation of the scheduling model The method of counting resource consumption is modified to be based on "Cycles" value when DFA is not used. The calculation of ResMII is modified to total "Cycles" and divide it by the number of units for each resource. Previously, ResMII was excessive because it was assumed that resources were consumed for the cycles of "Latency" value. The method of resource reservation is modified similarly. When a value of "Cycles" is larger than 1, the resource is considered to be consumed by 1 for cycles of its length from the scheduled cycle. To realize this, ResourceManager maintains a resource table for all slots. Previously, resource consumption was always 1 for 1 cycle regardless of the value of "Cycles" or "Latency". In addition, the number of micro operations per cycle is modified to be constrained by "IssueWidth". To disable the constraint, --pipeliner-force-issue-width=100 can be used. For the case of using DFA, the scheduling results are unchanged. Reviewed By: dpenry Differential Revision: https://reviews.llvm.org/D133572	2022-09-16 09:51:48 +09:00
Matt Arsenault	e30271169f	RegAllocGreedy: Try local instruction splitting with subranges This was only trying this to relax register class constraints, but this can also help if there are subranges involved. This solves a compilation failure for AMDGPU when there is high pressure created by large register tuples. If one virtual register is using most of the available budget, we need to be able to evict subranges. This solves the immediate failure, but this solution leaves a lot to be desired. In the relevant testcases, we have 32-element tuples but most of the uses are operations on 1 element subranges of it. What we're now getting is a spill and restore of the full 1024 bits and an extract of the used 32-bits. It would be far better if we introduced a copy to a new virtual register with a smaller register class and used narrower spills. Furthermore, we could probably do a better job if the allocator were to introduce new subranges where none previously existed in the highest pressure scenarios. The block and region splits should also try to split specific subranges out. The mve-vst3.ll test changes looks like noise to me, but instruction count increased by one. mve-vst4.ll looks like a solid improvement with several 16-byte spills eliminated. splitkit-copy-live-lanes.mir also shows a solid reduction in total spill count. This could use more tests but it's pretty tiring to come up with cases that fail on this.	2022-09-12 09:03:55 -04:00
David Penry	ced705c440	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reapplication of D128941/(reversion:D132037) with small fix. Differential Revision: https://reviews.llvm.org/D132170	2022-08-22 12:10:13 -07:00
David Penry	1c9f0408bc	Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules" This reverts commit 8c4aea438c310816bb4e4f9a32d783381ef3182e. Needed because buildbot failures (warnings) gave a clue that there was a functional bug in the ARM rejection logic. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132037	2022-08-17 09:32:43 -07:00
David Penry	8c4aea438c	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128941	2022-08-17 08:13:26 -07:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
Sanjay Patel	02b3a35892	[InstSimplify] fold FP rounding intrinsic with rounded operand issue #56775 I rearranged the Thumb2 codegen test to avoid simplifying the chain of rounding instructions. I'm assuming the intent of the test is to verify lowering of each of those intrinsics.	2022-07-31 10:00:27 -04:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
David Green	4704da1374	[ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP Given a patch like D129506, using instructions not valid for the current target feature set becomes an error. This fixes an issue in ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in Thumb1Only code, such as thumbv8m.baseline targets. Differential Revision: https://reviews.llvm.org/D129695	2022-07-20 12:04:22 +01:00
Simon Pilgrim	71c502cbca	[DAG] Call SimplifyDemandedBits from ISD::MUL nodes Noticed while triaging D129765.	2022-07-19 14:11:04 +01:00
David Green	6cb9529001	[ARM] Remove VBICimm if no cleared bits are demanded If none of the bits of a VBICimm are demanded, we can remove the node entirely using the input operand instead. Differential Revision: https://reviews.llvm.org/D129966	2022-07-19 11:53:47 +01:00
David Green	cb806ce2aa	[ARM] Guard VMOVH and VINS patterns. These instructions are only available when fp is available, so cannot be used with just +mve. Add predicates to ensure we fall-back under the right circumstances.	2022-07-17 21:26:49 +01:00
Craig Topper	dcfc1fd26f	[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount. If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules. This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example. Reviewed By: spatel, foad Differential Revision: https://reviews.llvm.org/D121833	2022-07-14 16:10:14 -07:00
David Green	6c46b3d65b	[ARM] Fix subtarget features for Thumb2 tests. NFC These mir tests were using instructions that require feature predicates that were not enabled.	2022-07-13 11:42:21 +01:00
John Brawn	ddd9485129	[MVE] Don't distribute add of vecreduce if it has more than one use If the add has more than one use then applying the transformation won't cause it to be removed, so we can end up applying it again causing an infinite loop. Differential Revision: https://reviews.llvm.org/D129361	2022-07-11 14:13:29 +01:00
David Green	0a11ad2aa8	[ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present. If MVE.fp is not present then we cannot select the vector i1 fp operations to VCMP instructions, so need to expand.	2022-07-11 13:03:30 +01:00
David Green	28b41237e6	[InterleaveAccessPass] Handle multi-use binop shuffles D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419	2022-07-10 17:24:37 +01:00
David Green	6ce63e267a	[ARM][AArch64] Add additional test for multiuse vldn binop shuffles. NFC For D129419, these are the same as the existing test, but run through -early-cse.	2022-07-09 22:48:12 +01:00
Nikita Popov	07b185ed81	[Thumb2] Remove unneeded IR from MIR test (NFC) Apart from the global, the IR does not appear to be relevant for the test. Drop it, to remove the dependence on the sdiv constant expression.	2022-07-05 18:18:59 +02:00
David Green	979400be78	[ARM] Fix MVE gather/scatter merged gep offsets This fixes the combining of constant vector GEP operands in the optimization of MVE gather/scatter addresses, when opaque pointers are enabled. As opaque pointers reduce the number of bitcasts between geps, more can be folded than before. This can cause problems if the index types are now different between the two geps. This fixes that by making sure each constant is scaled appropriately, which has the effect of transforming the geps to have a scale of 1, changing [r0, q0, uxtw #1] gathers to [r0, q0] with a larger q0. This helps use a simpler instruction that doesn't need the extra uxtw. Differential Revision: https://reviews.llvm.org/D127733	2022-06-22 11:04:22 +01:00
David Green	76f60931e2	[ARM] Allow distributing postinc with PHI uses Although this doesn't usually come up, we can have uses of the BaseAccess of a distributed postinc being a PHI. This doesn't need the usual dominance check as we will dominate along the phi edge, allowing us to still create a postinc load/store. Differential Revision: https://reviews.llvm.org/D127676	2022-06-20 10:08:21 +01:00
David Green	e995e34469	[MachinePipeliner] Handle failing constrainRegClass The included test hits a verifier problems as one of the instructions: ``` %113:tgpreven, %114:tgprodd = MVE_VMLSLDAVas16 %12:tgpreven(tied-def 0), %11:tgprodd(tied-def 1), %7:mqpr, %8:mqpr, 0, $noreg, $noreg ``` Has two inputs that come from different PHIs with the same base reg, but conflicting regclasses: ``` %11:tgprodd = PHI %103:gpr, %bb.1, %16:gpr, %bb.2 %12:tgpreven = PHI %103:gpr, %bb.1, %17:gpr, %bb.2 ``` The MachinePipeliner would attempt to use %103 for both the %11 and %12 operands in the prolog, constraining the register class to the common subset of both. Unfortunately there are no registers that are both odd and even, so the second constrainRegClass fails. Fix this situation by inserting a COPY for the second if the call to constrainRegClass fails. The register allocation can then fold that extra copy away. The register allocation of Q regs changed with this test, but the R regs were the same and no new instructions are needed in the final assembly. Differential Revision: https://reviews.llvm.org/D127971	2022-06-19 18:55:19 +01:00
Simon Pilgrim	db1be696c4	[DAG] SimplifyDemandedBits - add ISD::VSELECT handling	2022-06-19 15:18:25 +01:00
David Green	c5990d353e	Revert "[ARM] Add a pipeline test showing missing postinc generation. NFC" This reverts commit d9ef307e9bb3b636a18c4051a236f1aafd7600e6 as it is causeing expensive check verification errors. Remove the test again until we can fix them.	2022-06-16 08:23:08 +01:00
David Green	d9ef307e9b	[ARM] Add a pipeline test showing missing postinc generation. NFC	2022-06-16 08:04:50 +01:00
David Green	1da6940275	[ARM] Add more opaque pointer gather/scatter tests. NFC Some of the newly added tests are incorrect, fixed in D127733.	2022-06-14 14:08:43 +01:00
Simon Pilgrim	a71ad6a3c8	[DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector() Allow scalar_to_vector nodes to be used for the start of a build_vector creation	2022-06-11 15:29:22 +01:00
David Sherwood	007917b95c	[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd We already have patterns for matching fadd(select(..., -0.0)), but an upcoming patch will lead to patterns using +0.0 as the identity instead of -0.0. I'm adding support for these patterns now to avoid any regressions for MVE. Differential Revision: https://reviews.llvm.org/D127275	2022-06-10 11:09:55 +01:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Simon Pilgrim	e1d02f6c37	[ARM][Thumb2] Refresh UXTB16 tests to match optimized IR from instcombine As discussed on D77804, instcombine will have already performed a similar SimplifyMultipleUseDemandedBits call which will break the UXTB16 pattern that was being match in these DAG tests I've updated the existing tests so that it match the instcombine IR (with a suitable FIXME) and added an equivalent test pattern suggested by @dmgreen	2022-06-01 15:28:19 +01:00
David Penry	917dc0749b	[ARM] Recognize t2LoopEnd for software pipelining - Add t2LoopEnd to TargetInstrInfo::analyzeBranch and related functions. As there are many side effects of analyzing a branch, only do so if software pipelining is enabled to maintain previous behavior when pipelining is not desired. - Make sure that t2LoopEndDec is immediately followed by a t2B when it is synthesized from a t2LoopEnd. This is done because the t2LoopEnd might have acquired a fall-through path, but IfConversion assumes that fall-through are only possible on analyzable branches. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D126322	2022-05-26 09:55:42 -07:00
David Green	18cb3b3506	[ARM] Fix vcvtb/t.f16 input liveness The `vcvtb.f16.f32 Sd, Sn` (and vcvtt.f16.f32) instruction convert a f32 into a f16, writing either the top or bottom halves of the register. That means that half of the input register Sd is used in the output. This wasn't being modelled in the instructions, leading later analyses to believe that the registers were dead where they were not, generating invalid scheduling Fix that be specifying the input Sda register for the instructions too, allowing them to be set for cases like vector inserts. Most of the changes are plumbing through the constraint string, cstr. Differential Revision: https://reviews.llvm.org/D126118	2022-05-25 12:16:26 +01:00
David Green	fc0229fd6b	[ARM] Clean up a test check from D125604. NFC The Arm test had a incorrect check line with the wrong offset. From the look of the code it should be -400*4 = 0xFFFFF9C0 = 4294965696	2022-05-18 16:12:08 +01:00
Simon Pilgrim	1ecc3d86ae	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839	2022-05-14 09:50:01 +01:00
Matthias Braun	3bf643eb12	Update test for changes in f0ea9c9cec7f7b632ef7894ff7b3859269de611b / D124552	2022-05-10 13:25:38 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
David Green	c7a6b11b7e	[ARM][AArch64] Add some extra shuffle conversion test coverage. NFC This adds a big endian run line for the AArch64 TRN tests and regenerated the check lines, along with adding an extra MVE VMOVN case and regenerating vector-DAGCombine.ll for easier updating.	2022-05-05 15:27:44 +01:00
David Green	f848798b7d	[ARM] Delay creation of MVE Imm shifts to legalization The reasoning for creating VSHLIMM/VSHRsIMM/VSHRuIMM nodes in a combine - because matching i64 constants is difficult - does not apply for MVE, as there are not v2i64 shifts. Delaying the creation of the nodes can allow extra transforms on target independant shl/shr.	2022-05-04 22:12:09 +01:00

1 2 3 4 5 ...

1599 Commits