llvm-project

Author	SHA1	Message	Date
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Simon Pilgrim	8b43c1be23	[X86] X86FixupVectorConstants - shrink vector load to movsd/movsd/movd/movq 'zero upper' instructions (#79000 ) If we're loading a vector constant that is known to be zero in the upper elements, then attempt to shrink the constant and just scalar load the lower 32/64 bits. Always chose the vzload/broadcast with the smallest constant load, and prefer vzload over broadcasts for same bitwidth to avoid domain flips (mainly a AVX1 issue). Fixes #73783	2024-01-24 14:00:51 +00:00
Simon Pilgrim	4318b033bd	[MC][X86] Merge lane/element broadcast comment printers. (#79020 ) This is /almost/ NFC - the only annoyance is that for some reason we were using "<C1,C2,..>" for ConstantVector types unlike all other cases - these now use the same "[C1,C2,..]" format as the other constant printers.	2024-01-23 12:33:52 +00:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Simon Pilgrim	834cc88c5d	[X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets (REAPPLIED) lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this. This is an updated commit of 98061013e01207444cfd3980cde17b5e75764fbe after being reverted at a279a09ab9524d1d74ef29b34618102d4b202e2f	2023-06-13 12:10:11 +01:00
Simon Pilgrim	a279a09ab9	Revert rG98061013e01207444cfd3980 - [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets Reverting while we address an existing issue exposed by this (Issue #63108)	2023-06-06 18:44:24 +01:00
Simon Pilgrim	98061013e0	[X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.	2023-05-29 16:10:52 +01:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Simon Pilgrim	8bf04e9f2a	[X86] Add GFNI test coverage for bitreverse codegen We should be able to efficiently use the vector version for scalar bitreverse, like we do for XOP.	2022-09-13 11:23:03 +01:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Sanjay Patel	f0dd12ec5c	[x86] use zero-extending load of a byte outside of loops too (2nd try) The first attempt missed changing test files for tools (update_llc_test_checks.py). Original commit message: This implements the main suggested change from issue #56498. Using the shorter (non-extending) instruction with only -Oz ("minsize") rather than -Os ("optsize") is left as a possible follow-up. As noted in the bug report, the zero-extending load may have shorter latency/better throughput across a wide range of x86 micro-arches, and it avoids a potential false dependency. The cost is an extra instruction byte. This could cause perf ups and downs from secondary effects, but I don't think it is possible to account for those in advance, and that will likely also depend on exact micro-arch. This does bring LLVM x86 codegen more in line with existing gcc codegen, so if problems are exposed they are more likely to occur for both compilers. Differential Revision: https://reviews.llvm.org/D129775	2022-07-19 21:27:08 -04:00
Sanjay Patel	95401b0153	Revert "[x86] use zero-extending load of a byte outside of loops too" This reverts commit 9d1ea1774c51c44ddf0b5065bf600919988d7015. There are tests of update_llc_tests_checks.py that missed being updated.	2022-07-19 17:37:22 -04:00
Sanjay Patel	9d1ea1774c	[x86] use zero-extending load of a byte outside of loops too This implements the main suggested change from issue #56498. Using the shorter (non-extending) instruction with only -Oz ("minsize") rather than -Os ("optsize") is left as a possible follow-up. As noted in the bug report, the zero-extending load may have shorter latency/better throughput across a wide range of x86 micro-arches, and it avoids a potential false dependency. The cost is an extra instruction byte. This could cause perf ups and downs from secondary effects, but I don't think it is possible to account for those in advance, and that will likely also depend on exact micro-arch. This does bring LLVM x86 codegen more in line with existing gcc codegen, so if problems are exposed they are more likely to occur for both compilers. Differential Revision: https://reviews.llvm.org/D129775	2022-07-19 16:43:47 -04:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Guozhi Wei	f1d8345a2a	[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example %reg101 = ... = ... reg101 %reg103 = ADD %reg101, %reg102 We can create mapping between %reg101 and %reg103. Differential Revision: https://reviews.llvm.org/D113193	2021-11-29 19:01:59 -08:00
Craig Topper	ff37b1105d	[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254	2021-10-21 15:23:23 -07:00
Guozhi Wei	6599961c17	[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation This patch contains following enhancements to SrcRegMap and DstRegMap: 1 In findOnlyInterestingUse not only check if the Reg is two address usage, but also check after commutation can it be two address usage. 2 If a physical register is clobbered, remove SrcRegMap entries that are mapped to it. 3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap entry only when the COPY instruction is coalescable. (The COPY src is killed) With these enhancements isProfitableToCommute can do better commute decision, and finally more register copies are removed. Differential Revision: https://reviews.llvm.org/D108731	2021-10-11 15:28:31 -07:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
Simon Pilgrim	307890f85b	[X86] Freeze vXi8 shl(x,1) -> add(x,x) vector fold (PR50468) We don't have any vXi8 shift instructions (other than on XOP which is handled separately), so replace the shl(x,1) -> add(x,x) fold with shl(x,1) -> add(freeze(x),freeze(x)) to avoid the undef issues identified in PR50468. Split off from D106675 as I'm still looking at whether we can fix the vXi16/i32/i64 issues with the D106679 alternative. Differential Revision: https://reviews.llvm.org/D108139	2021-08-24 16:08:24 +01:00
Roman Lebedev	0aef747b84	[NFC][X86][Codegen] Megacommit: mass-regenerate all check lines that were already autogenerated The motivation is that the update script has at least two deviations (`<...>@GOT`/`<...>@PLT`/ and not hiding pointer arithmetics) from what pretty much all the checklines were generated with, and most of the tests are still not updated, so each time one of the non-up-to-date tests is updated to see the effect of the code change, there is a lot of noise. Instead of having to deal with that each time, let's just deal with everything at once. This has been done via: ``` cd llvm-project/llvm/test/CodeGen/X86 grep -rl "; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py" \| xargs -L1 <...>/llvm-project/llvm/utils/update_llc_test_checks.py --llc-binary <...>/llvm-project/build/bin/llc ``` Not all tests were regenerated, however.	2021-06-11 23:57:02 +03:00
Craig Topper	0248e24071	[X86][update_llc_test_checks] Use a less greedy regular expression for replacing constant pool labels in tests. While working on D97208 I noticed that these greedy regular expressions prevent tests from failing when (%rip) appears after a constant pool label when it didn't before. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D99460	2021-03-28 11:39:46 -07:00
Craig Topper	57c0c4a275	[X86] Fix crash with i64 bitreverse on 32-bit targets with XOP. We unconditionally marked i64 as Custom, but did not install a handler in ReplaceNodeResults when i64 isn't legal type. This leads to ReplaceNodeResults asserting. We have two options to fix this. Only mark i64 as Custom on 64-bit targets and let it expand to two i32 bitreverses which each need a VPPERM. Or the other option is to add the Custom handling to ReplaceNodeResults. This is what I went with.	2020-11-15 19:02:34 -08:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Amy Huang	de1d90299b	Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets" This reverts commit 11c16e71598d51f15b4cfd0f719c4dabcc0bebf7 because it causes a crash in chromium code. See https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.	2020-02-12 17:00:37 -08:00
Simon Pilgrim	11c16e7159	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.	2020-02-11 12:21:03 +00:00
Craig Topper	8b5f2ab2a4	Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183	2019-08-07 16:24:26 +00:00
Mitch Phillips	bd0d97e1c4	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default." This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107	2019-08-06 23:00:43 +00:00
Craig Topper	3de33245d2	[X86] Enable -x86-experimental-vector-widening-legalization by default. This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901	2019-08-05 18:25:36 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Craig Topper	572e94ca02	[X86] Enable 8-bit OR with disjoint bits to convert to LEA We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423	2019-03-05 18:37:33 +00:00
Simon Pilgrim	933673d878	[X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y)) For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations. This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns. Differential Revision: https://reviews.llvm.org/D55935 llvm-svn: 351819	2019-01-22 13:44:49 +00:00
Craig Topper	189e5b4ab6	[LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy Summary: getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant. This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use. Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets. Reviewers: eli.friedman, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53232 llvm-svn: 344460	2018-10-13 17:47:20 +00:00
Simon Pilgrim	2d0f20cc04	[X86] Handle COPYs of physregs better (regalloc hints) Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578	2018-09-19 18:59:08 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Nikolai Bozhenov	84af99b3b1	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Simon Pilgrim	3a196cbc4f	[X86] Add extra BITREVERSE tests Test on 32-bit and 64-bit targets. Add bitreverse tests for i64, i32 and i16 llvm-svn: 297741	2017-03-14 14:03:16 +00:00
Simon Pilgrim	820f87a72d	[SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size. After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts. In doing so we can significantly reduce the number of operations required. Differential Revision: https://reviews.llvm.org/D21578 llvm-svn: 276432	2016-07-22 16:46:25 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Simon Pilgrim	0a0e0d4e8e	[X86] Regenerated bitreverse tests to demonstrate what is going on. llvm-svn: 274915	2016-07-08 19:51:08 +00:00
Simon Pilgrim	aaaeedb8cb	[X86] Added bitreverse tests for non-legal types Requested on D21578 llvm-svn: 274914	2016-07-08 19:48:33 +00:00
Sanjay Patel	97c2c108fd	[x86] avoid printing unnecessary sign bits of hex immediates in asm comments (PR20347) It would be better to check the valid/expected size of the immediate operand, but this is generally better than what we print right now. Differential Revision: http://reviews.llvm.org/D20385 llvm-svn: 271114	2016-05-28 14:58:37 +00:00

1 2

53 Commits