31205 Commits

Author SHA1 Message Date
Matt Arsenault
2875d3d484 RegAllocGreedy: Remove an unhelpful auto, and don't use a reference 2021-09-23 17:25:25 -04:00
Jay Foad
deb2ca566a Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases"
This reverts commit 8229cb74125322ff337cfe316ab35c6ebf412bde.

It was failing on buildbots with expensive checks enabled.
2021-09-23 17:55:05 +01:00
Jay Foad
8229cb7412 [LiveIntervals] Fix repairOldRegInRange for simple def cases
The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange"
was over-zealous. It would bail out when the end of the range to be
repaired was in the middle of the first segment of the live range of
Reg, which was always the case when the range contained a single def of
Reg.

This patch fixes it as suggested by Matthias Braun in post-commit review
on the original patch, and tests it by adding -early-live-intervals to
a selection of existing lit tests that now pass.

(Note that D23303 was originally applied to fix a crash in
SILoadStoreOptimizer, but that is now moot since D23814 updated
SILoadStoreOptimizer to run before scheduling so it no longer has to
update live intervals.)

Differential Revision: https://reviews.llvm.org/D110238
2021-09-23 17:16:14 +01:00
Craig Topper
d5c67bba62 [RegAlloc] Cast uint8_t to unsigned before printing it.
raw_ostream interprets uint8_t as wanting to print a character
with that ASCII value. In this case the uint8_t is an integer
that we want to print.
2021-09-23 08:49:44 -07:00
Simon Pilgrim
2a5936faf0 [CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:46 +01:00
Simon Pilgrim
5cabe4d9d3 [CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-23 12:23:45 +01:00
Fraser Cormack
e7c879a69d [RISCV][VP] Add support for VP_REDUCE_* operations
This patch adds codegen support for lowering the vector-predicated
reduction intrinsics to RVV instructions. The process is similar to that
of the other reduction intrinsics, save for the fact that every VP
reduction has a start value. We reuse the existing custom "VL" nodes,
adding extra patterns where required to handle non-true masks.

To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been
given an explicit "merge" operand. This is to faciliate the VP
reductions, where we must be careful to ensure that even if no operation
is performed (when VL=0) we still produce the start value. The RVV
reductions don't update the destination register under these conditions,
so we tie the splatted start value to the output register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107657
2021-09-23 11:11:05 +01:00
Jay Foad
6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Bjorn Pettersson
c3ae8ecb52 [DAGCombiner] Rename isAlias as mayAlias. NFC
Differential Revision: https://reviews.llvm.org/D110062
2021-09-23 09:54:42 +02:00
Freddy Ye
13207a21a6 [NFC] Remove redundant setOperationAction.
[FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice.

Differential Revision: https://reviews.llvm.org/D110302
2021-09-23 10:28:21 +08:00
David Green
c49611f909 Mark CFG as preserved in TypePromotion and InterleaveAccess passes
Neither of these passes modify the CFG, allowing us to preserve DomTree
and LoopInfo across them by using setPreservesCFG.

Differential Revision: https://reviews.llvm.org/D110161
2021-09-22 18:58:00 +01:00
Daniil Fukalov
1a7b7d7ba2 [NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types.
The pass uses different cost kinds to estimate "old" and "interleaved" costs:
default cost kind for all targets override `getInterleavedMemoryOpCost()` is
`TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are
equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110100
2021-09-22 20:15:17 +03:00
Hongtao Yu
d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Kazu Hirata
3c557cd7f9 [CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC)
Note that MIRCanonicalizerID is declared in
llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp
includes.

Identified with readability-redundant-declaration.
2021-09-22 08:58:27 -07:00
Sander de Smalen
3e8d2008f7 [SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR.
This code seems untested and is likely obsolete, because this case
should already be handled by the code that legalizes the result type
of EXTRACT_SUBVECTOR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110061
2021-09-22 14:23:35 +01:00
Sander de Smalen
d5681f1d68 [SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR.
This is required to codegen something like:
  <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec,
                                                      <vscale x 2 x i16> %subvec,
                                                      i64 %idx)
where the output vector is legal, but the input vector needs promoting.

It implements this by performing the whole operation on the promoted type,
and then truncating the result.

Reviewed By: david-arm, craig.topper

Differential Revision: https://reviews.llvm.org/D110059
2021-09-22 13:32:36 +01:00
Sander de Smalen
4ca1fbe361 [SelectionDAG] Make WidenVecRes_Convert work for scalable vectors.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058
2021-09-22 10:58:38 +01:00
Arthur Eubanks
e42234383e Make DiagnosticInfoResourceLimit's limit param required
And always print it.

This makes some LLVM diagnostics match up better with Clang's diagnostics.

Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print
better diagnostics for those.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D110204
2021-09-21 15:27:58 -07:00
Craig Topper
aeb63d464f [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor.
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D110106
2021-09-21 10:07:29 -07:00
Michael Liao
5fb3ae525f [SelectionDAG] Re-calculate scoped AA metadata when merging stores.
Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D102821
2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov
624e4d087e [GlobalISel] Support ConstantAsMetadata in IRTranslator
When using instructions which have a MetadataAsValue argument
(e.g. some target-specific intrinsics) MD canonicalization strips
internal MDNodes with a single ConstantAsMetadata child. That
prevented IRTranslator from the proper translation of such a calls.
2021-09-21 11:24:56 -04:00
Simon Pilgrim
20b58855e0 [CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Simon Pilgrim
0f83456cf5 [CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI.
Reported by MSVC static analyzer.
2021-09-21 13:01:08 +01:00
Petar Avramovic
8bc7185668 GlobalISel/Utils: Refactor constant splat match functions
Add generic helper function that matches constant splat. It has option to
match constant splat with undef (some elements can be undef but not all).
Add util function and matcher for G_FCONSTANT splat.

Differential Revision: https://reviews.llvm.org/D104410
2021-09-21 12:09:35 +02:00
Amara Emerson
7091a7f781 [GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts.
For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't
seem to have debug users of defs. However, in the legalizer we're always calling
MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive.
In some rare cases, this contributes significantly to unreasonably long compile
times when we have lots of artifact combiner activity.

To verify this, I added asserts to that function when it actually replaced a debug
use operand with undef for these artifacts. On CTMark with both -O0 and -Os and
debug info enabled, I didn't see a single case where it triggered.

In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0
for AArch64 with this change.

Differential Revision: https://reviews.llvm.org/D109750
2021-09-20 23:34:42 -07:00
Amara Emerson
f9d69a0ab0 [GlobalISel] Implement support for the "trap-func-name" attribute.
This attribute calls a function instead of emitting a trap instruction.

Differential Revision: https://reviews.llvm.org/D110098
2021-09-20 14:32:01 -07:00
Petar Avramovic
e4c46ddd91 [GlobalISel] Improve elimination of dead instructions in legalizer
Add eraseInstr(s) utility functions. Before deleting an instruction
collects its use instructions. After deletion deletes use instructions
that became trivially dead.
This patch clears all dead instructions in existing legalizer mir tests.

Differential Revision: https://reviews.llvm.org/D109154
2021-09-20 13:00:58 +02:00
Kazu Hirata
84b07c9b3a [llvm] Use pop_back_val (NFC) 2021-09-19 13:44:23 -07:00
Kazu Hirata
48719e3b18 [CodeGen] Use make_early_inc_range (NFC) 2021-09-18 09:29:24 -07:00
Kazu Hirata
e2febc2ed4 [llvm] Use drop_begin (NFC) 2021-09-17 09:16:40 -07:00
Simon Pilgrim
4af7643470 [CodeGen] LiveDebug - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 14:04:54 +01:00
Simon Pilgrim
9e70d4e5f2 [AsmPrinter] DebugLocEntry::dump() - Use const-ref iterator in for-range loop. NFCI.
Avoid unnecessary copies, reported by MSVC static analyzer.
2021-09-17 12:11:54 +01:00
Petar Avramovic
d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Nikita Popov
0fc624f029 [IR] Return AAMDNodes from Instruction::getMetadata() (NFC)
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.

Differential Revision: https://reviews.llvm.org/D109852
2021-09-16 21:06:57 +02:00
Kazu Hirata
cfc7402419 [llvm] Use drop_begin (NFC) 2021-09-16 08:46:26 -07:00
Doug Gregor
a773db7d76 Add a command-line flag to control the Swift extended async frame info.
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Patch by Doug Gregor <dgregor@apple.com>

Reviewed By: doug.gregor

Differential Revision: https://reviews.llvm.org/D109392
2021-09-16 06:57:45 -07:00
Konstantin Schwarz
d2e66d7fa4 [GlobalISel] Add a combine for and(load , mask) -> zextload
This only handles simple masks, not shifted masks, for now.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D109357
2021-09-16 10:42:46 +02:00
Sam Parker
c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Alok Kumar Sharma
a5b72abc9e [DebugInfo] Enhance DIImportedEntity to accept children entities
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D109343
2021-09-16 10:41:55 +05:30
Ahmed Bougacha
94a2f9cdb6 [GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI.
The doc comment for isPredecessor says:
  Returns true if \p DefMI precedes \p UseMI or they are the same
  instruction.
And dominates relies on that behavior for its own:
  Returns true if \p DefMI dominates \p UseMI. By definition an
  instruction dominates itself.

Make both statements correct by fixing isPredecessor.
Found by inspection.
2021-09-15 16:45:27 -07:00
Matt Arsenault
87c00878d3 SplitKit: Remove decade old live interval hack
This was trying to fixup broken live intervals coming out of the
coalescer. The verifier is more complete now and no tests seem to fail
without this.
2021-09-15 17:35:59 -04:00
Amara Emerson
5ec1845cad [AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs.
G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C)

Improves CTMark -Os on AArch64:

Program            before after  diff
           sqlite3 286932 287024  0.0%
                kc 432512 432508 -0.0%
             SPASS 412788 412764 -0.0%
    pairlocalalign 249460 249416 -0.0%
            bullet 475740 475512 -0.0%
    7zip-benchmark 568864 568356 -0.1%
  consumer-typeset 419088 418648 -0.1%
        tramp3d-v4 367628 367224 -0.1%
          clamscan 383184 382732 -0.1%
            lencod 430028 429284 -0.2%
Geomean difference               -0.1%

Differential Revision: https://reviews.llvm.org/D109528
2021-09-14 23:57:41 -07:00
Matt Arsenault
54d755a034 DAG: Fix incorrect folding of fmul -1 to fneg
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
2021-09-14 21:25:02 -04:00
Matt Arsenault
4a36e96c3f RegAllocGreedy: Account for reserved registers in num regs heuristic
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.

There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.

The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
2021-09-14 21:00:29 -04:00
Bjorn Pettersson
cd2bff1ef1 [StackColoring] Fix a debug invariance problem
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).

Differential Revision: https://reviews.llvm.org/D109758
2021-09-14 19:21:56 +02:00
vnalamot
726b5d3416 [RegScavenger][NFC] Refer to the already initialized local variable for spill slot index
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109501
2021-09-13 21:55:33 +05:30
Simon Pilgrim
9db20822f7 [APInt] Add APIntOps::ScaleBitMask helper
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.

When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.

This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.

This came up on D109065 where we currently have to add yet another implementation of the same code.

Differential Revision: https://reviews.llvm.org/D109683
2021-09-13 16:27:12 +01:00
vnalamot
0fc3ebb70a [SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109674
2021-09-13 20:48:04 +05:30
David Truby
915e9e76bf [llvm][sve] Lowering for VLS masked extending loads
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.

The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.

Differential Revision: https://reviews.llvm.org/D108200
2021-09-13 11:13:25 +01:00
Nikita Popov
4189e5fe12 [CGP] Support opaque pointers in address mode fold
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.

In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
2021-09-12 17:43:37 +02:00