32115 Commits

Author SHA1 Message Date
Kazu Hirata
627b5bda11 [llvm] Add missing header guards (NFC)
Identified with llvm-header-guard.
2021-01-30 09:53:42 -08:00
Kazu Hirata
1a2d67fa23 [llvm] Use llvm::lower_bound and llvm::upper_bound (NFC) 2021-01-29 23:23:36 -08:00
Sriraman Tallam
c32f399802 Detect Source Drift with Propeller.
Source Drift happens when the sources are updated after profiling the binary
but before building the final optimized binary. If the source has changed since
the profiles were obtained, optimizing basic blocks might be sub-optimal. This
only applies to BasicBlockSection::List as it creates clusters of basic blocks
using basic block ids. Source drift can invalidate these groupings leading to
sub-optimal code generation with regards to performance.

PGO source drift for a particular function can be detected using function
metadata added in D95495.

When source drift is deected, disable basic block clusters by default
which can be re-enabled with  -mllvm option
bbsections-detect-source-drift=false.

Differential Revision: https://reviews.llvm.org/D95593
2021-01-29 18:47:26 -08:00
Roman Lebedev
ddc4b56eef
[ExpandMemCmpPass] Preserve Dominator Tree, if available
This finishes getting rid of all the avoidable Dominator Tree recalculations
in X86 optimized codegen pipeline.
2021-01-30 01:14:51 +03:00
Roman Lebedev
c2534a7097
[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable
This doesn't help avoid any Dominator Tree recalculations just yet,
there's one more pass to go..
2021-01-30 01:14:51 +03:00
Jessica Paquette
d6656c3b25 [GlobalISel] Remove hint instructions in generic InstructionSelect code.
I think every target will want to remove these in the same way. Rather than
making them all implement the same code, let's just put this in
InstructionSelect.

Differential Revision: https://reviews.llvm.org/D95652
2021-01-29 11:20:07 -08:00
Jay Foad
5cf6412a27 [GlobalISel] Fix modifying a G_OR without notifying the observer
Remove the call to setFlags in favour of creating the instruction with
the correct flags in the first place, so we don't have to explicitly
notify the observer.

Differential Revision: https://reviews.llvm.org/D95681
2021-01-29 16:32:24 +00:00
Sjoerd Meijer
f03f3a8474 [MachineLICM] Fix wrong and confusing comment. NFC. 2021-01-29 13:39:07 +00:00
Florian Hahn
f3a710cade [LTO] Update splitCodeGen to take a reference to the module. (NFC)
splitCodeGen does not need to take ownership of the module, as it
currently clones the original module for each split operation.

There is an ~4 year old fixme to change that, but until this is
addressed, the function can just take a reference to the module.

This makes the transition of LTOCodeGenerator to use LTOBackend a bit
easier, because under some circumstances, LTOCodeGenerator needs to
write the original module back after codegen.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95222
2021-01-29 11:53:11 +00:00
Kazu Hirata
7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Wei Mi
e15ae67a0a [LiveDebugVariables] Add cache for SkipPHIsLabelsAndDebug to prevent
iterating the same PHI/LABEL/Debug instructions repeatedly.

We run into a compiling timeout problem when building a target after its
SampleFDO profile is updated. It is because some very large blocks with
a bunch of PHIs at the beginning. LiveDebugVariables::emitDebugValues
called during VirtRegRewriter phase searchs the insertion point for those
large BBs repeatedly in SkipPHIsLabelsAndDebug, and each time
SkipPHIsLabelsAndDebug needs to go through the same set of PHIs before it
can find the first non PHI/Label/Debug instruction. This patch adds a cache
to save the last position for the sequence which has been checked in the
previous call of SkipPHIsLabelsAndDebug.

Differential Revision: https://reviews.llvm.org/D94981
2021-01-28 21:58:17 -08:00
Christudasan Devadasan
892e4567e1 Support a list of CostPerUse values
This patch allows targets to define multiple cost
values for each register so that the cost model
can be more flexible and better used during the
register allocation as per the target requirements.

For AMDGPU the VGPR allocation will be more efficient
if the register cost can be associated dynamically
based on the calling convention.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86836
2021-01-29 10:14:52 +05:30
Jessica Paquette
d5736a2746 [GlobalISel] Implement regbankselect for G_ASSERT_ZEXT
This adds generic regbankselect support for G_ASSERT_ZEXT.

It inherits whatever register bank the source was given, always, on all targets.

I think that at the point where we run into these, the source register bank
should be decided.

This also adds some AArch64-specific code which makes sure we can handle
G_ASSERT_ZEXT when deciding on register banks for G_STORE, G_PHI, ... etc.

Differential Revision: https://reviews.llvm.org/D95649
2021-01-28 16:56:14 -08:00
Jessica Paquette
f19971d1de [GlobalISel] Implement computeKnownBits for G_ASSERT_ZEXT
It's the same as the ZEXT/TRUNC case, except SrcBitWidth is given by the
immediate operand.

Update KnownBitsTest.cpp and a MIR test for a concrete example.

Differential Revision: https://reviews.llvm.org/D95566
2021-01-28 16:34:34 -08:00
Jessica Paquette
daffab1985 Recommit "[GlobalISel] Walk through hints in getDefIgnoringCopies et al"
Recommit of 4580acf6752ea3cc884657b5aa3e174bed86fc8c

`Opc = DefMI->getOpcode()` was in the wrong place.
2021-01-28 14:43:00 -08:00
Jessica Paquette
dcb5b5f1f2 Revert "[GlobalISel] Walk through hints in getDefIgnoringCopies et al"
This reverts commit 4580acf6752ea3cc884657b5aa3e174bed86fc8c.

Reverting while looking into some test failures.
2021-01-28 14:37:57 -08:00
Jessica Paquette
4580acf675 [GlobalISel] Walk through hints in getDefIgnoringCopies et al
Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers
which walk through copies.

This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations
that rely on these helpers.

Differential Revision: https://reviews.llvm.org/D95577
2021-01-28 14:27:00 -08:00
Cassie Jones
f22f4557a7 [GlobalISel] Implement widenScalar for carry-in add/sub
These are widened to a wider UADDE/USUBE, with the overflow value
unused, and with the same synthesis of a new overflow value as for the
O operations.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95326
2021-01-28 17:06:24 -05:00
Jessica Paquette
24261729a4 [GlobalISel] Add G_ASSERT_ZEXT
This adds a generic opcode which communicates that a type has already been
zero-extended from a narrower type.

This is intended to be similar to AssertZext in SelectionDAG.

For example,

```
%x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16
```

Signifies that the top 48 bits of %x are known to be 0.

This is useful in cases like this:

```
define i1 @zeroext_param(i8 zeroext %x) {
  %cmp = icmp ult i8 %x, -20
  ret i1 %cmp
}
```

In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit
value.

If we know that `%x` is already zero-ed out in the relevant high bits, we can
avoid the truncate.

Currently, in GISel, this looks like this:

```
_zeroext_param:
  and w8, w0, #0xff ; We don't actually need this!
  cmp w8, #236
  cset w0, lo
  ret
```

While SDAG does not produce the truncation, since it knows that it's
unnecessary:

```
_zeroext_param:
  cmp w0, #236
  cset w0, lo
  ret
```

This patch

- Adds G_ASSERT_ZEXT
- Adds MIRBuilder support for it
- Adds MachineVerifier support for it
- Documents it

It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There
should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.)

This allows us to skip over hints in the legalizer etc. These can then later
be selected like COPY instructions or removed.

Differential Revision: https://reviews.llvm.org/D95564
2021-01-28 13:58:37 -08:00
David Blaikie
85b7b5625a Fix memory leak in 4318028cd2d7633a0cdeb0b5d4d2ed81fab87864 2021-01-28 12:08:23 -08:00
David Blaikie
4318028cd2 DebugInfo: Add a DWARF FORM extension for addrx+offset references to reduce relocations
This is an alternative to the use of complex DWARF expressions for
addresses - shaving off a few extra bytes of expression overhead.
2021-01-28 10:20:02 -08:00
Shaurya Gupta
e29552c5af Revert "[DWARF] Create subprogram's DIE in DISubprogram's unit"
This reverts commit ef0dcb506300dc9644e8000c6028d14214be9d97.

This change is causing a lot of compiler crashes inside, sorry I don't have a
small repro/stacktrace with symbols to share right now.

Differential Revision: https://reviews.llvm.org/D95622
2021-01-28 16:39:01 +00:00
Roman Lebedev
6617529a1d
[CodeGen][DwarfEHPrepare] Preserve Dominator Tree
Now that D94827 has flipped the switch, and SimplifyCFG is officially marked
as production-ready regarding Dominator Tree preservation,
we can update this user pass to also preserve Dominator Tree.

This is a geomean compile-time win of `-0.05%`..`-0.08%`.
https://llvm-compile-time-tracker.com/compare.php?from=51a25846c198cff00abad0936f975167357afa6f&to=082499aac236a5c141e50a9e77870d5be2de5f0b&stat=instructions

Differential Revision: https://reviews.llvm.org/D95548
2021-01-28 14:11:34 +03:00
Tomas Matheson
b9ed8ebe0e [ARM][RegisterScavenging] Don't consider LR liveout if it is not reloaded
https://bugs.llvm.org/show_bug.cgi?id=48232

When PrologEpilogInserter writes callee-saved registers to the stack, LR is not reloaded but is instead loaded directly into PC.
This was not taken into account when determining if each callee-saved register was liveout for the block.
When frame elimination inserts virtual registers, and the register scavenger tries to scavenge LR, it considers it liveout and tries to spill again.
However there is no emergency spill slot to use, and it fails with an error:

    fatal error: error in backend: Error while trying to spill LR from class GPR: Cannot scavenge register without an emergency spill slot!

This patch pervents any callee-saved registers which are not reloaded (including LR) from being marked liveout.
They are therefore available to scavenge without requiring an extra spill.
2021-01-28 09:22:55 +00:00
Kazu Hirata
0da15ea581 [llvm] Use append_range (NFC) 2021-01-27 23:25:41 -08:00
David Blaikie
dd7297e1bf DebugInfo: Fix bug in addr+offset exprloc to use DWARFv5 addrx op instead of DWARFv4 GNU extension 2021-01-27 18:39:44 -08:00
Roman Lebedev
7e88942d25
[CodeGen] IndirectBrExpandPass: preserve Dominator Tree, if available
This fully de-pessimizes the common case of no indirectbr's,
(where we don't actually need to do anything to preserve domtree)
and avoids domtree recomputation in the case there were indirectbr's.

Note that two indirectbr's could have a common successor, and not all
successors of an indirectbr's are meant to survive the expansion.

Though, the code assumes that an indirectbr's doesn't have
duplicate successors, those *should* have been deduplicated
by simplifycfg or something already.
2021-01-28 01:58:53 +03:00
David Blaikie
7e6c87ee04 DebugInfo: Deduplicate addresses in debug_addr
Experimental, using non-existent DWARF support to use an expr for the
location involving an addr_index (to compute address + offset so
addresses can be reused in more places).

The global variable debug info had to be deferred until the end of the
module (so bss variables would all be emitted first - so their labels
would have the relevant section). Non-bss variables seemed to not have
their label assigned to a section even at the end of the module, so I
didn't know what to do there.

Also, the hashing code is broken - doesn't know how to hash these
expressions (& isn't hashing anything inside subprograms, which seems
problematic), so for test purposes this change just skips the hash
computation. (GCC's actually overly sensitive in its hash function, it
seems - I'm forgetting the specific case right now - anyway, we might
want to just use the frontend-known file hash and give up on optimistic
.dwo/.dwp reuse)
2021-01-27 14:00:43 -08:00
Craig Topper
0b50fa9945 [FaultsMaps][llvm-objdump] Move FaultMapParser to Object/. Remove CodeGen dependency from llvm-objdump
FaultsMapParser lived in CodeGen and was forcing llvm-objdump to
link CodeGen and everything CodeGen depends on.

This was previously attempted in r240364 to fix a link failure.
The CodeGen dependency was independently added to fix the same
link failure, and that ended up being kept.

Removing the dependency seems like the correct layering for
llvm-objdump.

Reviewed By: MaskRay, jhenderson

Differential Revision: https://reviews.llvm.org/D95414
2021-01-27 10:39:59 -08:00
Simon Pilgrim
5ded5ab78f ExecutionDomainFix.cpp - use const refs in for-range loops. NFCI.
Avoid unnecessary copies. Reported by clang-tidy.
2021-01-27 15:39:32 +00:00
Roman Lebedev
51a25846c1
[CodeGen] SafeStack: preserve DominatorTree if it is avaliable
While this is mostly NFC right now, because only ARM happens
to run this pass with DomTree available before it,
and required after it, more backends will be affected once
the SimplifyCFG's switch for domtree preservation is flipped,
and DwarfEHPrepare also preserves the domtree.
2021-01-27 18:32:35 +03:00
Roman Lebedev
4de3bdd65f
[NFC] StackProtector: be consistent and to initialize DominatorTreeWrapperPass
We already ask for it, so it might be good to ensure that it is
actually initialized before us. Doesn't seem to matter in practice though.
2021-01-27 18:32:35 +03:00
Jeremy Morse
ef0dcb5063 [DWARF] Create subprogram's DIE in DISubprogram's unit
This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.

Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.

Differential Revision: https://reviews.llvm.org/D94976
2021-01-27 12:36:14 +00:00
Sjoerd Meijer
48ecba350e [MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink.
This moves SinkIntoLoop from MachineLICM to MachineSink. The motivation for
this work is that hoisting is a canonicalisation transformation, but we do not
really have a good story to sink instructions back if that is better, e.g. to
reduce live-ranges, register pressure and spilling. This has been discussed a
few times on the list, the latest thread is:

https://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

There it was pointed out that we have the LoopSink IR pass, but that works on
IR, lacks register pressure informatiom, and is focused on profile guided
optimisations, and then we have MachineLICM and MachineSink that both perform
sinking. MachineLICM is more about hoisting and CSE'ing of hoisted
instructions. It also contained a very incomplete and disabled-by-default
SinkIntoLoop feature, which we now move to MachineSink.

Getting loop-sinking to do something useful is going to be at least a 3-step
approach:

1) This is just moving the code and is almost a NFC, but contains a bug fix.
This uses helper function `isLoopInvariant` that was factored out in D94082 and
added to MachineLoop.
2) A first functional change to make loop-sink a little bit less restrictive,
which it really is at the moment, is the change in D94308. This lets it do
more (alias) analysis using functions in MachineSink, making it a bit more
powerful. Nothing changes much: still off by default. But it shows that
MachineSink is a better home for this, and it starts using its functionality
like `hasStoreBetween`, and in the next step we can use `isProfitableToSinkTo`.
3) This is the going to be he interesting step: decision making when and how
many instructions to sink. This will be driven by the register pressure, and
deciding if reducing live-ranges and loop sinking will help in better
performance.
4) Once we are happy with 3), this should be enabled by default, that should be
the end goal of this exercise.

Differential Revision: https://reviews.llvm.org/D93694
2021-01-27 10:49:56 +00:00
Jessica Paquette
f36007e811 [GlobalISel] Implement computeKnownBits for G_SEXT_INREG
Just use the existing `Known.sextInReg` implementation.

- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.

Differential Revision: https://reviews.llvm.org/D95484
2021-01-26 15:01:38 -08:00
Amara Emerson
cbed865e1e [GlobalISel][IRTranslator] Ignore the llvm.experimental.noalias.scope.decl intrinsic.
These don't generate any code.
2021-01-26 13:04:11 -08:00
Fangrui Song
34b60d8a56 Add -fbinutils-version= to gate ELF features on the specified binutils version
There are two use cases.

Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler().  Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).

Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld".  You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969)

Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).

This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.

`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.

Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).

Differential Revision: https://reviews.llvm.org/D85474
2021-01-26 12:28:23 -08:00
Freddy Ye
b3b0acdc6f [NFC] Refine some uninitialized used variables.
These warning are reported by static code analysis tool: Klocwork

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D95421
2021-01-26 16:51:05 +08:00
Amara Emerson
03bce0bf4e [GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi.
The current algorithm just tries to localize defs as far as they can go, and in
the case of G_PHI operands, it clones the def into the predecessor block for
each incoming edge. When multiple edges have the same register value, this can
cause unnecessary code bloat, and inhibit later optimizations.

This change checks if a given phi operand is unique in the phi, if not the
def of that register is not localized to the predecessor.

Differential Revision: https://reviews.llvm.org/D95406
2021-01-25 17:48:04 -08:00
Craig Topper
ea87cf2acd [TargetLowering][RISCV] Don't transform (seteq/ne (sext_inreg X, VT), C1) -> (seteq/ne (zext_inreg X, VT), C1) if the sext_inreg is cheaper
RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we
can use addiw rd, rs1, x0 for sext_inreg. We already understood this
when type legalizing i32 seteq/ne on rv64. But this transform in
SimplifySetCC would sometimes undo it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D95289
2021-01-25 16:37:21 -08:00
David Blaikie
70e251497c DebugInfo: Generalize the .debug_addr minimization flag to pave the way for including other strategies 2021-01-25 16:24:35 -08:00
Mitch Phillips
c9466ede7e Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method""
This reverts commit 554b3211fefd09b56b64357b9edd66c78ae200b5.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-25 16:22:22 -08:00
Cassie Jones
aa8f3677f7 Recommit "[AArch64][GlobalISel] Implement widenScalar for signed overflow"
Implement widening for G_SADDO and G_SSUBO.
Add legalize-add/sub tests for narrow overflowing add/sub on AArch64.

Differential Revision: https://reviews.llvm.org/D95034
2021-01-25 16:57:20 -05:00
Fraser Cormack
fde2466171 [SelectionDAG] Support scalable-vector splats in more cases
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.

It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94501
2021-01-25 10:58:15 +00:00
Fangrui Song
d745b82de1 [XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering 2021-01-25 00:49:18 -08:00
Fangrui Song
d5bbaaaf95 [XRay] Make __xray_customevent support non-Linux 2021-01-25 00:48:21 -08:00
QingShan Zhang
ffc3e800c6 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Chen Zheng
0ed4cf4bf3 [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-24 21:28:21 -05:00
Kazu Hirata
16baad8f4e [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata
d44ca0cf2f [CodeGen] Forward-declare TargetMachine (NFC)
InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h.  This patch adds a forward
declaration right in InstrEmitter.h.

While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
2021-01-24 12:18:54 -08:00