679 Commits

Author SHA1 Message Date
Fangrui Song
a24418375a
[CodeLayout] cache-directed sort: limit max chain size (#69039)
When linking an executable with a slightly larger executable,
ld.lld --call-graph-profile-sort=cdsort can be very slow (see #68638).
```
   4.6%  20.7Mi    .text.hot
   3.5%  15.9Mi    .text
   3.4%  15.2Mi    .text.unknown
```

Add cl option `cdsort-max-chain-size`, which is similar to
`ext-tsp-max-chain-size`, and set it to 128, to improve performance.

In `ld.lld @response.txt --threads=4 --call-graph-profile-sort=cdsort
--time-trace"
builds, the "Total Sort sections" time is measured as follows:

* -mllvm  -cdsort-max-chain-size=64: 1.321813
* -mllvm -cdsort-max-chain-size=128: 2.030425
* -mllvm -cdsort-max-chain-size=256: 2.927684
* -mllvm -cdsort-max-chain-size=512: 5.493106
* unlimited: 9 minutes

The rest part takes 6.8s.
2023-10-22 16:50:03 -07:00
Dominik Adamski
eee8dd9088
[CodeExtractor] Allow to use 0 addr space for aggregate arg (#66998)
The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.

CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate parameter
is passed as pointer in zero address space.
2023-10-18 20:12:31 +02:00
Matthias Braun
5181156b37
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
2023-10-05 11:40:17 -07:00
Kazu Hirata
3b34c117db [llvm] Remove unused using decls (NFC)
Identified with misc-unused-using-decls.
2023-10-03 23:21:50 -07:00
Hans Wennborg
eee1f7cef8 Revert "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)"
This caused asserts:

  llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2331:
  virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction *):
  Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms &&
  "getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed.

See comment on the code review for reproducer.

> RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
>
> Similar to imported declarations, the patch tracks function-local types in
> DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
> the aforementioned metadata change and provided a support of function-local
> types scoped within a lexical block.
>
> The patch assumes that DICompileUnit's 'enums field' no longer tracks local
> types and DwarfDebug would assert if any locally-scoped types get placed there.
>
> Reviewed By: jmmartinez
>
> Differential Revision: https://reviews.llvm.org/D144006

This reverts commit f8aab289b5549086062588fba627b0e4d3a5ab15.
2023-09-29 14:23:31 +02:00
Fangrui Song
e705b37a77 [CodeLayout] Add unittest for cache-directed sort
The function reordering algorithm added by https://reviews.llvm.org/D152834 and
used by BOLT (https://reviews.llvm.org/D153039) is untested.

Add some tests at the appropriate layer.

Depends on D159526

Differential Revision: https://reviews.llvm.org/D159527
2023-09-27 10:52:12 -07:00
Vladislav Dzhidzhoev
f8aab289b5 [DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.

The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.

Reviewed By: jmmartinez

Differential Revision: https://reviews.llvm.org/D144006
2023-09-26 23:07:29 +04:00
Florian Hahn
4b0df112da
[VPlan] Fix invalid IR in unit test input, run verifier.
Some tests were passing invalid IR to the VPlan construction logic. Fix
the invalid IR and run the verifier on the input to avoid issues in the
future.
2023-09-22 21:12:09 +01:00
Nikita Popov
2d8d622c73 [SCEV] Require that addrec operands dominate the loop
SCEVExpander currently has special handling for the case where the
start or the step of an addrec do not dominate the loop header,
which is not used by any lit test.

Initially I thought that this is entirely dead code, because
addrec operands are required to be loop invariant. However,
SCEV currently allows creating an addrec with operands that are
loop invariant but defined *after* the loop.

This doesn't seem like a useful case to allow, and we don't
appear to be using this outside a single easy to adjust unit test.
2023-09-22 09:02:54 +02:00
Bjorn Pettersson
4d5906e0bf [llvm][unittests] Remove unneeded header includes 2023-09-12 18:47:44 +02:00
Mel Chen
26aed5b9a8 [VPlan][LoopUtils] Remove unused parameter TTI
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158148
2023-09-04 05:30:37 -07:00
Mel Chen
463e7cb892 [LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Alexandros Lamprineas
d1b376fd7b [FuncSpec] Rework the discardment logic for unprofitable specializations.
Currently we make an arbitrary comparison between codesize and latency
in order to decide whether to keep a specialization or not. Sometimes
the latency savings are biased in favor of loops because of imprecise
block frequencies, therefore this metric contains a lot of noise. This
patch tries to address the problem as follows:

* Reject specializations whose codesize savings are less than X% of
  the original function size.
* Reject specializations whose latency savings are less than Y% of
  the original function size.
* Reject specializations whose inlining bonus is less than Z% of
  the original function size.

I am not saying this is super precise, but at least X, Y and Z are
configurable, allowing us to tweak the cost model. Moreover, it lets
us prioritize codesize over latency, which is a less noisy metric.

I am also increasing the minimum size a function should have to be
considered a candidate for specialization. Initially the cost of
a function was calculated as

  CodeMetrics::NumInsts * InlineConstants::getInstrCost()

which later in D150464 was altered into CodeMetrics::NumInsts since
the metric is supposed to model TargetTransformInfo::TCK_CodeSize.
However, we omitted adjusting MinFunctionSize in that commit.

Differential Revision: https://reviews.llvm.org/D157123
2023-08-09 10:28:46 +01:00
Florian Hahn
93c5bae00e
[VPlan] Use printOperands for VPInstruction.
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.

Also removes some stray spaces in print output.
2023-08-08 11:31:21 +01:00
Alexandros Lamprineas
c2d19002ae [FuncSpec] Estimate dead blocks more accurately.
Currently we only consider basic blocks with a unique predecessor when
estimating the size of dead code. However, we could expand to this to
consider blocks with a back-edge, or blocks preceded by dead blocks.

Differential Revision: https://reviews.llvm.org/D156903
2023-08-07 11:04:23 +01:00
Alexandros Lamprineas
5bfefff1c4 Reland [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-08-02 12:41:13 +01:00
Bjorn Pettersson
fd05c34b18 Stop using legacy helpers indicating typed pointer types. NFC
Since we no longer support typed LLVM IR pointer types, the code can
be simplified into for example using PointerType::get directly instead
of using Type::getInt8PtrTy and Type::getInt32PtrTy etc.

Differential Revision: https://reviews.llvm.org/D156733
2023-08-02 12:08:37 +02:00
Alexandros Lamprineas
893d3a61c0 Reland [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-31 08:25:48 +01:00
Douglas Yung
32683b231e Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35.

The test in this change was failing on many buildbots:

https://lab.llvm.org/buildbot/#/builders/164/builds/41292
https://lab.llvm.org/buildbot/#/builders/258/builds/4491
https://lab.llvm.org/buildbot/#/builders/192/builds/3566
https://lab.llvm.org/buildbot/#/builders/123/builds/20411
https://lab.llvm.org/buildbot/#/builders/58/builds/42553
https://lab.llvm.org/buildbot/#/builders/247/builds/7037
https://lab.llvm.org/buildbot/#/builders/139/builds/46259
https://lab.llvm.org/buildbot/#/builders/216/builds/24650
https://lab.llvm.org/buildbot/#/builders/234/builds/12571
https://lab.llvm.org/buildbot/#/builders/232/builds/12574
https://lab.llvm.org/buildbot/#/builders/235/builds/975
2023-07-27 13:47:52 -07:00
Alexandros Lamprineas
96ff464dd3 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

In addition to the last revision this one fixes the bug reported on
Phabricator.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-27 19:24:11 +01:00
Alexandros Lamprineas
2e00eba232 [FuncSpec][NFC] Remove SSA copy intrinsics in the unittests.
Those are added by the SCCP Solver before invoking the Specializer.
They need to be removed otherwise the destructor of PredicateInfo
complains.

Differential Revision: https://reviews.llvm.org/D156365
2023-07-27 08:37:33 +01:00
Alexandros Lamprineas
c52ab9ea2f Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
Reverting due to the crash reported in D154852.

Also reverting the subsequent commit as collateral damage:

"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
2023-07-26 12:33:41 +01:00
Alexandros Lamprineas
20c8f58c11 [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-07-26 12:03:46 +01:00
Alexandros Lamprineas
03f1d09fe4 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-25 11:00:20 +01:00
Alexandros Lamprineas
1d0476cb4d [FuncSpec] Prefer DataLayout-aware constant folding of GEPs.
As shown in D154820, the DataLayout-independent constant folding
interface is not good enough for handling GEPs. Instead we should
be using the DataLayout-aware constant folding interface. Since
there isn't a method to specifically handle GEPs we can use the
one which folds generic instruction operands.

Differential Revision: https://reviews.llvm.org/D154821
2023-07-11 13:24:26 +01:00
Alexandros Lamprineas
cae00b2a9b [FuncSpec][NFC] Improve the unittest coverage for constant folding of GEPs.
The InstCostVisitor is currently using the DataLayout-independent constant
folding interface. This is a workaround since we can't directly call
ConstantExpr::getGetElementPtr due to deprecation. This patch shows that
the constant folding interface we are using is not good enough.

Differential Revision: https://reviews.llvm.org/D154820
2023-07-11 13:24:12 +01:00
Johannes Doerfert
e9fc399db3 [Attributor][NFCI] Use pointers to pass around AAs
This will make it easier to create less trivial AAs in the future as we
can simply return `nullptr` rather than an AA with in invalid state.
2023-06-23 17:21:20 -07:00
Alexandros Lamprineas
5400257ded [FuncSpec] Add Freeze and CallBase to the InstCostVisitor.
Allows constant folding of such instructions when estimating user bonus.

Differential Revision: https://reviews.llvm.org/D153036
2023-06-19 10:53:08 +01:00
Alexandros Lamprineas
f11d8c88dd [FuncSpec][NFC] Improve the unittest coverage.
The specialization bonus is zero in some unittests because the basic blocks
containing the users of the constant arguments are executed less frequently
than the entry block. Sinking them into loops solves that.

Differential Revision: https://reviews.llvm.org/D153230
2023-06-19 09:43:26 +01:00
Arthur Eubanks
3e39cfe5b4 Revert "Revert "InstSimplify: Require instruction be parented""
This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921.

Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
2023-06-16 13:53:31 -07:00
Arthur Eubanks
0c03f48480 Revert "InstSimplify: Require instruction be parented"
This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b.

Causes large binary size regressions, see comments on https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b.
2023-06-16 11:24:29 -07:00
Alan Zhao
d6b4f6786b Revert "Revert "InstSimplify: Require instruction be parented""
This reverts commit 00264eac4d0938ae8a0826da38e4777be269124c.

Reason: caused a bunch of bots to break
2023-06-16 10:58:54 -07:00
Alan Zhao
00264eac4d Revert "InstSimplify: Require instruction be parented"
This reverts commit 1536e299e63d7788f38117b0212ca50eb76d7a3b.

Reason: causes a regression in the inliner (see https://crbug.com/1454531 and https://reviews.llvm.org/rG1536e299e63d7788f38117b0212ca50eb76d7a3b#1217141)
2023-06-16 10:36:49 -07:00
Alexandros Lamprineas
4d13896d8a Reland "[FuncSpec] Improve the accuracy of the cost model"
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.

Differential Revision: https://reviews.llvm.org/D150464
2023-06-08 17:44:48 +01:00
Jim Lin
d4c5b45293 [NFC] Remove unneeded semicolon after function definition 2023-06-07 09:29:49 +08:00
Matt Arsenault
1536e299e6 InstSimplify: Require instruction be parented
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
2023-06-02 18:14:28 -04:00
Nikita Popov
96a14f388b Revert "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.

This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
2023-05-30 14:49:03 +02:00
Alexandros Lamprineas
0524534d52 [FuncSpec] Enable specialization of literal constants.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.

Differential Revision: https://reviews.llvm.org/D150649
2023-05-25 09:55:46 +01:00
Alexandros Lamprineas
ced90d1ff6 [FuncSpec] Improve the accuracy of the cost model.
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.

Differential Revision: https://reviews.llvm.org/D150464
2023-05-24 11:40:12 +01:00
Matt Arsenault
7d7c82ad84 Convert unit test to opaque pointers 2023-05-24 08:49:03 +01:00
Florian Hahn
701f7230cd
[VPlan] Use VPRecipeWithIRFlags for VPReplicateRecipe, retire poison map
Update VPReplicateRecipe to use VPRecipeWithIRFlags for IR flag
handling. Retire separate MayGeneratePoisonRecipes map.

Depends on D149082.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150027
2023-05-15 11:49:20 +01:00
Florian Hahn
c2bef381fa
[VPlan] Remove setEntry to avoid leaks when replacing entry.
Update the HCFG builder to directly connect the created CFG to the
existing Plan's entry. This allows removing `setEntry`, which can cause
leaks when the existing entry is replaced.

Should fix
https://lab.llvm.org/buildbot/#/builders/5/builds/33455/steps/13/logs/stdio
2023-05-04 19:12:02 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00
Florian Hahn
2c9d21a2a3
[VPlan] Turn Plan entry node into VPBasicBlock (NFCI).
The entry to the plan is the preheader of the vector loop and
guaranteed to be a VPBasicBlock. Make sure this is the case by
adjusting the type.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149005
2023-04-28 12:29:06 +01:00
OCHyams
3feea34d77 [DebugInfo] Do not delete debug intrinsics with empty metadata operands
A ValueAsMetadata may be replaced with nullptr for several reasons including
deleting (certain) values and value remapping a use-before-def. In the case of
a MetadataAsValue user, handleChangedOperand intercepts and replaces the
metadata with an empty tuple (!{}).

At the moment, an empty metadata operand in a debug intrinsics signals that it
can be deleted.

Given that we end up with empty metadata operands in circumstances where the
Value has been "lost" the current behaviour can lead to incorrect variable
locations. Instead, we should treat empty metadata as meaning "there is no
location for the variable" (the same as we currently treat undef operands).

This patch removes the deletion logic from wouldInstructionBeTriviallyDead.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140901
2023-04-26 09:58:31 +01:00
Bing1 Yu
c2f29f24c2 [ValueMapper] allow mapping ConstantTargetNone to its layout type
zeroinitializer is allowed for spirv TargetExtType.
This PR allows ValueMapper to map TargetExtType ConstantTargetNone to '0' constant of its layout type.

Reviewed By: jcranmer-intel

Differential Revision: https://reviews.llvm.org/D148774
2023-04-25 15:48:28 +08:00
OCHyams
ea60ffc6d1 [NFC] Return unique dbg intrinsics from findDbgValues and findDbgUsers
The out-param vector from findDbgValues and findDbgUsers should not include
duplicates, which is possible if the debug intrinsic uses the value multiple
times. This filter is already in place for multiple uses in a `DIArgLists`;
extend it to cover dbg.assigns too because a Value may be used in both the
address and value components.

Additionally, refactor the duplicated functionality between findDbgValues and
FindDbgUsers into a new function findDbgIntrinsics.

Reviewed By: jmorse, StephenTozer

Differential Revision: https://reviews.llvm.org/D148788
2023-04-20 14:18:46 +01:00
Florian Hahn
ff0ec4f42e
Recommit "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.

The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.

Original commit message:

    Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
    1) Value2VPValue and 2) VPExternalDefs.

    This duplication is unnecessary and there are already cases where
    external defs are added to Value2VPValue. This patch replaces all uses
    of VPExternalDefs with Value2VPValue.

    It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
    and addVPValue (to addExternalVPValue).

    At the moment, this is NFC, but will enable additional simplifications
    in D147783.

    Depends on D147891.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D147892
2023-04-18 10:29:31 +01:00