52324 Commits

Author SHA1 Message Date
Matt Arsenault
d6f9428e46 GlobalISel: Pass MachineIRBuilder to applyMappingImpl
The target should not have to construct MachineIRBuilders during
RegBankSelect (we should perhaps hide the constructors for it). The
pass should own the builder setup with the desired CSE configuration
(although currently the pass does not use the CSE builder, which is
what I want to fix).

https://reviews.llvm.org/D156479
2023-07-31 10:03:38 -04:00
Vedant Paranjape
259d56d41d [LoopAccessAnalysis] Add a const qualifier to getMaxSafeDepDistBytes()
Add a const qualifier to this API call, since this is a member of
MemoryDepChecker and LoopAccessInfo returns an object of this class as a
const, as follows:

const MemoryDepChecker &getDepChecker() const { return *DepChecker; }

If one tries to use function as follows:

LAI->getDepChecker().getMaxSafeDepDistBytes()

results in the following error:

passing ‘const llvm::MemoryDepChecker’ as ‘this’ argument discards
qualifiers

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D156304
2023-07-31 09:45:01 +00:00
Nikita Popov
41895843b5 [InstCombine] Only perform one iteration
InstCombine is a worklist-driven algorithm, which works roughly
as follows:

* All instructions are initially pushed to the worklist.
  The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
  worklist.
* When the use-count of an instruction decreases, it gets added
  back to the worklist.
* And a few of other heuristics on when we should revisit
  instructions.

On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.

In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:

    "instcombine.NumOneIteration": 411380,
    "instcombine.NumTwoIterations": 117921,
    "instcombine.NumThreeIterations": 236,
    "instcombine.NumFourOrMoreIterations": 2,

The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.

In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.

This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.

This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.

In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).

Differential Revision: https://reviews.llvm.org/D154579
2023-07-31 10:56:49 +02:00
Francesco Petrogalli
c4b21d57bc [llc] Add the command line option -sched-model-force-enable-intervals.
The option is used to force the use of resource intervals
in the machine scheduler, effectively ignoring the value of
`EnableIntervals` in the instance of the `SchedMachineModel`.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D156540
2023-07-31 10:10:18 +02:00
Alexandros Lamprineas
893d3a61c0 Reland [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-31 08:25:48 +01:00
Mel Chen
5962942902 [LV][NFC] Refine comments related to reduction idioms. 2023-07-31 00:06:45 -07:00
Sameer Sahasrabuddhe
d9847cde48 [GlobalISel] convergent intrinsics
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:

- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS

Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D154766
2023-07-31 12:15:39 +05:30
Carl Ritson
2e6530c8e0 [LiveIntervals] Fix comment to match code for getNextValue (NFC)
Comment mentions MIIdx, but the actual parameter is def.
Fix comment, but also rename parameter to Def to match current
coding standards while touching the code.
2023-07-31 14:06:25 +09:00
Lang Hames
7bd481d9af [ORC] Add ExecutionSession::removeJITDylibs (plural), use it in endSession.
The ExecutionSession::removeJITDylibs operation will remove all JITDylibs in
the given list (i.e. first clear them, then remove them from the session).

ExecutionSession::endSession is updated to remove JITDylibs rather than just
clearing them. This prevents new code from being added to any JITDylib once
endSession has been called.
2023-07-30 08:51:42 -07:00
Jay Foad
e2e3f06813 Revert "[MachineScheduler] Track physical register dependencies per-regunit"
This reverts commit 1a54671d5405a39de362e9692ce963c0638023bc.

It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS
build.
2023-07-29 18:05:25 +01:00
Jay Foad
1a54671d54 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Differential Revision: https://reviews.llvm.org/D156552
2023-07-29 15:34:53 +01:00
Nikita Popov
ed23609bc2 [PatternMatch] Do not match constant expressions for binops
Currently, m_Mul() style matchers also match constant expressions.
This is a regular source of assertion failures (usually by trying
to do a match and then cast to Instruction or BinaryOperator) and
infinite combine loops. At the same time, I don't think this provides
useful optimization capabilities (all of the tests affected here are
regression tests for crashes / infinite loops).

Long term, all of these constant expressions (apart from possibly
add/sub) are slated for removal per
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
-- but doing those removals can itself expose new crashes and
infinite loops due to the current PatternMatch behavior.

Differential Revision: https://reviews.llvm.org/D156401
2023-07-29 11:21:22 +02:00
Matt Arsenault
3240ae7034 AMDGPU/GlobalISel: Set dead on scc on manually selected instructions
In SelectionDAG InstrEmitter automatically puts dead flags on unused
physreg defs everywhere. The generated selectors should also set dead
on physreg defs that were not used in the pattern.
2023-07-28 14:14:06 -04:00
Aaron Ballman
1a53b5c367 Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.

Addressing issues found by:
https://lab.llvm.org/buildbot/#/builders/245/builds/11732
https://lab.llvm.org/buildbot/#/builders/187/builds/12251
https://lab.llvm.org/buildbot/#/builders/186/builds/11099
https://lab.llvm.org/buildbot/#/builders/182/builds/6976
2023-07-28 09:41:38 -04:00
Job Noorman
76f023bddf [RISCV] Make mapping symbols SF_FormatSpecific
This ensures that llvm-symbolizer ignores them for symbolization.

Note: unlike aarch64-mapping-symbol.s, the test included here does not
test if the mapping symbols are actually in the symbol table. The reason
is that llvm-mc support for RISC-V mapping symbols (D153260) has not
landed yet, so the mapping symbols simply aren't there. However, D153260
would like to depend on this patch together with D156190 to avoid having
to update a large amount of tests.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D156236
2023-07-28 10:21:55 +02:00
Freddy Ye
c9d92e6638 [X86] Support -march=arrowlake,arrowlake-s,lunarlake
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D156239
2023-07-28 15:05:54 +08:00
Fangrui Song
9ea44c6894 [llvm-objdump] -d: don't display mapping symbols as labels
Similar to D96617 for llvm-symbolizer.

This patch matches the GNU objdump -d behavior to suppress printing
labels for mapping symbols. Mapping symbol names don't convey much
information.

When --show-all-symbols (not in GNU) is specified, we still print
mapping symbols.

Note: the `for (size_t SI = 0, SE = Symbols.size(); SI != SE;)` loops
needs to iterate all mapping symbols, even if they are not displayed.
We use the new field `IsMappingSymbol` to recognize mapping symbols.
This field also enables simplification after D139131.

ELF/ARM/disassemble-all-mapping-symbols.s is enhanced to add `.space 2`.
If `End = std::min(End, Symbols[SI].Addr);` is not correctly set, we
would print a `.word`.

Reviewed By: jhenderson, jobnoorman, peter.smith

Differential Revision: https://reviews.llvm.org/D156190
2023-07-27 20:51:42 -07:00
Fangrui Song
c06a314150 [CSKY] Make mapping symbols SF_FormatSpecific
and omit them from llvm-nm output unless --special-syms is specified,
similar to ARM and AArch64.

This is a prerequisite of D156190 as llvm-objdump will only perform
mapping symbol recognition for SF_FormatSpecific symbols.
2023-07-27 20:37:43 -07:00
Fangrui Song
e09a1b51ab [MCDisassembler] Reorder XCOFF specific constructor parameters. NFC
to prevent overload resolution confusion. In particular, if we add
another parameter to the generic constructor, MCDisassemblerTest.cpp
specified constructors will be resolve to the generic constructor, which
is unintended.
2023-07-27 19:59:26 -07:00
Jun Sha (Joshua)
2b6df4a336 [RISCV] Add codegen support for bf16 vector
This patch adds codegen support for vector with bfloat16 type in llvm backend.
With this patch, Zvbfmin/Zvbfwma instructions as well as vle16/vse16 can generated from newly added bf16 IR intrinsics.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156287
2023-07-28 09:54:23 +08:00
William Huang
66ba71d913 [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.   The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.

The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.

Several changes to note:

(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.

(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )

(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.

(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.

(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)

Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.

Reviewed By: davidxl, snehasish

Differential Revision: https://reviews.llvm.org/D147740
2023-07-27 23:08:27 +00:00
Derek Schuff
1b21067cf2 [WebAssembly][Objcopy] Write output section headers identically to inputs
Previously when objcopy generated section headers, it padded the LEB
that encodes the section size out to 5 bytes, matching the behavior of
clang. This is correct, but results in a binary that differs from the
input. This can sometimes have undesirable consequences (e.g. breaking
source maps).

This change makes the object reader remember the size of the LEB
encoding in the section header, so that llvm-objcopy can reproduce it
exactly. For sections not read from an object file (e.g. that
llvm-objcopy is adding itself), pad to 5 bytes.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D155535
2023-07-27 15:43:51 -07:00
Douglas Yung
32683b231e Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
This reverts commit 96ff464dd3aac255adc52787a1e28487a9cd4c35.

The test in this change was failing on many buildbots:

https://lab.llvm.org/buildbot/#/builders/164/builds/41292
https://lab.llvm.org/buildbot/#/builders/258/builds/4491
https://lab.llvm.org/buildbot/#/builders/192/builds/3566
https://lab.llvm.org/buildbot/#/builders/123/builds/20411
https://lab.llvm.org/buildbot/#/builders/58/builds/42553
https://lab.llvm.org/buildbot/#/builders/247/builds/7037
https://lab.llvm.org/buildbot/#/builders/139/builds/46259
https://lab.llvm.org/buildbot/#/builders/216/builds/24650
https://lab.llvm.org/buildbot/#/builders/234/builds/12571
https://lab.llvm.org/buildbot/#/builders/232/builds/12574
https://lab.llvm.org/buildbot/#/builders/235/builds/975
2023-07-27 13:47:52 -07:00
Alexandros Lamprineas
96ff464dd3 [FuncSpec] Add Phi nodes to the InstCostVisitor.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.

In addition to the last revision this one fixes the bug reported on
Phabricator.

Differential Revision: https://reviews.llvm.org/D154852
2023-07-27 19:24:11 +01:00
spupyrev
bc59faa863 A new code layout algorithm for function reordering [2/3]
We are bringing a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.

The algorithm shares some similarities with C^3 and an approach for basic block
reordering (ext-tsp). It works with chains (ordered lists)
of functions. Initially all chains are isolated functions. On every iteration,
we pick a pair of chains whose merging yields the biggest increase in the
objective, which is a weighted combination of frequency-based and distance-based
locality. That is, we try to co-locate hot functions together (so they can share
the cache lines) and functions frequently executed together. The merging process
stops when there is only one chain left, or when merging does not improve the
objective. In the latter case, the remaining chains are sorted by density in the
decreasing order.

**Complexity**
We regularly apply the algorithm for large data-center binaries containing 10K+
(hot) functions, and the algorithm takes only a few seconds. For some extreme
cases with 100K-1M nodes, the runtime is within minutes.

**Perf-impact**
We extensively tested the implementation extensively on a benchmark of isolated
binaries and prod services. The impact is measurable for "larger" binaries that
are front-end bound: the cpu time improvement (on top of C^3) is in the range
of [0% .. 1%], which is a result of a reduced i-TLB miss rate (by up to 20%) and
i-cache miss rate (up to 5%).

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D152834
2023-07-27 09:20:53 -07:00
Cyndy Ishida
27459a3a2b [TextAPI] Update missing enum cases & utility functions
* Expand understood `FileType`s that InterfaceFile class can represent.
* Add `hasTarget` function.
* Cleanup symbol `<` comparator to account for SymbolSet operations.
2023-07-27 08:24:42 -07:00
Nikita Popov
70aca7b122 [InstCombine] Explicitly track dead edges
This allows us to handle dead blocks with multiple incoming edges,
where we can determine that all of those edges are dead (or cycles).

This allows InstCombine to handle certain dead code patterns that
can be produced by LoopVectorize in a single iteration.

This is in preparation for D154579.
2023-07-27 16:41:03 +02:00
Jay Foad
2dcf051259 [CodeGen] Store call frame size in MachineBasicBlock
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156113
2023-07-27 10:32:00 +01:00
Sameer Sahasrabuddhe
7c760b224b Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556

This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
2023-07-27 14:49:17 +05:30
Vitaly Buka
a496c8be6e Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting"
And dependent commits.

Details in D150388.

This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.

No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

Reviewed By: alexfh

Differential Revision: https://reviews.llvm.org/D156381
2023-07-26 22:13:32 -07:00
Sameer Sahasrabuddhe
d0f7850b01 Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR"
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.

The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe
baa3386edb [GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.

Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().

Reviewed By: arsenm, Pierre-vh

Differential Revision: https://reviews.llvm.org/D155556
2023-07-27 10:00:45 +05:30
Sameer Sahasrabuddhe
b14e30f10d [LLVM] refactor GenericSSAContext and its specializations
Fix the GenericSSAContext template so that it actually declares all the
necessary typenames and the methods that must be implemented by its
specializations SSAContext and MachineSSAContext.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156288
2023-07-27 09:54:50 +05:30
Jeff Niu
e76ac8074f [llvm][orc] Consider other ELF init sections as well
ELF object files can contain `.ctors` and `.dtors` sections that also
participate as initializers.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D154802
2023-07-26 13:44:41 -07:00
Jessica Del
93dc66a289 [AMDGPU] - Mark inverse.ballot as not convergent
`inverse.ballot` checks if a cc bit is set for the current
lane. Therefore, it is not convergent.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D156088
2023-07-26 19:52:52 +02:00
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Craig Topper
e5df0481f0 [FunctionSpecialization] Use SmallVector::operator== to simplify some code. NFC
Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D156260
2023-07-26 09:46:37 -07:00
Ivan Kosarev
e9df4c9892 [ADT] Support iterating size-based integer ranges.
It seems the ranges start with 0 in most cases.

Reviewed By: dblaikie, gchatelet

Differential Revision: https://reviews.llvm.org/D156135
2023-07-26 16:28:41 +01:00
Alexandros Lamprineas
c52ab9ea2f Revert "[FuncSpec] Add Phi nodes to the InstCostVisitor."
Reverting due to the crash reported in D154852.

Also reverting the subsequent commit as collateral damage:

"[FuncSpec] Split the specialization bonus into CodeSize and Latency."
2023-07-26 12:33:41 +01:00
Alexandros Lamprineas
20c8f58c11 [FuncSpec] Split the specialization bonus into CodeSize and Latency.
Currently we use a combined metric TargetTransformInfo::TCK_SizeAndLatency
when estimating the specialization bonus. This is suboptimal, and in some
cases erroneous. For example we shouldn't be weighting the codesize decrease
attributed to constant propagation by the block frequency of the dead code.
Instead only the latency savings should be weighted by block frequency. The
total codesize savings from all the specialization arguments should be
deducted from the specialization cost.

Differential Revision: https://reviews.llvm.org/D155103
2023-07-26 12:03:46 +01:00
Johannes Doerfert
b3fec1067a [Attributor] Improve NonNull deduction
We can improve our deduction if we stop at PHI and select instructions
and also iterate the returned values explicitly. The latter helps with
isImpliedByIR deductions.
2023-07-25 20:31:21 -07:00
Kai Luo
11a02de782 [JITLink][PowerPC] Change method to check if a symbol is external to current object
After PrePrunePass `claimOrExternalizeWeakAndCommonSymbols`, a defined symbol might become external. So determine a function call is external or not when building the linkgraph is not accurate. This largely affects updating TOC pointer on PowerPC. TOC pointer is supposed to be the same in one object file(if no mulitple TOC appears) and is updated when control flow transferred to another object file.

This patch defers checking a function call is external or not, in `buildTables_ELF_ppc64` which is a PostPrunePass.

This patch fixes failures when `jitlink -orc-runtime=/path/to/libort_rt.a` is used.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D155925
2023-07-26 03:20:56 +00:00
Weining Lu
c56514f21b Reland "[LoongArch] Support -march=native and -mtune="
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.

D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.

A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.

`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.

And these two preprocessor macros are defined:
- __loongarch_arch
- __loongarch_tune

[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc

Reviewed By: xen0n, wangleiat

Differential Revision: https://reviews.llvm.org/D155824
2023-07-26 10:26:38 +08:00
Fangrui Song
4553dc46a0 [Support] Rewrite GlobPattern
The current implementation has two primary issues:

* Matching `a*a*a*b` against `aaaaaa` has exponential complexity.
* BitVector harms data cache and is inefficient for literal matching.

and a minor issue that `\` at the end may cause an out of bounds access
in `StringRef::operator[]`.

Switch to an O(|S|*|P|) greedy algorithm instead: factor the pattern
into segments split by '*'. The segment is matched sequentianlly by
finding the first occurrence past the end of the previous match. This
algorithm is used by lots of fnmatch implementations, including musl and
NetBSD's.

In addition, `optional<StringRef> Exact, Suffix, Prefix` wastes space.
Instead, match the non-metacharacter prefix against the haystack, then
match the pattern with the rest.

In practice `*suffix` style patterns are less common and our new
algorithm is fast enough, so don't bother storing the non-metacharacter
suffix.

Note: brace expansions (D153587) can leverage the `matchOne` function.

Differential Revision: https://reviews.llvm.org/D156046
2023-07-25 18:46:55 -07:00
Johannes Doerfert
4223c9b354 [Attributor] Always deduce nosync from readonly + non-convergent
This adds the deduction also if the function is not IPO amendable.
2023-07-25 17:47:33 -07:00
Fangrui Song
6a684dbc44 [Support] Remove llvm::is_trivially_{copy/move}_constructible
This restores D132311, which was reverted in
29c841ce93e087fa4e0c5f3abae94edd460bc24a (Sep 2022) due to certain files
not buildable with GCC 7.3.0. The previous attempt was reverted by
6cd9608fb37ca2418fb44b57ec955bb5efe10689 (Dec 2020).

This time, GCC 7.3.0 has existing build errors for a long time due to
structured bindings for many files, e.g.

```
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9098:13: error: cannot decompose class type ‘std::pair<llvm::Value*, const llvm::SCEV*>’: both it and it
s base class ‘std::pair<llvm::Value*, const llvm::SCEV*>’ have non-static data members
   for (auto [_, Stride] : Legal->getLAI()->getSymbolicStrides()) {
             ^~~~~~~~~~~
```

... and also some `error: duplicate initialization of` instances due to llvm/Transforms/IPO/Attributor.h.

---

GCC 7.5.0 has a bug that, without this change, certain `SmallVector` with a `std::pair` element type like `SmallVector<std::pair<Instruction * const, Info>, 0> X;` lead to spurious

```
/tmp/opt/gcc-7.5.0/include/c++/7.5.0/type_traits:878:48: error: constructor required before non-static data member for ‘...’ has been parsed
```

Switching to std::is_trivially_{copy/move}_constructible fixes the error.
2023-07-25 17:21:16 -07:00
Shubham Sandeep Rastogi
b66b176dc3 Emit DW_RLE_base_addressx + DW_RLE_offset_pairs
instead of DW_ELE_start_length in debug_rnglists section

This patch tries to reduce the size of the debug_rnglist section by
replacing the DW_RLE_start_length opcodes currently emitted by dsymutil
in favor of using DW_RLE_base_addressx + DW_RLE_offset_pair instead.

The DW_RLE_start_length is one AddressSize followed by a ULEB per entry,
whereas, the DW_RLE_base_addressx + DW_RLE_offset_pair will use one ULEB
for the base address, and then the DW_RLE_offset_pair is a pair of
ULEBs. This will be more efficient.

Differential Revision: https://reviews.llvm.org/D156166
2023-07-25 16:17:53 -07:00
Fangrui Song
1b162fabe8 [Support] Change SetVector's default template parameter to SmallVector<*, 0>
Similar to D156016 for MapVector.

This brings back commit fae7b98c221b5b28797f7b56b656b6b819d99f27 with a
fix to llvm/unittests/Support/ThreadPool.cpp's `_WIN32` code path.
2023-07-25 13:13:35 -07:00
Johannes Doerfert
08a220764b Reapply "[OpenMP] Add the ompx_attribute clause for target directives"
This reverts commit 0d12683046ca75fb08e285f4622f2af5c82609dc and
reapplies ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 with an extension to
fix the Flang build.

Differential Revision: https://reviews.llvm.org/D156184
2023-07-25 10:40:35 -07:00
Thomas Köppe
6f1395a1fe [llvm-objcopy] --set-section-flags: allow "large" to add SHF_X86_64_LARGE
Currently, objcopy cannot set the new flag SHF_X86_64_LARGE. This change introduces the named flag "large" which translates to that section flag.

An "invalid argument" error is produced if a user attempts to set the flag on an architecture other than X86_64.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D153262
2023-07-25 09:47:02 -07:00