This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.
It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
The result is a data bag, this makes sure it's signaled to a user that
the data can't be mutated when, for example, doing something like:
auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F)
...
R.Uses++
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.
This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D123951
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
The condition should be 'ArgParts.size() > MaxElements', so that if we
have exactly 3 elements in the 'ArgParts' vector, the promotion should
be allowed because the 'MaxElement' threshold is not exceeded yet.
The default value for 'MaxElement' has been decreased to 2 in order
to avoid an actual change in argument promoting behavior. However,
this changes byval argument transformation behavior by allowing
adding not more than 2 arguments to the function instead of 3 allowed
before.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D124178
Some loop counters ('i', 'e') and variables ('type') were named not
in accordance with the code style and clang-tidy issues warnings
about the using of such variables. This patch renames the variables
and fixes some typos in the comments within the source file.
Differential Revision: https://reviews.llvm.org/D123662
Reland 3f2b76ec90b5f108272a3072a1345ba55d8ec75b with the test corrected
to require x86-registered-target.
Differential Revision: https://reviews.llvm.org/D120169
Reland commit 74273d575f9938d751a1c67862cffe553fe2de8b following a fix
for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr.
Differential Revision: https://reviews.llvm.org/D120169
When using opaque pointers, convert GEPs into offset representation
of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset.
This allows us to recognize equivalent address calculations even if
the GEPs don't use the same source element type.
This fixes an opaque pointer codegen regression seen in rustc.
Differential Revision: https://reviews.llvm.org/D124527
They can already be available, and even if not, DT/LI can be available.
We should not recompute them. Old PM is unchanged because it would
require changing dependencies, and we don't care enough about it.
Differential Revision: https://reviews.llvm.org/D124439
Reviewed By: nikic, aeubanks
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove ThreadSanitizerLegacyPass.
Reviewed By: #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D124209
The current testcase I'm trying to reduce only reproduces with IPRA
enabled and requires handling multiple functions.
The only real difference vs. the IR is the extra indirect to look for
the underlying MachineFunction, so treat the ReduceWorkItem as the
module instead of the function.
The ugliest piece of this is really the ugliness of
MachineModuleInfo. It not only tracks actual module state, but has a
number of transient fields used for isel and/or the asm printer. These
shouldn't do any harm for the use here, though they should be
separated out.
Right now, if we want to dump symbol at specified offset, we need to use `grep`.
And it can only show surrounding symbols in layout (not in lexical scope sense).
This adds similar options to `dump` command as `llvm-dwarfdump` to allow users
to dump symbol record at specified offset and its parents or children with
spcified depth.
`--symbol-offset=` must be used with `--modi` to dump only one symbol at given
offset.
`--show-parents`/`--show-children` must be used with `--symbol-offset` to
dump all symbols that are parents/children of the symbol at given offset.
`--parent-recurse-depth`/`--children-recurse-depth` must be used with
`--show-parents`/`--show-children` to specify the max up/down depth.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D124317
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs)
to virtual registers. When processing gc.relocate instructions we have to
generate CopyFromRegs node and then export it to VReg again if gc.relocate
is used in other basic blocks. This results in generation of extra COPY MIR
instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate
result is used in other blocks.
This patch changes this behavior to export statepoint results only if used
in other basic blocks. For local uses StatepointLoweringState.(get|set)Location()
API is used to communicate appropriate statepoint result from `LowerStatepoint()`
to `visitGCRelocate()`
This is NFC and is purely compile time optimization. On big methids it can improve
codegen compile time up to 10%.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D124444
This relands commit 8f550368b169b0a3d4ebc3e65efec49ec1e0798a.
The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.
Differential Revision: https://reviews.llvm.org/D120169
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.
Reviewers: @StephenTozer, @djtodoro
Differential Revision: https://reviews.llvm.org/D120169
This adds some basic fptosi_sat and fptoui_sat target independent cost
modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the
value, followed by a fp convert. The signed values then have an
additional fcmp+select for handling Nan correctly.
The AArch64/Arm costs may be more incorrect, as the instruction exist
natively. This can be fixed with target specific cost updates.
Differential Revision: https://reviews.llvm.org/D124269
Rather than listing these by hand, include all enum attribute
keywords from Attributes.inc. This reduces the number of places
one has to update whenever an enum attribute is added.
Differential Revision: https://reviews.llvm.org/D124465
This allows for navigating to included files on click, and also provides hover
information about the include file (similarly to clangd).
Differential Revision: https://reviews.llvm.org/D124077
This diff factors out the check "isCrash" from the static method "throwIfCrash".
This is a helper function that can be useful in debugging / analysis, in particular,
I'm planning to use it in the future patches for lld-fuzzer.
Test plan:
1/ ninja check-all
2/ export LLD_IN_TEST=5 ninja check-lld
Differential revision: https://reviews.llvm.org/D124414
This is a follow on to the revert of D84265 to add an error if we'd need
to write a non-zero visibility type in the xcoff object file. We can't
currently do that because we lack the auxilary header to interpret the
bits in XCOFF32. This is important because visibility is being enabled
in the assembly writing path, and without this error the visibility
could be silently ignored.
Differential Revision: https://reviews.llvm.org/D124392
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
This continues the push away from hard-coded knowledge about functions
towards attributes. We'll use this to annotate free(), realloc() and
cousins and obviate the hard-coded list of free functions.
Differential Revision: https://reviews.llvm.org/D123083
Every non-testcase use of OutputBuffer contains code to allocate an
initial buffer (using either 128 or 1024 as initial guesses). There's
now no need to do that, given recent changes to the buffer extension
heuristics -- it allocates a 1k(ish) buffer on first need.
Just pass in a buffer (if any) to the constructor. Thus the
OutputBuffer's ownership of the buffer starts at its own lifetime
start. We can reduce the lifetime of this object in several cases.
That new constructor takes a 'size_t *' for the size argument, as all
uses with a non-null buffer are passing through a malloc'd buffer from
their own caller in this manner.
The buffer reset member function is never used, and is deleted.
The original buffer initialization code would return a failure code if
that first malloc failed. Existing code either ignored that, called
std::terminate with a FIXME, or returned an error code.
But that's not foolproof anyway, as a subsequent buffer extension
failure ends up calling std::terminate. I am working on addressing
that unfortunate failure mode in a manner more consistent with the C++
ABI design.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D122604
llvm-gsymutil has an implementation of AddressRange and AddressRanges
classes. That implementation might be reused in other parts of llvm.
This patch moves AddressRange and AddressRanges classes into llvm/ADT.
Differential Revision: https://reviews.llvm.org/D124350
This change introduces a new intrinsic, `llvm.is.fpclass`, which checks
if the provided floating-point number belongs to any of the the specified
value classes. The intrinsic implements the checks made by C standard
library functions `isnan`, `isinf`, `isfinite`, `isnormal`, `issubnormal`,
`issignaling` and corresponding IEEE-754 operations.
The primary motivation for this intrinsic is the support of strict FP
mode. In this mode using compare instructions or other FP operations is
not possible, because if the value is a signaling NaN, floating-point
exception `Invalid` is raised, but the aforementioned functions must
never raise exceptions.
Currently there are two solutions for this problem, both are
implemented partially. One of them is using integer operations to
implement the check. It was implemented in https://reviews.llvm.org/D95948
for `isnan`. It solves the problem of exceptions, but offers one
solution for all targets, although some can do the check in more
efficient way.
The other, implemented in https://reviews.llvm.org/D96568, introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects a target
specific code into IR to implement `isnan` and some other functions. It is
convenient for targets that have dedicated instruction to determine FP data
class. However using target-specific intrinsic complicates analysis and can
prevent some optimizations.
A special intrinsic for value class checks allows representing data class
tests with enough flexibility. During IR transformations it represents the
check in target-independent way and saves it from undesired transformations.
In the instruction selector it allows efficient lowering depending on the
used target and mode.
This implementation is an extended variant of `llvm.isnan` introduced
in https://reviews.llvm.org/D104854. It is limited to minimal intrinsic
support. Target-specific treatment will be implemented in separate
patches.
Differential Revision: https://reviews.llvm.org/D112025
This is a simple datatype with a few JSON utilities, and is independent
of the underlying executor. The main motivation is to allow taking a
dependency on it on the AOT side, and allow us build a correctly-sized
buffer in the cases when the requested feature isn't supported by the
model. This, in turn, allows us to grow the feature set supported by the
compiler in a backward-compatible way; and also collect traces exposing
the new features, but starting off the older model, and continue
training from those new traces.
Differential Revision: https://reviews.llvm.org/D124417
As implemented this patch assumes that Typed pointer support remains in
the llvm::PointerType class, however this could be modified to use a
different subclass of llvm::Type that could be disallowed from use in
other contexts.
This does not rely on inserting typed pointers into the Module, it just
uses the llvm::PointerType class to track and unique types.
Fixes#54918
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D122268
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove AddressSanitizerLegacyPass...
...,
ModuleAddressSanitizerLegacyPass, and ASanGlobalsMetadataWrapperPass.
MemorySanitizerLegacyPass was removed in D123894.
AddressSanitizerLegacyPass was removed in D124216.
Reviewed By: #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D124337