This is to enable a transition of LoopIdiomRecognize to selecting the
llvm.experimental.memset.pattern intrinsic as requested in #118632 (as
opposed to supporting selection of the libcall or the intrinsic). As
such, although it _is_ a TODO to add costing considerations on whether
to lower to the libcall (when available) or expand directly, lacking
such logic is helpful at this stage in order to minimise any unexpected
code gen changes in this transition.
Adds `mapa` and `mapa.shared.cluster` MLIR Ops to generate mapa
instructions.
`mapa` - Map the address of the shared variable in the target CTA.
- `mapa` - source is a register containing generic address pointing to
shared memory.
- `mapa.shared.cluster` - source is a shared memory variable or a
register containing a valid shared memory address.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-mapa
This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in
https://github.com/llvm/llvm-project/pull/121619) and
LowerAllowCheckPass<cutoffs> (introduced in
https://github.com/llvm/llvm-project/pull/124211).
The net effect is that -fsanitize-skip-hot-cutoff now combines the
functionality of -ubsan-guard-checks and
-lower-allow-check-percentile-cutoff (though this patch does not remove
those yet), and generalizes the latter to allow per-sanitizer cutoffs.
Note: this patch replaces Intrinsic::allow_ubsan_check's
SanitizerHandler parameter with SanitizerOrdinal; this is necessary
because the hot cutoffs are specified in terms of SanitizerOrdinal
(e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch).
Likewise, CodeGenFunction::EmitCheck is changed to emit
allow_ubsan_check() for each individual check.
---------
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
This patch fixes two bugs:
https://github.com/llvm/llvm-project/issues/41488https://github.com/llvm/llvm-project/issues/53942
The dependence analysis assumes that the base address of array accesses
is invariant across loop iterations. In both bugs the base address
evolves following loop iterations: the base address flip-flops between
two different memory objects.
The patch uses the cross iteration mode of alias analysis to disambiguate the
base objects.
In the discussion around #116792, @rjmccall mentioned that ARCMigrate
has been obsoleted and that we could go ahead and remove it from Clang,
so this patch does just that.
Invalidation is already handled in the passes loop for MFAM, so all of
the rest analyses are preserved. (See `PassManager::run()`)
This won't change the number of invalidations, but will prevent needless
`MFAM::Invalidator::invalidate()` invocations made by results depending
on other results (since the invalidate shorts if `<AllAnalysesOn<MF>>`
is preserved)
I'm guessing based on tablegen definitions. I also don't
really understand how this could have been missing.
This defends against regressions in a future peephole-opt
patch.
Previously this combine would undo AMDGPU's new custom legalization of
wide vector shuffles into 2 element pieces. The comment also
states that this combine is only done before legalization,
but the case with a build_vector source was unconditional.
We probably don't want to do this if the multiple uses are full
scalarization of the vector, but this seems to work well enough.
Scalarizing extracts should have folded out pre-legalize.
This reapplies 4f0325873fa (and follow up patches 26fc07d5d88, a001cc0e6cdc,
c9bc242e387, and fd174f0ff3e), which were reverted in 212cdc9a377 to
investigate bot failures (e.g.
https://lab.llvm.org/buildbot/#/builders/108/builds/8502)
The fix to address the bot failures was landed in d0052ebbe2e. This patch also
restricts construction of the UnwindInfoManager object to Apple platforms (as
it won't be used on other platforms).
When we apply cloning decisions in the ThinLTO backend, we need to find
the corresponding summary for each function in the IR, and in some cases
for callee functions. This is complicated when the function was a
promoted local, in which case the GUID was formed from the hash of the
original source file prepended to the function name. Those functions
can be identified by the fact that they were given a ".llvm." suffix
during promotion.
We previously didn't do this correctly for promoted locals imported from
other modules, as we only tried the current module source name. This led
to crashes, in particular when the current module also had an local
function of the same original name. In particular, we were attempting to
iterate through the wrong summary's callsites, and there were fewer than
in the actual function so we accessed data off the end (in a release
build with assertion checking off - with assertion checking on we double
check the stack ids and that would have failed). Even if we hadn't
crashed or hit an assert, we could have applied the wrong cloning
decisions, leading to unsats at link time.
Luckily, function importing attaches thinlto_src_file metadata
containing the original source file name to all imported functions. It
normally doesn't do this by default, however, it always does if MemProf
context disambiguation is enabled. Therefore, we can just look to see if
the function contains this metadata and if so use it to recreate the
original GUID.
A similar issue can occur when looking for the ValueInfo / GUID of
a direct tail call to see if we synthesized a callsite record for a
missing tail call frame. In that case, the callee function may be a
declaration, if we imported its caller but not the callee function
definition. Because imported declarations don't get the thinlto_src_file
metadata, we instead look at its caller (which works because this
happens very early in the backend before any inlining).
This patch extends 71d4f34 to all trig functions that take complex
arguments. On AIX, the `libm` routines are called in compile time
folding instead of the STL routines.
A DeclRefExpr could never refer to a Decomposition in valid C++ code,
but somehow the Analyzer creates these entities and then it tries to
print them.
There is no sensible answer here, so we print 'decomposition' followed
by the names of all of its bindings, separated by dashes.
Using ccache relies on the GitHub Actions Cache, which may be
susceptible to cache poisoning. See
https://adnanthekhan.com/2024/05/06/the-monsters-in-your-build-cache-github-actions-cache-poisoning/
Even though these attacks may be difficult, it's better to err on the
side of caution and ensure that the build environment for our releases
is as isolated as possible. Additionally, ccache was only being used for
the stage1 build, which is a small part of the overall build, so the
speed up from using it was not that large.
Implement the integer range inference niterface for memref.dim and
tetnor.dim using shared code. The inference will infer the `dim` of
dynamic dimensions to [0, index_max] and take the union of all the
dimensions that the `dim` argument could be validly referring to.
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.
Currently supported annotations are:
- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
The only architecture currently being supported (still a WIP) is
x86_64. Other UEFI triples targeting other architectures will now
report an `unknown target triple` error.
This patch initializes AllocInfoIter and CallSitesIter to their
respective end(). I'm doing this not because I'm worried about
uninitialized iterators, but because the resulting code looks shorter
and makes it clear which data structure each iterator is associated
with.
Reverts llvm/llvm-project#124129 as its currently causing a regression at #124499 - avoids the regression until a proper fix can be added to getSpillCost
Update to use getConstShapeValue to collect shape information along the
graph.
Change-Id: Ic6fc2341e3bcfbec06a1d08986e26dd08573bd9c
Co-authored-by: TatWai Chong <tatwai.chong@arm.com>
Modify the DA pretty printer to match the output of other analysis
passes. This enables update_analyze_test_checks.py to also work on DA
tests. Auto generate all the Dependence Analysis tests.
This patch addresses post commit review comments from #124859.
The extra compile definition is not necessary and goes against the
effort to separate the runtimes from the flang compiler itself. The
function declaration for `CUFInit` can be accessed anyway since the
header are always present. The insertion of the call is only based on
the language feature options from the folding context.
A program compiled with cuda enabled but no cufruntime would just fail
at link time as expected.
Parse METADIRECTIVE as a standalone executable directive at the moment.
This will allow testing the parser code.
There is no lowering, not even clause conversion yet. There is also no
verification of the allowed values for trait sets, trait properties.
This is an implementation of P1061 Structure Bindings Introduce a Pack
without the ability to use packs outside of templates. There is a couple
of ways the AST could have been sliced so let me know what you think.
The only part of this change that I am unsure of is the
serialization/deserialization stuff. I followed the implementation of
other Exprs, but I do not really know how it is tested. Thank you for
your time considering this.
---------
Co-authored-by: Yanzuo Liu <zwuis@outlook.com>
This refactors the standard stream implementation in multiple ways:
- The streams are now `stream_data` structs, which contain all the data
required for a stream
- The windows mangling is generated via a macro instead of having magic
strings for the different streams. (i.e. it's now only partially magic)
This fixes instantiation of definition for friend function templates,
when the declaration found and the one containing the definition have
different template contexts.
In these cases, the the function declaration corresponding to the
definition is not available; it may not even be instantiated at all.
So this patch adds a bit which tracks which function template
declaration was instantiated from the member template. It's used to find
which primary template serves as a context for the purpose of
obtainining the template arguments needed to instantiate the definition.
Fixes#55509
Relanding patch, with no changes, after it was reverted due to revert of
commit this patch depended on.