This PR include two changes:
1. Change debuginfod cache file name to include origin file name, the
new file name would be something like:
llvmcache-13267c5f5d2e3df472c133c8efa45fb3331ef1ea-liblzma.so.5.2.2.debuginfo.dwp
So it will provide more information in image list instead of a plain
llvmcache-123
2. Switch debuginfod cache to use lldb index cache settings. Currently
we don't have proper settings for setting the cache path or the cache
expiration time for debuginfod cache. We want to use the lldb index
cache settings, as they make sense to be in the same place and have the
same TTL.
---------
Co-authored-by: George Hu <georgehuyubo@gmail.com>
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.
A subsequent patch will address deserialization.
(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
* Construct frequently-accessed TimerLock/DefaultTimerGroup early to
reduce overhead.
* Rename `aquireDefaultGroup` to `acquireTimerGlobals` and restore
ManagedStatic::claim. https://reviews.llvm.org/D76099
* Drop mtg::. We use internal linkage, so mtg:: is unneeded and might
mislead users. In addition, llvm/ code almost never introduces a named
namespace not in llvm::. Drop mtg::.
* Replace some unique_ptr with optional to reduce overhead.
* Switch to `functionName()`.
* Simplify `llvm::initTimerOptions` and `TimerGroup::constructForStatistics()`
Pull Request: https://github.com/llvm/llvm-project/pull/122429
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.
## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics
## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.
* These intrinsics optionally support multicast and cache_hints,
as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionally when the mask
allows is slightly questionable, but seems to work out okay in practice.
As a bit of context, it's helpful to realize that we have existing logic
in both DAGCombine and InstCombine which mutates the element width of in
an analogous manner. However, that code has two restriction which
prevent it from handling the motivating cases here. First, it only
triggers if there is a bitcast involving a different element type.
Second, the matcher used considers a partially undef wide element to be
a non-match. I considered trying to relax those assumptions, but the
information loss for undef in mid-level opt seemed more likely to open a
can of worms than I wanted.
Two instances of `PointerUnion` with different active members and null
value compare unequal. Currently, this results in counterintuitive
behavior when using functions from `Casting.h`, e.g.:
```C++
PointerUnion<int *, float *> U;
// U = (int *)nullptr;
dyn_cast<int *>(U); // Aborts
dyn_cast<float *>(U); // Aborts
U = (float *)nullptr;
dyn_cast<int *>(U); // OK
dyn_cast<float *>(U); // OK
```
`dyn_cast` should abort in all cases because the argument is null.
Currently, it aborts only if the first member is active. This happens
because the partial template specialization of `ValueIsPresent` for
nullable types compares the union with a union constructed from nullptr,
and the two unions compare equal only if their active members are the
same.
This patch changed the specialization of `ValueIsPresent` for nullable
types to make `isPresent()` return false for all possible null values of
a PointerUnion, and fixes two places where the old behavior was
exploited.
Pull Request: https://github.com/llvm/llvm-project/pull/121847
DeferredAAs should only capture bootstrap actions, but after 30b73ed7bd it was
capturing all actions, including those from other plugins. This is problematic
as other plugins may introduce actions that need to run before the platform
actions (e.g. on arm64e we need pointer signing to run before we access any
global pointers in the graph).
Note that this effectively undoes 30b73ed7bd, which was a buggy attempt to
synchronize writes to the DeferredAAs vector. This patch fixes that issue the
obvious way by locking the bootstrap mutex while accessing the DeferredAAs
vector.
No testcase yet: So far I've only seen this fail during bootstrap of arm64e
JIT'd programs.
The `EraseCallbackID` field is not always initialized in the ctor for
SeedCollector; if not, it will be used uninitialized by its dtor. This
could potentially lead to the erasure of a random callback, leading to a
bug.
Fixed by making `CallbackID` an opaque type, which is always
default-initialized to an invalid ID.
This reapplies 6d72bf47606, which was reverted in 57447d3ddf to investigate
build failures, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/10114.
The original patch contained an invalid unused friend declaration of
std::make_shared. This has been removed.
Also adds a new IdleTask type and updates DynamicThreadPoolTaskDispatcher to
schedule IdleTasks whenever the total number of threads running is less than
the maximum number of MaterializationThreads.
A SimpleLazyReexportsSpeculator instance maintains a list of speculation
suggestions ((JITDylib, Function) pairs) and registered lazy reexports. When
speculation opportunities are available (having been added via
addSpeculationSuggestions or when lazy reexports were created) it schedules
an IdleTask that triggers the next speculative lookup as soon as resources
are available. Speculation suggestions are processed first, followed by
lookups for lazy reexport bodies. A callback can be registered at object
construction time to record lazy reexport executions as they happen, and these
executions can be fed back into the speculator as suggestions on subsequent
executions.
The llvm-jitlink tool is updated to support speculation when lazy linking is
used via three new arguments:
-speculate=[none|simple] : When the 'simple' value is specified a
SimpleLazyReexportsSpeculator instances is used
for speculation.
-speculate-order <path> : Specifies a path to a CSV containing
(jit dylib name, function name) triples to use
as speculative suggestions in the current run.
-record-lazy-execs <path> : Specifies a path in which to record lazy function
executions as a CSV of (jit dylib name, function
name) pairs, suitable for use with
-speculate-order.
The same path can be passed to -speculate-order and -record-lazy-execs, in
which case the file will be overwritten at the end of the execution.
No testcase yet: Speculative linking is difficult to test (since by definition
execution behavior should be unaffected by speculation) and this is an new
prototype of the concept*. Tests will be added in the future once the interface
and behavior settle down.
* An earlier implementation of the speculation concept can be found in
llvm/include/llvm/ExecutionEngine/Orc/Speculation.h. Both systems have the
same goal (hiding compilation latency) but different mechanisms. This patch
relies entirely on information available in the controller, where the old
system could receive additional information from the JIT'd runtime via
callbacks. I aim to combine the two in the future, but want to gain more
practical experience with speculation first.
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.
This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.
[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.
## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics
## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
This is intended to help with flang `-ftime-report` support:
- #107270.
With this change, I was able to cherry-pick #107270, uncomment
`llvm::TimePassesIsEnabled = true;` and compile with `-ftime-report`.
I also noticed that `clang/lib/Driver/OffloadBundler.cpp` was statically
constructing a `TimerGroup` and changed it to lazily construct via
ManagedStatic.
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.
This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.
```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```
Note: The tests require the VFABI changes from #119000 to pass.
This moves combineAAMetadata() into Local and implements it via a new
AAOnly flag, which will intersect only AA metadata and keep other known
metadata.
The existing KnownIDs list is dropped, because it is redundant with the
switch in combineMetadata(), which already drops unknown metadata.
I tried a few variants of this, and ultimately went with the AAOnly flag
because this way we make an explicit choice for each metadata kind
supported by combineMetadata(), and ignoring the flag gives you
conservatively correct behavior.
I checked that the memcpy tests still pass if we adjust the logic for
MD_memprof/MD_callsite to drop the metadata instead of arbitrarily
picking one.
Fixes https://github.com/llvm/llvm-project/issues/121495.
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the
buffer load docs under DirectX/DXILResources.
This resolves the "load" parts of #106188
Threads created by DynamicThreadPoolTaskDispatcher::dispatch had been holding a
unique_ptr to the most recent Task, meaning that the Task would be destroyed
when the thread object was destroyed, but this would happen *after* the thread
signaled the Dispatcher that it was finished. This could cause
DynamicThreadPoolTaskDispatcher::shutdown to return (and consequently
ExecutionSession to be destroyed) before all Tasks were destroyed, with Task
destructors accessing ExecutionSession and related objects after they were
freed.
The fix is to reset the Task pointer immediately after it is run to trigger
cleanup, *then* (if there are no other tasks to run) signal the Dispatcher that
the thread is finished.
This patch also updates DynamicThreadPoolTaskDispatcher::dispatch to reject any
new Tasks dispatched after DynamicThreadPoolTaskDispatcher::shutdown is called.
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
While VALUES is not actually used by LLVM_MAKE_OPT_ID_WITH_ID_PREFIX
threading the correct value through is clearer and avoids the potential
for strange bugs if this ever changes.
Embedded development often needs to use a different C standard library,
replacing the existing one normally passed as -internal-externc-isystem.
This works fine for an apple-macos target, but apple-none-macho doesn't
work because the MachO driver doesn't implement
AddClangSystemIncludeArgs to add the resource directory as
-internal-isystem like most other drivers do. Move most of the search
path logic from Darwin and DarwinClang down into an AppleMachO toolchain
between the MachO and Darwin toolchains.
Also define __MACH__ for apple-none-macho, as Swift expects all MachO
targets to have that defined.
As mentioned in https://github.com/llvm/llvm-project/pull/118989, all
sanitizers but tsan are converted to just module pass for easier
maintenance.
This patch removes the TySan function pass, convert TySan from
function+module pass to just module pass.
Explicitly disable copy CTOR/assigment for SchedBundle to avoid
acsidentional
usage of default versions that do not handle Nodes copies properly.
A developer will need to implement them once required.
The `IPOAmendableCB`'s type is `llvm::function_ref`, it is error-prone
to write code (e.g.
5656cbca52/llvm/lib/Transforms/IPO/OpenMPOpt.cpp (L5812))
that assign a temporary lambda to an `IPOAmendableCB` object, which is a
use-after-free issue.
This patch changes the `IPOAmendableCB` to `std::function`, to avoid the
dangling issue.