This fixes a regression reported here
https://github.com/llvm/llvm-project/pull/147835#issuecomment-3181811371,
where getTrivialTemplateArgumentLoc can't see through template name
sugar when producing a trivial TemplateArgumentLoc for template template
arguments.
Since this regression was never released, there are no release notes.
The PR is going to improve the readability for the files under
`llvm-project/libc/src/wchar` directory.
---------
Co-authored-by: Jin Huang <jingold@google.com>
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes#149318.
Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public
packaging" (i.e. release mode) of an overlay build. The Bazel build has
already switched to use it by default in
5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change,
as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler)
LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and
config settings.
Fixes issue #143454.
Cause:
1. `implicit_def` inside bundle does not count for define of reg in
machineinst verifier
2. Including `implicit_def` will cause relative reg not define, result
in `Bad machine code: Using an undefined physical register` in the
machineinst verifier
Fixes https://github.com/llvm/llvm-project/issues/139102
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
It might have been a bug that these were previously not included,
but they don't appear to have ever been used:
https://godbolt.org/z/zE6zs8xxa
If these really exist, they probably should be included. Removes 4
unused entries from the set of libcall impls.
We almost only ever have one header mask, except with the data tail
folding style, i.e. with VPInstruction::ActiveLaneMask.
All we need to do is to make sure to erase the old header icmp based
header mask when replacing it.
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.
This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
Use std::numeric_limits<uint32_t>::max() for all overflow checks in
ObjectFileWasm and fix a few locations where I incorrectly used `>=`
instead of `>`.
An earlier draft of DoubleAPFloat::convertToSignExtendedInteger had
arranged for overflow to be handled in a different way. However, these
assertions are now possible if Hi+Lo are out of range and Lo != 0.
A test has been added to defend against a regression.
Reverts llvm/llvm-project#153119 because with
`LLDB_USE_LIBEDIT_READLINE_COMPAT_MODULE`, we're using
`PyImport_Inittab` which isn't part of the stable API.
* This adjusts the `Request`/`Response` types to have an `id` that is
either a string or a number.
* Merges 'Error' into 'Response' to have a single response type that
represents both errors and results.
* Adjusts the `Error.data` field to by any JSON value.
* Adds `operator==` support to the base protocol types and simplifies
the tests.
The prior implementation did not consider that the Lo component may
underflow when it undergoes scaling. This means that we need to
carefully handle things like binade crossings or how to handle
roundTowardZero when Hi and Lo have different signs.
Particularly annoying is roundTiesToAway when Hi and Lo have different
signs. It basically requires us to implement roundTiesTowardZero.
The profiling - related metadata information for the hoisted conditional branch should be copied from the original branch, not from the current terminator of the block it's hoisted to.
The patch adds a way to disable the fix just so we can do an ablation test, after which the flag will be removed. The same flag will be reused for other similar fixes.
(This was identified through `profcheck` (see Issue #147390), and this PR addresses most of the test failures (when running under profcheck) under `Transforms/LICM`.)
Properly cast the selector to `i64` regardless of its integer type.
We used to generate llvm.trunc always.
We have to use `i64` as long as the case values may exceed INT_MAX.
Fixes#153050.
Recently my change to avoid duplicate `dontcall` attribute errors
(#152810) caused the Clang `Frontend/backend-attribute-error-warning.c`
test to fail on Arm32:
<https://lab.llvm.org/buildbot/#/builders/154/builds/20134>
The root cause is that, if the default `IFastSel` path bails, then
targets are given the opportunity to lower instructions via
`fastSelectInstruction`. That's the path taken by Arm32 and since its
implementation of `selectCall` didn't call `diagnoseDontCall` no error
was emitted.
I've checked the other implementations of `fastSelectInstruction` and
the only other one that lowers call instructions in WebAssembly, so I've
fixed that too.
Constructing Target triple with `ObjectFile::makeTriple` instead of just
with `Arch` and leaving the rest unknown. Also creating the subtarget
with the `CPU`. AMDGPU needs the full triple and `CPU` to disassemble
correctly.
To run a full test, also fixed a failure in `SIPreAllocateWWMRegs` with
the `$noreg` operand in `DBG_VALUE`.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This adds support for printing the signature sections as part of the
`-p` flag for printing private headers.
The formatting aims to roughly match the formatting used by DXC's
`/dumpbin` flag.
Resolves#152380.
This PR implements "automatic" location inference in the bindings. The
way it works is it walks the frame stack collecting source locations
(Python captures these in the frame itself). It is inspired by JAX's
[implementation](523ddcfbca/jax/_src/interpreters/mlir.py (L462))
but moves the frame stack traversal into the bindings for better
performance.
The system supports registering "included" and "excluded" filenames;
frames originating from functions in included filenames **will not** be
filtered and frames originating from functions in excluded filenames
**will** be filtered (in that order). This allows excluding all the
generated `*_ops_gen.py` files.
The system is also "toggleable" and off by default to save people who
have their own systems (such as JAX) from the added cost.
Note, the system stores the entire stacktrace (subject to
`locTracebackFramesLimit`) in the `Location` using specifically a
`CallSiteLoc`. This can be useful for profiling tools (flamegraphs
etc.).
Shoutout to the folks at JAX for coming up with a good system.
---------
Co-authored-by: Jacques Pienaar <jpienaar@google.com>
fixes#135572
There are two problems that are causing problems first register types
are copied from older registers instead of evaluating the spirv types.
Second the way OpSelect is defined in SPIRVInstrInfo.td we always
default to integer for TernOpTyped. There seems to be a problem of
multiple matches in the getMatchTable so when executeMatchTable runs we
aren't getting the right opSelect.
Correcting the tablegen wasn't very easy so instead created an emitter
for Select that evaluated the register types. this passes the original
llvm/test/CodeGen/SPIRV/instructions/select.ll tests and the new float
ones I'm adding in issue-135572-emit-float-opselect.ll
When determining what arguments to pass to `clang-linker-wrapper` as
device linker args, don't forward `-mllvm` args if the offloading
toolchain doesn't have native LLVM support.
I saw this when working with SPIR-V, we were trying to pass `-mllvm` to
`spirv-link`.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Remove the type alias now that the std::variant aspect is gone, directly
using std::vector in the few places that need it is more idiomatic.
Move a routine from a core header to single user.
`f16` is passed and returned in vector registers on both x86 on AArch64,
the same calling convention as `f32`, so it is a straightforward type to
support. The calling convention support already exists, added as part of
a6065f0fa55a ("Arm64EC entry/exit thunks, consolidated. (#79067)").
Thus, add mangling and remove the error in order to make `half` work.
MSVC does not yet support `_Float16`, so for now this will remain an
LLVM-only extension.
Fixes the `f16` portion of
https://github.com/llvm/llvm-project/issues/94434
This reverts commit 1c7c8e3ad39957285524ff116d9a6aec0d9b62f9.
Recommit with a fix for the verifier error caused for EVL recipes.
Extra test coverage added in 6f939da60e.