This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38.
This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326
/home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'
Instead of getting a lock and then checking/modifying the Initialization
variable, make it an atomic. Doing this, we can remove one of the
mutexes in shared TSDs and avoid any potential lock contention in both
shared TSDs and exclusive TSDs if multiple threads do allocation
operations at the same time.
Add two new tests that make sure no crashes occur if multiple threads
try and do allocations at the same time.
#170809 added the child_stdin_fd_ field on SymbolizerProcess to allow
the parent process to hold on to the read in of the child's stdin pipe.
This was to avoid SIGPIPE.
However, the `StartSubprocess` path still closes the stdin fd in the
parent here:
7f5ed91684/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp (L525-L535)
This could cause a double-close of this fd (problematic in the case of
fd reuse).
This moves the `child_stdin_fd_` field to only be initialized on the
posix_spawn path. This should ensure #170809 only truly affects Darwin.
uint64_t and size_t are not the same across all platforms. This was
causing build failures when building this file for wasm:
llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1323:19: error:
out-of-line definition of 'resolveEntry' does not match any declaration
in '(anonymous namespace)::AttrTypeReader'
1323 | T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>>
&entries, size_t index,
| ^~~~~~~~~~~~
third_party/llvm/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:851:7:
note: AttrTypeReader defined here
851 | class AttrTypeReader {
| ^~~~~~~~~~~~~~
1 error generated.
Use uint64_t everywhere to ensure portability.
This PR is very similar to #167235, but applied to `trn` rather than
`zip`. There are two further differences:
- The `@combine_v8i16_8first` and `@combine_v8i16_8firstundef` test
cases in `arm64-zip.ll` didn't have equivalents in `arm64-trn.ll`, so
this PR adds new test cases `@vtrni8_8first`, `@vtrni8_9first`,
`@vtrni8_89first_undef`.
- `AArch64TTIImpl::getShuffleCost` calls `isZIPMask`, but not
`isTRNMask`. It relies on `Kind == TTI::SK_Transpose` instead (which
in turn is based on `ShuffleVectorInst::isTransposeMask` through
`improveShuffleKindFromMask`).
Therefore, this PR does not itself influence the slp-vectorizer. In a
follow-up PR, I intend to override
`AArch64TTIImpl::improveShuffleKindFromMask` to ensure we get
`ShuffleKind::SK_Transpose` based on the new `isTRNMask`. In fact, that
follow-up change is the actual motivation for this PR, as it will result
in
```C++
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}
```
from #137447 being optimised by the slp-vectorizer.
When GeneratedRTChecks::create bails out due to exceeding the cost
threshold, no runtime checks are generated and we must not proceed
assuming checks have been generated.
Mark the checks as never succeeding, to make sure we don't try to
vectorize assuming the runtime checks hold. This fixes a case where we
previously incorrectly vectorized assuming runtime checks had been
generated when forcing vectorization via metadate.
Fixes the mis-compile mentioned in
https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
Generalize the check for recognizing [[Obj alloc] init] to also
recognize [allocObj() init]. We do this by utilizing isAllocInit
function in RetainPtrCtorAdoptChecker.
Due to a legacy incompatibility with `atos`, we were allocating a pty
whenever we spawned the symbolizer. This is no longer necessary and we
can use a regular ol' pipe.
This PR is split into two commits:
- The first removes the pty allocation and replaces it with a pipe. This
relocates the `CreateTwoHighNumberedPipes` call to be common to the
`posix_spawn` and `StartSubprocess` path.
- The second commit adds the `child_stdin_fd_` field to
`SymbolizerProcess`, storing the read end of the stdin pipe. By holding
on to this fd for the lifetime of the symbolizer, we are able to avoid
getting SIGPIPE (which would occur when we write to a pipe whose
read-end had been closed due to the death of the symbolizer). This will
be very close to solving #120915, but this PR is intentionally not
touching the non-posix_spawn path.
rdar://165894284
If the subtarget supports flat scratch SVS mode and there is no SGPR
available to replace a frame index, convert a scratch instruction in SS
form into SV form and replace the frame index with a scavenged VGPR.
Resolves#155902
Co-authored-by: Matt Arsenault <matthew.arsenault@amd.com>
This LLVM IR
https://godbolt.org/z/5bM1vrMY1
```llvm
define <4 x i32> @masked(<2 x double> %a, <4 x i32> %src, i8 noundef zeroext %mask) unnamed_addr #0 {
%r = tail call <4 x i32> @llvm.x86.avx10.mask.vcvttpd2udqs.128(<2 x double> %a, <4 x i32> %src, i8 noundef %mask)
ret <4 x i32> %r
}
define <4 x i32> @unmasked(<2 x double> %a) unnamed_addr #0 {
%r = tail call <4 x i32> @llvm.x86.avx10.mask.vcvttpd2udqs.128(<2 x double> %a, <4 x i32> zeroinitializer, i8 noundef -1)
ret <4 x i32> %r
}
declare <4 x i32> @llvm.x86.avx10.mask.vcvttpd2udqs.128(<2 x double>, <4 x i32>, i8) unnamed_addr
attributes #0 = { mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(none) uwtable "probe-stack"="inline-asm" "target-cpu"="x86-64" "target-features"="+avx10.2-512" }
```
produces
```asm
masked: # @masked
kmovd k1, edi
vcvttpd2dqs xmm1 {k1}, xmm0
vmovaps xmm0, xmm1
ret
unmasked: # @unmasked
vcvttpd2udqs xmm0, xmm0
ret
```
So, when a mask is used, somehow the signed version of this instruction
is selected. I suspect this is a typo.
Similar to the other PRs, this runs the `std::optional` test with PDB.
Since we don't know that variables use typedefs, we check for the full
name when testing PDB.
Add ability to defer parsing and re-enqueueing oneself. This enables
changing CallSiteLoc parsing to not recurse as deeply: previously this
could fail (especially on large inputs in debug mode the recursion could
overflow). Add a default depth cutoff, this could be a parameter later
if needed.
Since PDB doesn't have template information, we need to get the element
type from somewhere else. I'm using the type of `_Myval` in a list node,
which holds the element type.
Define `LHS.subsetOf(RHS)` as a more descriptive name for `!LHS.test(RHS)`
and update the existing callers to use that name.
Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
Previously, even when MSVC compatibility was not requested, inline move
constructors in dllexport-ed templates were not exported, which was
seemingly unintended.
On non-MSVC targets (MinGW, Cygwin, and PS), such move constructors
should be exported consistently with copy constructors and with the
behavior of modern MSVC.
Single backticks RST tries to resolve to a reference.
Double means plaintext.
Fixes these warnings:
map.rst:803: WARNING: 'any' reference target not found: target.prefer-dynamic-value
map.rst:814: WARNING: 'any' reference target not found: expr
Added masked compress builtin in CIR.
Note: This is my first PR to llvm. Looking forward to corrections
---------
Co-authored-by: bhuvan1527 <balabhuvanvarma@gmail.com>
For extended imges insts amdgcn_image_sample_*_/gather4_* builtins,
using 'x' in the builtin def so that it will take _Float16 for both
HIP/C++ and OpenCL.
LLVM has pretty thorough support for `int128`, and it has started seeing
some use. Even thouth we already have support for the
`SPV_ALTERA_arbitrary_precision_integers` extension, the BE was oddly
capping integer width to 64-bits. This patch adds partial support for
lowering 128-bit integers to `OpTypeInt 128`. Some work remains to be
done around legalisation support and validating constant uses (e.g.
cases that get lowered to `OpSpecConstantOp`).
Fixed the argument types of the following intrinsics to match with the
ISA:
- vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
- vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
- vpdpwsud_128, vpdpwsud_256, vpdowsud_512
- vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
- vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
- vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
- vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
- vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512
Fixes#97271. Note that this is the last PR for the issue.
This change introduces a new IR pass in the llc pipeline for NVPTX that
transforms sequences of FMUL followed by FADD or FSUB into a single FMA
instruction.
Currently, all FMA folding for NVPTX occurs at the DAGCombine stage,
which is too late for any IR-level passes that might want to optimize or
analyze FMAs. By moving this transformation earlier into the IR phase,
we enable more opportunities for FMA folding, including across basic
blocks.
Additionally, this new pass relies on the contract instruction level
fast-math flag to perform these transformations, rather than depending
on the -fp-contract=fast or -enable-unsafe-fp-math options passed to
llc.
Runs the `std::shared/unique_ptr` tests with PDB with two changes:
- PDB uses the "full" name, so `std::string` is `std::basic_string<char,
std::char_traits<char>, std::allocator<char>>`
- The type of the pointer inside the shared/unique_ptr isn't the
`element_type` typedef
This adds stubs that issue NYI errors for any visitor that is present in
the ClangIR incubator but missing in the upstream implementation. This
will make it easier to find to correct locations to implement missing
functionality.
This moves a couple of statement emitters that were incorrectly
implemented in the middle of a switch statement where all cases in the
final group are intended to fall through to a handler that emits an NYI
error message. The placement of these implementations was causing some
statement types that should have emitted the NYI error to instead go to
a handler for a different statement type.
The inline assembly handling in SelectionDAG uses the first type
for the register class as the type at the input/output of the
inlineassembly. If this isn't the type for the surrounding DAG,
it needs to be converted.
nxv8i8 is the first type for the VR and VRNoV0 register classes.
So we currently generate insert/extract_subvector and bitcasts to
convert to/from nxv8i8.
I believe some of the special casing we have for this in
splitValueIntoRegisterParts and joinRegisterPartsIntoValue is causing
us to also generate incorrect code for arguments with nxv16i4 types
that should be any extended to nxv16i8. Instead we widen them to nxv32i4
and bitcast to nxv16i8.
This patch uses VM and VMNoV0 for masks which has nxv64i1 as their
first type. This means we will only emit an insert/extract_subvector
without any bitcasts. This will allow me to fix
splitValueIntoRegisterParts and joinRegisterPartsIntoValue to fix the
nxv16i4 argument issue without breaking inline assembly.
I may need to add more register classes to cover fractional LMULs,
but I'm not sure yet.
We're considering modifying the ObjC runtime's class_rw_t structure to
remove the firstSubclass and nextSiblingClass fields in some cases. LLDB
is currently reading those but not actually using them. Stop doing that
to avoid issues if they are removed by the runtime.
rdar://166084122
This PR extends XeGPU layout propagation and distribution for
vector.broadcast operation.
It relaxes the restriction of layout propagation to allow low-rank and
scalar source input, and adds a pattern in sg-to-wi distribution to
support the lowering.
Fixes a crash in `ReorderCastOpsOnBroadcast` by ensuring the cast result
is a `VectorType` before applying the pattern.
A regression test has been added to
mlir/test/Dialect/Vector/vector-sink.mlir.
Fixes: #126371
"All tests passed" is too easily interpreted as every possible test was
run and was fine. A lot of the time it means all the tests that didn't
fail to build ran and were fine.
Maybe the wording is still too subtle but at least it hints to the idea
that the tests run might be fewer than if the build had no compilation
errors.
This fixes the buildbot failures from
https://github.com/llvm/llvm-project/pull/150267.
I could not reproduce them locally but my intuition suggests that the
-O3 option on the RUN line behaves incosistently on different hosts
judging from the error logs.
My intention was to run an integration test which will use llvm's
globalopt pass, but there's no need actually. We have unittests in place
for it.