The Generic_GCC::GCCInstallationDetector class picks the GCC installation directory with the largest version number. Since the location of the libstdc++ include directories is tied to the GCC version, this can break C++ compilation if the libstdc++ headers for this particular GCC version are not available. Linux distributions tend to package the libstdc++ headers separately from GCC. This frequently leads to situations in which a newer version of GCC gets installed as a dependency of another package without installing the corresponding libstdc++ package. Clang then fails to compile C++ code because it cannot find the libstdc++ headers. Since libstdc++ headers are in fact installed on the system, the GCC installation continues to work, the user may not be aware of the details of the GCC detection, and the compiler does not recognize the situation and emit a warning, this behavior can be hard to understand - as witnessed by many related bug reports over the years.
The goal of this work is to change the GCC detection to prefer GCC installations that contain libstdc++ include directories over those which do not. This should happen regardless of the input language since picking different GCC installations for a build that mixes C and C++ might lead to incompatibilities.
Any change to the GCC installation detection will probably have a negative impact on some users. For instance, for a C user who relies on using the GCC installation with the largest version number, it might become necessary to use the --gcc-install-dir option to ensure that this GCC version is selected.
This seems like an acceptable trade-off given that the situation for users who do not have any special demands on the particular GCC installation directory would be improved significantly.
This patch does not yet change the automatic GCC installation directory choice. Instead, it does introduce a warning that informs the user about the future change if the chosen GCC installation directory differs from the one that would be chosen if the libstdc++ headers are taken into account.
See also this related Discourse discussion: https://discourse.llvm.org/t/rfc-take-libstdc-into-account-during-gcc-detection/86992.
This patch reapplies #145056. The test in the original PR did not specify a target in the clang RUN line and used a wrong way of piping to FileCheck.
Start considering !amdgpu.no.remote.memory.access and
!amdgpu.no.fine.grained.host.memory metadata when deciding to expand
integer atomic operations. This does not yet attempt to accurately
handle fadd/fmin/fmax, which are trickier and require migrating the
old "amdgpu-unsafe-fp-atomics" attribute.
System scope atomics need to use cmpxchg loops if we know
nothing about the allocation the address is from.
aea5980e26e6a87dab9f8acb10eb3a59dd143cb1 started this, this
expands the set to cover the remaining integer operations.
Don't expand xchg and add, those theoretically should work over PCIe.
This is a pre-commit which will introduce performance regressions.
Subsequent changes will add handling of new atomicrmw metadata, which
will avoid the expansion.
Note this still isn't conservative enough; we do need to expand
some device scope atomics if the memory is in fine-grained remote
memory.
`NumFiltered` is the number of elements in all vectors in a map.
It is ever compared to 1, which is equivalent to checking if the map
contains exactly one vector with exactly one element.
All targets are included by Intrinsics.td so we should name things
carefully to avoid interfering with other targets.
Copy one class that LoongArch was also using.
### Problem
PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir
The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.
### Fix
This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:
- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.
Co-authored-by: Yang Bai <yangb@nvidia.com>
Parse the `Inst` and `SoftField` fields once and store them in
`InstructionEncoding` so that we don't parse them every time
`getMandatoryEncodingBits()` is called.
Adds support for the Error class, Expected class template, and related
APIs that will be used for error propagation and handling in the new ORC
runtime.
The implementations of these types are cut-down versions of similar APIs
in llvm/Support/Error.h. Most advice on llvm::Error and llvm::Expected
(e.g. from the LLVM Programmer's manual) applies equally to
orc_rt::Error and orc_rt::Expected.
Ported from the old ORC runtime at compiler-rt/lib/orc.
Currently we only allow folding not (cmp eq) -> icmp ne if the not is
the only user of the compare.
However a common scenario is that some select might also use the
compare. We can still fold the not if we also swizzle the arms of the
selects.
This helps avoid regressions in #150368
libclc sequential build issue addressed in commit 0c21d6b4c8ad is
specific to cmake MSVC generator. Therefore, this PR avoids creating a
large number of targets when a non-MSVC generator is used, such as the
Ninja generator, which is used in pre-merge CI on Windows in
llvm-project repo. We plan to migrate from MSVC generator to Ninja
generator in our downstream CI to fix flaky cmake bug `Cannot restore
timestamp`, which might be related to the large number of targets.
This patch adds setters to the SBStruturedData class to be able to
initialize said object from the client side directly.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The CrossDSOCFI pass runs on the full LTO module and fills in the
body of __cfi_check. This function must have the correct attributes in
order to be compatible with the rest of the program. For example, when
building with -mbranch-protection=standard, the function must have the
branch-target-enforcement attribute, which is normally added by Clang.
When __cfi_check is missing, CrossDSOCFI will give it the default set
of attributes, which are likely incorrect. Therefore, emit __cfi_check
to the full LTO part, where CrossDSOCFI will see it.
Reviewers: efriedma-quic, vitalybuka, fmayer
Reviewed By: efriedma-quic
Pull Request: https://github.com/llvm/llvm-project/pull/154833
Flang currently doesn't build in debug mode on GCC 15 due to missing
dynamic libraries in some CMakeLists.txt files, and OpenMP doesn't link
in debug mode due to the atomic library pulling in libstdc++ despite an
incomplete attempt in the CMakeLists.txt to disable glibcxx assertions.
This PR fixes these issues and allows Flang and the OpenMP runtime to
build and link on GCC 15 in debug mode.
---------
Co-authored-by: ronlieb <ron.lieberman@amd.com>
As the FIXME says, we might generate the wrong code to decode an
instruction if it had an operand with no encoding bits. An example is
M68k's `MOV16ds` that is defined as follows:
```
dag OutOperandList = (outs MxDRD16:$dst);
dag InOperandList = (ins SRC:$src);
list<Register> Uses = [SR];
string AsmString = "move.w\t$src, $dst"
dag Inst = (descend { 0, 1, 0, 0, 0, 0, 0, 0, 1, 1 },
(descend { 0, 0, 0 }, (operand "$dst", 3)));
```
The `$src` operand is not encoded, but what we see in the decoder is:
```C++
tmp = fieldFromInstruction(insn, 0, 3);
if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
{ return MCDisassembler::Fail; }
if (!Check(S, DecodeSRCRegisterClass(MI, insn, Address, Decoder)))
{ return MCDisassembler::Fail; }
return S;
```
This calls DecodeSRCRegisterClass passing it `insn` instead of the value
of a field that doesn't exist. DecodeSRCRegisterClass has an
unconditional llvm_unreachable inside it.
New decoder looks like:
```C++
tmp = fieldFromInstruction(insn, 0, 3);
if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
{ return MCDisassembler::Fail; }
return S;
```
We're still not disassembling this instruction right, but at least we no
longer have to provide a weird operand decoder method that accepts
instruction bits instead of operand bits.
See #154477 for the origins of the FIXME.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
Now, this unit test is unique in that supply multiple key types via
TYPED_TESTS.
This patch adds UniversalSmallSet to work around the problem.
I'm trying to remove the redirection in SmallSet.h:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
to make it clear that we are using SmallPtrSet. There are only
handful places that rely on this redirection.
This patch replaces SmallSet to SmallPtrSet where the element type is
a pointer.
insertParsePoints collects live variables and then deduplicate them
while retaining the original insertion order, which is exactly what
SetVector is designed for. This patch replaces SmallVector with
SetSmallVector while deleting unique_unsorted.
While we are at it, this patch reduces the number of inline elements
to a reasonable level for linear search.
When multiple assumed size variable are used in a kernel with dynamic
shared memory, each variable use the 0 offset. Update the pass to
account for that.
```
attributes(global) subroutine testany( a )
real(4), shared :: smasks(*)
real(8), shared :: dmasks(*)
end subroutine
```
If an encoding has a custom decoder, the decoder is assumed to be
"complete" (always succeed) if hasCompleteDecoder field is true. We
determine this when constructing InstructionEncoding.
If the decoder for an encoding is *generated*, it always succeeds if
none of the operand decoders can fail. The latter is determined based on
the value of operands' DecoderMethod/hasCompleteDecoder. This happens
late, at table construction time, making the code harder to follow.
This change moves this logic to the InstructionEncoding constructor.
Instead of interleaving the pseudo definitions and their patterns,
define all the pseudos together and all the patterns together.
Add IsRV32 predicate to the patterns.
Summary:
The `__builtin_shufflevector` call would return a GCC vector in all
cases where the vector type was increased. Change this to preserve
whether or not this was an extended vector.
Fixes: https://github.com/llvm/llvm-project/issues/107981