Win/ASan relies on the runtime's functions being 16-byte aligned so it
can intercept them with hotpatching. This used to be true (but not
guaranteed) until #149444.
Passing /hotpatch will give us enough alignment and generally ensure
that the functions are hotpatchable.
Drop poison generating flags on trunc when distributing trunc over
add/sub/or. We need to do this since for example
(add (trunc nuw A), (trunc nuw B)) is more poisonous than
(trunc nuw (add A, B))).
In some situations it is pessimistic to drop the flags. Such as
if the add in the example above also has the nuw flag. For now we
keep it simple and always drop the flags.
Worth mentioning is that we drop the flags when cloning
instructions and rebuilding the chain. This is done after the
"allowsPreservingNUW" checks in ConstantOffsetExtractor::Extract.
So we still take the "trunc nuw" into consideration when determining
if nuw can be preserved in the gep (which should be ok since that
check also require that all the involved binary operations has nuw).
Fixes#154116
This handles constant folding for the AVX2 per-element shift intrinsics, which handle out of bounds shift amounts (logical result = 0, arithmetic result = signbit splat)
AVX512 intrinsics will follow in follow up patches
First stage of #154287
This is a weird special case added in 2015, simplifying an even older
condition. It is a no-op for ELF (isExternal is always false) and seems
unneeded for non-ELF.
Similay to
94655dc8ae
The difference is that in LoongArch, the ALIGN is synthesized when the
alignment is >4, (instead of >=4), and the number of bytes inserted is
`sec->addralign - 4`.
extractSubregFromImm previously would sign extend the 16-bit subregister
extracts, but not the 32-bit. We try to consistently store immediates
as sign extended, since not doing it can result in misreported
isInlineImmediate checks.
Since src_{private|shared}_{base|limit} registers are added and
are not artifical compiler happily uses it when it can. In HW
these registers do not exist and the encoding belongs to their
64-bit super-register or 32-bit low register. Same instructions
will produce relocation if run through asm.
This patch updates the TMA Tensor prefetch Op
to add support for im2col_w/w128 and tile_gather4 modes.
This completes support for all modes available in Blackwell.
* lit tests are added for all possible combinations.
* The invalid tests are moved to a separate file with more coverage.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This patch adds basic assembler and MC layer infrastructure for
RISC-V big-endian targets (riscv32be/riscv64be):
- Register big-endian targets in RISCVTargetMachine
- Add big-endian data layout strings
- Implement endianness-aware fixup application in assembler
backend
- Add byte swapping for data fixups on BE cores
- Update MC layer components (AsmInfo, MCTargetDesc, Disassembler,
AsmParser)
This provides the foundation for BE support but does not yet include:
- Codegen patterns for BE
- Load/store instruction handling
- BE-specific subtarget features
These are almost all for internal-developer-users only so "look at
debugserver.cpp" wasn't unreasonable, but we rarely add any new options
so a simple list of all recognized options isn't a burden to throw in
the help method.
This change addresses CodeQL format-string warnings across multiple
sanitizer libraries by adding explicit casts to ensure that printf-style
format specifiers match the actual argument types.
Key updates:
- Cast pointer arguments to (void*) when used with %p.
- Use appropriate integer types and specifiers (e.g., size_t -> %zu,
ssize_t -> %zd) to avoid mismatches.
- Fix format specifier mismatches across xray, memprof, lsan, hwasan,
dfsan.
These changes are no-ops at runtime but improve type safety, silence
static analysis warnings, and reduce the risk of UB in variadic calls.
In 88f409194, we changed the way the crashlog scripted process was
launched since the previous approach required to parse the file twice,
by stopping at entry, setting the crashlog object in the middle of the
scripted process launch and resuming it.
Since then, we've introduced SBScriptObject which allows to pass any
arbitrary python object accross the SBAPI boundary to another scripted
affordance.
This patch make sure of that to include the parse crashlog object into
the scripted process launch info dictionary, which eliviates the need to
stop at entry.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
libstdc++ can be configured to default to a different C++11 ABI, and
when the system that is used to build clang has a different default than
the system used to build a clang plugin, that leads to uses of different
ABIs, leading to breakage (missing symbols) when using clang APIs that
use types like std::string.
We arbitrarily choose to default to the old ABI, but the user can opt-in
to the new ABI. The important part is that whichever is picked is
reflected in llvm-config's output.
The Generic_GCC::GCCInstallationDetector class picks the GCC installation directory with the largest version number. Since the location of the libstdc++ include directories is tied to the GCC version, this can break C++ compilation if the libstdc++ headers for this particular GCC version are not available. Linux distributions tend to package the libstdc++ headers separately from GCC. This frequently leads to situations in which a newer version of GCC gets installed as a dependency of another package without installing the corresponding libstdc++ package. Clang then fails to compile C++ code because it cannot find the libstdc++ headers. Since libstdc++ headers are in fact installed on the system, the GCC installation continues to work, the user may not be aware of the details of the GCC detection, and the compiler does not recognize the situation and emit a warning, this behavior can be hard to understand - as witnessed by many related bug reports over the years.
The goal of this work is to change the GCC detection to prefer GCC installations that contain libstdc++ include directories over those which do not. This should happen regardless of the input language since picking different GCC installations for a build that mixes C and C++ might lead to incompatibilities.
Any change to the GCC installation detection will probably have a negative impact on some users. For instance, for a C user who relies on using the GCC installation with the largest version number, it might become necessary to use the --gcc-install-dir option to ensure that this GCC version is selected.
This seems like an acceptable trade-off given that the situation for users who do not have any special demands on the particular GCC installation directory would be improved significantly.
This patch does not yet change the automatic GCC installation directory choice. Instead, it does introduce a warning that informs the user about the future change if the chosen GCC installation directory differs from the one that would be chosen if the libstdc++ headers are taken into account.
See also this related Discourse discussion: https://discourse.llvm.org/t/rfc-take-libstdc-into-account-during-gcc-detection/86992.
This patch reapplies #145056. The test in the original PR did not specify a target in the clang RUN line and used a wrong way of piping to FileCheck.
Start considering !amdgpu.no.remote.memory.access and
!amdgpu.no.fine.grained.host.memory metadata when deciding to expand
integer atomic operations. This does not yet attempt to accurately
handle fadd/fmin/fmax, which are trickier and require migrating the
old "amdgpu-unsafe-fp-atomics" attribute.
System scope atomics need to use cmpxchg loops if we know
nothing about the allocation the address is from.
aea5980e26e6a87dab9f8acb10eb3a59dd143cb1 started this, this
expands the set to cover the remaining integer operations.
Don't expand xchg and add, those theoretically should work over PCIe.
This is a pre-commit which will introduce performance regressions.
Subsequent changes will add handling of new atomicrmw metadata, which
will avoid the expansion.
Note this still isn't conservative enough; we do need to expand
some device scope atomics if the memory is in fine-grained remote
memory.
`NumFiltered` is the number of elements in all vectors in a map.
It is ever compared to 1, which is equivalent to checking if the map
contains exactly one vector with exactly one element.
All targets are included by Intrinsics.td so we should name things
carefully to avoid interfering with other targets.
Copy one class that LoongArch was also using.
### Problem
PR #142944 introduced a new canonicalization pattern which caused
failures in the following GPU-related integration tests:
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f16-f16-accum.mlir
-
mlir/test/Integration/GPU/CUDA/TensorCore/sm80/transform-mma-sync-matmul-f32.mlir
The issue occurs because the new canonicalization pattern can generate
multi-dimensional `vector.from_elements` operations (rank > 1), but the
GPU lowering pipelines were not equipped to handle these during the
conversion to LLVM.
### Fix
This PR adds `vector::populateVectorFromElementsLoweringPatterns` to the
GPU lowering passes that are integrated in `gpu-lower-to-nvvm-pipeline`:
- `GpuToLLVMConversionPass`: the general GPU-to-LLVM conversion pass.
- `LowerGpuOpsToNVVMOpsPass`: the NVVM-specific lowering pass.
Co-authored-by: Yang Bai <yangb@nvidia.com>
Parse the `Inst` and `SoftField` fields once and store them in
`InstructionEncoding` so that we don't parse them every time
`getMandatoryEncodingBits()` is called.
Adds support for the Error class, Expected class template, and related
APIs that will be used for error propagation and handling in the new ORC
runtime.
The implementations of these types are cut-down versions of similar APIs
in llvm/Support/Error.h. Most advice on llvm::Error and llvm::Expected
(e.g. from the LLVM Programmer's manual) applies equally to
orc_rt::Error and orc_rt::Expected.
Ported from the old ORC runtime at compiler-rt/lib/orc.