This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
Not all SPIR-V extensions are allows in every environment. When we use
the `-spirv-ext=all` option, the backend currently believes that all
extensions can be used.
This commit filters out the extensions on the command line to remove
those that are not known to be allowed for the current environment.
Alternatives considered: I considered modifying the
SPIRVExtensionsParser::parse to use a different list of extensions for
"all" depending on the target triple. However that does not work because
the target triple is not available, and cannot be made available in a
reasonable way.
Fixes#147717
---------
Co-authored-by: Victor Lomuller <victor@codeplay.com>
This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512
equivalent of the pmaddw and pmaddubs intrinsics:
<16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
<32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
Resolve the TODO: on RV32, when constructing the double-precision
constant `+0.0` for `s64`, `BuildPairF64Pseudo` can be optimized to use
the `fcvt.d.w` instruction to generate the result directly.
Add a default off option to the inline cost calculation to always inline
all viable calls regardless of the cost/benefit and cost/threshold
calculations.
For performance reasons, some users require that all calls be inlined.
Rather than forcing them to adjust the inlining threshold to an
arbitrarily high value, offer an option to inline all calls.
Fixes#154056.
The fat buffer lowering pass was erroniously detecting that it did not
need to run on functions that only load/store to the null constant (or
other such constants). We thought this would be covered by specializing
constants out to instructions, but that doesn't account foc trivial
constants like null. Therefore, we check the operands of instructions
for buffer fat pointers in order to find such constants and ensure the
pass runs.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.
Create_nd PR : #152351
Remove:
* DescriptorType enum - this almost exactly shadowed the ResourceClass
enum
* ClauseType aliased ResourceClass
Although these were introduced to make the HLSL root signature handling
code a bit cleaner, they were ultimately causing confusion as they
appeared to be unique enums that needed to be converted between each
other.
Closes#153890
Adds support for compact serialization of Formulas, and a corresponding
parse function. Extends Environment and AnalysisContext with necessary
functions for serializing and deserializing all formula-related parts of
the environment.
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.
I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to
multiple performance regressions observed across downstream Numba
benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772).
While avoiding non-trivial unswitches on newly-cloned loops helps
mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509,
it may as well make the IR less friendly to vectorization / loop-
canonicalization (in the test reported, previously no select with
loop-carried dependence existed in the new specialized loops),
leading the abovementioned approach to be reconsidered.
PushedRegisters in this patch needs to be of type int64_t because iot is
grabbing registers from immediate operands of pseudo instructions.
However, we then compare to an actual register type later, which relies
on the implicit conversion within Register to int, which can result in
build failures in some configurations.
- llvm.nvvm.reflect - Use a PureIntrinsic for (adding speculatable),
this will be replaced by a constant prior to lowering so speculation is
fine.
- llvm.nvvm.tex.* - Add [IntrNoCallback, IntrNoFree, IntrWillReturn]
- llvm.nvvm.suld.* - Add [IntrNoCallback, IntrNoFree] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.sust.* - Add [IntrNoCallback, IntrNoFree, IntrWriteMem] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.[suq|txq|istypep].* - Use DefaultAttrsIntrinsic
- llvm.nvvm.read.ptx.sreg.* - Add [IntrNoFree, IntrWillReturn] to
non-constant reads as well.
This PR upstreams `GotoOp`. It moves some tests from the `goto` test
file to the `label` test file, and adds verify logic to `FuncOp`. The
gotosSolver, required for lowering, will be implemented in a future PR.
This has been a long-standing problem, but we didn't use to call the
destructors of items on the stack unless we explicitly `pop()` or
`discard()` them.
When interpretation was interrupted midway-through (because something
failed), we left `Pointer`s on the stack. Since all `Block`s track what
`Pointer`s point to them (via a doubly-linked list in the `Pointer`),
that meant we potentially leave deallocated pointers in that list. We
used to work around this by removing the `Pointer` from the list before
deallocating the block.
However, we now want to track pointers to global blocks as well, which
poses a problem since the blocks are never deallocated and thus those
pointers are always left dangling.
I've tried a few different approaches to fixing this but in the end I
just gave up on the idea of never knowing what items are in the stack.
We already have an `ItemTypes` vector that we use for debugging
assertions. This patch simply enables this vector unconditionally and
uses it in the abort case to properly `discard()` all elements from the
stack. That's a little sad IMO but I don't know of another way of
solving this problem.
As expected, this is a slight hit to compile times:
https://llvm-compile-time-tracker.com/compare.php?from=574d0a92060bf4808776b7a0239ffe91a092b15d&to=0317105f559093cfb909bfb01857a6b837991940&stat=instructions:u
Fixes build bot assertion.
I forgot to include logic that will be added in a future PR that handles
-1 correctly. For now, let's just return nullptr like we used to.
Resolves#105430
- Implement all required pieces of P3168R2
- Leverage existing `wrap_iter` and `bounded_iter` classes to implement
the `optional` regular and hardened iterator type, respectively
- Update documentation to match
Lowering in GlobalISel for AMDGPU previously always narrows to i32 on
truncating store regardless of mem size or scalar size, causing issues
with types like i65 which is first extended to i128 then stored as i64 +
i8 to i128 locations. Narrowing only on store to pow of 2 mem location
ensures only narrowing to mem size near end of legalization.
This LLVM defect was identified via the AMD Fuzzing project.
A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to
G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal
according to the verifier. This makes sure we generate a fadd/fmul
instead.
Like a big dummy, I completely skipped running this test locally and
forgot it would need check lines. *sigh*, Looks like SOMEONE has a case
of the Mondays!
Anyway, this patch fixes it by adding the proper verify lines.
As reported, OpenACC's variable declaration handling was assuming some
semblence of legality in the example, so it didn't properly handle an
error case. This patch fixes its assumptions so that we don't crash.
Fixes#154008
This breaks a ton of libc++ tests otherwise, since calling
std::destroy_at will currently end the lifetime of the entire array not
just the given element.
See https://github.com/llvm/llvm-project/issues/147528
The original patch to implement basic lowering for firstprivate didn't
have the Sema work to change the name of the variable being generated
from openacc.private.init to openacc.firstprivate.init. I forgot about
that when I merged the Sema changes this morning, so the tests now
failed. This patch fixes those up.
Additionally, Suggested on #153622 post-commit, it seems like a good idea to
use a size of APInt that matches the size-type, so this changes us to use that
instead.
Incompatible pointer to integer conversion diagnostic checks would
trigger an assertion when the designated initializer is for an array of
unknown bounds.
Fixes#154046