The old syncscope parameter never really worked correctly, but
effectively gave "workgroup" scope. Use something faster than system
but more correct than before.
https://reviews.llvm.org/D157389
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
The test migration to opaque pointers has finished, so we can finally
drop typed pointer support from LLVM \o/
This removes the ability to disable typed pointers, as well as the
-opaque-pointers option, but otherwise doesn't yet touch any API
surface. I'll leave deprecation/removal of compatibility APIs to
future changes.
This also drops a few tests: These are either testing errors that
only occur with typed pointers, or type linking behavior that, to
the best of my knowledge, only applies to typed pointers.
Note that this will break some tests in the experimental SPIRV
backend, because the maintainers have failed to update their tests
in a reasonable time-frame, despite multiple warnings. In accordance
with our experimental target policy, this is not a blocking concern.
This issue is tracked at https://github.com/llvm/llvm-project/issues/60133.
Differential Revision: https://reviews.llvm.org/D155079
This is a reboot of the original design and implementation by
Nicolai Haehnle <nicolai.haehnle@amd.com>:
https://reviews.llvm.org/D85603
This change also obsoletes an earlier attempt at restarting the work on
convergence tokens:
https://reviews.llvm.org/D104504
Changes relative to D85603:
1. Clean up the definition of a "convergent operation", a convergent
call and convergent function.
2. Clean up the relationship between dynamic instances, sets of threads and
convergence tokens.
3. Redistribute the formal rules into the definitions of the convergence
intrinsics.
4. Expand on the semantics of entering a function from outside LLVM,
and the environment-defined outcome of the entry intrinsic.
5. Replace the term "cycle" with "closed path". The static rules are defined
in terms of closed paths, and then a relation is established with cycles.
6. Specify that if a function contains a controlled convergent operation, then
all convergent operations in that function must be controlled.
7. Describe an optional procedure to infer tokens for uncontrolled convergent
operations.
8. Introduce controlled maximal convergence-before and controlled m-converged
property as an update to the original properties in UniformityAnalysis.
9. Additional constraint that a cycle heart can only occur in the header of a
reducible cycle (natural loop).
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D147116
Test "local-type-as-template-parameter.ll" is now enabled only for
x86_64.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144006
Depends on D144005
Test "local-type-as-template-parameter.ll" now requires linux-system.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144006
Depends on D144005
This reverts commit d80fdc6fc1a6e717af1bcd7a7313e65de433ba85.
split-dwarf-local-impor3.ll fails because of an issue with
Dwo sections emission on Windows platform.
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
This reverts commit ed578f02cf44a52adde16647150e7421f3ef70f3.
Tests llvm/test/DebugInfo/Generic/split-dwarf-local-import*.ll fail
when x86_64 target is not registered.
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
This reverts commit aae8524bcc26cf04729f2bbc02ecb54233a587e4, which was
found to cause a few unexpected benchmark performance differences that
need investigation.
As pointed out in
https://discourse.llvm.org/t/undeterministic-thin-index-file/69985, the
block count added to distributed ThinLTO index files breaks incremental
builds on ThinLTO - if any linked file has a different number of BBs,
then the accumulated sum placed in the index files will change, causing
all ThinLTO backend compiles to be redone.
This was only used for partial sample profiles, and was therefore
removed for other cases (3adc6e03080c6d38a51f5c5b6744b7c0d9c7541b).
Subsequent testing did not show a performance effect of disabling this
feature even for partial sample profiles. Therefore, switch the default
to false. If this does not cause a noticeable performance degradation
after the default flip, we can remove this support completely.
Differential Revision: https://reviews.llvm.org/D151249
As pointed out in
https://discourse.llvm.org/t/undeterministic-thin-index-file/69985, the
block count added to distributed ThinLTO index files breaks incremental
builds on ThinLTO - if any linked file has a different number of BBs,
then the accumulated sum placed in the index files will change, causing
all ThinLTO backend compiles to be redone.
The block count is only used for scaling of partial sample profiles, and
was added in D80403 for D79831.
This patch simply removes this field from the index files of non partial
sample profile compiles, which is NFC on the output of the compiler.
We subsequently need to see if this can be removed for partial sample
profiles without signficant performance loss, or redesigned in a way
that does not destroy caching.
Differential Revision: https://reviews.llvm.org/D148746
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
!nonnull expectes an empty metadata argument, so check that this
is the case in the verifier. This came up as a problem in
https://reviews.llvm.org/D141386.
This requires dropping the verifier call in the compatibility-6.0.ll
test (which is not present in any of the other bitcode compatibility
tests). The original input unfortunately used typo'd nonnull
metadata.
Use the existing mechanism to change the data layout using callbacks.
Before this patch, we had a callback type DataLayoutCallbackTy that receives
a single StringRef specifying the target triple, and optionally returns
the data layout string to be used. Module loaders (both IR and BC) then
apply the callback to potentially override the module's data layout,
after first having imported and parsed the data layout from the file.
We can't do the same to fix invalid data layouts, because the import will already
fail, before the callback has a chance to fix it.
Instead, module loaders now tentatively parse the data layout into a string,
wait until the target triple has been parsed, apply the override callback
to the imported string and only then parse the tentative string as a data layout.
Moreover, add the old data layout string S as second argument to the callback,
in addition to the already existing target triple argument.
S is either the default data layout string in case none is specified, or the data
layout string specified in the module, possibly after auto-upgrades (for the BitcodeReader).
This allows callbacks to inspect the old data layout string,
and fix it instead of setting a fixed data layout.
Also allow to pass data layout override callbacks to lazy bitcode module
loader functions.
Differential Revision: https://reviews.llvm.org/D140985
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
Always read bitcode according to the -opaque-pointers mode. Do not
perform auto-detection to implicitly switch to typed pointers.
This is a step towards removing typed pointer support, and also
eliminates the class of problems where linking may fail if a typed
pointer module is loaded before an opaque pointer module. (The
latest place where this was encountered is D139924, but this has
previously been fixed in other places doing bitcode linking as well.)
Differential Revision: https://reviews.llvm.org/D139940
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.
Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.
I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.
Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139647
The global constant arguments could be in a different address space
than the first argument, so we have to add another overloaded argument.
This patch was originally made for CHERI LLVM (where globals can be in
address space 200), but it also appears to be useful for in-tree targets
as can be seen from the test diffs.
Differential Revision: https://reviews.llvm.org/D138722
Almost all of the other SVE LLVM IR intrinsics take i32 values
for lane indices or other immediates. We should bring the bfloat
intrinsics in line with that. It will also make it easier to
add support for the SVE2.1 float intrinsics in future, since
they reuse the same underlying instruction classes.
I've maintained backwards compatibility with the old i64 variants
and used the autoupgrade mechanism.
Differential Revision: https://reviews.llvm.org/D138788