This allows for removing llvm-project-tests.yml. This significantly
reduces the complexity of this workflow (including the complexity of
llvm-project-tests.yml) at the cost of a little bit of duplication with
the other workflows that were also using llvm-project-tests.yml.
Reviewers: tstellar, DeinAlptraum
Reviewed By: DeinAlptraum
Pull Request: https://github.com/llvm/llvm-project/pull/153876
This will eventually allow for removing llvm-project-tests.yml. This
should significantly reduce the complexity of this workflow (including
the complexity of llvm-project-tests.yml) at the cost of a little bit of
duplication.
Reviewers: IgWod-IMG, kuhar
Reviewed By: kuhar
Pull Request: https://github.com/llvm/llvm-project/pull/153871
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
…Expr
Two tests have new warnings because `warn_unused_result` is now
respected for constructor temporaries. These tests were newly added in
#112521 last year. This is good because the new behavior is better than
the old.
@Sirraide and @Mick235711 what do you think about it?
This patch removes RSP dependencies from push and pop instructions to
pretend that we have a stack engine. This does not model details like
sync uops that are relevant implementation details due to complexity.
This is just enabled on all X86 CPUs given LLVM does not have a
scheduling model for any X86 CPU that does not have a stack engine.
This fixes#152008.
The existing `amdgpu-kernarg-preload-count` can't be used as a switch to
turn it off if it is set to 0. This PR adds an extra option to turn it
off.
Fixes SWDEV-550147.
For instructions in the `SYS` alias encoding space which take no
register operands, and where the unused 5 register bits are not all set
(0x31, 0b11111), then disassemble to a `SYS` alias and not the
instruction, since it is not considered valid.
This is because it is specified in the Arm ARM in text similar to this
(e.g. page C5-1037 of DDI0487L.b for `TLBI ALLE1`, or page C5-1585 for
`GCSPOPX`):
```
Rt should be encoded as 0b11111. If the Rt field is not set to 0b11111,
it is CONSTRAINED UNPREDICTABLE whether:
* The instruction is UNDEFINED.
* The instruction behaves as if the Rt field is set to 0b11111.
```
Since we want to follow "should" directives, and not encourage undefined
behaviour, only assemble or disassemble instructions considered valid.
Add an extra test-case for this, and all existing test-cases are
continuing to pass.
This patch implements the basic lowering infrastructure, but does not
quite implement the copy initialization, which requires #153622.
It does however pass verification for the 'copy' section, which just
contains a yield.
This false dependency issue was fixed in CannonLake looking at the data
from uops.info. This is confirmed not to be an issue based on
benchmarking data in #153983. Setting this can potentially lead to extra
xor instructions whihc could consume extra frontend/renaming resources.
None of the other CPUs that have had this fixed have the tuning flag.
Fixes#153983.
Add a debugging flag to the dialect conversion to dump the
materialization kind. This flag is useful to find out whether a missing
materialization rule is for source or target materializations.
Also add missing test coverage for the `buildMaterializations` flag.
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
LiveVariables will mark instructions with their implicit subregister
uses. However, it will also mark the subregister as an implicit if its
own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def
$r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused,
which defines $r3 on the same line it is used.
This change ensures such uses are marked without implicit, i.e. `$r3 =
OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
This commit changes all relocations to be relocated with symbols.
Without this commit, errors may occur in some cases, such as when using
`llc/lto+relax`, or combining relaxed and norelaxed object files using
`ld -r`.
Some tests updated.
Reland of #142941
Squashed with fixes for #150004, #149585
This pattern matches gather-like patterns where
values are loaded per lane into neon registers, and
replaces it with loads into 2 separate registers, which
will be combined with a zip instruction. This decreases
the critical path length and improves Memory Level
Parallelism.
rdar://151851094
Implement use-after-free detection in the lifetime safety analysis with two warning levels.
- Added a `LifetimeSafetyReporter` interface for reporting lifetime safety issues
- Created two warning levels:
- Definite errors (reported with `-Wexperimental-lifetime-safety-permissive`)
- Potential errors (reported with `-Wexperimental-lifetime-safety-strict`)
- Implemented a `LifetimeChecker` class that analyzes loan propagation and expired loans to detect use-after-free issues.
- Added tracking of use sites through a new `UseFact` class.
- Enhanced the `ExpireFact` to track the expressions where objects are destroyed.
- Added test cases for both definite and potential use-after-free scenarios.
The implementation now tracks pointer uses and can determine when a pointer is dereferenced after its loan has been expired, with appropriate diagnostics.
The two warning levels provide flexibility - definite errors for high-confidence issues and potential errors for cases that depend on control flow.
Fixes a regression uncovered by Fujitsu test 0686_0024.f90. In
particular, verifies that a pre-determined symbol is only privatized by
its defining evaluation (e.g. the loop for which the symbol was marked
as pre-determined).
An operand of the nested yield op can be null and hasn't been verified
yet when processing the enclosing operation. Using `getResultTypes()`
will dereference this null Value and crash in the verifier.
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144
The early return for lamda expressions with deduced return types in
Sema::ActOnCapScopeReturnStmt meant that we were not actually perform
the required return type deduction for such lambdas when in a discarded
context.
This PR removes that early return allowing the existing return type
deduction steps to be performed.
Fixes#153884
Fix developed by, and
Co-authored-by: Corentin Jabot <corentinjabot@gmail.com>
This is in preparation of a follow-up change to stop traversing
unreachable blocks.
This is not NFC because of a subtlety of the early_inc. On a test case
like:
```
scf.if %cond {
"test.move_after_parent_op"() ({
"test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> ()
}) : () -> ()
}
```
We recursively traverse the nested regions, and process an op when the
region is done (post-order).
We need to pre-increment the iterator before processing an operation in
case it gets deleted. However
we can do this before or after processing the nested region. This
implementation does the latter.
Operations like:
%add = arith.addi %add, %add : i64
are legal in unreachable code. Unfortunately many patterns would be
unsafe to apply on such IR and can lead to crashes or infinite loops. To
avoid this we can remove unreachable blocks before attempting to apply
patterns.
We may have to do this also whenever the CFG is changed by a pattern, it
is left up for future work right now.
Fixes#153732
Using hex format allows to better interpret IDs:
the first digits represent the thread number, the last digits represent
the ID within a thread
The main change is in callback.h: PRIu64 -> PRIx64
The patch also guards RUN/CHECK lines in openmp/runtime/tests/ompt with clang-format on/off comments and clang-formats the directory.
---------
Co-authored-by: Kaloyan Ignatov <kaloyan.ignatov@rwth-aachen.de>
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.