This will eventually allow for removing llvm-project-tests.yml. This
should significantly reduce the complexity of these workflows at the
cost of a little bit of duplication standard to github actions.
Reviewers: michalpaszkowski, sudonatalie
Reviewed By: sudonatalie
Pull Request: https://github.com/llvm/llvm-project/pull/153869
The SymContentsTargetCommon kind introduced by
https://reviews.llvm.org/D61493 lackes significant and should be treated
as a regular common symbol with a different section index.
Update ELFObjectWriter to respect the specified section index.
The new representation also works with Hexagon's SHN_HEXAGON_SCOMMON.
The names "SymbolContents" and "SymContents*" members are confusing.
Rename to kind and Kind::XXX similar to lld/ELF/Symbols.h
Rename SymContentsVariable to Kind::Equated as the former term is
"equated symbol", not "variable".
Static analysis flagged that we were not checking the return value of
DiagnoseClassNameShadow when we did so everywhere else. Modifying this
case to match how other places uses it makes sense and does not change
behavior. Likely if this check fails later actions will fail as well but
it is more correct to exit early.
This patch makes GPU throughput benchmark results more comparable across
targets by disabling loop unrolling in the benchmark loop.
Motivation:
* PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX
shows the `throughput` loop unrolled 8x at `N=128` (one iteration
advances the input pointer by 64 bytes = 8 doubles), interleaving eight
independent chains before the back-edge. This hides latency and
significantly reduces cycles/call as the batch size `N` grows.
* Observed scaling (NVPTX measurements): with unrolling enabled, `sin`
dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After
enforcing `#pragma clang loop unroll(disable)`, results stabilized
(e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`).
* libdevice contrast: the libdevice `sin` path did not exhibit a similar
drop in our measurements, and the PTX appears as compact internal calls
rather than a long FMA chain, leaving less ILP for the outer loop to
extract.
What this change does:
* Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()`
loop in both NVPTX and AMDGPU backends.
Leaving unrolling entirely to the optimizer makes apples-to-apples
comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields
fairer, more consistent numbers.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.
Do a final run of simplifyRecipes before executing the VPlan.
We might create a local temporary variable for a ParmVarDecl, in which
case a DeclRefExpr for that ParmVarDecl should _still_ result in us
choosing the parameter, not that local.
In https://github.com/llvm/llvm-project/pull/100281, we use
`TaggedBlocks` to record blocks modified by `CheckIfPHIMatche()`, so do
not need to clear every block in `BlockList` if `CheckIfPHIMatches()`
match failed.
If `CheckIfPHIMatches()` match succeed, we can reuse `TaggedBlocks` to
only record matching PHIs for modified blocks, avoid checking every
block in `BlockList` to see if `PHITag` is set.
Add support for 1:N type conversions to the `ControlFlowToLLVM` lowering
patterns. Not applicable to `cf.switch` and `cf.assert`.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Add support for 1:N type conversions to the `FuncToLLVM` lowering
patterns. This commit does not change the lowering of any types (such as
`MemRefType`). It just sets up the infrastructure, such that 1:N type
conversions can be used during `FuncToLLVM`.
Note: When the converted result types of a `func.func` have more than 1
type, then the results are wrapped in an `llvm.struct`. That's because
`llvm.func` does not support multiple result values. This "wrapping" was
already implemented for cases where the original `func.func` has
multiple results. With 1:N conversions, even a single result can now
expand to multiple converted results, triggering the same wrapping
mechanism.
The test cases are exercised with both the old and the new no-rollback
conversion driver.
These are implicit vardecls which its type was never written in source
code. Don't create a TypeLoc and give it a fake source location.
The fake as-written type also didn't match the actual type, which after
fixing this gives some unrelated test churn on a CFG dump, since
statement printing prefers type source info if thats available.
Fixes https://github.com/llvm/llvm-project/issues/153649
This is a regression introduced in
https://github.com/llvm/llvm-project/pull/147835
This regression was never released, so no release notes are added.
Without this patch, we use NumNonEmpty, which keeps track of the
number of valid entries plus tombstones even though we have a separate
variable to keep track of the number of tombstones.
This patch simplifies the metadata. Specifically, it changes the name
and semantics of the variable to NumEntries to keep track of the
number of valid entries.
The difference in semantics requires some code changes aside from
mechanical replacements:
- size() just returns NumEntries.
- erase_imp() and remove_if() need to decrement NumEntries in the
large mode.
- insert_imp_big() increments NumEntries for successful insertions,
regardless of whether a tombstone is being replaced with a valid
entry. It also computes the number of non-tombstone empty slots as:
CurArraySize - NumEntries - NumTombstones
- Grow() no longer needs NumNonEmpty -= NumTombstones.
Overall, the resulting code should look more intuitive and more
consistent with DenseMapSet.
llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64
x i8>, <64 x i8>) are currently handled strictly, which is suboptimal.
llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and
llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically
handled using maybeHandleSimpleNomemIntrinsic, which is incorrect.
Since the second argument is the shuffle order, we instrument all these
intrinsics using `handleIntrinsicByApplyingToShadow(...,
/*trailingVerbatimArgs=*/1)`
(https://github.com/llvm/llvm-project/pull/114490).
Auto-generated decoder fails to add the $sgp10 operand because it has no
encoding bits.
Work around this by adding the missing operand after decoding is
complete.
Fixes#153829.
This patch refactors DXILABI to remove the dependency on scope printer.
Closes: #153827
---------
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
As a member of github.com/orgs/llvm/teams/pr-subscribers-llvm-mc , I was
not notified about PR #149935.
This commit introduces the `llvm:mc` label to cover the generic MC
interface, excluding target-specific MCTargetDesc files.
- Rename the `mc` label to `llvm:mc` for consistency with other LLVM
subdirectory labels.
- Exclude `llvm/test/MC` from the label scope, as it contains many
target-specific directories.
Admin: please change the name of
https://github.com/orgs/llvm/teams/pr-subscribers-llvm-mc
to "pr-subscribers-llvm:mc", similar to pr-subscribers-llvm:ir
This patch provides cleanups and improvements for the GPU benchmarking
infrastructure. The key changes are:
- Fix benchmark convergence bug: Round up the scaled iteration count
(ceil) to ensure it grows properly. The previous truncation logic causes
the iteration count to get stuck.
- Resolve remaining compiler warning.
- Remove unused `BenchmarkLogger` files: This is dead code that added
maintenance and cognitive overhead without providing functionality.
- Improve build hygiene: Clean up headers and CMake dependencies to
strictly follow the 'include what you use' (IWYU) principle.
This simply updates the pass's cognizance of these instructions, and for
the
most part the hazards where they might be encountered do not exist for
gfx12.
Nonetheless, encountering them has to be checked for as doing so would
indicate
a compiler error.
Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
---------
Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
This commits makes the docs more precise, clarifying how scopes affect
the result of a method, as well as documenting a parameter of a
different method.
This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e.,
relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted
because it was subtly wrong: multiplying an uninitialized zero should
not result in an initialized zero.
This reland fixes the issue by using instrumentation analogous to
visitAnd (bitwise AND of an initialized zero and an uninitialized value
results in an initialized value). Additionally, this reland expands a
test case; fixes the commit message; and optimizes the change to avoid
the need for horizontalReduce.
The current instrumentation has false positives: it does not take into
account that multiplying an initialized zero value with an uninitialized
value results in an initialized zero value This change fixes the issue
during the multiplication step. The horizontal add step is modeled using
bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent
intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX
VNNI intrinsics.
Per discussion with @ojhunt and @AaronBallman we are moving towards
predefined macros and away from __has_feature and __has_extension
for detecting sanitizers and other similar features. The rationale
is that __has_feature is only really meant for standardized features
(see the comment at the top of clang/include/clang/Basic/Features.def),
and __has_extension has the issues discovered as part of #153104.
Let's start by defining macros for ASan, HWASan and TSan, consistently
with gcc.
Reviewers: vitalybuka, ojhunt, AaronBallman, fmayer
Reviewed By: fmayer, vitalybuka
Pull Request: https://github.com/llvm/llvm-project/pull/153888
This adds support for printing the signature sections as part of the
`-p` flag for printing private headers.
The formatting aims to roughly match the formatting used by DXC's
`/dumpbin` flag.
The original version's printed output left some trailing whitespace on
lines, which caused the tests to fail with the strict whitespace
matching.
Re-lands #152531.
Resolves#152380.
In relation to the approval and merge of the
[PRIF](https://github.com/llvm/llvm-project/pull/76088) specification
about multi-image features in Flang, here is a first PR to add support
for the `-fcoarray` compilation flag and the initialization of the PRIF
environment.
Other PRs will follow for adding support of lowering to PRIF.