This is a step towards separating the set of available libcalls
from the lowering decision of which call to use. Libcall recognition
now directly checks availability instead of indirectly checking through
the lowering table.
This will eventually allow for removing llvm-project-tests.yml. This
should significantly reduce the complexity of these workflows at the
cost of a little bit of duplication standard to github actions.
Reviewers: michalpaszkowski, sudonatalie
Reviewed By: sudonatalie
Pull Request: https://github.com/llvm/llvm-project/pull/153869
The SymContentsTargetCommon kind introduced by
https://reviews.llvm.org/D61493 lackes significant and should be treated
as a regular common symbol with a different section index.
Update ELFObjectWriter to respect the specified section index.
The new representation also works with Hexagon's SHN_HEXAGON_SCOMMON.
The names "SymbolContents" and "SymContents*" members are confusing.
Rename to kind and Kind::XXX similar to lld/ELF/Symbols.h
Rename SymContentsVariable to Kind::Equated as the former term is
"equated symbol", not "variable".
Static analysis flagged that we were not checking the return value of
DiagnoseClassNameShadow when we did so everywhere else. Modifying this
case to match how other places uses it makes sense and does not change
behavior. Likely if this check fails later actions will fail as well but
it is more correct to exit early.
This patch makes GPU throughput benchmark results more comparable across
targets by disabling loop unrolling in the benchmark loop.
Motivation:
* PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX
shows the `throughput` loop unrolled 8x at `N=128` (one iteration
advances the input pointer by 64 bytes = 8 doubles), interleaving eight
independent chains before the back-edge. This hides latency and
significantly reduces cycles/call as the batch size `N` grows.
* Observed scaling (NVPTX measurements): with unrolling enabled, `sin`
dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After
enforcing `#pragma clang loop unroll(disable)`, results stabilized
(e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`).
* libdevice contrast: the libdevice `sin` path did not exhibit a similar
drop in our measurements, and the PTX appears as compact internal calls
rather than a long FMA chain, leaving less ILP for the outer loop to
extract.
What this change does:
* Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()`
loop in both NVPTX and AMDGPU backends.
Leaving unrolling entirely to the optimizer makes apples-to-apples
comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields
fairer, more consistent numbers.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.
Do a final run of simplifyRecipes before executing the VPlan.
We might create a local temporary variable for a ParmVarDecl, in which
case a DeclRefExpr for that ParmVarDecl should _still_ result in us
choosing the parameter, not that local.
In https://github.com/llvm/llvm-project/pull/100281, we use
`TaggedBlocks` to record blocks modified by `CheckIfPHIMatche()`, so do
not need to clear every block in `BlockList` if `CheckIfPHIMatches()`
match failed.
If `CheckIfPHIMatches()` match succeed, we can reuse `TaggedBlocks` to
only record matching PHIs for modified blocks, avoid checking every
block in `BlockList` to see if `PHITag` is set.
Add support for 1:N type conversions to the `ControlFlowToLLVM` lowering
patterns. Not applicable to `cf.switch` and `cf.assert`.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Add support for 1:N type conversions to the `FuncToLLVM` lowering
patterns. This commit does not change the lowering of any types (such as
`MemRefType`). It just sets up the infrastructure, such that 1:N type
conversions can be used during `FuncToLLVM`.
Note: When the converted result types of a `func.func` have more than 1
type, then the results are wrapped in an `llvm.struct`. That's because
`llvm.func` does not support multiple result values. This "wrapping" was
already implemented for cases where the original `func.func` has
multiple results. With 1:N conversions, even a single result can now
expand to multiple converted results, triggering the same wrapping
mechanism.
The test cases are exercised with both the old and the new no-rollback
conversion driver.
These are implicit vardecls which its type was never written in source
code. Don't create a TypeLoc and give it a fake source location.
The fake as-written type also didn't match the actual type, which after
fixing this gives some unrelated test churn on a CFG dump, since
statement printing prefers type source info if thats available.
Fixes https://github.com/llvm/llvm-project/issues/153649
This is a regression introduced in
https://github.com/llvm/llvm-project/pull/147835
This regression was never released, so no release notes are added.
Without this patch, we use NumNonEmpty, which keeps track of the
number of valid entries plus tombstones even though we have a separate
variable to keep track of the number of tombstones.
This patch simplifies the metadata. Specifically, it changes the name
and semantics of the variable to NumEntries to keep track of the
number of valid entries.
The difference in semantics requires some code changes aside from
mechanical replacements:
- size() just returns NumEntries.
- erase_imp() and remove_if() need to decrement NumEntries in the
large mode.
- insert_imp_big() increments NumEntries for successful insertions,
regardless of whether a tombstone is being replaced with a valid
entry. It also computes the number of non-tombstone empty slots as:
CurArraySize - NumEntries - NumTombstones
- Grow() no longer needs NumNonEmpty -= NumTombstones.
Overall, the resulting code should look more intuitive and more
consistent with DenseMapSet.
llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64
x i8>, <64 x i8>) are currently handled strictly, which is suboptimal.
llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and
llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically
handled using maybeHandleSimpleNomemIntrinsic, which is incorrect.
Since the second argument is the shuffle order, we instrument all these
intrinsics using `handleIntrinsicByApplyingToShadow(...,
/*trailingVerbatimArgs=*/1)`
(https://github.com/llvm/llvm-project/pull/114490).
Auto-generated decoder fails to add the $sgp10 operand because it has no
encoding bits.
Work around this by adding the missing operand after decoding is
complete.
Fixes#153829.
This patch refactors DXILABI to remove the dependency on scope printer.
Closes: #153827
---------
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
As a member of github.com/orgs/llvm/teams/pr-subscribers-llvm-mc , I was
not notified about PR #149935.
This commit introduces the `llvm:mc` label to cover the generic MC
interface, excluding target-specific MCTargetDesc files.
- Rename the `mc` label to `llvm:mc` for consistency with other LLVM
subdirectory labels.
- Exclude `llvm/test/MC` from the label scope, as it contains many
target-specific directories.
Admin: please change the name of
https://github.com/orgs/llvm/teams/pr-subscribers-llvm-mc
to "pr-subscribers-llvm:mc", similar to pr-subscribers-llvm:ir
This patch provides cleanups and improvements for the GPU benchmarking
infrastructure. The key changes are:
- Fix benchmark convergence bug: Round up the scaled iteration count
(ceil) to ensure it grows properly. The previous truncation logic causes
the iteration count to get stuck.
- Resolve remaining compiler warning.
- Remove unused `BenchmarkLogger` files: This is dead code that added
maintenance and cognitive overhead without providing functionality.
- Improve build hygiene: Clean up headers and CMake dependencies to
strictly follow the 'include what you use' (IWYU) principle.
This simply updates the pass's cognizance of these instructions, and for
the
most part the hazards where they might be encountered do not exist for
gfx12.
Nonetheless, encountering them has to be checked for as doing so would
indicate
a compiler error.
Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
---------
Co-authored-by: Stephen Thomas <Stephen.Thomas@amd.com>
This commits makes the docs more precise, clarifying how scopes affect
the result of a method, as well as documenting a parameter of a
different method.