This patch intersects attributes of two calls to avoid introducing UB.
It also skips incompatible call pairs in GVN/NewGVN. However, I cannot
provide negative tests for these changes.
Fixes https://github.com/llvm/llvm-project/issues/113997.
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.
Related issue: https://github.com/llvm/llvm-project/issues/113711
Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.
Note: I had created the same PR and merged
(https://github.com/llvm/llvm-project/pull/113724), but reverted caused
by the merging issue. (The CI issue happened in 3 A.M. at my timezone.
So, I need to fall asleep again after I replied about why issue
happened.) So, I rebased to the latest main branch and recreate the PR
and hope I won't have the third time to create the same PR.
I hope @arsenm can help me review the code again. I’m sorry for that.
Kenji Mouri
This patch is a part of step-by-step refactoring of CloneFunctionInto.
The goal is to extract reusable pieces out of it that will be later used
to optimize function cloning e.g. in coroutine processing.
Extracted from #109032 (commit 2)
The spec for llvm.experimental.convergence.entry says that is must be in
the entry block for a function, and must preceed any other convergent
operation. It does not have to be the first instruction in the entry
block.
Inlining assumes that the call to llvm.experimental.convergence.entry
will be the first instruction after any phi instructions. This commit
modifies inlining to search the entire block for the call.
# What
This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.
# Why?
- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.
We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.
# Concerns
- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
The IR lowering of memcpy/memmove intrinsics uses a target-specific type
for its load/store operations. So far, the loaded and stored addresses
are computed with GEPs based on this type. That is wrong if the
allocation size of the type differs from its store size: The width of
the accesses is determined by the store size, while the GEP stride is
determined by the allocation size. If the allocation size is greater
than the store size, some bytes are not copied/moved.
This patch changes the GEPs to use i8 addressing, with offsets based on
the type's store size. The correctness of the lowering therefore no
longer depends on the type's allocation size.
This is in support of PR #112332, which allows adjusting the memcpy loop
lowering type through a command line argument in the AMDGPU backend.
Add the llvm-canon tool. Description from the [original
PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu):
> Added a new llvm-canon tool which aims to transform LLVM Modules into
a canonical form by reordering and renaming instructions while
preserving the same semantics. This tool makes it easier to spot
semantic differences while diffing two modules which have undergone
different transformation passes.
The current version of this tool can:
- Reorder instructions within a function.
- Rename instructions based on the operands.
- Sort commutative operands.
This code was originally written by @michalpaszkowski and [submitted to
mainline
LLVM](14d358537f).
However, it was quickly
[reverted](335de55fa3)
to do BuildBot errors.
Michal presented his version of the tool in [LLVM-Canon: Shooting for
Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg).
@AidanGoldfarb and I ported the code to the new pass manager, added more
tests, and fixed some bugs related to PHI nodes that may have been the
root cause of the BuildBot errors that caused the patch to be reverted.
Additionally, we rewrote the implementation of instruction reordering to
fix cases where the original algorithm would break use-def chains.
Note that this is @AidanGoldfarb and I's first time submitting to LLVM.
Please liberally critique the PR!
CC @plotfi for initial review.
---------
Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>
This patch adds basic support for `scalbln, scalblnf, scalblnl, scalbn,
scalbnf, scalbnl`. Constant folding support will be submitted in a
subsequent patch.
Related issue: <#112631>
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**
Add support for propagating:
- `derefereancable`
- `derefereancable_or_null`
- `align`
- `nonnull`
- `range`
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
- **[Inliner] Add tests for bad propagationg of access attr for `byval`
param; NFC**
- **[Inliner] Don't propagate access attr to `byval` params**
We previously only handled the case where the `byval` attr was in the
callbase's param attr list. This PR also handles the case if the
`ByVal` was a param attr on the function's param attr list.
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Similar to 112aac4e8961b9626bb84f36deeaa5a674f03f5a, this converts log
libcalls to llvm.log.f64 intrinsics if we know they do not set errno, as
the input is not zero and not negative. As log will produce errno if the
input is 0 (returning -inf) or if the input is negative (returning nan),
we also perform the conversion when we have noinf and nonan.
In some pathological cases this optimization can spend an unreasonable
amount of time populating the set for predecessors of the successor
block. This change sinks some of that initializing to the point where
it's actually necessary so we can take advantage of the existing
early-exits.
rdar://137063034
We weren't flagging inlined callee functions with callsite but not
memprof metadata correctly, leading to the callsite metadata not being
stripped when that function was inlined into a callsite that didn't
itself have callsite metadata.
In practice, this meant that we went into the LTO link with many more
calls than necessary having callsite metadata / summary records, which
in turn made the graph larger than necessary.
Fixing this oversight resulted in huge reductions in the thin link of a
large target:
99% fewer duplicated context ids (recall we have to duplicate when
callsites containing the same stack ids are in different functions)
71% fewer graph edges
17% fewer graph nodes
13% fewer functions cloned
44% smaller peak memory
47% smaller time
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.
Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
…ead of #ifdef (#110883)"
This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which
causes lots of failures where LLVM doesn't have the right header guards.
The errors can be seen on
[BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04),
where the source uses `#ifndef NDEBUG`, but the content in question is
defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`.
For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h`
has the following:
```cpp
// Helper struct used during edge insertions.
struct InsertionInfo {
// ...
#ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS
SmallVector<TreeNodePtr, 8> VisitedUnaffected;
#endif
};
// ...
InsertionInfo II;
// ...
#ifndef NDEBUG
II.VisitedUnaffected.push_back(SuccTN);
#endif
```
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
Some (many) attributes can safely be dropped to enable sinking. For
example removing `nonnull` on a return/param can't affect correctness.
Closes#109472
Since no passes compute DependenceAnalysis via the PassManager, there is
no value in preserving it here. Hence, strip the unnecessary dependency
on DependenceAnalysis.
As pointed out in the review of #102133, SCEVExpander currently
incorrectly reuses GEP instructions that have poison-generating flags
set. Fix this by clearing the flags on the reused instruction.
Currently DSE unconditionally emits `calloc` as returning a pointer to
AS0. However, this is incorrect for targets that have a non-zero default
AS, as it'd not match the `malloc` signature. This patch addresses that
by piping through the AS for the pointer returned by `malloc` into the
`calloc` insertion call.
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.