16 Commits

Author SHA1 Message Date
Drew Kersnar
90e8c8e718
[InferAlignment] Propagate alignment between loads/stores of the same base pointer (#145733)
We can derive and upgrade alignment for loads/stores using other
well-aligned loads/stores. This optimization does a single forward pass through
each basic block and uses loads/stores (the alignment and the offset) to
derive the best possible alignment for a base pointer, caching the
result. If it encounters another load/store based on that pointer, it
tries to upgrade the alignment. The optimization must be a forward pass within a basic
block because control flow and exception throwing can impact alignment guarantees.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-08 12:05:29 -05:00
goldsteinn
69a798a996
Reapply "[Inliner] Propagate more attributes to params when inlining (#91101)" (2nd Attempt) (#112749)
Root cause of the bug was code hanging onto `range` attr after
changing BitWidth. This was fixed in PR #112633.
2024-10-17 20:28:47 -05:00
Arthur Eubanks
9e6d24f61f Revert "[Inliner] Propagate more attributes to params when inlining (#91101)"
This reverts commit ae778ae7ce72219270c30d5c8b3d88c9a4803f81.

Creates broken IR, see comments in #91101.
2024-10-16 21:21:34 +00:00
goldsteinn
ae778ae7ce
[Inliner] Propagate more attributes to params when inlining (#91101)
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**

Add support for propagating:
        - `derefereancable`
        - `derefereancable_or_null`
        - `align`
        - `nonnull`
        - `range`
    
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
2024-10-16 11:53:21 -05:00
Nikita Popov
1c298c9274 [InstCombine] Preserve nuw flags when merging geps
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.

For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
2024-09-13 11:15:22 +02:00
Hari Limaye
94473f4db6
[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.
2024-08-09 13:25:04 +01:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Nikita Popov
b31cd07de5 [Clang] Regenerate test checks (NFC)
The UTC output has changes slightly, regenerate tests to minimize
future diff.
2023-11-28 09:58:30 +01:00
Johannes Doerfert
d346c82435
[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383)
By associating the kernel environment with the generic kernel we can
access middle-end information easily, including the launch bounds ranges
that are acceptable. By constraining the number of threads accordingly,
we now obey the user-provided bounds that were passed via attributes.
2023-10-29 11:35:34 -07:00
Johannes Doerfert
31b91213bd [OpenMP] Unify the min/max thread/teams pathways
We used to pass the min/max threads/teams values through different paths
from the frontend to the middle end. This simplifies the situation by
passing the values once, only when we will create the KernelEnvironment,
which contains the values. At that point we also manifest the metadata,
as appropriate. Some footguns have also been removed, e.g., our target
check is now triple-based, not calling convention-based, as the latter
is dependent on the ordering of operations. The types of the values have
been unified to int32_t.
2023-10-29 10:53:20 -07:00
Amara Emerson
1a2e77cf9e Revert "Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner.""
This reverts commit 86bfeb906e3a95ae428f3e97d78d3d22a7c839f3.

This is a long time coming re-application that was originally reverted due to
regressions, unrelated to the actual inlining change. These regressions have since
been fixed due to another long-in-the-making change of a66051c6 landing.

Original commit message for reference:
---
    We have several situations where it's beneficial for code size to ensure that every
    call to always-inline functions are inlined before normal inlining decisions are
    made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
    it only does it on a per-SCC basis, rather than the whole module. Ensuring that
    all mandatory inlinings are done before any heuristic based decisions are made
    just makes sense.

    Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
    for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.

    This also fixes an exponential compile time blow up in
    https://github.com/llvm/llvm-project/issues/59126

    Differential Revision: https://reviews.llvm.org/D143624
---
2023-10-28 23:21:11 -07:00
Johannes Doerfert
5a64ae75b5 [OpenMP][NFC] Update clang OpenMP tests
Just re-running the script to make future updates easier
2023-08-23 10:40:31 -07:00
Matt Arsenault
a709c49d75 clang: Regenerate OpenMP tests
Avoid diffs from no longer hardcoding metadata checks
2023-07-11 18:28:10 -04:00
David Green
86bfeb906e Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner."
This seems to cause large regressions in existing code, as much as 75% slower
(4x the time taken). Small always inline functions seem to be used a lot in the
cmsis-dsp library.

I would add a phase ordering test to show the problems, but one already exists!
The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by
removing alwaysinline to hide the problems that existed.

This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f.
This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.
2023-02-10 15:01:49 +00:00
Amara Emerson
cae033dcf2 Inlining: Run the legacy AlwaysInliner before the regular inliner.
We have several situations where it's beneficial for code size to ensure that every
call to always-inline functions are inlined before normal inlining decisions are
made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this,
it only does it on a per-SCC basis, rather than the whole module. Ensuring that
all mandatory inlinings are done before any heuristic based decisions are made
just makes sense.

Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary
for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0.

This also fixes an exponential compile time blow up in
https://github.com/llvm/llvm-project/issues/59126

Differential Revision: https://reviews.llvm.org/D143624
2023-02-09 16:49:29 -08:00
Shilei Tian
ae53c7f4a2 [Clang][OpenMP] Fix the issue that a functor is not captured properly in a task region
This patch fixes the issue that a functor is not captured properly if
that is used in a task region. It was introduced by https://reviews.llvm.org/D114546
where `CallExpr` is treated specially, but the callee itself is not properly visited.
https://reviews.llvm.org/D115902 already did some fix for one case. This patch
fixes another case.

Fix #57757.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D141873
2023-01-16 22:35:05 -05:00