7688 Commits

Author SHA1 Message Date
Harald van Dijk
23209eb1d9 Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)"
This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.
2025-02-12 17:50:39 +00:00
Harald van Dijk
3ec9f7494b
[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
2025-02-12 17:38:59 +00:00
Alireza Torabian
3c74430320
[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615)
Parameter PossiblyLoopIndependent has lost its intended purpose. This
flag is always set to true in all cases when depends() is called, hence
we want to reconsider the utility of this variable and remove it from
the function signature entirely. This is an NFC patch.
2025-02-11 16:23:28 -05:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Ramkumar Ramachandra
d0f472c246
SimplifyIndVar: teach widenLoopCompare about samesign (#125851)
Proof: https://alive2.llvm.org/ce/z/NVXaeo
2025-02-06 11:13:01 +00:00
Yingwei Zheng
9fbd5fbcc6
[IR][NFC] Switch to use LifetimeIntrinsic (#125528) 2025-02-04 02:18:33 +08:00
Yingwei Zheng
84ba55d18b
[NFC][SimplifyCFG] Refactor passingValueIsAlwaysUndefined to work on Use (#125519)
Address comment
https://github.com/llvm/llvm-project/pull/125383#discussion_r1938436526
2025-02-04 00:42:25 +08:00
Ramkumar Ramachandra
88f858d858
IndVarSimplify: thread CmpPredicate to SCEV (NFC) (#125240)
Relevant parts of ScalarEvolution's API accept a CmpPredicate instead of
a CmpInst::Predicate after 60dc450 (SCEV: migrate to CmpPredicate
(NFC)). After auditing the callers of these APIs, it was found that
IndVarSimplify was dropping samesign information. Fix this.
2025-01-31 18:07:44 +00:00
Lou
c75b251103
[GVN] Load-store forwaring of scalable store to fixed load. (#124748)
When storing a scalable vector and loading a fixed-size vector, where
the
scalable vector is known to be larger based on vscale_range, perform
store-to-load forwarding through temporary @llvm.vector.extract calls.
InstCombine then folds the insert/extract pair away.

The usecase is shown in https://godbolt.org/z/KT3sMrMbd, which shows
that clang generates IR that matches this pattern when the
"arm_sve_vector_bits" attribute is used:

```c
typedef svfloat32_t svfloat32_fixed_t
__attribute__((arm_sve_vector_bits(512)));

struct svfloat32_wrapped_t {
  svfloat32_fixed_t v;
};

static inline svfloat32_wrapped_t
add(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
  return {svadd_f32_x(svptrue_b32(), a.v, b.v)};
}

svfloat32_wrapped_t
foo(svfloat32_wrapped_t a, svfloat32_wrapped_t b) {
  // The IR pattern this patch matches is generated for this return:
  return add(a, b);
}
```
2025-01-30 11:49:42 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Florian Hahn
ad9da92cf6
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.

This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.

PR: https://github.com/llvm/llvm-project/pull/124462
2025-01-27 21:20:04 +00:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00
Jeremy Morse
6292a808b3
[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites (#123737)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.

This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).

---------

Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24 13:27:56 +00:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Artem Pianykh
196f7c2a4f
[Utils] Identity map module-level debug info on first use in CloneFunction* (#118627)
Summary:
To avoid cloning module-level debug info (owned by the module rather
than the function), CloneFunction implementation used to eagerly
identity map such debug info into ValueMap's MD map. In larger modules
with meaningful volume of debug info this gets very expensive.

By passing such debug info metadata via an IdentityMD set for the
ValueMapper to map on first use, we get several benefits:

1. Mapping metadata is not cheap, particularly because of tracking. When
   cloning a Function we identity map lots of global module-level
   metadata to avoid cloning it, while only a fraction of it is actually
   used by the function. Mapping on first use is a lot faster for
   modules with meaningful amount of debug info.

2. Eagerly identity mapping metadata makes it harder to cache
   module-level data (e.g. a set of metadata nodes in a \a DICompileUnit).
   With this patch we can cache certain module-level metadata
   calculations to speed things up further.

Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up
CoroSplitPass which is one of the heavier users of cloning:

|                 | Baseline | IdentityMD set |
|-----------------|----------|----------------|
| CoroSplitPass   | 306ms    | 221ms          |
| CoroCloner      | 101ms    | 72ms           |
| Speed up        | 1x       | 1.4x           |

Test Plan:
ninja check-llvm-unit
ninja check-llvm
2025-01-24 08:38:38 +00:00
David Green
775d0f36f7
[GVN] Handle scalable vectors with the same size in VNCoercion (#123984)
This allows us to forward to a load even if the types do not match
(nxv4i32 vs nxv2i64 for example). Scalable types are allowed in
canCoerceMustAliasedValueToLoad so long as the size (minelts *
scalarsize) is the same, and some follow-on code is adjusted to make
sure it handles scalable sizes correctly. Methods like
analyzeLoadFromClobberingWrite and analyzeLoadFromClobberingStore still
do nothing for scalable vectors, as Offsets and mismatching types are
not supported.
2025-01-23 18:43:50 +00:00
Joshua Cranmer
b0d35cf22e
[SSAUpdater] Avoid scanning basic blocks to find instruction order. (#123803)
This fixes a compile-time regression caused by #116645, where an entry
basic block with a very large number of allocas and other instructions
caused SROA to take ~100× its expected runtime, as every alloca (with ~2
uses) now calls this method to find the order of those few instructions,
rescanning the very large basic block every single time.

Since this code was originally written, Instructions now have ordering
numbers available to determine relative order without unnecessarily
scanning the basic block.
2025-01-22 11:11:41 -05:00
Mats Larsen
b95ed30ea2
[IR] Remove unused variables from #123617
Failed to notice them when landing that patch - apologies!
2025-01-21 01:47:38 +09:00
Mats Jun Larsen
416f1c465d
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
2025-01-21 00:32:56 +09:00
Nikita Popov
193ea83dd7 [SimplifyLibCalls] Don't infer call-site nocapture on atoi
This is already inferred on the function declaration by BLC, there
is no need to also do it at the call-site.
2025-01-14 17:13:45 +01:00
Pedro Lobo
7e19103895
[DebugInfo] Map VAM args to poison instead of undef [NFC] (#122756)
If an argument cannot be mapped in `Mapper::mapValue`, it can be mapped
to `poison` instead of `undef`.
2025-01-13 21:38:40 +00:00
Nikita Popov
22e9024c9f
[IR] Introduce captures attribute (#116990)
This introduces the `captures` attribute as described in:
https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420

This initial patch only introduces the IR/bitcode support for the
attribute and its in-memory representation as `CaptureInfo`. This will
be followed by a patch to upgrade and remove the `nocapture` attribute,
and then by actual inference/analysis support.

Based on the RFC feedback, I've used a syntax similar to the `memory`
attribute, though the only "location" that can be specified is `ret`.

I've added some pretty extensive documentation to LangRef on the
semantics. One non-obvious bit here is that using ptrtoint will not
result in a "return-only" capture, even if the ptrtoint result is only
used in the return value. Without this requirement we wouldn't be able
to continue ordinary capture analysis on the return value.
2025-01-13 14:40:25 +01:00
Nikita Popov
71f7b972c3
[Local] Make combineAAMetadata() more principled (#122091)
This moves combineAAMetadata() into Local and implements it via a new
AAOnly flag, which will intersect only AA metadata and keep other known
metadata.

The existing KnownIDs list is dropped, because it is redundant with the
switch in combineMetadata(), which already drops unknown metadata.

I tried a few variants of this, and ultimately went with the AAOnly flag
because this way we make an explicit choice for each metadata kind
supported by combineMetadata(), and ignoring the flag gives you
conservatively correct behavior.

I checked that the memcpy tests still pass if we adjust the logic for
MD_memprof/MD_callsite to drop the metadata instead of arbitrarily
picking one.

Fixes https://github.com/llvm/llvm-project/issues/121495.
2025-01-09 09:34:46 +01:00
Akshat Oke
f6c76d5180
[PM] Remove is_analysis label for LoopSimplify (#121433)
This reverts part of the changes in #118779
2025-01-09 10:11:14 +05:30
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Mircea Trofin
4312075efa
[nfc][thinlto] remove unnecessary return from renameModuleForThinLTO (#121851)
Same goes for `FunctionImportGlobalProcessing::run`.

The return value was used, but it was always `false`.
2025-01-06 15:19:09 -08:00
Yingwei Zheng
a77346bad0
[IRBuilder] Refactor FMF interface (#121657)
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.

This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2025-01-06 14:37:04 +08:00
Fangrui Song
e6f76378c2
EntryExitInstrumenter: skip available_externally linkage
gnu::always_inline functions, which lower to available_externally, may
not have definitions external to the module. -finstrument-function
family options instrumentating the function (which takes the function
address) may lead to a linker error if the function is not optimized
out, e.g.

```
// -std=c++17 or above with libstdc++
 #include <string>
std::string str;
int main() {}
```

Simplified reproduce:
```
template <typename T>
struct A {
  [[gnu::always_inline]] T bar(T a) { return a * 2; }
};
extern template class A<int>;
int main(int argc, char **argv) {
  return A<int>().bar(argc);
}
```

GCC's -finstrument-function instrumentation skips such functions
(https://gcc.gnu.org/PR78333). Let's skip such functions
(available_externally) as well.

Fix #50742

Pull Request: https://github.com/llvm/llvm-project/pull/121452
2025-01-03 09:25:08 -08:00
Teresa Johnson
3a423a10ff
[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359)
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.

All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.

Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.

For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.

Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.

There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
2025-01-02 12:11:59 -08:00
Yingwei Zheng
eafbab6fac
[EntryExitInstrumenter][AArch64][RISCV][LoongArch] Pass __builtin_return_address(0) into _mcount (#121107)
On RISC-V, AArch64, and LoongArch, the `_mcount` function takes
`__builtin_return_address(0)` as an argument since
`__builtin_return_address(1)` is not available on these platforms. This
patch fixes the argument passing to match the behavior of glibc/gcc.

Closes https://github.com/llvm/llvm-project/issues/121103.
2025-01-01 15:02:08 +08:00
DaPorkchop_
cea738bc9a
[SimplifyCFG] Replace unreachable switch lookup table holes with poison (#94990)
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.

---------

Co-authored-by: DianQK <dianqk@dianqk.net>
2024-12-26 07:47:26 +08:00
Owen Anderson
bc8fa9c443
Revert "SimplifyLibCalls: Use default globals address space when building new global strings. (#118729)" (#119616)
This reverts commit cfa582e8aaa791b52110791f5e6504121aaf62bf.
2024-12-21 09:33:39 +13:00
Dominik Steenken
fa9cef50b1
Only guard loop metadata that has non-debug info in it (#118825)
This PR is motivated by a mismatch we discovered between compilation
results with vs. without `-g3`. We noticed this when compiling SPEC2017
testcases. The specific instance we saw is fixed in this PR by modifying
a guard (see below), but it is likely similar instances exist elsewhere
in the codebase.

The specific case fixed in this PR manifests itself in the `SimplifyCFG`
pass doing different things depending on whether DebugInfo is generated
or not. At the end of this comment, there is reduced example code that
shows the behavior in question.

The differing behavior has two root causes:
1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop
metadata including debug locations to loops that otherwise would not
have loop metadata
2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds
a guard to a simplification action in `SImplifyCFG` that prevents it
from simplifying away loop metadata

So, the change in 2. does not consider that when compiling with debug
symbols, loops that otherwise would not have metadata that needs
preserving, now have debug locations in their loop metadata. Thus, with
`-g3`, `SimplifyCFG` behaves differently than without it.

The larger issue is that while debug info is not supposed to influence
the final compilation result, commits like 1. blur the line between what
is and is not debug info, and not all optimization passes account for
this.

This PR does not address that and rather just modifies this particular
guard in order to restore equivalent behavior between debug and
non-debug builds in this one instance.

---

Here is a reduced version of a file from `f526.blender_r` that showcases
the behavior in question:
```C
struct LinkNode;
typedef struct LinkNode {
 struct LinkNode *next;
 void *link;
} LinkNode;

void do_projectpaint_thread_ph_v_state() {
  int *ps = do_projectpaint_thread_ph_v_state;
  LinkNode *node;
  while (do_projectpaint_thread_ph_v_state)
    for (node = ps; node; node = node->next)
      ;
}
```
Compiling this with and without DebugInfo, and then disassembling the
results, leads to different outcomes (tested on SystemZ and X86). The
reason for this is that the `SimplifyCFG` pass does different things in
either case.
2024-12-20 15:15:51 +01:00
Abhay Kanhere
cc246d4a29
[Transforms][CodeExtraction] bug fix regions with stackrestore (#118564)
Ensure code extraction for outlining to a function does not create a function with stacksave of caller to restore stack (e.g. tail call).
2024-12-19 09:19:11 -07:00
Florian Hahn
a487b792e2
[TySan] Add initial Type Sanitizer (LLVM) (#76259)
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76259
2024-12-17 13:57:34 +00:00
Artem Pianykh
fbdbb13d5b
[NFC][Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap (#118625)
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-17 08:57:59 +00:00
Artem Pianykh
8402a0fab0
[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (#118624)
Summary:
This and previously extracted `CloneFunction*Into` functions will be used in later diffs.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 22:30:56 +00:00
Artem Pianykh
a9237b1a10
[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto (#118623)
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 20:50:05 +00:00
Vedant Paranjape
b21fa18b44
[LoopVersioning] Add a check to see if the input loop is in LCSSA form (#116443)
Loop Optimizations expect the input loop to be in LCSSA form. But it
seems that LoopVersioning doesn't have any check to see if the loop is
actually in LCSSA form. As a result, if we give it a loop which is not
in LCSSA form but still correct semantically, the resulting
transformation fails to pass through verifier pass with the following
error.

Instruction does not dominate all uses!
%inc = add nsw i16 undef, 1
store i16 %inc, ptr @c, align 1

As the loop is not in LCSSA form, LoopVersioning's transformations leads
to invalid IR! As some instructions do not dominate all their uses.

This patch checks if a loop is in LCSSA form, if not it will call
formLCSSARecursively on the loop before passing it to LoopVersioning.

Fixes: #36998
2024-12-16 11:55:19 -05:00
David Green
0032c151dc [SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-14 18:07:21 +00:00
Florian Hahn
c4a78b6fe3
[SimplifyCFG] Always allow hoisting if all instructions match. (#97158)
Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to
`AllInstsEqOnly` and always allow hoisting if all instructions match.

In that case, all instructions can be hoisted and the
original branch will be replaced and selects for PHIs are added. This
allows preserving metadata in more cases, using the existing hoisting
logic, whereas previously FoldTwoEntryPHINode would drop the metadata.


https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/97158
2024-12-13 21:26:27 +00:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
Antonio Frighetto
d26df32255 [SimplifyCFG] Consider preds to switch in simplifyDuplicateSwitchArms
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
2024-12-13 09:07:24 +01:00
Kirill Stoimenov
e3676aa21f Revert "[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)"
Causing buffer overflow:

SUMMARY: AddressSanitizer: heap-buffer-overflow llvm/lib/Transforms/Scalar/SROA.cpp:5552:35

This reverts commit 5e247d726d7a54cf0acc997bc17b50e7494e6fa3.
2024-12-12 21:32:35 +00:00
David Green
5e247d726d
[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-12 10:27:27 +00:00
Nikita Popov
5013c81b78
[GlobalOpt][Evaluator] Don't evaluate calls with signature mismatch (#119548)
The global ctor evaluator tries to evalute function calls where the call
function type and function type do not match, by performing bitcasts.
This currently causes a crash when calling a void function with non-void
return type.

I've opted to remove this functionality entirely rather than fixing this
specific case. With opaque pointers, there shouldn't be a legitimate use
case for this anymore, as we don't need to look through pointer type
casts. Doing other bitcasts is very iffy because it ignores ABI
considerations. We should at least leave adjusting the signatures to
make them line up to InstCombine (which also does some iffy things, but
is at least somewhat more constrained).

Fixes https://github.com/llvm/llvm-project/issues/118725.
2024-12-12 10:44:52 +01:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00