7663 Commits

Author SHA1 Message Date
Nikita Popov
71f7b972c3
[Local] Make combineAAMetadata() more principled (#122091)
This moves combineAAMetadata() into Local and implements it via a new
AAOnly flag, which will intersect only AA metadata and keep other known
metadata.

The existing KnownIDs list is dropped, because it is redundant with the
switch in combineMetadata(), which already drops unknown metadata.

I tried a few variants of this, and ultimately went with the AAOnly flag
because this way we make an explicit choice for each metadata kind
supported by combineMetadata(), and ignoring the flag gives you
conservatively correct behavior.

I checked that the memcpy tests still pass if we adjust the logic for
MD_memprof/MD_callsite to drop the metadata instead of arbitrarily
picking one.

Fixes https://github.com/llvm/llvm-project/issues/121495.
2025-01-09 09:34:46 +01:00
Akshat Oke
f6c76d5180
[PM] Remove is_analysis label for LoopSimplify (#121433)
This reverts part of the changes in #118779
2025-01-09 10:11:14 +05:30
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Mircea Trofin
4312075efa
[nfc][thinlto] remove unnecessary return from renameModuleForThinLTO (#121851)
Same goes for `FunctionImportGlobalProcessing::run`.

The return value was used, but it was always `false`.
2025-01-06 15:19:09 -08:00
Yingwei Zheng
a77346bad0
[IRBuilder] Refactor FMF interface (#121657)
Up to now, the only way to set specified FMF flags in IRBuilder is to
use `FastMathFlagGuard`. It makes the code ugly and hard to maintain.

This patch introduces a helper class `FMFSource` to replace the original
parameter `Instruction *FMFSource` in IRBuilder. To maximize the
compatibility, it accepts an instruction or a specified FMF.
This patch also removes the use of `FastMathFlagGuard` in some simple
cases.

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au
2025-01-06 14:37:04 +08:00
Fangrui Song
e6f76378c2
EntryExitInstrumenter: skip available_externally linkage
gnu::always_inline functions, which lower to available_externally, may
not have definitions external to the module. -finstrument-function
family options instrumentating the function (which takes the function
address) may lead to a linker error if the function is not optimized
out, e.g.

```
// -std=c++17 or above with libstdc++
 #include <string>
std::string str;
int main() {}
```

Simplified reproduce:
```
template <typename T>
struct A {
  [[gnu::always_inline]] T bar(T a) { return a * 2; }
};
extern template class A<int>;
int main(int argc, char **argv) {
  return A<int>().bar(argc);
}
```

GCC's -finstrument-function instrumentation skips such functions
(https://gcc.gnu.org/PR78333). Let's skip such functions
(available_externally) as well.

Fix #50742

Pull Request: https://github.com/llvm/llvm-project/pull/121452
2025-01-03 09:25:08 -08:00
Teresa Johnson
3a423a10ff
[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359)
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.

All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.

Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.

For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.

Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.

There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
2025-01-02 12:11:59 -08:00
Yingwei Zheng
eafbab6fac
[EntryExitInstrumenter][AArch64][RISCV][LoongArch] Pass __builtin_return_address(0) into _mcount (#121107)
On RISC-V, AArch64, and LoongArch, the `_mcount` function takes
`__builtin_return_address(0)` as an argument since
`__builtin_return_address(1)` is not available on these platforms. This
patch fixes the argument passing to match the behavior of glibc/gcc.

Closes https://github.com/llvm/llvm-project/issues/121103.
2025-01-01 15:02:08 +08:00
DaPorkchop_
cea738bc9a
[SimplifyCFG] Replace unreachable switch lookup table holes with poison (#94990)
As discussed in #94468, this causes switch lookup table entries which
are unreachable to be poison instead of filling them with a value from
one of the reachable cases.

---------

Co-authored-by: DianQK <dianqk@dianqk.net>
2024-12-26 07:47:26 +08:00
Owen Anderson
bc8fa9c443
Revert "SimplifyLibCalls: Use default globals address space when building new global strings. (#118729)" (#119616)
This reverts commit cfa582e8aaa791b52110791f5e6504121aaf62bf.
2024-12-21 09:33:39 +13:00
Dominik Steenken
fa9cef50b1
Only guard loop metadata that has non-debug info in it (#118825)
This PR is motivated by a mismatch we discovered between compilation
results with vs. without `-g3`. We noticed this when compiling SPEC2017
testcases. The specific instance we saw is fixed in this PR by modifying
a guard (see below), but it is likely similar instances exist elsewhere
in the codebase.

The specific case fixed in this PR manifests itself in the `SimplifyCFG`
pass doing different things depending on whether DebugInfo is generated
or not. At the end of this comment, there is reduced example code that
shows the behavior in question.

The differing behavior has two root causes:
1. Commit https://github.com/llvm/llvm-project/commit/c07e19b adds loop
metadata including debug locations to loops that otherwise would not
have loop metadata
2. Commit https://github.com/llvm/llvm-project/commit/ac28efa6c100 adds
a guard to a simplification action in `SImplifyCFG` that prevents it
from simplifying away loop metadata

So, the change in 2. does not consider that when compiling with debug
symbols, loops that otherwise would not have metadata that needs
preserving, now have debug locations in their loop metadata. Thus, with
`-g3`, `SimplifyCFG` behaves differently than without it.

The larger issue is that while debug info is not supposed to influence
the final compilation result, commits like 1. blur the line between what
is and is not debug info, and not all optimization passes account for
this.

This PR does not address that and rather just modifies this particular
guard in order to restore equivalent behavior between debug and
non-debug builds in this one instance.

---

Here is a reduced version of a file from `f526.blender_r` that showcases
the behavior in question:
```C
struct LinkNode;
typedef struct LinkNode {
 struct LinkNode *next;
 void *link;
} LinkNode;

void do_projectpaint_thread_ph_v_state() {
  int *ps = do_projectpaint_thread_ph_v_state;
  LinkNode *node;
  while (do_projectpaint_thread_ph_v_state)
    for (node = ps; node; node = node->next)
      ;
}
```
Compiling this with and without DebugInfo, and then disassembling the
results, leads to different outcomes (tested on SystemZ and X86). The
reason for this is that the `SimplifyCFG` pass does different things in
either case.
2024-12-20 15:15:51 +01:00
Abhay Kanhere
cc246d4a29
[Transforms][CodeExtraction] bug fix regions with stackrestore (#118564)
Ensure code extraction for outlining to a function does not create a function with stacksave of caller to restore stack (e.g. tail call).
2024-12-19 09:19:11 -07:00
Florian Hahn
a487b792e2
[TySan] Add initial Type Sanitizer (LLVM) (#76259)
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76259
2024-12-17 13:57:34 +00:00
Artem Pianykh
fbdbb13d5b
[NFC][Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap (#118625)
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-17 08:57:59 +00:00
Artem Pianykh
8402a0fab0
[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (#118624)
Summary:
This and previously extracted `CloneFunction*Into` functions will be used in later diffs.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 22:30:56 +00:00
Artem Pianykh
a9237b1a10
[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto (#118623)
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 20:50:05 +00:00
Vedant Paranjape
b21fa18b44
[LoopVersioning] Add a check to see if the input loop is in LCSSA form (#116443)
Loop Optimizations expect the input loop to be in LCSSA form. But it
seems that LoopVersioning doesn't have any check to see if the loop is
actually in LCSSA form. As a result, if we give it a loop which is not
in LCSSA form but still correct semantically, the resulting
transformation fails to pass through verifier pass with the following
error.

Instruction does not dominate all uses!
%inc = add nsw i16 undef, 1
store i16 %inc, ptr @c, align 1

As the loop is not in LCSSA form, LoopVersioning's transformations leads
to invalid IR! As some instructions do not dominate all their uses.

This patch checks if a loop is in LCSSA form, if not it will call
formLCSSARecursively on the loop before passing it to LoopVersioning.

Fixes: #36998
2024-12-16 11:55:19 -05:00
David Green
0032c151dc [SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-14 18:07:21 +00:00
Florian Hahn
c4a78b6fe3
[SimplifyCFG] Always allow hoisting if all instructions match. (#97158)
Generalize hoistCommonCodeFromSuccessors's `EqTermsOnly` to
`AllInstsEqOnly` and always allow hoisting if all instructions match.

In that case, all instructions can be hoisted and the
original branch will be replaced and selects for PHIs are added. This
allows preserving metadata in more cases, using the existing hoisting
logic, whereas previously FoldTwoEntryPHINode would drop the metadata.


https://llvm-compile-time-tracker.com/compare.php?from=716360367fbdabac2c374c19b8746f4de49a5599&to=986b2c47df516b31d998c055400e4f62aa76edc6&stat=instructions:u

PR: https://github.com/llvm/llvm-project/pull/97158
2024-12-13 21:26:27 +00:00
Ramkumar Ramachandra
4a0d53a0b0
PatternMatch: migrate to CmpPredicate (#118534)
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.

This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
2024-12-13 14:18:33 +00:00
Antonio Frighetto
d26df32255 [SimplifyCFG] Consider preds to switch in simplifyDuplicateSwitchArms
Allow a duplicate basic block with multiple predecessors to the
jump table to be simplified, by considering that the same basic
block may appear in more switch cases.
2024-12-13 09:07:24 +01:00
Kirill Stoimenov
e3676aa21f Revert "[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)"
Causing buffer overflow:

SUMMARY: AddressSanitizer: heap-buffer-overflow llvm/lib/Transforms/Scalar/SROA.cpp:5552:35

This reverts commit 5e247d726d7a54cf0acc997bc17b50e7494e6fa3.
2024-12-12 21:32:35 +00:00
David Green
5e247d726d
[SROA] Optimize reloaded values in allocas that escape into readonly nocapture calls. (#116645)
Given an alloca that potentially has many uses in big complex code and
escapes into a call that is readonly+nocapture, we cannot easily split
up the alloca. There are several optimizations that will attempt to take
a value that is stored and a reload, and replace the load with the
original stored value. Instcombine has some simple heuristics, GVN can
sometimes do it, as can CSE in limited situations. They all suffer from
the same issue with complex code - they start from a load/store and need
to prove no-alias for all code between, which in complex cases might be
a lot to look through. Especially if the ptr is an alloca with many uses
that is over the normal escape capture limits.

The pass that does do well with allocas is SROA, as it has a complete
view of all of the uses. This patch adds a case to SROA where it can
detect allocas that are passed into calls that are no-capture readonly.
It can then optimize the reloaded values inside the alloca slice with
the stored value knowing that it is valid no matter the location of the
loads/stores from the no-escaping nature of the alloca.
2024-12-12 10:27:27 +00:00
Nikita Popov
5013c81b78
[GlobalOpt][Evaluator] Don't evaluate calls with signature mismatch (#119548)
The global ctor evaluator tries to evalute function calls where the call
function type and function type do not match, by performing bitcasts.
This currently causes a crash when calling a void function with non-void
return type.

I've opted to remove this functionality entirely rather than fixing this
specific case. With opaque pointers, there shouldn't be a legitimate use
case for this anymore, as we don't need to look through pointer type
casts. Doing other bitcasts is very iffy because it ignores ABI
considerations. We should at least leave adjusting the signatures to
make them line up to InstCombine (which also does some iffy things, but
is at least somewhat more constrained).

Fixes https://github.com/llvm/llvm-project/issues/118725.
2024-12-12 10:44:52 +01:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Owen Anderson
22f0ebb19c
TargetLibraryInfo: Use pointer index size to determine getSizeTSize(). (#118747)
When using non-integral pointer types, such as on CHERI targets, size_t
is equivalent
to the index size, which is allowed to be smaller than the size of the
pointer.
2024-12-12 15:45:44 +13:00
Owen Anderson
ab15976173
CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119483)
This reapplies #119138 with a defensive fix for the assertion failure
when building libcxx.
Unfortunately the failure does not reproduce on my machine, so I am not
able to extract a test case.

The key insight for the fix comes from Jessica Clarke, who observes that
`VTablePtr` may, in fact,
not be a pointer on return from `FindAvailableLoadedValue`.

Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
2024-12-11 16:49:48 +13:00
Owen Anderson
9b6bb83860 Revert "CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119138)"
Reverting due to ASAN bootstrap failures.

This reverts commit 4027e2f248044d944aaf3d9bc9c8eb6928506d44.
2024-12-11 13:20:17 +13:00
Owen Anderson
4027e2f248
CallPromotionUtils: Correctly use IndexSize when determining the bit width of pointer offsets. (#119138)
Co-authored-by: Alexander Richardson <alexander.richardson@cl.cam.ac.uk>
2024-12-11 12:43:40 +13:00
Pedro Lobo
d7c12ea29e
[LoopRotate] Use poison instead of undef as placeholder in debug info [NFC] (#119135)
The `poison` values are used to substitute debug information of values
moved from the original header into the preheader that are no longer
available in the former.
2024-12-10 15:06:48 +00:00
Artem Pianykh
eadc0c901b
[NFC][Utils] Extract BuildDebugInfoMDMap from CloneFunctionInto (#118622)
Summary:
Extract the logic to build up a metadata map to use in metadata cloning
into a separate function.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-10 17:10:22 +09:00
Artem Pianykh
e529681ad5
[NFC][Utils] Clone basic blocks after we're done with metadata in CloneFunctionInto (#118621)
Summary:
Moving the cloning of BBs after the metadata makes the flow of the
function a bit more straightforward and makes it easier to extract more
into helper functions.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-09 21:40:04 +09:00
Artem Pianykh
a202a35e79
[NFC][Utils] Remove DebugInfoFinder parameter from CloneBasicBlock (#118620)
Summary:
There was a single usage of CloneBasicBlock with non-default
DebugInfoFinder inside CloneFunctionInto which has been refactored in
more focused.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-06 21:41:29 +09:00
Nikita Popov
9a24f2198e [MergeFuncs] Handle ConstantRangeList attributes
Support comparison of ConstantRangeList attributes in
FunctionComparator.
2024-12-06 12:21:45 +01:00
Akshat Oke
49abcd207f
[CodeGen][PM] Initialize analyses with isAnalysis=true (#118779)
Analyses should be marked as analyses.

Otherwise they are prone to get ignored by the legacy analysis cache mechanism and get scheduled redundantly.
2024-12-06 15:25:54 +05:30
Nikita Popov
b569ec6de6
[SCCP] Infer nuw for gep nusw with non-negative offsets (#118819)
If the GEP is nusw/inbounds and has all-non-negative offsets infer nuw
as well.

This doesn't have measurable compile-time impact.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-06 09:52:32 +01:00
Owen Anderson
cfa582e8aa
SimplifyLibCalls: Use default globals address space when building new global strings. (#118729)
Writing a test for this transitively exposed a number of places in
BuildLibCalls where
we were failing to propagate address spaces properly, which are
additionally fixed.
2024-12-06 10:51:14 +13:00
Florian Hahn
4226e0a0c7
[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)
Add an extra know to UnrollingPreferences to let backends control the
maximum budget for SCEV expansions.

This gives backends more fine-grained control on the cost of the runtime
checks for runtime unrolling.

PR: https://github.com/llvm/llvm-project/pull/118316
2024-12-02 21:35:00 +00:00
AdityaK
39601a6e54
Bail out jump threading on indirect branches only (#117778)
Remove check for PHI in pred as pointed out in #103688 
Reduced the testcase to remove redundant phi in pred

Fixes: #102351
2024-11-26 14:57:28 -08:00
Florian Hahn
46a08579f2
[Local] Only intersect alias.scope,noalias & parallel_loop if inst moves (#117716)
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220

Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.


PR: https://github.com/llvm/llvm-project/pull/117716
2024-11-26 20:39:53 +00:00
Matt Arsenault
4028bb10c3
Local: Handle noalias_addrspace in combineMetadata (#103938)
This should act like range.

Previously ConstantRangeList assumed a 64-bit range. Now query from the
actual entries. This also means that the empty range has no bitwidth, so
move asserts to avoid checking the bitwidth of empty ranges.
2024-11-26 09:13:34 -05:00
David Green
18abc7e0c5
[PatternMatch] Introduce m_c_Select (#114328)
This matches m_Select(m_Value(), L, R) or m_Select(m_Value(), R, L).
2024-11-25 13:47:23 +00:00
Phoebe Wang
2568e52a73
[X86,SimplifyCFG] Support hoisting load/store with conditional faulting (Part II) (#108812)
This is a follow up of #96878 to support hoisting load/store from BBs
have the same predecessor, if load/store are the only instructions and
the branch is unpredictable, e.g.:

```
void test (int a, int *c, int *d) {
  if (a)
   *c = a;
  else
   *d = a;
}
```
2024-11-25 15:19:28 +08:00
Jay Foad
d6fc7d3ab1 Fix typo "intead" 2024-11-21 14:48:38 +00:00
Artem Pianykh
f5002a0fae
[Utils] Extract CollectDebugInfoForCloning from CloneFunctionInto (#114537)
Summary:
Consolidate the logic in a single function. We do an extra pass over
Instructions but this is necessary to untangle things and extract
metadata cloning in a future diff.

Test Plan:
```
$ ninja check-llvm-unit check-llvm
[211/213] Running the LLVM regression tests

Testing Time: 106.06s

Total Discovered Tests: 62601
  Skipped          :    17 (0.03%)
  Unsupported      :  2518 (4.02%)
  Passed           : 59911 (95.70%)
  Expectedly Failed:   155 (0.25%)
[212/213] Running lit suite 

Testing Time: 12.47s

Total Discovered Tests: 8474
  Skipped:   17 (0.20%)
  Passed : 8457 (99.80%)
```

Extracted from #109032 (commit 3) (there are more refactors and cleanups
in subsequent commits)
2024-11-20 23:36:55 +00:00
Florian Hahn
0bb1b68330
[Local] Only intersect tbaa metadata if instr moves. (#116682)
Preserve tbaa metadata on the replacement instruction, if it does not
move. In that case, the program would be UB, if the aliasing property
encoded in the metadata does not hold.

This makes use of the clarification re tbaa metadata implying UB if the
property does not hold: https://github.com/llvm/llvm-project/pull/116220

Same as https://github.com/llvm/llvm-project/pull/115868, but for !tbaa

PR: https://github.com/llvm/llvm-project/pull/116682
2024-11-20 19:31:16 +00:00
Florian Hahn
076513646c
[Local] Only intersect llvm.access.group metadata if instr moves. (#115868)
Preserve llvm.access.group metadata on the replacement instruction, if
it does not move. In that case, the program would be UB, if the parallel
property encoded in the metadata does not hold.

This matches the LangRef recently updated in #116220

PR https://github.com/llvm/llvm-project/pull/115868
2024-11-19 22:01:16 +00:00
Stephen Tozer
2188a56a75
[DebugInfo][SimplifyCFG] Fully propagate merged invoke DILocations (#114235)
Currently when we merge invokes as part of SimplifyCFG we apply a merge
of the invoke DILocations to the merged invoke. We also insert an
unconditional branch to the merged invoke at the positions previously
occupied by the original invokes; as this branch is part of the
substitution for the invoke it has replaced, we should propagate the
original invoke DebugLoc to it.
2024-11-15 17:20:55 +00:00
Alex Bradbury
298127dcbe Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583)
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.

Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
2024-11-15 15:21:39 +00:00
Alex Bradbury
0fb8fac5d6 Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)"
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.

Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
2024-11-15 14:48:32 +00:00