37965 Commits

Author SHA1 Message Date
Hari Limaye
e19a5fc6d3
[FuncSpec] Improve accounting of specialization codesize growth (#113448)
Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.

This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.
2024-10-29 11:53:12 +00:00
Jay Foad
2443549b85
[IR] Remove some uses of StructType::setBody. NFC. (#113685)
It is simple to create the struct body up front, now that we have
transitioned to opaque pointers.
2024-10-29 11:44:53 +00:00
Hari Limaye
06664fdc76
[FuncSpec] Enable SpecializeLiteralConstant by default (#113442)
Enable specialization on literal constant arguments by default in
Function Specialization.

---------

Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
2024-10-29 11:41:25 +00:00
Yingwei Zheng
18311093ab
[InstCombine] Do not fold shufflevector(select) if the select condition is a vector (#113993)
Since `shufflevector` is not element-wise, we cannot do fold it into
select when the select condition is a vector.
For shufflevector that doesn't change the length, it doesn't crash, but
it is still a miscompilation: https://alive2.llvm.org/ce/z/s8saCx

Fixes https://github.com/llvm/llvm-project/issues/113986.
2024-10-29 10:39:07 +08:00
c8ef
0c1c37bfbe
[TLI] Add support for the tgamma libcall. (#113791)
This patch adds the `tgamma` libcall.
2024-10-29 10:08:38 +08:00
vporpo
a461869db3
[SandboxIR][Pass] Implement Analyses class (#113962)
The Analyses class provides a way to pass around commonly used Analyses
to SandboxIR passes throught `runOnFunction()` and `runOnRegion()`
functions.
2024-10-28 18:00:52 -07:00
Igor Kudrin
757d0e4764
Revert "[CFI][LowerTypeTests] Fix indirect call with alias" (#113978)
Reverts llvm/llvm-project#106185

This is breaking Sanitizer bots:
https://lab.llvm.org/buildbot/#/builders/66/builds/5449/steps/8/logs/stdio
2024-10-28 16:13:32 -07:00
David Majnemer
902acde341 [InstCombine] Optimize away certain additions using modular arithmetic
We can turn:
```
  %add = add i8 %arg, C1
  %and = and i8 %add, C2
  %cmp = icmp eq i1 %and, C3
```

into:
```
  %and = and i8 %arg, C2
  %cmp = icmp eq i1 %and, (C3 - C1) & C2
```

This is only worth doing if the sequence is the sole user of the addition
operation.
2024-10-28 22:51:35 +00:00
Matthias Braun
5903c6af44
InstCombine: Fold shufflevector(select) and shufflevector(phi) (#113746)
- Transform `shufflevector(select(c, x, y), C)` to
  `select(c, shufflevector(x, C), shufflevector(y, C))` by re-using
  the `FoldOpIntoSelect` helper.
- Transform `shufflevector(phi(x, y), C)` to
  `phi(shufflevector(x, C), shufflevector(y, C))` by re-using the
  `foldOpInotPhi` helper.
2024-10-28 15:35:17 -07:00
vporpo
bf4b31ad54
[SandboxVec][Legality] Check Fastmath flags (#113967) 2024-10-28 15:32:20 -07:00
vporpo
5ea694816b
[SandboxVec][Legality] Check opcodes and types (#113741) 2024-10-28 14:05:58 -07:00
Igor Kudrin
67bcce2141
[CFI][LowerTypeTests] Fix indirect call with alias (#106185)
Motivation example:
```
> cat test.cpp
extern "C" [[gnu::weak]] void f() {}
void alias() __attribute__((alias("f")));
int main() { auto p = alias; p(); }
> clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld
> ./a.out
[1]    1868 illegal hardware instruction  ./a.out
```

If the address of a function was only taken through its alias, the
function was not considered exported and therefore was not included
in the CFI jumptable. This resulted in `@llvm.type.test()` being lowered
to `false`, and consequently the indirect call to the function was
eventually optimized to `ubsantrap()`.
2024-10-28 13:14:42 -07:00
Florian Hahn
0d0abb351b
[VPlan] Use ResumePhi to create reduction resume phis. (#110004)
Use VPInstruction::ResumePhi to create phi nodes for reduction resume
values in the scalar preheader, similar to how ResumePhis are used for
first-order recurrence resume values after 9a5a8731e77.

This allows simplifying createAndCollectMergePhiForReduction to only
collect reduction resume phis when vectorizing epilogue loops and adding
extra incoming edges from the main vector loop. Updating phis for the
epilogue vector loops requires special attention, because additional
incoming values from the bypass blocks need to be added.

PR: https://github.com/llvm/llvm-project/pull/110004
2024-10-28 20:14:08 +01:00
Lei Wang
e517cfc531
[InstrPGO] Support cold function coverage instrumentation (#109837)
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.

More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`: 
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.

Overall, the full command is like:

```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code
```
2024-10-28 10:13:45 -07:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Alexey Bataev
7152bf3bc8 [SLP]Do not create new vector node if scalars fully overlap with the existing one
If the list of scalars vectorized as the part of the same vector node,
no need to generate vector node again, it will be handled as part of
overlapping matching.

Fixes #113810
2024-10-28 06:59:41 -07:00
Yingwei Zheng
f78610af3f
[InstCombine] Add function attribute instcombine-no-verify-fixpoint (#113822)
This patch introduces a function attribute
`instcombine-no-verify-fixpoint` to avoids disabling fix-point
verification for unrelated tests in the same file.
Address comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.
2024-10-28 17:45:08 +08:00
Yingwei Zheng
5155c38cee
[InstCombine] Don't check uses of constant exprs (#113684)
This patch skips constant expressions to avoid iterating over uses on
other functions.

Fix crash reported in
https://github.com/llvm/llvm-project/pull/105510#issuecomment-2437521147.
2024-10-28 15:09:20 +08:00
David Majnemer
5d4a0d54b5 [InstCombine] Teach takeLog2 about right shifts, truncation and bitwise-and
We left some easy opportunities for further simplifications.

log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.

log2(x >>u y) is simply `log2(x) - y`.

log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
  unsigned char a = 1u << x;
  return y / a;
}
```

LLVM would canonicalize this to:
```
  %shl = shl nuw i32 1, %x
  %conv1 = and i32 %shl, 255
  %div = udiv i32 %y, %conv1
```

In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
2024-10-28 05:13:04 +00:00
Florian Hahn
7fe149cdf0
[VPlan] Replace getIRBasicBlock with IRBB in VPIRBB::execute (NFC).
Suggested in https://github.com/llvm/llvm-project/pull/109975. This
makes the function consistent throughout.
2024-10-27 16:22:18 +01:00
Eirik Byrkjeflot Anonsen
d2e9532fe1
[DemoteRegToStack] Use correct variable for branch instructions in DemoteRegToStack (#113798)
I happened to see this code, and it seems "obviously" wrong to me. So
here's what I think this code is supposed to look like.
2024-10-27 17:09:39 +08:00
Teresa Johnson
355e6948d4
[MemProf] Fix clone edge comparison (#113753)
The issue fixed in PR113337 exposed a bug in the comparisons done in
allocTypesMatch, which compares a vector of alloc types to those in the
given vector of Edges. The form of std::equal used, which didn't provide
the end iterator for the Edges vector, will iterate through as many
entries in the Edges vector as in the InAllocTypes vector, which can
fail if there are fewer entries in the Edges vector, because we may
dereference a bogus Edge pointer. This function is called twice, once
for the Node, with its callee edges, in which case the number of edges
should always match the number of entries in allocTypesMatch, which is
computed from the Node's callee edges. It was also called for Node's
clones, and it turns out that after cloning and edge modifications done
for other allocations, the number of callee edges in Node and its clones
may no longer match. In some cases, more common with memprof ICP before
the PR113337, the number of clone edges can be smaller leading to a bad
dereference. I found for a large application even before adding memprof
ICP support we sometimes call this with fewer entries in the clone's
callee edges, but were getting lucky as they had allocation type None,
and we didn't end up attempting to dereference the bad edge pointer.

Fix this by passing Edges.end() to std::equal, which means std::equal
will fail if the number of entries in the 2 vectors are not equal.
However, this is too conservative, as clone edges may have been added or
removed since it was initially cloned, and in fact can be wrong as we
may not be comparing allocation types corresponding to the same callee.

Therefore, a couple of enhancements are made to avoid regressing and
improve the checking and cloning:
- Don't bother calling the alloc type comparison when the clone and the
  Node's alloc type for the current allocation are precise (have a
  single allocation type) and are the same (which is guaranteed by an
  earlier check, and an assert is added to confirm that). In that case
  we can trivially determine that the clone can be used.
- Split the alloc type matching handling into a separate function for
  the clone case. In that case, for each of the InAllocType entries,
  attempt to find and compare to the clone callee edge with the same
  callee as the corresponding original node callee.

To create a test case I needed to take a spec application (xalancbmk),
and repeatedly apply random hot/cold-ness to the memprof contexts
when building, until I hit the problematic case. I then reduced that
full LTO IR using llvm-reduce and then manually.
2024-10-26 20:53:20 -07:00
Kyungwoo Lee
1941c5180b Reland (2nd attempt) [StructuralHash] Refactor (#112621)
This is largely NFC, and it prepares for #112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-10-26 16:12:28 -07:00
Kyungwoo Lee
d104b8e827 Revert "Reland [StructuralHash] Refactor (#112621)"
This reverts commit 98ca9a635bd2fb98cee473a9558687a5b522e219.
2024-10-26 13:55:46 -07:00
Kyungwoo Lee
98ca9a635b Reland [StructuralHash] Refactor (#112621)
This is largely NFC, and it prepares for #112638.
 - Use stable_hash instead of uint64_t
 - Rename update* to hash* functions. They compute stable_hash locally and return it.

This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-10-26 12:07:57 -07:00
Kyungwoo Lee
9672375623 Revert "[StructuralHash] Refactor (#112621)"
This reverts commit b667d161f0a9ff6b29cda0ccdb0081610c1e8b8c.
2024-10-26 09:55:21 -07:00
Kyungwoo Lee
b667d161f0
[StructuralHash] Refactor (#112621)
This is largely NFC, and it prepares for #112638.
 - Use stable_hash instead of uint64_t
- Rename update* to hash* functions. They compute stable_hash locally and return it.
 
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-10-26 09:20:26 -07:00
Shih-Po Hung
266ff98cba
[LV][VPlan] Use VF VPValue in VPVectorPointerRecipe (#110974)
Refactors VPVectorPointerRecipe to use the VF VPValue to obtain the
runtime VF, similar to #95305.

Since only reverse vector pointers require the runtime VF, the patch
sets VPUnrollPart::PartOpIndex to 1 for vector pointers and 2 for
reverse vector pointers. As a result, the generation of reverse vector
pointers is moved into a separate recipe.
2024-10-26 23:18:50 +08:00
davidtrevelyan
4102625380
[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155)
# What

This PR renames the newly-introduced llvm attribute
`sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise,
sibling variables such as `SanitizeRealtimeUnsafe` are renamed to
`SanitizeRealtimeBlocking` respectively. There are no other functional
changes.


# Why?

- There are a number of problems that can cause a function to be
real-time "unsafe",
- we wish to communicate what problems rtsan detects and *why* they're
unsafe, and
- a generic "unsafe" attribute is, in our opinion, too broad a net -
which may lead to future implementations that need extra contextual
information passed through them in order to communicate meaningful
reasons to users.
- We want to avoid this situation and make the runtime library boundary
API/ABI as simple as possible, and
- we believe that restricting the scope of attributes to names like
`sanitize_realtime_blocking` is an effective means of doing so.

We also feel that the symmetry between `[[clang::blocking]]` and
`sanitize_realtime_blocking` is easier to follow as a developer.

# Concerns

- I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been
part of the tree for a few weeks now (introduced here:
https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't
been released in version 20 yet, am I correct in considering this to not
be a breaking change?
2024-10-26 13:06:11 +01:00
Vasileios Porpodas
1540f772c7 Reapply "[SandboxVec][Legality] Reject non-instructions (#113190)"
This reverts commit eb9f4756bc3daaa4b19f4f46521dc05180814de4.
2024-10-25 12:55:58 -07:00
Vasileios Porpodas
eb9f4756bc Revert "[SandboxVec][Legality] Reject non-instructions (#113190)"
This reverts commit 6c9bbbc818ae8a0d2849dbc1ebd84a220cc27d20.
2024-10-25 12:52:31 -07:00
vporpo
6c9bbbc818
[SandboxVec][Legality] Reject non-instructions (#113190) 2024-10-25 12:47:19 -07:00
Florian Hahn
e724226da7
[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value.
In some cases, VPWidenCastRecipes are created but not considered in the
legacy cost model, including truncates/extends when evaluating a reduction
in a smaller type. Return 0 for such casts for now, to avoid divergences
between VPlan and legacy cost models.

Fixes https://github.com/llvm/llvm-project/issues/113526.
2024-10-25 21:25:44 +02:00
Florian Hahn
9648271a3c
[LV] Pass flag indicating epilogue is vectorized to executePlan (NFC)
This clarifies the flag, which is now only passed if the epilogue loop
is being vectorized.
2024-10-25 20:39:47 +02:00
Teresa Johnson
144ddca9ed
[MemProf] Avoid duplicate edges between nodes (#113337)
The recent change to add support for cloning indirect calls
inadvertantly caused duplicate edges to be created between the same
caller/callee pair. This is due to the new moveCalleeEdgeToNewCaller
not properly guarding the addition of a new edge (ironically I was
testing for that in an assertion, but failed to handle that case
specially otherwise). Now simply move the context ids over to any
existing edge.

This issue in turn led to some assumptions in cloning being violated,
resulting in a later crash.

Add a test for this case to checkNode.
2024-10-25 11:09:57 -07:00
Jay Foad
90cdc03e7f
[IR] Fix undiagnosed cases of structs containing scalable vectors (#113455)
Type::isScalableTy and StructType::containsScalableVectorType failed to
detect some cases of structs containing scalable vectors because
containsScalableVectorType did not call back into isScalableTy to check
the element types. Fix this, which requires sharing the same Visited set
in both functions. Also change the external API so that callers are
never required to pass in a Visited set, and normalize the naming to
isScalableTy.
2024-10-25 12:56:10 +01:00
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Farzon Lotfi
21b3769d1d
[Scalarizer] Fix to only scalarize if intrinsic was marked as isTriviallyScalarizable (#113625)
fixes #113624
2024-10-24 23:26:02 -07:00
Vitaly Buka
cf8d24531e
[msan] Reduces overhead of #113200, by 10% (#113201)
CTMark #113200 size overhead was 5.3%, now it's 4.7%.

The patch affects only signed integers.

https://alive2.llvm.org/ce/z/Lv5hyi

* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.

* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.

* After sign bit flip, we had to switch to unsigned
version of the predicates.

* After change above  getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.

* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.
2024-10-24 20:46:49 -07:00
Haopeng Liu
a31ce36f56
Apply initializes attribute to DSE (#113630)
retry #107282

Fixed with `MadeChange |= Changed;` and confirmed it works.

```
cmake -DLLVM_CCACHE_BUILD=ON -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON -DLLVM_ENABLE_WERROR=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-U_GLIBCXX_DEBUG '-DLLVM_LIT_ARGS=-v -vv -j96' '-DLLVM_ENABLE_PROJECTS=llvm;lld' -DLLVM_ENABLE_ASSERTIONS=ON -GNinja ../llvm

ninja check-llvm
```
2024-10-24 18:43:20 -07:00
Florian Hahn
7b9f988a53
[VPlan] Limit stride replacement to vector region and middle VPBB (NFC).
At the moment this in NFC, but ensures we only replace uses that are
dominated by runtime checks as we model more of the skeleton in VPlan.
2024-10-24 15:08:36 -07:00
Alexey Bataev
e914421d7f [SLP]Do correct signedness analysis for externally used scalars
If the scalars is used externally is in the root node, it may have
incorrect signedness info because of the conflict with the demanded bits
analysis. Need to perform exact signedness analysis and compute it
rather than rely on the precomputed value, which might be incorrect for
alternate zext/sext nodes.

Fixes #113520
2024-10-24 08:59:24 -07:00
Arthur Eubanks
3cec720449
Revert "[DSE] Apply initializes attribute to DSE" (#113589)
Reverts llvm/llvm-project#107282

Seems to be causing invalid analysis caching as mentioned in
https://github.com/llvm/llvm-project/pull/107282#issuecomment-2435083978.
2024-10-24 08:51:31 -07:00
Alexey Bataev
d2e7ee77d3 [SLP]Do not check for clustered loads only
Since SLP support "clusterization" of the non-load instructions, the
restriction for reduced values for loads only should be removed to avoid
compiler crash.

Fixes #113516
2024-10-24 08:16:42 -07:00
Alexey Bataev
cb5046da26 [SLP]Do not ignore undefs when trying to replace with "poisonous" shuffles
Need to consider undefs correctly, when trying to replace them with
potentially poisonous values in shuffles. Such elements should not be
silently replaced by poison values, instead complex analysis should be
implemented to see if it is safe to do it.

Fixes #113425
2024-10-24 07:47:23 -07:00
Nashe Mncube
e37d736def
Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.

- Pass optimizes memcpy's by padding out destinations and sources to a
  full word to make ARM backend generate full word loads instead of
  loading a single byte (ldrb) and/or half word (ldrh). Only pads
  destination when it's a stack allocated constant size array and source
  when it's constant string. Heuristic to decide whether to pad or not
  is very basic and could be improved to allow more examples to be
  padded.
- Pass works at the midend level
2024-10-24 10:12:01 +01:00
Mingming Liu
60944177b8
[WPD][ThinLTO]Add cutoff option for WPD (#113383)
This option applies for _import_ WPD (i.e., when `DevirtModule` pass
de-virtualizes according to an imported summary, in ThinLTO backend
pipeline). It's meant for debugging (e.g., bisection).
2024-10-23 23:47:27 -07:00
Haopeng Liu
089237c0d0
[DSE] Apply initializes attribute to DSE (#107282)
Apply the initializes attribute to DSE and guard with a flag,
"enable-dse-initializes-attr-improvement".

The attribute support has been landed in:
https://github.com/llvm/llvm-project/pull/84803
The attribute inference will be landed after this PR:
https://github.com/llvm/llvm-project/pull/97373
2024-10-23 22:18:59 -07:00
Thomas Fransham
b8fddca7bd
[llvm] Support llvm::Any across shared libraries on windows (#108051)
This is part of the effort to support for enabling plugins on windows by
adding better support for building llvm as a DLL. The export macros used
here were added in #96630

Since shared library symbols aren't deduplicated across multiple
libraries on windows like Linux we have to manually explicitly import
and export `Any::TypeId` template instantiations for the uses of
`llvm::Any` in the LLVM codebase to support LLVM Windows shared library
builds.
This change ensures that external code, including LLVM's own tests, can
use PassManager callbacks when LLVM is built as a DLL.

I also removed the only use of llvm::Any for LoopNest that only existed
in debug code and there also doesn't seem to be any code creating
`Any<LoopNest>`
2024-10-24 08:07:13 +03:00
Florian Hahn
ef217a0f6b
[VPlan] Introduce and use getVectorPreheader (NFC).
Introduce a dedicated function to retrieve the vector preheader. This
ensures the correct block is used, even if the skeleton is exetended.
2024-10-23 21:01:52 -07:00