This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.
This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:
1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.
These changes have no impact on code functionality, focusing solely on
documentation clarity.
Co-authored-by: wangqiang <wangqiang1@kylinos.cn>
This work is in preparation for PRs #112138 and #88385 where
the middle block is not guaranteed to be the immediate successor
to the region block. I've simply add new getMiddleBlock()
interfaces to VPlan that for now just return
cast<VPBasicBlock>(VectorRegion->getSingleSuccessor())
Once PR #112138 lands we'll need to do more work to discover
the middle block.
This patch intersects attributes of two calls to avoid introducing UB.
It also skips incompatible call pairs in GVN/NewGVN. However, I cannot
provide negative tests for these changes.
Fixes https://github.com/llvm/llvm-project/issues/113997.
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.
Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.
PR: https://github.com/llvm/llvm-project/pull/109975
This is a fixed version of #106185, which was reverted in #113978 due to
a buildbot failure.
Motivation example:
```
> cat test.cpp
extern "C" [[gnu::weak]] void f() {}
void alias() __attribute__((alias("f")));
int main() { auto p = alias; p(); }
> clang test.cpp -fsanitize=cfi-icall -flto=thin -fuse-ld=lld
> ./a.out
[1] 1868 illegal hardware instruction ./a.out
```
If the address of a function was only taken through its alias, the
function was not considered exported and therefore was not included in
the CFI jumptable. This resulted in `@llvm.type.test()` being lowered to
`false`, and consequently the indirect call to the function was
eventually optimized to `ubsantrap()`.
The code in EH and non-returning blocks can be skipped by the
vectorizer, since it does not add to the perfromance, just consumes
compile/link time.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/112221
If the instruction is vectorized and it is a part of the reduced values
gather/buildvector node, it should replaced in reduced operation
instructions before removal properly, to avoid compiler crash.
Fixes#114371
This patch adds basic support for `hypot`. Constant folding support will
be submitted in a subsequent patch.
Related issue: https://github.com/llvm/llvm-project/issues/113711
Note: It's my first time contributing to the LLVM with encouragement
from one of my friends, @fawdlstty. I learned a lot from
https://github.com/llvm/llvm-project/pull/99611, and thanks for that.
Note: I had created the same PR and merged
(https://github.com/llvm/llvm-project/pull/113724), but reverted caused
by the merging issue. (The CI issue happened in 3 A.M. at my timezone.
So, I need to fall asleep again after I replied about why issue
happened.) So, I rebased to the latest main branch and recreate the PR
and hope I won't have the third time to create the same PR.
I hope @arsenm can help me review the code again. I’m sorry for that.
Kenji Mouri
Currently, the `DropTypeTests` parameter only fully works with phi nodes
and llvm.assume instructions. However, we'd like CFI to work in
conjunction with FatLTO, in so far as the bitcode section should be able
to contain the CFI instrumentation, while any incompatible bits are
dropped when compiling the object code.
To do that, we need to drop the llvm.type.test instructions everywhere,
and not just their uses in phi nodes. This patch updates the
LowerTypeTest pass so that uses are removed, and replaced with `true` in
all cases, and not just in phi nodes.
Addressing this will allow us to fix#112053 by modifying the FatLTO
pipeline.
Reviewers: pcc, nikic
Reviewed By: pcc
Pull Request: https://github.com/llvm/llvm-project/pull/112787
This patch is a part of step-by-step refactoring of CloneFunctionInto.
The goal is to extract reusable pieces out of it that will be later used
to optimize function cloning e.g. in coroutine processing.
Extracted from #109032 (commit 2)
A helper (2 overloads) that consolidates corocloner creation and the
actual cloning. The helpers create a TimeTraceScope to make it easier to
see how long the cloning takes.
Extracted from #109032 (commit 1)
The spec for llvm.experimental.convergence.entry says that is must be in
the entry block for a function, and must preceed any other convergent
operation. It does not have to be the first instruction in the entry
block.
Inlining assumes that the call to llvm.experimental.convergence.entry
will be the first instruction after any phi instructions. This commit
modifies inlining to search the entire block for the call.
Infers member MayReadFromMemory, MayWriteToMemory, and
MayHaveSideEffects based on intrinsic attributes.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.
This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.
Enable specialization on literal constant arguments by default in
Function Specialization.
---------
Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
We can turn:
```
%add = add i8 %arg, C1
%and = and i8 %add, C2
%cmp = icmp eq i1 %and, C3
```
into:
```
%and = and i8 %arg, C2
%cmp = icmp eq i1 %and, (C3 - C1) & C2
```
This is only worth doing if the sequence is the sole user of the addition
operation.