In replaceSignedInst, if a signed instruction can be repalced with
unsigned instruction, we created a new instruction and removed the old
instruction's value state. If the following instructions has this new
instruction as a use operand, transformations like replaceSignedInst and
refineInstruction would be blocked. The reason is there is no value
state for the new instrution.
This patch set the new instruction's value state with the removed
instruction's value state. I believe it is correct bacause when we
repalce a signed instruction with unsigned instruction, the value state
is not changed.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152337
The last use was removed by:
commit ae0987d242e266847f21f5fa1bffa97ce3eff586
Author: Kazu Hirata <kazu@google.com>
Date: Sat Jun 10 13:51:35 2023 -0700
Differential Revision: https://reviews.llvm.org/D152636
This is a quick fix for an assert when the source and dest have
different address spaces. The pointer compare needs to have matching
types, but we can't generically introduce addrspacecast and we don't
know if the address spaces alias.
This is an alternative to currently existing hostcall implementation and uses printf buffer similar to OpenCL,
The data stored in the buffer (i.e the data frame) for each printf call are as follows,
1. Control DWord - contains info regarding stream, format string constness and size of data frame
2. Hash of the format string (if constant) else the format string itself
3. Printf arguments (each aligned to 8 byte boundary)
The format string Hash is generated using LLVM's MD5 Message-Digest Algorithm implementation and only low 64 bits are used.
The implementation still uses amdhsa metadata and hash is stored as part of format string itself to ensure
minimal changes in runtime.
Differential Revision: https://reviews.llvm.org/D150427
For constant range supported intrinsics, we got consantrange from args
no matter if they are unknown or undef. And the constant range computed
from unknown or undef value state is full range.
I think compute with full constant range is harmful since although we
can do mergeIn after these args value state are changed, the merge
operation of two constant ranges is union.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D152499
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
Since `MD_annotation` metadata now supports having mutliple strings in the annotation node. casing Operand to string directly will cause a crash. When checking if `MDOperand` equals str you can use `equalsStr` method.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D152372
Reapply with extra check for struct types, which caused buildbot
failures last time.
-----
The freeze instruction has not been handled by SCCPInstVisitor.
This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I)
method to handle freeze instructions.
Differential Revision: https://reviews.llvm.org/D151659
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
The SCCPSolver is using a structure (AnalysisResultsForFn) where it keeps
pointers to various analyses needed by the IPSCCP pass. These analyses are
requested all at the same time, which can become problematic in some cases.
For example one could be retrieved via getCachedAnalysis() prior to the
actual execution of the analysis. In more detail:
The IPSCCP pass uses a DomTreeUpdater to preserve the PostDominatorTree
in case the PostDominatorTreeAnalysis had run before IPSCCP. Starting with
commit 1b1232047e83b the IPSCCP pass may use BlockFrequencyAnalysis for
some functions in the module. As a result, the PostDominatorTreeAnalysis
may not run until the BlockFrequencyAnalysis has run, since the latter
analysis depends on the former. Currently, we setup the DomTreeUpdater
using getCachedAnalysis to retrieve a PostDominatorTree. This happens
before BlockFrequencyAnalysis has run, therefore the cached analysis can
become invalid by the time we use it.
Differential Revision: https://reviews.llvm.org/D151666
The freeze instruction has not been handled by SCCPInstVisitor.
This patch adds SCCPInstVisitor::visitFreezeInst(FreezeInst &I)
method to handle freeze instructions.
Differential Revision: https://reviews.llvm.org/D151659
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.
This reverts commit 0524534d5220da5ecb2cd424a46520184d2be366.
This reverts commit ced90d1ff64a89a13479a37a3b17a411a3259f9f.
This reverts commit 9f992cc9350a7f7072a6dbf018ea07142ea7a7ed.
This reverts commit 1b1232047e83b69561fd64b9547cb0a0d374473a.
For pseudo probes we would like to keep their original dwarf discriminator (either a zero or null) until the first FS-discriminator pass. The inliner is a violation of that, given that it assigns inlinee instructions with no debug info with the that of the callsite. This is being disabled in this patch.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D151568
SCEVExpander keeps track of all instructions it inserted. However,
it currently misses some phi nodes created during LCSSA construction.
Fix this by collecting these into another argument.
This also removes the IRBuilder argument, which was added for
essentially the same purpose, but only handles the root LCSSA nodes,
not those inserted by SSAUpdater.
This was reported as a regression on D149344, but the reduced test
case also reproduces without it.
Differential Revision: https://reviews.llvm.org/D150681
This also allows us to peel loops with a `select`:
```
for (int i = 0; i <= N; ++i);
f3(i == 0 ? a : b); // select instruction
```
into:
```
f3(a); // peel one iteration
for (int i = 1; i <= N; ++i)
f3(b);
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151052
The method is marked for deprecation. Delete the method and move all of
its consumers to use the DomTreeUpdater version.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D149428
salvageDebugInfo is a function that allows us to reatin debug info for
instructions that have been optimized out. Currently, it doesn't support
salvaging the debug information from icmp instrcutions, but DWARF
expressions can emulate an icmp by using the DWARF conditional
expressions. This patch adds support for salvaging debug information
from icmp instructions.
Differential Revision: https://reviews.llvm.org/D150216
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.
Differential Revision: https://reviews.llvm.org/D150375
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
For some reason the inliner calls simplifyInstruction with disembodied
instructions. I consider this to be an API defect. Either the instruction
should always be inserted prior to simplification, or we at least
should pass in the new function for the context.
This remove a bunch of #include statements in Scalar.cpp. I do not
think those should be needed any longer (assuming that they once
upon a time possibly were needed for legacy PM C bindings, but
that is not supported any longer).
Also removing some other #include statements not needed any longer
due to deprecation of legacy PM.
Differential Revision: https://reviews.llvm.org/D149438
This commit removes constness from DILocation::getMergedLocation and
fixes all its users accordingly.
Having constness on the parameters forced the return type to be const
as well, which does force usage of `const_cast` when the location needs
to be used in metadata nodes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D149942
Adds support to set the hot/cold new hint values with an option. Change
the defaults slightly to make it easier to distinguish between compiler
synthesized vs manually inserted calls to the interface.
Differential Revision: https://reviews.llvm.org/D150488
This diff facilitates a new stale profile matching in BOLT: D144500
This is a no-op for existing usages of profi (CSSPGO).
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D150466
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.
I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D148569
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.
The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.
The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D148328
The patch adds new member MaybeEVL into InterestingMemoryOperand to represent
the effective vector length for vp intrinsics. It may be extended for some target intrinsics in the future.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D146208
We are simplifying the loop and all its children. Each time, we
invalidate the top-most loop. The top-most loop is going to be
the same every time. The cost of SCEV invalidation is largely
independent from how data about the loop is actually cached, so
we should avoid redundant invalidations.
The asm in a naked function may reasonably expect the argument registers and the
return address register (if present) to be live.
When using -pg and -finstrument-functions, functions are instrumented by adding
a function call to `_mcount/__cyg_profile_func_enter/__cyg_profile_func_enter_bare`/etc,
which will clobber these registers. If the return address register is clobbered,
the function will be unable to return to the caller, possibly causing an
infinite loop.
```
__attribute__((naked)) void g() {
#if defined(__arm__)
__asm__("bx lr");
#else
__asm__("ret");
#endif
}
int main() { g(); }
```
It seems that the only one reasonable way to handle the combination is to
disable instrumenting for naked functions.
GCC PR: https://gcc.gnu.org/PR109707
Close https://github.com/llvm/llvm-project/issues/62504
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D149721