KMSAN defaults to `msan-handle-asm-conservative`, which inserts
`__msan_instrument_asm_store` calls to unpoison indirect outputs in
inline assembly (e.g. `=m` constraints in source).
```c
unsigned f() {
unsigned v;
// __msan_instrument_asm_store unpoisons v before invoking the asm.
asm("movl $1,%0" : "=m"(v));
return v;
}
```
Extend the mechanism to userspace, but require explicit
`-mllvm -msan-handle-asm-conservative` for experiments for now.
As
https://docs.kernel.org/dev-tools/kmsan.html#inline-assembly-instrumentation
says, this approach may mask certain errors (an indirect output may not
actually be initialized), but it also helps to avoid a lot of false
positives.
Link: https://github.com/google/sanitizers/issues/192
This commit modifies `LoopDeletion::deleteLoopIfDead` to check if the
exit block of a loop is an EH pad before checking if the loop gets
executed. This handles the case where an unreachable loop has a
landingpad as an Exit block, and the loop gets deleted, leaving leaving
the landingpad without an edge from an unwind clause.
Fixes#76852.
Instead of using the debug location of the underlying instruction, use
the debug location from the recipe. This removes an unneeded dependency
of the underlying instruction.
This patch introduces a new common base class for recipes defining a
single result VPValue. This has been discussed/mentioned at various
previous reviews as potential follow-up and helps to replace various
getVPSingleValue calls.
PR: https://github.com/llvm/llvm-project/pull/77023
`ResumeIndex` isn't part of the frame struct header, so it necessarily
appears after the promise.
Co-authored-by: Yoni Lavi <yoni.lavi@nextsilicon.com>
LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).
Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).
NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
In preparation for the major chunk of the assignment tracking
implementation, this patch adds a new set of overloaded versions of
existing functions that take DbgVariableIntrinsics, with the overloads
taking DPValues. This is used specifically to allow us to use generic code
to handle both DbgVariableIntrinsics and DPValues, reducing code
duplication. This patch doesn't actually add the uses of these functions.
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336
In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
Fixes https://github.com/llvm/llvm-project/issues/76549
The cause of the optimization miss was -
1. `optimizePow` converting almost integer FP exponents to integer, and
turning `pow` to `powi`.
2. `optimizeLog` not accepting `Intrinsic::powi` as a target.
This patch converts constantInt back to constantFP where applicable and
adds a test.
As suggested in https://github.com/llvm/llvm-project/pull/75823, to
avoid confusion with std::function_ref, qualify all uses with llvm::
(we were already using the llvm version, but this avoids ambiguity).
!isa<Constant>(GEPIdx)' failed.
The non-constant index might be folded to constant during earlier stages
of vectorization. Need to consider this option and filter out out GEP
with the constant indices from the candidates list.
Those are probably leftovers from an old name of the same attribute.
Fixed for the sake of consistency.
Co-authored-by: Yoni Lavi <yoni.lavi@nextsilicon.com>
times during reduction vectorization.
If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.
Fixes#78049
This patch has done:
- Ignore unreachable predecessors when looking for nearest common
dominator.
- Check catchswitch with `getFirstNonPHI`, instead of
`getFirstInsertionPt`. The latter skips EHPad.
The following internal error occurred when using native vplan to
vectorize the program with the debug info generation.
Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed.
This patch ignored all debug instructions to fix the error when native
vplan is enabled.
Fixes the buildbot failure in
https://github.com/llvm/llvm-project/pull/78134#issuecomment-1892195197
When we meet the path with single `determinator`, the determinator
actually takes itself as a predecessor. Thus, we need to let `Prev` be
the determinator when `PathBBs` has only one element.
`(ctpop (not x))` <-> `(sub nuw nsw BitWidth(x), (ctpop x))`. The
`sub` expression can sometimes be constant folded depending on the use
case of `(ctpop (not x))`.
This patch adds fold for the following cases:
`(add/sub/disjoint_or C, (ctpop (not x))`
-> `(add/sub/disjoint_or C', (ctpop x))`
`(cmp pred C, (ctpop (not x))`
-> `(cmp swapped_pred C', (ctpop x))`
Where `C'` depends on how we constant fold `C` with `BitWidth(x)` for
the given opcode.
Proofs: https://alive2.llvm.org/ce/z/qUgfF3Closes#77859
This patch follows on from comments on
https://github.com/llvm/llvm-project/pull/73498, implementing the
proposed split of findDbgDeclares into two separate functions for
DbgDeclareInsts and DPVDeclares, which return containers rather than
taking containers by reference.
the second vector.
Need to transform all elements in the long mask, if we decided to
produce shorter version, some elements may still have incorrect inifices
after transformation for the first vector in the permutation.
https://github.com/llvm/llvm-project/pull/76669 taught SimplifyCFG to
handle switches when `default` has only one case. When the `switch`'s
condition is wider than 64 bit, the current implementation can calculate
the wrong default value. This PR skips cases where the condition is too
wide.
StackSafetyAnalysis determines whether stack-allocated variables are
guaranteed to be safe from memory access bugs and enables the removal of
certain unneeded instrumentations.
(hwasan enables StackSafetyAnalysis in https://reviews.llvm.org/D108381)
In a release build of clang, text sections are 9% smaller.
Test updates:
* asan-stack-safety.ll: test the -asan-use-stack-safety=1 default
* lifetime-uar-uas.ll: switch to an indexed store to prevent
StackSafetyAnalysis from optimizing out instrumentation for %c
* alloca_vla_interact.cpp: add a load to prevent StackSafetyAnalysis
from optimizing out `__asan_alloca_poison` for the VLA `array`
* scariness_score_test.cpp: add -asan-use-stack-safety=0 to make a load
of a `__asan_poison_memory_region`-poisoned local variable fail as
intended.
* other .ll tests: add -asan-use-stack-safety=0
Reviewed By: kstoimenov
Pull Request: https://github.com/llvm/llvm-project/pull/77210
The term folding logic needs to prove that the induction variable does
not cycle through the same set of values so that testing for the value
of the IV on the exiting iteration is guaranteed to trigger only on that
iteration. The prior code checked the no-self-wrap property on the IV,
but this is insufficient as a zero step is trivially no-self-wrap per
SCEV's definition but does repeat the same series of values.
In the current form, this has the effect of basically disabling lsr's
term-folding for all non-constant strides. This is still a net
improvement as we've disabled term-folding entirely, so being able to
enable it for constant strides is still a net improvement.
As future work, there's two SCEV weakness worth investigating.
First sext (or i32 %a, 1) to i64 does not return true for
isKnownNonZero. This is because we check only the unsigned range in that
query. We could either do query pushdown, or check the signed range as
well. I tried the second locally and it has very broad impact - i.e. we
have a bunch of missing optimizations here.
Second, zext (or i32 %a, 1) to i64 as the increment to the IV in
expensive_expand_short_tc causes the addrec to no longer be provably
no-self-wrap. I didn't investigate this so it might be necessary, but
the loop structure is such that I find this result surprising.
- [DebugMetadata][DwarfDebug] Support function-local types in lexical
block scopes (4/7)
- [CloneFunction][DebugInfo] Avoid cloning DILocalVariables of inlined
functions
This is a follow-up for https://reviews.llvm.org/D144006, fixing a crash
reported
in Chromium (https://reviews.llvm.org/D144006#4651955).
The first commit is added for convenience, as it has already been
accepted.
If DISubpogram was not cloned (e.g. we are cloning a function that has
other
functions inlined into it, and subprograms of the inlined functions are
not supposed to be cloned), it doesn't make sense to clone its
DILocalVariables as well.
Otherwise get duplicated DILocalVariables not tracked in their
subprogram's retainedNodes, that crash LTO with Chromium.
This is meant to be committed along with
https://reviews.llvm.org/D144006.