when exception handling with setjmp/longjmp (exception-mode=eh_sjlj is
enabled,
eh_sjlj_callsite intrinsic is inserted in same basic block as the
throwing/exception instruction. This fix ensures stack protector
insertion code does not split the block and move these apart into
different basic blocks.
This is a reland for 99e53cb4139eda491f97cb33ee42ea424d352200 with the
appropriate test fixes.
It's possible for __stack_chk_fail to be an alias when using CrossDSOCFI
since it will make a jump table entry for this function and replace it
with an alias. StackProtector can crash since it always expects this to
be a regular function. Instead add the noreturn attribute to the call.
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
The compiler should not introduce calls to arbitrary strings
that aren't defined in RuntimeLibcalls. Previously OpenBSD was
disabling the default __stack_chk_fail, but there was no record
of the alternative __stack_smash_handler function it emits instead.
This also avoids a random triple check in the pass.
It's possible for __stack_chk_fail to be an alias when using CrossDSOCFI
since it will make a jump table entry for this function and replace it
with an alias. StackProtector can crash since it always expects this to
be a regular function. Instead add the noreturn attribute to the call.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The issue is caused by [D133860](https://reviews.llvm.org/D133860).
The guard would be inserted in wrong place in some cases, like the test
case showed below.
This patch fixed the issue by using `isInTailCallPosition()` to verify
whether the tail call is in right position.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
Despite the name, the HasAddressTaken() heuristic identifies not only
allocas that have their address taken, but also those that have accesses
that cannot be proven to be in-bounds.
However, the current handling for phi nodes is incorrect. Phi nodes are
only visited once, and will perform the analysis using whichever
(remaining) allocation size is passed the first time the phi node is
visited. If it is later visited with a smaller remaining size, which may
lead to out of bounds accesses, it will not be detected.
Fix this by keeping track of the smallest seen remaining allocation size
and redo the analysis if it is decreased. To avoid degenerate cases
(including via loops), limit the number of allowed decreases to a small
number.
Atomicrmw xchg can directly take a pointer operand, so we should
treat it similarly to store or cmpxchg.
In practice, I believe that all targets that support stack protectors
will convert this to an integer atomicrmw xchg in AtomicExpand, so
there is no issue in practice. We still should handle it correctly
if that doesn't happen.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
The original `StackProtector` is both transform and analysis pass, break
it into two passes now. `getAnalysis<StackProtector>()` could be now
replaced by `FAM.getResult<SSPLayoutAnalysis>(F)` in new pass system.
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
This is the first of a series of patch to improve Alias Analysis on
Scalable quantities.
Keep Scalable information from TypeSize which
will be used in Alias Analysis.
Computing EH-related information was only relevant for analysis passes so far. Lifting it to IR will allow the IR Verifier to calculate EH funclet coloring and validate funclet operand bundles in a follow-up step.
Reviewed By: rnk, compnerd
Differential Revision: https://reviews.llvm.org/D138122
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.
Update a few of the uses, but there are quite a lot of these.
The IR stack protector pass should insert stack checks before the tail
calls not only the musttail calls. So that the attributes `ssqreq` and
`tail call`, which are emited by llvm-opt, could be both enabled by
llvm-llc.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D133860
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
* Use TypeSize when determining if accesses to a variable are
considered out-of-bounds so that the behaviour is correct for
scalable vectors.
* When stack protection is enabled move the stack protector location
to the top of the SVE locals, so that any overflow in them (or the
other locals which are below that) will be detected.
Fixes: https://github.com/llvm/llvm-project/issues/51137
Differential Revision: https://reviews.llvm.org/D111631
This is a port of the feature that allows the StackProtector pass to omit
checking code for stack canary checks, and rely on SelectionDAG to do it at a
later stage. The reasoning behind this seems to be to prevent the IR checking
instructions from hindering tail-call optimizations during codegen.
Here we allow GlobalISel to also use that scheme. Doing so requires that we
do some analysis using some factored-out code to determine where to generate
code for the epilogs.
Not every case is handled in this patch since we don't have support for all
targets that exercise different stack protector schemes.
Differential Revision: https://reviews.llvm.org/D98200
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.
Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:
1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95982