
The current implementation of -fsanitize=function places two words (the prolog signature and the RTTI proxy) at the function entry, which makes the feature incompatible with Intel Indirect Branch Tracking (IBT) that needs an ENDBR instruction at the function entry. To allow the combination, move the two words before the function entry, similar to -fsanitize=kcfi. Armv8.5 Branch Target Identification (BTI) has a similar requirement. Note: for IBT and BTI, whether a function gets a marker instruction at the entry generally cannot be assumed (it can be disabled by a function attribute or stronger LTO optimizations). It is extremely unlikely for two words preceding a function entry to be inaccessible. One way to achieve this is by ensuring that a function is aligned at a page boundary and making the preceding page unmapped or unreadable. This is not reasonable for application or library code. (Think: the first text section has crt* code not instrumented by -fsanitize=function.) We use 0xc105cafe for all targets. .long 0xc105cafe disassembles to invalid instructions on all architectures I have tested, except Power where it is `lfs 8, -13570(5)` (Load Floating-Point with a weird offset, unlikely to be used in real code). --- For the removed function in AsmPrinter.cpp, remove an assert: `mdconst::extract` already asserts non-nullness. For compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp, when the function doesn't have prolog/epilog (-O1 and above), after moving the two words, the address of the function equals the address of ret instruction, so symbolizing the function will additionally get a non-zero column number. Adjust the test to allow an optional column number. ``` .long 3238382334 .long .L__llvm_rtti_proxy-_Z1fv _Z1fv: // symbolizing here retrieves the line table entry from the second .loc .file 0 ... .loc 0 1 0 .cfi_startproc .loc 0 2 1 prologue_end retq ``` Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D148665
IRgen optimization opportunities. //===---------------------------------------------------------------------===// The common pattern of -- short x; // or char, etc (x == 10) -- generates an zext/sext of x which can easily be avoided. //===---------------------------------------------------------------------===// Bitfields accesses can be shifted to simplify masking and sign extension. For example, if the bitfield width is 8 and it is appropriately aligned then is is a lot shorter to just load the char directly. //===---------------------------------------------------------------------===// It may be worth avoiding creation of alloca's for formal arguments for the common situation where the argument is never written to or has its address taken. The idea would be to begin generating code by using the argument directly and if its address is taken or it is stored to then generate the alloca and patch up the existing code. In theory, the same optimization could be a win for block local variables as long as the declaration dominates all statements in the block. NOTE: The main case we care about this for is for -O0 -g compile time performance, and in that scenario we will need to emit the alloca anyway currently to emit proper debug info. So this is blocked by being able to emit debug information which refers to an LLVM temporary, not an alloca. //===---------------------------------------------------------------------===// We should try and avoid generating basic blocks which only contain jumps. At -O0, this penalizes us all the way from IRgen (malloc & instruction overhead), all the way down through code generation and assembly time. On 176.gcc:expr.ll, it looks like over 12% of basic blocks are just direct branches! //===---------------------------------------------------------------------===//