When looking for a largest legal integer type for a target
`TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is
the largets possible integer type. The patch removes this assumption and
uses `MVT::LAST_INTEGER_VALUETYPE` instead.
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
Releand with test fixes.
clang-format will format entire class when `class PHIElimination :
public MachineFunctionPass {` is changed. Format it firstly to reduce
unnecessary changes when porting it to new pass manager.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
Menezes, Evandro, Sebastian Pop, and Aditya Kumar. "Clustering case
statements for indirect branch predictors." arXiv preprint
arXiv:1910.02351 (2019).
https://arxiv.org/pdf/1910.02351v2
AArch64 uses MCAsmInfo::AssemblerDialect to control the style of emitted
Neon assembly. E.g. Apple platforms use AsmWriterVariantTy::Apple by
default which collides with InlineAsm::AD_Intel (both value 1). Checking
for inlineasm dialects on non-X86 platforms can thus lead to problems.
4e0bd3f improved early MachineLICM's capabilities to hoist COPY from
physical registers out of a loop. However, it accidentally broke one of
MachineSink's preconditions on sinking cheap instructions (in this case,
COPY) which considered those instructions being profitable to sink only
when there are at least two of them in the same def-use chain in the
same basic block. So if early MachineLICM hoisted one of them out,
MachineSink no longer sink rest of the cheap instructions. This results
in redundant load immediate instructions from the motivating example
we've seen on RISC-V.
This patch fixes this by teaching MachineSink that if there is more than
one demand to sink a register into the same block from different
critical edges, it should be considered profitable as it increases the
CSE opportunities.
This change also improves two of the AArch64's cases.
As pointed out by @preames, ConstantRange can wrap so it's possible
for zero to be in a range without zero being the minimum. This fixes
this by checking contains instead.
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.
The details of the optimization are outlined in #82075Fixes#82075
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
This re-commits d1a4f0c9fb559eb4c2fb56112e56343bcd333edc after
a issue was fixed in f92bfca9fc217cad9026598ef6755e711c0be070
("[AArch64] All bits of an exact right shift are demanded (#97448)").
If a function's address is taken, which means it may be called via a function pointer,
we need the function descriptor for it.
Otherwise, the function descriptor can be omitted for external symbols.
GNU assembler 2.26 introduced the .cfi_label directive. It does not
expand to any CFI instructions, but defines a label in
.eh_frame/.debug_frame, which can be used by runtime patching code to
locate the FDE. .cfi_label is not allowed for CIE's initial
instructions, and can therefore be used to force the next instruction to
be placed in a FDE instead of a CIE.
In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force
DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have
copied this use.
```
.cfi_startproc
// DW_CFA_undefined is allowed for CIE's initial instructions.
// Without .cfi_label, gas would place DW_CFA_undefined in a CIE.
.cfi_label .Ldummy
.cfi_undefined ra
.cfi_endproc
```
No CFI instruction is associated with .cfi_label, so the `case
MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make
-Wswitch happy.
Close#97222
Pull Request: https://github.com/llvm/llvm-project/pull/97922
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO item to remove `raw_string_ostream::str()`.
Add simple support for looking through ZEXT/ANYEXT/SEXT when doing
ComputeKnownSignBits for SHL. This is valid for the case when all
extended bits are shifted out, because then the number of sign bits
can be found by analysing the EXT operand.
A future improvement could be to pass along the "shifted left by"
information in the recursive calls to ComputeKnownSignBits. Allowing
us to handle this more generically.
VSCALE is by definition greater than zero, but this checks it via
getVScaleRange anyway.
The motivation for this is to be able to check if the EVL for a VP
strided load is non-zero in #97394.
I added the tests to the RISC-V backend since the existing X86
known-never-zero.ll test crashed when trying to lower vscale for the
+sse2 RUN line.
The same assert appears in the TargetLowering function.
Refine comment to describe as a convenience wrapper and leave it to
TargetLowering documentation to explain.
#97645 proposed to remove LegalTypes from getShiftAmountTy. This patches
removes it from getShiftAmountConstant which is one of the callers of
getShiftAmountTy.
In #97645, I proposed removing the LegalTypes operand to
TargetLowering::getShiftAmountTy. This means we don't need to use the
DAGCombiner wrapper for getShiftAmountTy that manages this flag. Now we
can use getShiftAmountConstant and let it call
TargetLowering::getShiftAmountTy.
When this flag was false, `getShiftAmountTy` would return `PointerTy`
instead of the target's preferred shift amount type for scalar shifts.
This used to be needed when the target's preferred type wasn't large
enough to support the shift amount needed for an illegal type. For
example, any scalar type larger than i256 on X86 since X86's preferred
shift amount type is i8.
For a while now, we've had code that uses `MVT::i32` if `LegalTypes` is
true, but the target's preferred type is too small. This fixed a
repeated cause of crashes where the `LegalTypes` flag wasn't set to
false when illegal types could be present.
This has made it unnecessary to set the `LegalTypes` flag correctly, and
as a result more and more places don't. So I think its time for this
flag to go away.
This first patch just disconnects the flag. The interface and all
callers will be cleaned up in follow up patches.
The X86 test change is because we now have the same shift type for both
shifts in a (srl (sub C, (shl X, 32), 32) sequence. This makes the shift
amounts appear equal in value and type which is needed to enable a
combine.
getSectionIDNum may return same value for two different MBBSectionID.
e.g. A Cold type MBBSectionID with number 0 and a Default type
MBBSectionID with number 2 get same value 2 from getSectionIDNum. This
may lead to overwrite of MBBSectionRanges. Using MBBSectionID itself
as DenseMap key is better choice.
This patch fixes a miscompilation when `N` gets CSEed to `Existing`:
```
Existing: t5: i32 = sub nuw Constant:i32<0>, t3
N: t30: i32 = sub Constant:i32<0>, t3
```
Fixes https://github.com/llvm/llvm-project/issues/96366.
PHIElim deduplicates identical PHI nodes to reduce the number of copies
inserted. There are two cases:
1. Identical PHI nodes are in different blocks. That's the reason for
this optimization; this can't be avoided at SSA-level. A necessary
prerequisite for this is that the predecessors of all basic blocks
(where such a PHI node could occur) are the same. This implies that
all (>= 2) predecessors must have multiple successors, i.e. all edges
into the block are critical edges.
2. Identical PHI nodes are in the same block. CSE can remove these.
There are a few cases, however, where they still occur regardless:
- expand-large-div-rem creates PHI nodes with large integers, which
get lowered into one PHI per MVT. Later, some identical values
(zeroes) get folded, resulting in identical PHI nodes.
- peephole-opt occasionally inserts PHIs for the same value.
- Some pseudo instruction emitters create redundant PHI nodes (e.g.,
AVR's insertShift), merging the same values more than once.
In any case, this happens rarely and MachineCSE handles most cases
anyway, so that PHIElim only gets to see very few of such cases (see
changed test files).
Currently, all PHI nodes are inserted into a DenseMap that checks
equality not by pointer but by operands. This hash map is pretty
expensive (hashing itself and the hash map), but only really useful in
the first case.
Avoid this expensive hashing most of the time by restricting it to basic
blocks with only critical input edges. This improves performance for
code with many PHI nodes, especially at -O0. (Note that Clang often
doesn't generate PHI nodes and -O0 includes no mem2reg. Other
compilers always generate PHI nodes.)
…ain cases" (#97246)
This reverts commit e6a961dbef773b16bda2cebc4bf9f3d1e0da42fc.
There is no difference from the original change. I re-ran the failed
test and it passed. So the failure wasn't caused by this change.
test result: https://lab.llvm.org/buildbot/#/builders/176/builds/585
With `--trap-unreachable`, `clang` can emit double `TRAP` instructions
for code that contains a call to `__builtin_trap()`:
```
> cat test.c
void test() { __builtin_trap(); }
> clang test.c --target=x86_64 -mllvm --trap-unreachable -O1 -S -o -
...
test:
...
ud2
ud2
...
```
`SimplifyCFGPass` inserts `unreachable` after a call to a `noreturn`
function, and later this instruction causes `TRAP/G_TRAP` to be emitted
in `SelectionDAGBuilder::visitUnreachable()` or
`IRTranslator::translateUnreachable()` if
`TargetOptions.TrapUnreachable` is set.
The patch checks the instruction before `unreachable` and avoids
inserting an additional trap.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
`SelectionDAG::getBitcastedAnyExtOrTrunc` assumes that there is always a
valid
integer type corresponding to another type, which is not always true
when it
comes to vector type. For example, `<3 x i8>` doesn't have a
corresponding
integer type.
Fix SWDEV-464698.
Timers are an out-of-line function call and a global variable access,
here twice per emitted instruction. At this granularity, not only the
time results become skewed, but the timers also add a performance
overhead when profiling is disabled. Also outside of the innermost loop,
timers add a measurable overhead. As this is quite expensive for a
mostly unused profiling facility, remove the timers.
Fixes#39650.