This fixes issue #114194
The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.
This results in a portion of the landingpad missing due to being hoisted
in front of the label.
If we've handled ==, and < above, the only case left can be >. We don't
need to branch on this, and can instead assert and reduce indentation,
and simplify reasoning about the fallthrough path.
This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:
1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.
These changes have no impact on code functionality, focusing solely on
documentation clarity.
Co-authored-by: wangqiang <wangqiang1@kylinos.cn>
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.
The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.
We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
This extends the existing fold:
```
for(i = Start; i < End; ++i)
Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```
To work with a non-zero `IncrLoopInvariant`.
This is a common usage in cases such as:
```
for(i = 0; i < N; ++i)
if ((i + 1) % X) == 0)
do_something_occasionally_but_not_first_iter();
```
Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3
Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`
as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)`
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`
iff valid for the target.
Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
Allow construction of DstOp with VRegAttrs.
Also allow construction with register class or bank and LLT.
Intended to be used in lowering code for reg-bank-select where
new registers need to have both register bank and LLT.
Add support for new type of DstOp in CSEMIRBuilder.
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT
Followup to #112547
Fix bug that `mir-strip-debug` pass does not remove debug location from
bundled instructions.
Problem arises during testing that debug info does not affect
optimization passes output (`llvm-lit` with ` -Dllc="llc
-debugify-and-strip-all-safe"`), when pass operates on MIR with bundled
instructions + memory operands.
Let mir test check looks like:
```
CHECK-NEXT: BUNDLE {
CHECK-NEXT: $r3 = LD $r1, $r2 :: (load (s64) from %ir.a, !tbaa !2)
CHECK-NEXT: }
```
So as `mir-strip-debug` pass does not process bundled instructions,
running `llc -debugify-and-strip-all-safe` on the test will produce the
following output:
```
BUNDLE {
$r3 = LD $r1, $r2, debug-location !DILocation(line: 3, column: 1, scope: <0x608cb2b99b10>) :: (load (s64) from %ir.a, !tbaa !2)
}
```
And test will fail, but it shouldn't.
Seems like the root cause is that `mir-strip-debug` pass should remove
debug location from bundled instructions.
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
If an undef subregister def is live into another block, we need to
maintain a physreg def to track the liveness of those lanes. This
would manifest a verifier error after branch folding, when the cloned
tail block use no longer had a def.
We need to detect interference with other assigned intervals to avoid
clobbering the undef lanes defined in other intervals, since the undef
def didn't count as interference. This is pretty ugly and adds a new
dependency on LiveRegMatrix, keeping it live for one more pass. It also
adds a lot of implicit operand spam (we really should have a better
representation for this).
There is a missing verifier check for this situation. Added an xfailed
test that demonstrates this. We may also be able to revert the changes
in 47d3cbcf842a036c20c3f1c74255cdfc213f41c2.
It might be better to insert an IMPLICIT_DEF before the instruction
rather than using the implicit-def operand.
Fixes#98474
There was a missing break, which led to an unannotated fallthrough when
merging #112171. This has caused sanitizer builds to fail.
This adds the missing break in the switch statement to ensure that the
fallthrough does not occur.
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.
For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
This patch makes it so that specifying all or none for -pgo-analysis-map
along with an explicit option causes an error as this set of options
does not really make sense.
This patch adds none and all options to the -pgo-analysis-map flag,
which do basically what they say on the tin. The none option is added to
enable forcing the pgo-analysis-map by overriding an earlier invocation
of the flag. The all option is just added for convenience.
Both are based on MachineLICMBase, and the functionality there is
"switched" based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, defined by the `MachineLICM`
and `EarlyMachineLICM` classes.
The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(),
which is un-reliable due to how it is inferred by the MIRParser. I see
that we can now define isSSA in MIR (thanks @gargaroff ), meaning the
fix isn’t really needed anymore, but redefining that flag still feels
wrong.
Note that I'm looking into upstreaming more changes to MachineLICM, see
[the discourse
thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `VecFuncs.def`: define intrinsic to sleef/armpl mapping
- `LegalizerHelper.cpp`: add missing fewerElementsVector handling for
the new atan2 intrinsic
- `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering
like neon instructions
- `AArch64LegalizerInfo.cpp`: Legalize atan2.
Part 5 for Implement the atan2 HLSL Function #70096.
In SelectionDAG, `TargetTransformInfo::hasBranchDivergence()` can be
called when both `NDEBUG` and `LLVM_ENABLE_ABI_BREAKING_CHECKS` are
enabled. In that case, the class member `TTI` is still initialized to
`nullptr`, causing a segfault.
Fix this by ensuring that all the calls to `hasBranchDivergence` and
`VerifyDAGDivergence` only occur when `NDEBUG` is disabled, and
`LLVM_ENABLE_ABI_BREAKING_CHECKS` is enabled.
Symbols that get mapped into the read-only section are loaded as part of
the text segment and will always need a TOC entry to be addressable. Add
an option to aggressively merge these read only globals to reduce TOC
usage.
Before this patch, redundant COPY couldn't be removed for the following
case:
```
$R0 = OP ...
... // Read of %R0
$R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
$R1 = OP ...
... // Replace all uses of %R0 with $R1
```
Store Swift mangled names in DW_AT_linkage_name. The Swift compiler
emits only the type mangled name in debug information, and LLDB uses
those mangled names as keys to look up size, alignment, fields, etc
from either reflection metadata or Swift modules.
Additionally, emit types linkage names for types into the accelerator
table if they exist and they're different from the display name.