This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
%0 = cleanuppad ...
call @destructor
cleanupret from %0 unwind label %catch.dispatch
```
If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
%0 = cleanuppad ...
invoke @throwing_func
to label %unreachale unwind label %catch.dispatch.i
catch.dispatch.i:
catchswitch ... [ label %catch.start.i ]
catch.start.i:
%1 = catchpad ...
invoke @some_function
to label %invoke.cont.i unwind label %terminate.i
invoke.cont.i:
catchret from %1 to label %destructor.exit
destructor.exit:
cleanupret from %0 unwind label %catch.dispatch
```
We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.
The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).
This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:
5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.
This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.
---
- Details about changes in the Wasm backend:
After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.
In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.
---
Fixes#114600.
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
In #70357, I changed a isLegalOrCustom to isLegalOrCustomOrPromote in
visitSelect to enable integer min/max to be formed when the operation
was promoted. Unfortunately, this also affected floating point. For
floating point, fmaxnum may require a libcall so we also need to check
if the operation on the promoted type is legal or custom.
Other changes to RISC-V have seen made the original change untested so
this patch restores the original isLegalOrCustom.
Fixes#114520.
The dpbusd_const.ll test change is due to us losing the expanded add reduction pattern as one of the elements is known to be zero (removing one of the adds from the reduction pyramid). I don't think its of concern.
Noticed while working on #107423
When LiveRangeEdit::eliminateDeadDef() converts an MI to a KILL instruction,
it should also call dropMemRefs() in order to erase any MachineMemOperand
present.
This was discovered in testing as the MachineVerifier does not accept an MMO
without the corresponding MI mayLoad/mayStore flag, which the KILL opcode
lacks.
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.
These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.
Alternative to #114967
Fixes the remaining regression in #112588
This fixes a bug that started triggering after #111730, where we could
remove a load with multiple uses. It looks like the match should be
checking the other register in a one-use check.
%SrcReg = load..
%DstReg = sign_extend_inreg %SrcReg
This matches InstCombine and DAGCombine.
RISC-V only has an ADDI instruction so without this we need additional
patterns to do the conversion.
Some of the AMDGPU tests look like possible regressions. Maybe some
patterns from isel aren't imported.
64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.
Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.
We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
Same fold as #114389 which added this for SimplifyMultipleUseDemandedBits
When trying to evaluate an expression in a narrower type, the
DAGCombine should propagate the disjoint flag, as it's equally
valid on the narrower expression.
This helps improve better use of addressing modes for some
Arm SME instructions, for example.
Original commit: 19c8475871faee5ebb06281034872a85a2552675
Multiple 2-stage sanitizer buildbots were reporting failures, the issue
has been addressed separately in 29246a92aee87e86cbc2bb64ee520d7385644f34.
Crashes on ARM builds
https://lab.llvm.org/buildbot/#/builders/85/builds/2548
```
DAGCombiner.cpp:16157: SDValue (anonymous
namespace)::DAGCombiner::visitFREEZE(SDNode *):
Assertion `DAG.isGuaranteedNotToBeUndefOrPoison(R,
false) && "Can't create node that may be
undef/poison!"' failed.
```
Issue #114648
This reverts commit 19c8475871faee5ebb06281034872a85a2552675.
This fixes issue #114194
The issue happens during the `LiveRangeShrink` pass, which runs early,
before phi elimination. LandingPads, which are lowered to EHLabels, need
to be the first non phi instruction in an EHPad. In case of a phi node
being in front of the EHLabel and a use being after the EHLabel, we
hoist the use in front of the label.
This results in a portion of the landingpad missing due to being hoisted
in front of the label.
If we've handled ==, and < above, the only case left can be >. We don't
need to branch on this, and can instead assert and reduce indentation,
and simplify reasoning about the fallthrough path.
This pull request corrects multiple occurrences of the typo "avaliable"
to "available" across the LLVM and Clang codebase. These changes improve
the clarity and accuracy of comments and documentation. Specific
modifications are in the following files:
1. clang-tools-extra/clang-tidy/readability/FunctionCognitiveComplexityCheck.cpp:
Updated comments in readability checks for cognitive complexity.
2. llvm/include/llvm/ExecutionEngine/Orc/ExecutionUtils.h: Corrected
documentation for JITDylib responsibilities.
3. llvm/include/llvm/Target/TargetMacroFusion.td: Fixed descriptions for
FusionPredicate variables.
4. llvm/lib/CodeGen/SafeStack.cpp: Improved comments on DominatorTree
availability.
5. llvm/lib/Target/RISCV/RISCVSchedSiFive7.td: Enhanced resource usage
descriptions for vector units.
6. llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp: Updated invariant
description in shift-detect idiom logic.
7. llvm/test/MC/ARM/mve-fp-registers.s: Amended ARM MVE register
availability notes.
8. mlir/lib/Bytecode/Reader/BytecodeReader.cpp: Adjusted forward
reference descriptions for bytecode reader operations.
These changes have no impact on code functionality, focusing solely on
documentation clarity.
Co-authored-by: wangqiang <wangqiang1@kylinos.cn>
When expanding an atomicrmw with a cmpxchg, preserve any metadata
attached to it. This will avoid unwanted double expansions
in a future commit.
The initial load should also probably receive the same metadata
(which for some reason is not emitted as an atomic).
Check to see if we are only demanding (shifted) signbits from a SRL node that are also signbits in the source node.
We can't demand any upper zero bits that the SRL will shift in (up to max shift amount), and the lower demanded bits bound must already be all signbits.
In buildLocationList, with basic block sections, we iterate over
every basic block twice to detect section start and end. This is
sub-optimal and shows up as significantly time consuming when
compiling large functions.
This patch uses the set of sections already stored in MBBSectionRanges
and iterates over sections rather than basic blocks.
When detecting if loclists can be merged, the end label of an entry is
matched with the beginning label of the next entry. For the section
corresponding to the entry basic block, this is skipped. This is
because the loc list uses the end label corresponding to the function
whereas the MBBSectionRanges map uses the function end label.
For example:
.Lfunc_begin0:
.file
.loc 0 4 0 # ex2.cc:4:0
.cfi_startproc
.Ltmp0:
.loc 0 8 5 prologue_end # ex2.cc:8:5
....
.LBB_END0_0:
.cfi_endproc
.section .text._Z4testv,"ax",@progbits,unique,1
...
.Lfunc_end0:
.size _Z4testv, .Lfunc_end0-_Z4testv
The debug loc uses ".LBB_END0_0" for the end of the section whereas
MBBSectionRanges uses ".Lfunc_end0".
It is alright to skip this as we already check the section corresponding
to the debugloc entry.
Added a new test case to check that if this works correctly when the
variable's value is mutated in the entry section.
This extends the existing fold:
```
for(i = Start; i < End; ++i)
Rem = (i nuw+- IncrLoopInvariant) u% RemAmtLoopInvariant;
```
->
```
Rem = (Start nuw+- IncrLoopInvariant) % RemAmtLoopInvariant;
for(i = Start; i < End; ++i, ++rem)
Rem = rem == RemAmtLoopInvariant ? 0 : Rem;
```
To work with a non-zero `IncrLoopInvariant`.
This is a common usage in cases such as:
```
for(i = 0; i < N; ++i)
if ((i + 1) % X) == 0)
do_something_occasionally_but_not_first_iter();
```
Alive2 w/ i4/unrolled 6x (needs to be ran locally due to timeout):
https://alive2.llvm.org/ce/z/6tgyN3
Exhaust proof over all uint8_t combinations in C++:
https://godbolt.org/z/WYa561388
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`
as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)`
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`
iff valid for the target.
Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
Allow construction of DstOp with VRegAttrs.
Also allow construction with register class or bank and LLT.
Intended to be used in lowering code for reg-bank-select where
new registers need to have both register bank and LLT.
Add support for new type of DstOp in CSEMIRBuilder.
On ANDNOT capable targets we can always do this profitably, without ANDNOT we only attempt this if we don't introduce an additional NOT
Followup to #112547