Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.
The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
Currently LLVMContext::emitError emits any error as an "inline asm"
error which does not make any sense. InlineAsm appears to be special,
in that it uses a "LocCookie" from srcloc metadata, which looks like
a parallel mechanism to ordinary source line locations. This meant
that other types of failures had degraded source information reported
when available.
Introduce some new generic error types, and only use inline asm
in the appropriate contexts. The DiagnosticInfo types are still
a bit of a mess, and I'm not sure why DiagnosticInfoWithLocationBase
exists instead of just having an optional DiagnosticLocation in the
base class.
DK_Generic is for any error that derives from an IR level instruction,
and thus can pull debug locations directly from it. DK_GenericWithLoc
is functionally the generic codegen error, since it does not depend
on the IR and instead can construct a DiagnosticLocation from the
MI debug location.
It doesn't make sense to add a new generic ISD to handle riscv tuple
type. Instead we use `SPLAT_VECTOR` for ISD and further lower to
`VMV_V_X`.
Note: If there's `visitSPLAT_VECTOR` in generic DAG combiner, it needs
to skip riscv vector tuple type.
Stack on https://github.com/llvm/llvm-project/pull/114329
Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel,
and IntrinsicLowering in the same way \@llvm.expect is handled, where
the value is passed through as-is. This can be reached if the intrinsic
is used without optimizations, where it would otherwise be properly
transformed out.
Fixes#115411 for SelectionDAG. A similar patch is likely needed for
GlobalISel.
As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false.
The target-agnostic SelectionDAG lowering is similar to the IR in #106560.
This patch introduces an experimental intrinsic for matching the
elements of one vector against the elements of another.
For AArch64 targets that support SVE2, the intrinsic lowers to a MATCH
instruction for supported fixed and scalar vector types.
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
%0 = cleanuppad ...
call @destructor
cleanupret from %0 unwind label %catch.dispatch
```
If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
%0 = cleanuppad ...
invoke @throwing_func
to label %unreachale unwind label %catch.dispatch.i
catch.dispatch.i:
catchswitch ... [ label %catch.start.i ]
catch.start.i:
%1 = catchpad ...
invoke @some_function
to label %invoke.cont.i unwind label %terminate.i
invoke.cont.i:
catchret from %1 to label %destructor.exit
destructor.exit:
cleanupret from %0 unwind label %catch.dispatch
```
We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.
The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).
This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:
5091a359d9/llvm/include/llvm/Target/TargetSelectionDAG.td (L700)
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.
This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.
---
- Details about changes in the Wasm backend:
After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.
In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.
---
Fixes#114600.
In #70357, I changed a isLegalOrCustom to isLegalOrCustomOrPromote in
visitSelect to enable integer min/max to be formed when the operation
was promoted. Unfortunately, this also affected floating point. For
floating point, fmaxnum may require a libcall so we also need to check
if the operation on the promoted type is legal or custom.
Other changes to RISC-V have seen made the original change untested so
this patch restores the original isLegalOrCustom.
Fixes#114520.
Original commit: 19c8475871faee5ebb06281034872a85a2552675
Multiple 2-stage sanitizer buildbots were reporting failures, the issue
has been addressed separately in 29246a92aee87e86cbc2bb64ee520d7385644f34.
Crashes on ARM builds
https://lab.llvm.org/buildbot/#/builders/85/builds/2548
```
DAGCombiner.cpp:16157: SDValue (anonymous
namespace)::DAGCombiner::visitFREEZE(SDNode *):
Assertion `DAG.isGuaranteedNotToBeUndefOrPoison(R,
false) && "Can't create node that may be
undef/poison!"' failed.
```
Issue #114648
This reverts commit 19c8475871faee5ebb06281034872a85a2552675.
If we've handled ==, and < above, the only case left can be >. We don't
need to branch on this, and can instead assert and reduce indentation,
and simplify reasoning about the fallthrough path.
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
This fixes all the places that hit the new assertion added in
https://github.com/llvm/llvm-project/pull/106524 in tests. That is,
cases where the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, set the
implicitTrunc flag, or perform more calculations inside APInt.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).
- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC
Part 4 for Implement the atan2 HLSL Function #70096.
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
For the purpose of verifying proper arguments extensions per the target's ABI,
introduce the NoExt attribute that may be used by a target when neither sign-
or zeroextension is required (e.g. with a struct in register). The purpose of
doing so is to be able to verify that there is always one of these attributes
present and by this detecting cases where sign/zero extension is actually
missing.
As a first step, this patch has the verification step done for the SystemZ
backend only, but left off by default until all known issues have been
addressed.
Other targets/front-ends can now also add NoExt attribute where needed and do
this check in the backend.
This patch introduces lowering of the partial add reduction intrinsic to
a udot or svdot for AArch64. This also involves adding a
`shouldExpandPartialReductionIntrinsic` target hook, which AArch64 will
return false from in the cases that it can be lowered.
GCC supports code like "asm volatile ("" : "=r" (i) : "0" (f))" where i
is integer type and f is floating point type. Currently this code
produces an error with Clang. The change allows mixed scalar types
between input and output constraints.
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
For some reason, isOperationLegalOrCustom is not the same as
isOperationLegal || isOperationCustom. Unfortunately, it checks
if the type is legal which makes it uesless for custom lowering
on non-legal types (which is always ppcf128).
Really the DAG builder shouldn't be going to expand this in the
builder, it makes it difficult to work with. It's only here to work
around the DAG requiring legal integer types the same size as
the FP type after type legalization.
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
This change avoids deleting `!willReturn` intrinsics for which the
return value is unused when building the SDAG. Currently, calls to
read-only intrinsics not marked with `IntrWillReturn` cannot be deleted
at the LLVM IR level but may be deleted when building the SDAG.
These calls are unsafe to remove from the IR because the functions are
`!willReturn` and should also be unsafe to remove fromthe SDAG for
the same reason. This change aligns the behavior of the SDAG to that
of LLVM IR. This change also requires that intrinsics not have the
`Throws` attribute to be treated as loads for the same reason.
PR #80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.
This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
C23 introduced new functions fminimum_num and fmaximum_num, and they
follow the minimumNumber and maximumNumber of IEEE754-2019. Let's
introduce new intrinsics to support them.
This patch introduces support only support for scalar values. The
support of
vector (vp, vp.reduce, vector.reduce),
experimental.constrained
will be added in future patches.
With this patch, MIPSr6 and LoongArch can work out of box with
fcanonical and fmax/fmin.
Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while
they have no fcanonical support yet.
I will add it in future patches.
The FMIN/FMAX of RISC-V instructions follows the
minimumNumber/maximumNumber of IEEE754-2019. We can just add it in
future patch.
Background
https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735
Currently we have fminnum/fmaxnum, which have different behavior on
different platform for NUM vs sNaN:
1) Fallback to fmin(3)/fmax(3): return qNaN.
2) ARM64/ARM32+Neon: same as libc.
3) MIPSr6/LoongArch/RISC-V: return NUM.
And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008
will submit as separated patches.
Add custom lowering for `BR_JT` DAG nodes to the `brx.idx` PTX
instruction ([PTX ISA 9.7.13.4. Control Flow Instructions: brx.idx]
(https://docs.nvidia.com/cuda/parallel-thread-execution/#control-flow-instructions-brx-idx)).
Depending on the heuristics in DAG selection, `switch` statements may
now be lowered using `brx.idx`.
Note: this fixes the previous issue in #102400 by adding the isBarrier
attribute to BRX_END