This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.
Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.
Extra tests have been added in 52d29eb287.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
Now that FOR exit and resume value creation is explicitly modeled in
VPlan (05e1b5340b0caf1, 07b330132c0b) it doesn't depend on the first
order recurrence splice being preserved and it can now be marked as not
having side-effects. This allows removal of first-order-recurrence-splce
if the FOR is only used in the exit or as scalar ph resume value.
Apply the onlyFirstPartUsed logic generally to all per-part
VPInstructions. Note that the test changes remove the second part
of an unsued first-order recurrence splice.
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This PR is just for Tan.
Now that x86 tan backend landed:
https://github.com/llvm/llvm-project/pull/90503 we can add other
backends since the shared pieces are in tree now.
Changes:
- `llvm/include/llvm/Analysis/VecFuncs.def` - vectorization of tan for
arm64 backends.
- `llvm/lib/Target/AArch64/AArch64FastISel.cpp` - Add tan to the libcall
table
- `llvm/lib/Target/AArch64/AArch64ISelLowering.cpp` - Add tan expansion
for f128, f16, and vector\neon operations
- `llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp` define
`G_FTAN` as a legal arm64 instruction
resolves#94755
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Much of this change was following how G_FSIN and G_FCOS were used.
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic
resolves https://github.com/llvm/llvm-project/issues/70082
This patch uses the ExtractFromEnd VPInstruction opcode
to extract the value of a FOR to be used as resume value for the ph in
the scalar loop.
It adds a new live-out that temporarily wraps the FOR phi in the scalar
loop. fixFixedOrderRecurrence will process live outs for fixed order
recurrence phis by creating a new phi node in the scalar preheader,
using the generated value for the live-out as incoming value from the
middle block and the original start value as incoming value for the
other edge. Creation of the phi in the preheader, as well as updating
the phi in the scalar loop will also be moved to VPlan in the future,
eventually retiring fixFixedOrderRecurrence
Depends on https://github.com/llvm/llvm-project/pull/93395
PR: https://github.com/llvm/llvm-project/pull/93396
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.
When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.
Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.
Depends on https://github.com/llvm/llvm-project/pull/93498
PR: https://github.com/llvm/llvm-project/pull/93499
VPTypeAnalysis::inferScalarTypeForRecipe is missing the case for
VPInstruction::LogicalAnd, due to which the test
vplan-incomplete-cases.ll crashes. Add this missing case, and move the
test in vplan-infer-not-or-type.ll to vplan-incomplete-cases.ll, showing
correct codegen for trip-counts 2 and 3.
This patch introduces a new ExtractFromEnd VPInstruction opcode to
extract the value of a FOR for users outside the loop (i.e. in the
scalar loop's exits). This moves the first part of fixing first order
recurrences to VPlan, and removes some additional code to patch up
live-outs, which is now handled automatically.
The majority of test changes is due to changes in the order of which the
extracts are generated now. As we are now using VPTransformState to
generate the extracts, we may be able to re-use existing extracts in the
loop body in some cases. For scalable vectors, in some cases we now have
to compute the runtime VF twice, as each extract is now independent, but
those should be trivial to clean up for later passes (and in line with
other places in the code that also liberally re-compute runtime VFs).
PR: https://github.com/llvm/llvm-project/pull/93395
For interleave groups we only create a pointer for the start of the
interleave group, not all original loads/stores. Mark single-use ops
feeding interleave group mem ops as free when vectorizing.
Generalize LoopVectorizationPlanner::isMoreProfitable smoothly across
the fixed-vector and scalable-vector cases, taking the trip-count into
account, and fixing logical pitfalls that arise from a lack of
generality.
As a follow-up to b2f65e80, use the DTU to also update and preserve
the DT in the native path. This should also allow preserving SCEV in the
native path
PR: https://github.com/llvm/llvm-project/pull/93287
Update SCEVUnionPredicate::add to only add predicates from another union
predicate, if they aren't alread implied by the union predicate we add
them to.
Note that there exists logic elsewhere to avoid adding predicates if
they are already implied, but this logic misses cases when only some
predicates of a union predicate are implied by the current set of
predicates.
PR: https://github.com/llvm/llvm-project/pull/93397
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.
In LoopVectorizationCostModel::getInstructionCost(), when the condition
canTruncateToMinimalBitwidth() is satisfied, for a trunc, the source
type is computed as the smallest type of the source vector and the
destination vector, and the destination type is computed as the largest
type of the instruction and destination type. This is clearly a logical
error, as the original source vector type could be smaller than the
original destination vector type, and the trunc semantics are broken
because we're attempting to widen.
Fixes#47665.
The behaviour of the flag should be equivalent to
__arm_streaming_compatible.
At the moment, the name suggests that '-force-streaming-compatible-sve'
on its own (i.e. without specifying `+sve`) enables the compiler to use
the streaming-compatible subset of SVE instructions, but the semantics
merely are that the function can be called with either PSTATE.SM=0 or
PSTATE.SM=1.
This patch canonicalizes constant expression GEPs to use i8 source
element type, aka ptradd. This is the ConstantFolding equivalent of the
InstCombine canonicalization introduced in #68882.
I believe all our optimizations working on constant expression GEPs
(like GlobalOpt etc) have already been switched to work on offsets, so I
don't expect any significant fallout from this change.
This is part of:
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699
isLegalInterleavedAccessType expects the subvector type, but
getInterleavedMemoryOpCost is called with the full vector type. So we
need to divide by Factor.
Update VPBlendRecipe::execute to support generating code for first-lane
only. This fixes a crash in the newly added test
@test_not_first_lane_only_wide_compare_incoming_order_swapped.