784 Commits

Author SHA1 Message Date
Florian Hahn
c48a1ebec1
[LV] Remove force-vector-width/force-vector-interleave from X86 test.
Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.

With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
2024-09-17 08:59:24 +01:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Simon Pilgrim
6c8746b6e3
[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-03 10:05:56 +01:00
Simon Pilgrim
4d412bedcc [LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions
Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping
2024-08-31 11:48:22 +01:00
Philip Reames
4b553f4916 Regen a bunch of vectorizer tests to avoid naming churn in upcoming review 2024-08-30 10:13:02 -07:00
Simon Pilgrim
d58d105cda
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30 16:49:23 +01:00
Simon Pilgrim
81acc84997 [LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics
Check for cases where there isn't a amdlib call but it still vectorises the math call
2024-08-29 17:31:55 +01:00
Simon Pilgrim
2f95298727 [LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
2024-08-29 14:27:31 +01:00
Simon Pilgrim
c57abc66e2 [LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
2024-08-29 14:27:31 +01:00
Florian Hahn
0a272d3a17
[LV] Use SCEV to analyze second operand for cost query.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.

Fixes https://github.com/llvm/llvm-project/issues/106248.
2024-08-29 12:08:27 +01:00
Florian Hahn
533e6bbd0d
[VPlan] Simplify live-ins if they are SCEVConstant.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.

Fixes https://github.com/llvm/llvm-project/issues/105722.
2024-08-26 09:15:58 +01:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Florian Hahn
2ab910c08c
[LV] Check pointer user are in loop when checking for uniform pointers.
Widening decisions are not set for users outside the loop. Avoid
crashing by only calling isVectorizedMemAccessUse for users in the loop.

Fixes https://github.com/llvm/llvm-project/issues/102934.
2024-08-13 09:23:44 +01:00
Florian Hahn
cd08fadd03
[LV] Include chains feeding inductions in cost precomputation.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.

Fixes https://github.com/llvm/llvm-project/issues/101337.
2024-08-12 14:45:43 +01:00
Florian Hahn
db0603cb7b
[LV] Only OR unique edges when creating block-in masks.
This removes redundant ORs of matching masks.

Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for
masks.
2024-08-12 10:17:40 +01:00
Florian Hahn
5a42a677aa
[VPlan] Mark VPVectorPointer as only using the first part of the ptr.
VPVectorPointerRecipe only uses the first part of the pointer operand,
so mark it accordingly.

Follow-up suggested as part of
https://github.com/llvm/llvm-project/pull/99808.
2024-08-12 08:46:55 +01:00
Farzon Lotfi
efc6b50d2d
[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125)
Adding the following linked to their docs:
-
[amd_vrs16_acosf](9c0b67293b/scripts/libalm.def (L221))
-
[amd_vrd2_cosh](9c0b67293b/scripts/libalm.def (L124))
-
[amd_vrs16_tanhf](9c0b67293b/scripts/libalm.def (L224))
2024-08-11 22:11:09 -04:00
Florian Hahn
60680f7181
[LV] Handle SwitchInst in ::isPredicatedInst.
After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well.
Handle it the same as BranchInst.

This fixes a crash in the newly added test and improves the results for
one of the existing tests in predicate-switch.ll

Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.
2024-08-11 20:56:58 +01:00
Florian Hahn
f0df4fbd0c
[LV] Support generating masks for switch terminators. (#99808)
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:

1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.

Edge masks are created for every destination of cases and/or 
default when requesting a mask where the source is a switch.

Fixes https://github.com/llvm/llvm-project/issues/48188.

PR: https://github.com/llvm/llvm-project/pull/99808
2024-08-11 20:38:36 +02:00
Florian Hahn
fdb9f96fa2
[LV] Consider earlier stores to invariant reduction address as dead.
For invariant stores to an address of a reduction, only the latest store
will be generated outside the loop. Consider earlier stores as dead.

This fixes a difference between the legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/96294.
2024-08-04 20:54:26 +01:00
Florian Hahn
855703537e
[LV] Add more tests with switches.
Extra tests for
https://github.com/llvm/llvm-project/pull/99808, including cost model
tests.
2024-08-01 19:30:48 +01:00
Farzon Lotfi
378fe2fc23
[X86][LoopVectorize] Add support for arc and hyperbolic trig functions (#99383)
This change is part 2 x86 Loop Vectorization of :
https://github.com/llvm/llvm-project/pull/96222

It also has veclib call loop vectorization hence the test cases in
`llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll`

finally the last pr missed tests for
`llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and
`llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell.

No evidence was found for arc and hyperbolic trig glibc vector math
functions

https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h
so no  new `_ZGVbN2v_*` and  `_ZGVdN4v_*` .
So no new tests in
`llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll`

Also no new svml and no new tests to:
`llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll`
There was not enough evidence that there were svml arc and hyperbolic
trig vector implementations, Documentation was scarces so looked at test
cases in
[numpy](32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)).
Someone with more experience with svml should investigate.

## Note 
amd libm doesn't have a vector hyperbolic sine api hence why youi might
notice there are no tests for `sinh`.

## History
This change is part of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.
resolves #70079
resolves #70080
resolves #70081
resolves #70083
resolves #70084
resolves #95966
2024-07-28 20:57:43 -04:00
Florian Hahn
a3092152ac
[VPlan] Don't create live-outs for induction increments.
Follow up to fc9cd3272b5 to also skip creating live-outs for IV
increments, as those are also generated independent of VPlan for now.
2024-07-25 21:34:55 +01:00
Simon Pilgrim
010dcfd85f [CostModel][X86] Improve add/sub/mul overflow intrinsic costs
Noticed due to x86 changes in #97463
2024-07-25 16:01:05 +01:00
Florian Hahn
72532c9219
[LV] Don't predicate divs with invariant divisor when folding tail (#98904)
When folding the tail, at least one of the lanes must execute
unconditionally. If the divisor is loop-invariant no predication is
needed, as predication would not prevent the divide-by-0 on the executed
lane.

Depends on https://github.com/llvm/llvm-project/pull/98892.

PR: https://github.com/llvm/llvm-project/pull/98904
2024-07-25 12:21:09 +01:00
Florian Hahn
05f986e143
[LV] Add tests for loops with switches. 2024-07-21 10:11:38 +01:00
Florian Hahn
710dab6e18
[VPlan] Remove VPPredInstPHIRecipes without users after region merging.
After merging replicate regions, VPPredInstPHIRecipes may become unused.
Remove them directly instead of moving them to the merged region.
2024-07-20 13:21:32 +01:00
Florian Hahn
008df3cf85
[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892)
Any instruction marked as uniform will result in a uniform
VPReplicateRecipe. If it requires predication, it will be placed in a
replicate region, even if isScalarWithPredication returns false.

Check isPredicatedInst instead of isScalarWithPredication to avoid
generating uniform VPReplicateRecipes placed inside a replicate region.
This fixes an assertion when using scalable VFs.

Fixes https://github.com/llvm/llvm-project/issues/80416. 
Fixes https://github.com/llvm/llvm-project/issues/94328.
Fixes https://github.com/llvm/llvm-project/issues/99625.

PR: https://github.com/llvm/llvm-project/pull/98892
2024-07-19 12:02:25 +01:00
Florian Hahn
2bb65660ae
[LV] Allow re-processing of operands of instrs feeding interleave group
Follow up to d216615518 to update dead interleave group pointer detection
to allow re-processing of operands of instructions determined to only feed
interleave groups.

This is needed because instructions feeding interleave group pointers
can become dead in any order, as per the newly added test case.
2024-07-17 21:37:28 +01:00
Florian Hahn
d216615518
[LV] Process dead interleave pointer ops in reverse order.
Process dead interleave pointer ops in reverse order. This also catches
cases where the same base pointer is used by multiple different
interleave groups.

This fixes another case where the legacy cost model inaccuarately
estimates cost, surfaced by b841e2eca3b5c8.
2024-07-17 11:43:42 +01:00
Florian Hahn
967eba0754
[LV] Add test cases for tail-folding sdiv/udiv/urem feeding geps.
Based on reduced tests from
https://github.com/llvm/llvm-project/issues/94328.
2024-07-15 11:45:07 +01:00
Florian Hahn
8fcb822da6
[LV] Add uses of result to pointer-runtime-checks-unprofitable.ll test.
Otherwise %p.2 is not used and will be removed by VPlan transforms,
leading to a difference between legacy and VPlan-based cost.
2024-07-15 09:59:46 +01:00
Florian Hahn
fc9cd3272b
[VPlan] Don't add live-outs for IV phis.
Resume and exit values for inductions are currently still created
outside of VPlan and independent of the induction recipes. Don't add
live-outs for now, as the additional unneeded users can pessimize other
anlysis.

Fixes https://github.com/llvm/llvm-project/issues/98660.
2024-07-14 20:49:03 +01:00
Florian Hahn
7a49d80f58
[VPlan] Skip users outside loop in check for exit pre-compute candidates
When collecting candidates to pre-compute cost for operands of exit
conditions, skip users outside the loop when checking if they are in
ExistInstrs. The users outside the loop should be ignored, as they won't
make a value live in the VPlan.

This fixes a failure when building for X86 with sanitizers on macOS
after b841e2eca3b5c
(https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)
2024-07-11 22:04:39 +01:00
Florian Hahn
9a5a8731e7
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.

The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.

In follow-up changes, it will also be used to create phis for reduction
and induction resume values.

Depends on https://github.com/llvm/llvm-project/pull/92651

PR: https://github.com/llvm/llvm-project/pull/94760
2024-07-11 16:08:04 +01:00
Florian Hahn
ef89e3efa9
[VPlan] Collect ephemeral values for VPlan.
Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan,
use it in willGenerateVectors. This fixes a regression caused by
29b8b72117 for loops where the only vector values are ephemeral.
2024-07-09 21:34:49 +01:00
Florian Hahn
27ccc8835e
[LV] Add tests with ephemeral values that are widened.
Add tests with loops with ephemeral values that are widened.
After 29b8b72117, @ephemeral_load_and_compare_another_load_used_outside
is vectorized even though the only vector values that are generated are
ephemeral.
2024-07-08 13:15:39 +01:00
Florian Hahn
29b8b72117
[LV] Move check if any vector insts will be generated to VPlan. (#96622)
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.

The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.

PR: https://github.com/llvm/llvm-project/pull/96622
2024-07-07 20:08:01 +01:00
Florian Hahn
99d6c6d936
[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651)
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on https://github.com/llvm/llvm-project/pull/92525

PR: https://github.com/llvm/llvm-project/pull/92651
2024-07-05 10:08:42 +01:00
Noah Goldstein
7c96469ea8 [ValueTracking] Extend LHS/RHS with matching operand to work without constants.
Previously we only handled the `L0 == R0` case if both `L1` and `R1`
where constant.

We can get more out of the analysis using general constant ranges
instead.

For example, `X u> Y` implies `X != 0`.

In general, any strict comparison on `X` implies that `X` is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.

Closes #85557
2024-07-03 20:18:51 +08:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Florian Hahn
8681bb8bed
[LV] Add additional test coverage for cost modeling.
Add missing tests uncovered by
https://github.com/llvm/llvm-project/pull/92555.

Includes test for https://github.com/llvm/llvm-project/issues/96294 and
https://github.com/llvm/llvm-project/issues/96328
2024-06-26 10:18:01 +01:00
Nikita Popov
eeb0884e66 [LoopUnroll] Use poison instead of undef for preheader value 2024-06-25 12:09:58 +02:00
Florian Hahn
3808ba78de
[VPlan] Model middle block via VPIRBasicBlock. (#95816)
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.

Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.


PR: https://github.com/llvm/llvm-project/pull/95816
2024-06-20 13:42:20 +01:00
Florian Hahn
b9702bb12f
[LV] Consider insts feeding interleave group pointers free.
For interleave groups, we only generate a pointer for the start of the
interleave group (the instruction at the insert position). The other
addresses for other members are alreayd considered free, but so are
their operands, if they are only used in address computations for
other interleave group members.
2024-06-19 17:06:52 +01:00
Florian Hahn
3be7312f81
[LV] Add more masked store cost tests with different masks.
Add additional masked store tests which caused crashes with earlier
versions of https://github.com/llvm/llvm-project/pull/92555.
2024-06-19 15:34:03 +01:00
Florian Hahn
fb86cb7ec1
[LV] Add extra tests for interleave-group, reduction store costing.
Add extra cost model tests exposed by VPlan cost-model transition,
causing revert in 6f538f6a2d3224efda985e9eb09012fa4275ea92
2024-06-18 14:35:51 +01:00
Florian Hahn
52d29eb287
[LV] Add extra cost model tests with truncated inductions.
Extra test cases that caused revert of
https://github.com/llvm/llvm-project/pull/92555
2024-06-13 20:42:53 +01:00
Florian Hahn
2e4c06780c
[LV] Add extra X86 cost tests for any_of reduction and multi-exit loops.
Add extra test coverage to ensure decisions do not change when
transitioning to a VPlan-based cost model.
2024-06-10 13:13:04 +01:00
Florian Hahn
998c33e5fc
[VPlan] Mark FirstOrderRecurrenceSplice as not having side-effects.
Now that FOR exit and resume value creation is explicitly modeled in
VPlan (05e1b5340b0caf1, 07b330132c0b) it doesn't depend on the first
order recurrence splice being preserved and it can now be marked as not
having side-effects. This allows removal of first-order-recurrence-splce
if the FOR is only used in the exit or as scalar ph resume value.
2024-06-08 21:40:30 +01:00