23401 Commits

Author SHA1 Message Date
Sanjay Patel
7d3a37a4b4 [InstCombine] add tests for demanded bits of sub; NFC 2022-10-26 17:23:33 -04:00
Sanjay Patel
1bd856fbe5 [InstCombine] add tests for demanded bits of sub; NFC 2022-10-26 14:04:46 -04:00
Johannes Rudolf Doerfert
41a278f56a [OpenMP][FIX] Do not add custom state machine eagerly in LTO runs
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.

Differential Revision: https://reviews.llvm.org/D136740
2022-10-26 10:40:11 -07:00
Alex Brachet
443e2a10f6 Reland "[PGO] Make emitted symbols hidden"
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562

Then reverted again because it broke tests on MacOS, they should be
fixed now.

Bug: https://github.com/llvm/llvm-project/issues/58265

Differential Revision: https://reviews.llvm.org/D135340
2022-10-26 17:13:05 +00:00
Momchil Velikov
5ea8951b88 [FuncSpec] Add a testcase for the treatment of constant and unused arguments
Increase test coverage - check that functions are not specialised on
constant or unused arguments.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D136184
2022-10-26 17:25:18 +01:00
Momchil Velikov
9901583968 Revert "[FuncSpec] Fix specialisation based on literals"
This reverts commit a8b0f580170089fcd555ade5565ceff0ec60f609 because
of "reverse-iteration" buildbot failure.
2022-10-26 13:54:12 +01:00
Momchil Velikov
606d25e545 [FuncSpec] Compute specialisation gain even when forcing specialisation
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g.  for a function

    void f(int x, int y)

the call like `f(1, 2)` could be matched by either

    void f.1(int x /* int y == 2 */);

or

    void f.2(/* int x == 1, int y == 2 */);

The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.

This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D136180
2022-10-26 10:08:03 +01:00
Momchil Velikov
a8b0f58017 [FuncSpec] Fix specialisation based on literals
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` .  There are a few
issues with the implementation, though:

* even with the default, the pass will still specialise based on
   floating-point literals

* even when it's enabled, the pass will specialise only for the `i1`
    type (or `i2` if all of the possible 4 values occur, or `i3` if all
    of the possible 8 values occur, etc)

The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.

This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:

* unknown or undef
* a constant
* a constant range with a single element

on the basis that specialisation is pointless for those cases.

Is also changes the criteria for picking up an actual argument to
specialise if the argument is:

* a LLVM IR constant
* has `constant` lattice value
 has `constantrange` lattice value with a single element.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135893
2022-10-26 09:55:33 +01:00
Matt Arsenault
8acddef90d SimplifyLibCalls: Add missing testcase for sincospi
Part of issue 58604. Test should have been part of
50fe87a5c8597eb72e6055356fa7dad364756ff7
2022-10-25 17:06:08 -07:00
Momchil Velikov
1a525dec7f [FuncSpec] Fix missed opportunities for function specialisation
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.

With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135867
2022-10-25 23:19:48 +01:00
Philip Reames
269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Alina Sbirlea
d1b19da854 [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits
Add a flag to allow disabling the changes in
https://reviews.llvm.org/D134803.

Differential Revision: https://reviews.llvm.org/D136643
2022-10-25 12:19:14 -07:00
Momchil Velikov
c47739b45c [FuncSpec] Consider small noinline functions for specialisation
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135862
2022-10-25 19:49:04 +01:00
Yaxun (Sam) Liu
9d5adc7e49 Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost"
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.

Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.

Will open RFC for further discussion.

Differential Revision: https://reviews.llvm.org/D132408
2022-10-25 12:15:39 -04:00
zhongyunde
620cff096a [InstCombine] Fold series of instructions into mull for more types
Relax the constraint of wide/vectors types.
Address the comment https://reviews.llvm.org/D136015?id=469189#inline-1314520

Reviewed By: spatel, chfast
Differential Revision: https://reviews.llvm.org/D136661
2022-10-25 23:04:46 +08:00
Nico Weber
76745d2b58 Revert "[PGO] Make emitted symbols hidden"
This reverts commit 04877284b4592e9286cab43467662c1b4ff81861.
Looks like this is still breaking the test
Profile-x86_64 :: instrprof-darwin-dead-strip.c
(see comment on https://reviews.llvm.org/D135340).
2022-10-25 08:54:47 -04:00
Sanjay Patel
5dcfc32822 [InstCombine] allow more commutative matches for logical-and to select fold
This is a sibling transform to the fold just above it. That was changed
to allow the corresponding commuted patterns with:
307307456277
e1bd759ea567
8628e6df7000
2022-10-24 16:40:43 -04:00
Philip Reames
c782a93773 [Instcombine] Add coverage for demanded bits of insertelement 2022-10-24 13:17:33 -07:00
Yaxun (Sam) Liu
bd7949bcd8 reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost
Fixed compile time increase due to always constructing LocalCostTracker.
Now only construct LocalCostTracker when needed.
2022-10-24 15:43:53 -04:00
Craig Topper
1edc51b56a [InstCombine] Explicitly check for scalable TypeSize.
Instead of assuming it is a fixed size.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136517
2022-10-24 12:29:06 -07:00
Alex Brachet
04877284b4 [PGO] Make emitted symbols hidden
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562

Bug: https://github.com/llvm/llvm-project/issues/58265

Differential Revision: https://reviews.llvm.org/D135340
2022-10-24 19:05:10 +00:00
David Green
093b4011e8 [ARM] Add a test demonstrating reductions with reused extend. NFC
D136227 showed that tests for this case in getReductionPatternCost were
missing.
2022-10-24 19:38:19 +01:00
zhongyunde
81713e893a [InstCombine] Fold series of instructions into mull
The following sequence should be folded into in0 * in1
      In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32;
      In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32;
      m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo;
      addc = m01 + m10;
      ResLo = m00 + (addc >> 32);

Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D136015
2022-10-25 01:09:37 +08:00
Kevin P. Neal
cfb88ee3ba [StrictFP][IPSCCP] Constant fold intrinsics with metadata arguments
This teaches the SCCP Solver how to constant fold more intrinsics. Constant
folding appears to be just as good as D115737 but much, much lower in code
change impact as suggested by nikic.

The constrained floating-point intrinsics all take at least one metadata
argument and were the motivation for the change.

Differential Revision: https://reviews.llvm.org/D136466
2022-10-24 11:43:20 -04:00
Ahmed Bougacha
bddd9b6b91 [InstCombine] Combine ptrauth sign/resign + auth/resign intrinsics.
(sign|resign) + (auth|resign) can be folded by omitting the middle
sign+auth component if the key and discriminator match.

Differential Revision: https://reviews.llvm.org/D132383
2022-10-24 08:03:14 -07:00
Sanjay Patel
56c6b612ae [InstCombine] vary commuted patterns for mul fold; NFC
Try to get better coverage for the pattern-matching
possibilities in D136015.
2022-10-24 09:14:46 -04:00
Sanjay Patel
41c42f5b18 [InstCombine] adjust mul tests to avoid reliance on other folds; NFC
This gets the tests closer to the form that we are
trying to test in D136015. Note that the IR has
changed, but the check lines have not changed.

This also shows that the desired commuted pattern
coverage is not as expected.
2022-10-24 09:14:46 -04:00
Benjamin Maxwell
fc28971fb9 Add nocapture to pointer parameters of masked stores/loads
The lack of this attribute (particularly on the load intrinsics)
prevented InstCombine from optimizing away allocas and memcpys
for arrays that could be read directly from rodata.

This now also includes a new test to check the masked load/store
intrinsics have the expected attributes (specifically nocapture).

Differential Revision: https://reviews.llvm.org/D135656
2022-10-24 11:15:55 +00:00
Arthur Eubanks
c8ca8fb978 [test] Use new pass manager syntax in some tests
To prepare for upcoming change.
2022-10-23 13:44:20 -07:00
skc7
e98501e27e [SLP][NFC] Added test to check resulting mask in shufflevector as per order of phinodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136553
2022-10-23 17:20:47 +00:00
zhongyunde
770d5e89ba [tests] precommit tests for D136015
Address the commit https://reviews.llvm.org/D136015#inline-1313479

Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D136340
2022-10-23 21:36:43 +08:00
Mike Hommey
86e57e66da [InstCombine] Bail out of casting calls when a conversion from/to byval is involved.
Fixes #58307

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135738
2022-10-23 09:49:48 +02:00
Sanjay Patel
8628e6df70 [InstCombine] use freeze to enable poison-safe logic->select fold
Without a freeze, this transform can leak poison to the output:
https://alive2.llvm.org/ce/z/GJuF9i

This makes the transform as uniform as possible, and it can help
reduce patterns like issue #58313 (although that particular
example probably still needs another transform).

Differential Revision: https://reviews.llvm.org/D136527
2022-10-22 10:42:14 -04:00
Sanjay Patel
db40f9b774 [InstCombine] add test for logical-ands to select; NFC 2022-10-22 10:42:14 -04:00
Paweł Bylica
119c34e7f9
[InstCombine][test] Add tests for mul combinations
Tests taken from https://reviews.llvm.org/D56214 and ported to
InstCombine for https://reviews.llvm.org/D136015.
2022-10-22 16:25:50 +02:00
Sanjay Patel
e1bd759ea5 [InstCombine] allow more matches for logical-ands --> select
This allows patterns with real 'and' instructions because
those are safe to transform:
https://alive2.llvm.org/ce/z/7-U_Ak
2022-10-22 08:15:50 -04:00
Florian Hahn
0b574650d6
[ConstraintElim] Add additional GEP subtraction tests. 2022-10-21 18:58:15 +01:00
Sanjay Patel
3073074562 [InstCombine] allow more commutative matches for logical-and to select fold
When the common value is part of either select condition,
this is safe to reduce. Otherwise, it is not poison-safe
(with the select form of the pattern):
https://alive2.llvm.org/ce/z/FxQTzB

This is another patch motivated by issue #58313.
2022-10-21 13:29:13 -04:00
Sanjay Patel
d7fecf26f4 [InstCombine] allow some commutative matches for logical-and to select fold
This is obviously correct for real logic instructions,
and it also works for the poison-safe variants that use
selects:
https://alive2.llvm.org/ce/z/wyHiwX

This is motivated by the lack of 'xor' folding seen in issue #58313.
This more general fold should help reduce some of those patterns,
but I'm not sure if this specific case does anything for that
particular example.
2022-10-21 11:28:38 -04:00
Bjorn Pettersson
211cf8a384 [test] Use -passes in more Transforms tests
Another step towards getting rid of dependencies to the legacy
pass manager.

Primary change here is to just do -passes=foo instead of -foo in
simple situations (when running a single transform pass). But also
updated a few test running multiple passes.

Also removed some "duplicated" RUN lines in a few tests that where
using both -foo and -passes=foo syntax. No need to do the same kind
of testing twice.
2022-10-21 17:02:02 +02:00
Philip Reames
656e53e544 [instcombine] Add basic test coverage for demanded bits of scalable vectors 2022-10-21 07:59:04 -07:00
Sanjay Patel
bf75e937bb [InstCombine] match logical and/or more generally in fold to select
This allows the regular bitwise logic opcodes in addition to the
poison-safe select variants:
https://alive2.llvm.org/ce/z/8xB9gy

Handling commuted variants safely is likely trickier, so that's
left to another patch.
2022-10-21 09:03:36 -04:00
Sanjay Patel
5a53fe846b [InstCombine] add tests for logical selects; NFC 2022-10-21 09:03:35 -04:00
Florian Hahn
fd236772f5
[IndVars] Forget SCEV for value after simplifying condition.
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a condition in IndVars
and moving the user in LoopDeletion. Fix it by invalidating ICmp and all
its users.

Fixes #58515.
2022-10-21 11:18:01 +01:00
Florian Hahn
7eb4ec1c75
[VPlan] Print predicates for widened cmp instructions (NFC). 2022-10-21 08:54:11 +01:00
William Huang
6c767cef5a [InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back
Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM

For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D125845
2022-10-20 17:41:26 +00:00
Florian Hahn
e25ed058bc
[LV] Use buildScalarSteps to also handle VF = 1. (NFCI)
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.

Suggested by @Ayal in D133760.
2022-10-20 14:30:01 +01:00
Nabeel Omer
e1fd6d49a3 [InstCombine] Fix assert condition in foldSelectShuffleOfSelectShuffle
Bug introduced in e239198cdbbf.

The assert() is making an assumption that the resulting shuffle mask
will always select elements from both vectors, this is untrue in the
case of two shuffles being folded if the former shuffle has a mask with
undef elements in it. In such a case folding the shuffles might result
in a mask which only selects from one of the vectors because the other
elements (in the mask) are undef.

Differential Revision: https://reviews.llvm.org/D136256
2022-10-20 12:10:54 +00:00
Florian Hahn
3a4aa24fd1
[LoopSimplifyCFG] Forget loop and block dispos after merging blocks.
This fixes another case where block and loop dispositions weren't
properly invalidate after changing the CFG.

Fixes #58489.
2022-10-20 11:23:29 +01:00
Nikita Popov
f2fe289374 [FunctionAttrs] Volatile operations can access inaccessible memory
Per LangRef, volatile operations are allowed to access the location
of their pointer argument, plus inaccessible memory:

> Any volatile operation can have side effects, and any volatile
> operation can read and/or modify state which is not accessible
> via a regular load or store in this module.
> [...]
> The allowed side-effects for volatile accesses are limited. If
> a non-volatile store to a given address would be legal, a volatile
> operation may modify the memory at that address. A volatile
> operation may not modify any other memory accessible by the
> module being compiled. A volatile operation may not call any
> code in the current module.

FunctionAttrs currently does not model this and ends up marking
functions with volatile accesses on arguments as argmemonly,
even though they should be inaccessiblemem_or_argmemonly.

Differential Revision: https://reviews.llvm.org/D135863
2022-10-20 11:57:10 +02:00