91181 Commits

Author SHA1 Message Date
Simon Pilgrim
9db1eb13b6 [Thumb2] Regenerate thumb2-teq2 tests 2022-04-04 12:48:20 +01:00
David Green
2abaa027d9 [AArch64] Teach the costmodel about widening muls
A vector mul(sext, sext) or mul(zext, zext) will be code generated as a
single smull or umull instruction. This most notably effects v2i64
multiplies, which are otherwise not legal and need to be expanded.

The oneuse check has also been slightly changed, as it is already
checked from the use of isWideningInstruction in getCastInstrCost.

Differential Revision: https://reviews.llvm.org/D123006
2022-04-04 12:45:04 +01:00
Simon Pilgrim
ec93435ba0 [Thumb2] Regenerate thumb2-teq tests 2022-04-04 12:24:35 +01:00
Simon Pilgrim
d4cdaa24fd [MIPS] Regenerate countleading tests with common check prefixes 2022-04-04 12:19:57 +01:00
David Green
2e2f38a1ac [AArch64] Add widening arithmetic cost tests. NFC 2022-04-04 12:19:45 +01:00
Nikita Popov
3c9f3f76f1 [ConstantFold] Fold zero-index GEPs with opaque pointers
With opaque pointers, we can eliminate zero-index GEPs even if
they have multiple indices, as this no longer impacts the result
type of the GEP.

This optimization is already done for instructions in InstSimplify,
but we were missing the corresponding constant expression handling.

The constexpr transform is a bit more powerful, because it can
produce a vector splat constant and also handles undef values --
it is an extension of an existing single-index transform.
2022-04-04 13:04:27 +02:00
Nikita Popov
d092df42f3 [InstSimplify] Add tests for zero-offset opaque ptr constexpr GEP (NFC) 2022-04-04 13:04:26 +02:00
Simon Pilgrim
ad59bd0be9 [X86] Regenerate peep tests checks 2022-04-04 12:02:33 +01:00
Muhammad Omair Javaid
a96638e50e Revert "[NFCI] Regenerate PhaseOrdering test checks"
This reverts commit e91fe08999d5f5d7e7777837c529bac692d06c1b.

Breaks following buildbots: https://lab.llvm.org/buildbot/#/builders/171
2022-04-04 15:30:57 +05:00
Jeremy Morse
059d1f84d2 [DebugInfo] Correctly recognize bitfields when emitting dwarf
Use the "isBitfield" flag for debug types to determine whether something is
a bitfield, rather than trying to guess from it's layout. Fixes
https://bugs.llvm.org/show_bug.cgi?id=44601

Patch by: mahkoh

Differential Revision: https://reviews.llvm.org/D96334
2022-04-04 11:14:13 +01:00
Simon Pilgrim
623d4b5787 [X86] Support optional NOT stages in the AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) fold
Extension to D122891, peek through NOT() ops, adjusting the condcode as we go.
2022-04-04 10:51:26 +01:00
Simon Pilgrim
842175676c [X86] Add additional test cases for NOT(AND(SRL(X,Y),1))/AND(SRL(NOT(X(,Y),1) -> SETCC(BT(X,Y))
As suggested in post review on D122891
2022-04-04 10:29:33 +01:00
Martin Sebor
5ccfd5f6d4 [SimplifyLibCalls] Optimize memchr() with known char+str and unknown length
If both the character and string are known, but the length
potentially isn't, we can optimize the memchr() call to a select
of either the known position of the character or null.

Split off from https://reviews.llvm.org/D122836.
2022-04-04 11:01:33 +02:00
Martin Sebor
d18991debf [SimplifyLibCalls] Fold memchr() with size 1
If the memchr() size is 1, then we can convert the call into a
single-byte comparison. This works even if both the string and the
character are unknown.

Split off from https://reviews.llvm.org/D122836.
2022-04-04 10:41:20 +02:00
Martin Sebor
0f08875744 [InstCombine] Add additional memchr test (NFC)
And fix some test names / comments.
2022-04-04 10:41:20 +02:00
Nikita Popov
a5c3b5748c [MemCpyOpt] Work around PR54682
As discussed on https://github.com/llvm/llvm-project/issues/54682,
MemorySSA currently has a bug when computing the clobber of calls
that access loop-varying locations. I think a "proper" fix for this
on the MemorySSA side might be non-trivial, but we can easily work
around this in MemCpyOpt:

Currently, MemCpyOpt uses a location-less getClobberingMemoryAccess()
call to find a clobber on either the src or dest location, and then
refines it for the src and dest clobber. This was intended as an
optimization, as the location-less API is cached, while the
location-affected APIs are not.

However, I don't think this really makes a difference in practice,
because I don't think anything will use the cached clobbers on
those calls later anyway. On CTMark, this patch seems to be very
mildly positive actually.

So I think this is a reasonable way to avoid the problem for now,
though MemorySSA should also get a fix.

Differential Revision: https://reviews.llvm.org/D122911
2022-04-04 10:19:51 +02:00
Nikita Popov
c0cc98251a [Float2Int] Make sure dependent ranges are calculated first (PR54669)
The range calculation in walkForwards() assumes that the ranges of
the operands have already been calculated. With the used visit
order, this is not necessarily the case when there are multiple
roots. (There is nothing guaranteeing that instructions are visited
in topological order.)

Fix this by queuing instructions for reprocessing if the operand
ranges haven't been calculated yet.

Fixes https://github.com/llvm/llvm-project/issues/54669.

Differential Revision: https://reviews.llvm.org/D122817
2022-04-04 10:18:39 +02:00
Min-Yih Hsu
fccdc5618d [M68k] Adopt VarLenCodeEmitter for shift / rotate instructions
This patch is covered by existing MC tests.
2022-04-03 22:52:32 -07:00
Min-Yih Hsu
22201f499d [M68k][test] Remove redundant CHECK-LABEL directive
The associated test had a redundant CHECK-LABEL directive that might fail
the test since the inception, but this issue was "burried" by a missing
colon, which was addressed in fb65aaf0be09936e657d339f3dc8e62666a41956.
Thus, the test finally failed after the said commit.

This patch remove that CHECK-LABEL directive.
2022-04-03 22:51:03 -07:00
Augie Fackler
e90bce8f91 CallBase: fix getFnAttr so it also checks the function
Prior to this change, CallBase::hasFnAttr checked the called function to
see if it had an attribute if it wasn't set on the CallBase, but
getFnAttr didn't do the same delegation, which led to very confusing
behavior. This patch fixes the issue by making CallBase::getFnAttr also
check the function under the same circumstances.

Test changes look (to me) like they're cleaning up redundant attributes
which no longer get specified both on the callee and call. We also clean
up the one ad-hoc implementation of this getter over in InlineCost.cpp.

Differential Revision: https://reviews.llvm.org/D122821
2022-04-03 23:19:23 -04:00
Philip Reames
88de27e3fd [LV] Handle non-integral types when considering interleave widening legality
In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible.

This fixes pr54634.
2022-04-03 20:16:20 -07:00
Dávid Bolvanský
872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb57305aa9ff3e9e60687b458e43835c9.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský
14e3450fb5 [NFCI] Regenerate SROA test checks 2022-04-04 00:55:54 +02:00
Dávid Bolvanský
e91fe08999 [NFCI] Regenerate PhaseOrdering test checks 2022-04-04 00:28:57 +02:00
Dávid Bolvanský
260679b000 [NFCI] Regenerate LoopIdiomRecognize test checks 2022-04-04 00:21:26 +02:00
Dávid Bolvanský
a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
Dávid Bolvanský
11b41910dd [NFCI] Regenerate instsimplify test checks 2022-04-03 20:55:15 +02:00
Hirochika Matsumoto
f138a9964b Reapply "[InstSimplify][NFC] Add baseline tests for folds of icmp with ctpop"
This change was previously reverted because I forgot rerunning
update_test_checks.py and tests were not actually baseline.

Extracted from: https://reviews.llvm.org/D122757
2022-04-03 22:07:04 +09:00
Dávid Bolvanský
fb65aaf0be [NFCI] Fixed missing colon in CHECK directives - part 2 2022-04-03 14:42:59 +02:00
Dávid Bolvanský
f02a0a69af [NFCI] Fixed missing colon in CHECK directives 2022-04-03 11:52:38 +02:00
Simon Pilgrim
fbfd78f7aa [X86] lowerShuffleAsRepeatedMaskAndLanePermute - allow v16i32 sub-lane permutes for v64i8 shuffles
Without VBMI, we are better off permuting v16i32 sub-lanes, even though its a variable shuffle, if it allows us to then shuffle v64i8 inlane repeated masks (PSHUFB etc.)

Fixes #54658
2022-04-03 10:05:10 +01:00
Alexander Shaposhnikov
6cf10b7e6e [InstCombine] Fold srem(X, PowerOf2) == C into (X & Mask) == C for positive C
This diff extends InstCombinerImpl::foldICmpSRemConstant to handle the cases
srem(X, PowerOf2) == C and
srem(X, PowerOf2) != C
for positive C.
This addresses the issue https://github.com/llvm/llvm-project/issues/54650

Differential revision: https://reviews.llvm.org/D122942

Test plan: make check-all
2022-04-03 03:57:05 +00:00
Alexander Shaposhnikov
911cfcd7f5 [InstCombine][NFC] Add baseline tests for folds of srem(X, PowerOf2) == C
Extracted from: https://reviews.llvm.org/D122942

Test plan: make check-all
2022-04-03 03:26:47 +00:00
Sanjay Patel
5f8c2b884d [InstCombine] limit icmp fold with sub if other sub user is a phi
This is a hacky fix for:
https://github.com/llvm/llvm-project/issues/54558

As discussed there, codegen regressed when we opened up this transform
to allow extra uses ( 61580d0949fd3465 ), and it's not clear how to
undo the transforms at the later stage of compilation.

As noted in the code comments, there's a set of remaining folds that
are still limited to one-use, so we can try harder to refine and
expand the limitations on these folds, but it's likely to be an
up-and-down battle as we find and overcome similar regressions.

Differential Revision: https://reviews.llvm.org/D122909
2022-04-02 19:23:42 -04:00
Sanjay Patel
97ac0cd6c4 [InstCombine] fold fcmp with lossy casted constant (2nd try)
This is a retry of 9397bdc67eb2 - that was reverted until
we had a clang warning in place to alert users about a
possible mistake in source. The warning was added with
ab982eace6e4.

This is noted as a missing clang warning in #54222,
but it is also a missing optimization opportunity.

Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDq
https://alive2.llvm.org/ce/z/pE6LRt

I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
2022-04-02 19:23:01 -04:00
Roman Lebedev
308ca349cb
[InstCombine] Fold (X | C2) ^ C1 --> (X & ~C2) ^ (C1^C2)
These two are equivalent,
and i *think* the `and` form is more-ish canonical.

General proof: https://alive2.llvm.org/ce/z/RrF5s6

If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2

However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.

alive-tv is happy about the entirety of `xor-of-or.ll`.
2022-04-03 00:12:56 +03:00
Roman Lebedev
3ae08dac8f
[NFC][InstCombine] Autogenerate check lines in a test affected by the future change 2022-04-03 00:12:56 +03:00
Roman Lebedev
b3fca02a6d
[NFC][InstCombine] Add some tests for (X | C2) ^ C1 pattern 2022-04-03 00:12:48 +03:00
Florian Hahn
95b2aa511e
[VPlan] Set VPlan header block name to vector.body.
This brings the VPlan block naming in line with the naming of the
generated basic blocks.
2022-04-02 19:34:32 +01:00
Hirochika Matsumoto
f65c78a094 Revert "[InstSimplify][NFC] Add baseline tests for folds of icmp with ctpop"
This reverts commit b48abeea44ac3c7860b13b863210116e8db1d978.

Accidentally added already optimized tests, not baseline tests.
2022-04-03 02:27:59 +09:00
Hirochika Matsumoto
b48abeea44 [InstSimplify][NFC] Add baseline tests for folds of icmp with ctpop
Extracted from: https://reviews.llvm.org/D122757
2022-04-03 02:19:24 +09:00
wanglei
cd85ea9431 [LoongArch] Fix instruction definition
This patch fixes issue with the LU32I_D instruction, which did not have
an input register operand.

Differential Revision: https://reviews.llvm.org/D122970
2022-04-02 18:08:29 +08:00
Jake Egan
3db9fd51b5 [AIX] XFAIL tests because of no big archive writer operation support
Big archive writer operation is not currently supported so mark these tests XFAIL for now.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D122949
2022-04-01 22:40:22 -04:00
Craig Topper
d970e96c53 [RISCV] Add lowering for vp.fptoui and vp.uitofp.
This is a straightforward extension of D122512 to unsigned integers.
2022-04-01 18:28:46 -07:00
Craig Topper
fa630e7594 [RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1).
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.

Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.

This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122933
2022-04-01 13:14:10 -07:00
Craig Topper
31b8a1dc46 [RISCV] Add tests for uaddo with a constant 1. NFC
The overflow calculation can be optimized to check if the add
result is 0.
2022-04-01 12:29:08 -07:00
zhijian
b36be2f77f Addressed post-commit comment https://reviews.llvm.org/D122746#inline-1175831 2022-04-01 14:10:22 -04:00
Sanjay Patel
ec0b332cd8 [AArch64] add tests for funnel+or == 0; NFC
These are copied from x86 ( 1074bdfb52b2e1753e51472 ) to
provide more coverage for a potential generic combine.
2022-04-01 13:39:25 -04:00
Sanjay Patel
2c6f78dc2c [InstCombine] add tests for icmp with sub with multiple uses; NFC
Issue #54558
2022-04-01 13:39:24 -04:00
Simon Pilgrim
c64f37f818 [X86] matchAddressRecursively - add XOR(X, MIN_SIGNED_VALUE) handling
Allows us to fold XOR(X, MIN_SIGNED_VALUE) == ADD(X, MIN_SIGNED_VALUE) into LEA patterns

As mentioned on PR52267.

Differential Revision: https://reviews.llvm.org/D122815
2022-04-01 17:26:29 +01:00