42810 Commits

Author SHA1 Message Date
Craig Topper
4477500533 [RISCV] ISel (and (shift X, C1), C2)) to shift pair in more cases
Previously, these isel optimizations were disabled if the AND could
be selected as a ANDI instruction. This patch disables the optimizations
only if the immediate is valid for C.ANDI. If we can't use C.ANDI,
we might be able to compress the shift instructions instead.

I'm not checking the C extension since we have relatively poor test
coverage of the C extension. Without C extension the code size
should be equal. My only concern would be if the shift+andi had
better latency/throughput on a particular CPU.

I did have to add a peephole to match SRLIW if the input is zexti32
to prevent a regression in rv64zbp.ll.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122701
2022-03-30 11:46:42 -07:00
Craig Topper
7417eb29ce [RISCV] Use getSplatBuildVector instead of getSplatVector for fixed vectors.
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.

Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122185
2022-03-30 11:36:34 -07:00
Fangrui Song
e78cea0a91 [X86][test] Precommit D122541 tests for prologue/epilogue CFI
Currently there is no CFI_INSTRUCTION MIR test with .ll input. This patch
adds some -stop-after=prologepilog tests.
2022-03-30 09:02:23 -07:00
Sanjay Patel
436b875e49 [SDAG] avoid libcalls to fmin/fmax for soft-float targets
This is an extension of D70965 to avoid creating a mathlib
call where it did not exist in the original source. Also see
D70852 for discussion about an alternative proposal that was
abandoned.

In the motivating bug report:
https://github.com/llvm/llvm-project/issues/54554
...we also have a more general issue about handling "no-builtin" options.

Differential Revision: https://reviews.llvm.org/D122610
2022-03-30 11:22:03 -04:00
Sanjay Patel
e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Sanjay Patel
849d577e56 [x86] add tests for fcmp with 0.0 operand; NFC 2022-03-30 08:37:15 -04:00
Sanjay Patel
5b4bbaa8d8 [SystemZ] generate full checks for tests; NFC
These may change if we transform the fcmp (setcc) to avoid a constant operand.
2022-03-30 08:37:15 -04:00
Simon Pilgrim
14a89d00c7 [X86] Extend xor-lea test coverage
Add XOR(ADD/SUB(X,Y),MIN_SIGNED_VALUE) tests and adjust some XOR(SHL(X,C),MIN_SIGNED_VALUE) shifts to better match LEA scales
2022-03-30 13:34:32 +01:00
Fraser Cormack
43a91a8474 [SelectionDAG] Don't create illegally-typed nodes while constant folding
This patch fixes a (seemingly very rare) crash during vector constant
folding introduced in D113300.

Normally, during legalization, if we create an illegally-typed node during
a failed attempt at constant folding it's cleaned up before being
visited, due to it having no uses.

If, however, an illegally-typed node is created during one round of
legalization and isn't cleaned up, it's possible for a second round of
legalization to create new illegally-typed nodes which add extra uses to
the old illegal nodes. This means that we can end up visiting the old
nodes before they're known to be dead, at which point we crash.

I'm not happy about this fix. Creating illegal types at all seems like a
bad idea, but we all-too-often rely on illegal constants being
successfully folded and being fixed up afterwards. However, we can't
rely on constant folding actually happening, and we don't have a
foolproof way of peering into the future.

Perhaps the correct fix is to revisit the node-iteration order during
legalization, ensuring we visit all uses of nodes before the nodes
themselves. Or alternatively we could try and clean up dead nodes
immediately after failing constant folding.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122382
2022-03-30 13:17:55 +01:00
Simon Pilgrim
e000dbc39f [X86] Add test coverage based off Issue #51609 2022-03-30 12:57:22 +01:00
Luo, Yuanke
7471d8b13c [X86][AMX] Pre-checkin the test case for AMX undef and zero 2022-03-30 17:53:01 +08:00
Luo, Yuanke
1141c8b6fc [X86][AMX] Fix bug for amx cast tranform
After combining amx cast operation, some amx cast intrinsic may be dead
code. This patch is to delete such dead code and avoid crash.
2022-03-30 17:22:30 +08:00
Simon Pilgrim
bce954321a [X86] Add PR47857 test case 2022-03-30 09:51:36 +01:00
Liqin Weng
4cb85da811 [RISCV] Add CMIX isel pattern for (xor (and (xor rs1, rs3), rs2), rs3)
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122702
2022-03-30 16:51:09 +08:00
Simon Pilgrim
6697e3354f [X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry)
If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds.

We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet.

As suggested by @davezarzycki on Issue #35256

Differential Revision: https://reviews.llvm.org/D122482
2022-03-30 09:11:55 +01:00
Nikita Popov
8a72391f60 [IR] Require intrinsic struct return type to be anonymous
This is an alternative to D122376. Rather than working around the
problem, this patch requires that struct return types in intrinsics
are anonymous/literal and adds auto-upgrade code to convert
existing uses of intrinsics with named struct types.

This ensures that the mapping between intrinsic name and
intrinsic function type is actually bijective, as it is supposed
to be.

This also fixes https://github.com/llvm/llvm-project/issues/37891.

Differential Revision: https://reviews.llvm.org/D122471
2022-03-30 09:51:24 +02:00
Liqin Weng
7f81765898 [RISCV][NFC] Add immediate tests for the icmp instruction
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122651
2022-03-30 02:51:26 +00:00
Chenbing Zheng
780eb9f586 [DAGCombine] add tests for bitreverse-shift optimization
This patch add some tests to show some optimization opportunities
for bitreverse-shift.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121507
2022-03-30 09:50:28 +08:00
Zakk Chen
b578330754 [RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could
check maskedoff value to decide mask policy rather than have a addtional
policy operand.

Reviewed By: craig.topper, arcbbb

Differential Revision: https://reviews.llvm.org/D122456
2022-03-29 18:05:33 -07:00
Zakk Chen
10b2760da0 Revert "[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR"
This reverts commit 10fd2822b77e12215b4ea82fc6d0a052961eb9d9.

I have a better implementation for those operations without the
additional policy operand.
masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could
assume undef maskedoff is mask agnostic.

Differential Revision: https://reviews.llvm.org/D122455
2022-03-29 18:05:33 -07:00
Yonghong Song
5898979387 BPF: support inlining __builtin_memcmp intrinsic call
Delyan Kratunov reported an issue where __builtin_memcmp is
not inlined into simple load/compare instructions.
This is a known issue. In the current state, __builtin_memcmp
will be converted to memcmp call which won't work for
bpf programs.

This patch added support for expanding __builtin_memcmp with
actual loads and compares up to currently maximum 128 total loads.
The implementation is identical to PowerPC.

Differential Revision: https://reviews.llvm.org/D122676
2022-03-29 15:03:26 -07:00
Eli Friedman
a8ebd85e46 [MC] Make MCAsmInfo::isAcceptableChar reflect MCAsmInfo::doesAllowAtInName
On targets which don't allow "@" in unquoted identifiers, make sure we
don't emit them; otherwise, we can't parse our own output.

Differential Revision: https://reviews.llvm.org/D122516
2022-03-29 14:01:32 -07:00
Sanjay Patel
7f5c2f6a76 [x86] consolidate tests and auto-gen complete check lines; NFC
The same test was duplicated in 2 files.
2022-03-29 14:55:04 -04:00
Stanislav Mekhanoshin
f311f934e1 [AMDGPU] gfx940 VALU hazard recognizer
Differntial Revision: https://reviews.llvm.org/D122339
2022-03-29 10:57:54 -07:00
Simon Pilgrim
d32d65b903 [X86] Regenerate x86-interleaved-access.ll with AVX1OR2 common check-prefix to reduce duplication 2022-03-29 12:24:46 +01:00
Jay Foad
2b754384ad [AMDGPU] Generate checks in atomic_optimizations_*.ll
This had already been done for some of these files but not all.
2022-03-29 11:05:23 +01:00
David Green
60f57b3658 [AArch64] Ensure fixed point fptoi_sat has correct saturation width
D113200 introduced an error where it was converting FP_TO_SI_SAT with
multiply to a fixed point floating point convert. The saturation
bitwidth needs to be equal to the floating point width, or else the
routine would truncate the result as opposed to saturating it.

Fixes #54601
2022-03-29 10:12:44 +01:00
Chenbing Zheng
6a01b676cf [DAGCombine] add tests for bswap-shift optimization
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D121504
2022-03-29 16:34:52 +08:00
Zi Xuan Wu
0365c54ca3 [CSKY] Add CSKYTargetObjectFile to support exception handling
Initialize TargetLoweringObjectFileELF and EH header.
2022-03-29 16:05:30 +08:00
Zi Xuan Wu
27c18558e6 [CSKY] Add missing codegen pattern for 16-bit instruction
In generic cpu model, there are only low 16 registers and little 32-bit instruction. CK801 is the cpu
family with least basic features like generic model.

Add test run and check for generic cpu model in original test case to cover basic LLVM IR functionality.
2022-03-29 16:05:30 +08:00
Lian Wang
2c503dcb4f [RISCV][NFC] Remove redundant check and rename functions in some IR tests
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122204
2022-03-29 07:29:58 +00:00
Liqin Weng
d660c0d793 [RISCV] Optimize LI+SLT to SLTI+XORI for immediates in specific range
This transform will reduce one GPR.

Reviewed By: craig.topper, benshi001

Differential Revision: https://reviews.llvm.org/D122051
2022-03-29 14:46:49 +08:00
zhongyunde
2b3becb41d [AArch64][GlobalISel] Add new MOVI pattern for fp constants
GlobalISel is used in option -O0, so add MOVI pattern for it,
which is done similar in gcc.(https://godbolt.org/z/8j6fzG3h6)

Fix https://github.com/llvm/llvm-project/issues/53651

Reviewed By: dmgreen, paquette

Differential Revision: https://reviews.llvm.org/D122559
2022-03-29 10:57:22 +08:00
Craig Topper
01203918d1 [RISCV] Add computeKnownBits support for RISCVISD::GORC.
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D121575
2022-03-28 16:56:33 -07:00
Craig Topper
e68257fcee [RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h

This is an alternative to D122454.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122458
2022-03-28 12:46:36 -07:00
Sanjay Patel
382de90896 [RISCV] add tests for minnum/maxnum; NFC
Issue #54554
2022-03-28 15:40:23 -04:00
Craig Topper
cfe533da05 [RISCV] Add lowering for vp.fptosi and vp.sitofp.
This as an alternative version of D120641. Starting from the code here
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/raw/EPI/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
but with some modifications to how the interim types are calculated,
and adding support for f16.

Still need to add fptosi for mask vectors.

Lots of masked isel patterns added so we can pass the mask through
the type changes.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D122512
2022-03-28 11:06:41 -07:00
Simon Pilgrim
8a1956dfa5 [X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute
Fixes #54562
2022-03-28 17:21:27 +01:00
Thomas Symalla
3bd15c03c6 [AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation
where v_cmpx instructions are introduced. However, the modifiers are
not correctly inherited from the original operands. The patch
adds the source modifiers, if they are exist, or sets them to 0.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122489
2022-03-28 17:52:53 +02:00
Jyotsna Verma
65a2f6ad9c [Hexagon] Create an intrinsic to profile using a custom handler
The intrinsic is lowered into a hexagon pseudo instruction which
after register allocation is expanded into A2_tfrsi and J2_call.
2022-03-28 10:31:41 -05:00
Daniil Kovalev
a8c277041a [NVPTX] Fix poorly designed assertion introduced in D120129
NVPTXTargetLowering::getFunctionParamOptimizedAlign, which was introduces in
D120129, contained a poorly designed assertion checking that a function with
internal or private linkage is not a kernel. It relied on invariants that
were not actually guaranteed, and that resulted in compiler crash with some
CUDA versions (see discussion with @jdoerfert in D120129). This patch changes
that assertion and makes it use isKernelFunction which is designed exactly for
such checks. This patch also includes a test with IR that caused compiler crash
before.

Differential Revision: https://reviews.llvm.org/D122562
2022-03-28 17:34:58 +03:00
Simon Pilgrim
614363ecc0 [X86] Add shuffle tests from Issue #54562 2022-03-28 13:54:17 +01:00
Carl Ritson
1f52d02ceb [AMDGPU] Split waterfall loop exec manipulation
Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.

VGPR live range optimizer is updated to handle waterfall loops
spanning multiple blocks.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D122200
2022-03-28 17:44:54 +09:00
zhongyunde
c3fe025bd4 [AArch64][SelectionDAG] Refactor to support more scalable vector extending loads
Accord the discussion in D120953, we should firstly exclude all scalable vector
extending loads and then selectively enable those which we directly support.

This patch is intend to refactor for above (truncating stores is not touched),and
more scalable vector types will try to reduce the number of masked loads in favour
of more unpklo/hi instructions.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122281
2022-03-27 21:18:01 +08:00
David Green
693d3b7e76 [AArch64] Lower 3 and 4 sources buildvectors to TBL
The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
2022-03-26 21:10:43 +00:00
zhongyunde
758be63ac6 [test][AArch64] Add a test case for D121180 NFC
Now, perform last active true vector combine only where
we're extracting from a flag-setting operation. But in
fact, the last active extracting will output LASTB + WHILELS,
and the WHILELS itself is a flag-setting operation, so
precommit this case to test the potentially further optimization.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122453
2022-03-26 19:12:16 +08:00
Ben Shi
bce2e208e0 [AVR] Optimize int16 airthmetic right shift for shift amount 7/14/15
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115618
2022-03-26 06:53:27 +00:00
Ben Shi
49b0b5f0fa [AVR][NFC] Fix incorrect register states in expanding pseudo instructions
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D118354
2022-03-25 16:02:15 +00:00
Johannes Doerfert
a81fff8afd Reapply "[Intrinsics] Add nocallback to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and
reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test
changes.
2022-03-25 09:36:50 -05:00
Simon Pilgrim
f84b5c11dd [X86] Add test showing failure to fold multiple constant args in ADC
As noticed on Issue #35256
2022-03-25 13:42:08 +00:00