203 Commits

Author SHA1 Message Date
Craig Topper
f668a08e00
[DAGCombiner][RISCV] Optimize (zext nneg (truncate X)) if X has known sign bits. (#82227)
This treats the zext nneg as sext if X is known to have sufficient sign
bits to allow the zext or truncate or both to removed. This code is
taken from the same optimization for sext.
2024-02-19 10:45:11 -08:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Matthias Braun
e3cf80c5c1
BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3
[CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Simon Pilgrim
4cd1c07491 [DAG] SimplifyDemandedBits - if we're only demanding the msb, a UMIN/UMAX node can be simplified to a AND/OR node respectively.
Alive2: https://alive2.llvm.org/ce/z/qnvmc6
2023-08-18 12:12:22 +01:00
Kazushi (Jam) Marukawa
2e2395651e [VE] Change the way of lowering store
Change lowering store iff the data operand is leagalized.  In this way,
llvm can lower only operands first, then lower store instruction later.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D158253
2023-08-18 17:13:55 +09:00
Kazushi (Jam) Marukawa
922ac64b04 [VE] Avoid vectorizing store/load in scalar mode
Avoid vectorizing store and load instructions in scalar mode.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D158049
2023-08-17 02:15:54 +09:00
Eli Friedman
bc7f11ccb0 [SelectionDAG] Improve expansion of wide min/max
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.

Differential Revision: https://reviews.llvm.org/D151358
2023-06-26 10:45:41 -07:00
Amaury Séchet
61f9cb002d [NFC] Regenerate several VE codegen tests. 2023-06-14 16:20:37 +00:00
Craig Topper
139392c0a5 [LegalizeTypes][ARM][AArch6][RISCV][VE][WebAssembly] Add special case for smin(X, -1) and smax(X, 0) to ExpandIntRes_MINMAX.
We can compute a simpler expression for Lo for these cases. This
is an alternative for the test cases in D151180 that works for
more targets.

This is similar to some of the special cases we have for expanding
setcc operands.

Differential Revision: https://reviews.llvm.org/D151182
2023-05-23 09:19:55 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Craig Topper
a983ef2c17 [DAGCombiner][AArch64][VE] Teach BuildUDIV/SDIV to use 2x mul when mulh/mul_lohi are not available.
Correct the legality of i32 mul_lohi on AArch64.

Previously, AArch64 incorrectly reported i32 mul_lohi as Legal.
This allowed BuildUDIV/SDIV to use them. A later DAGCombiner would
replace them with MULHS/MULHU because only the high half was used.
This conversion does not check the legality of MULHS/MULHU under
the assumption that LegalizeDAG can turn it back into MUL_LOHI later.

After they are converted to MULHS/MULHU, DAGCombine ran and saw that
these operations aren't supported but an i64 MUL is. So they get
converted to that plus a shift. Without this, LegalizeDAG would
convert back MUL_LOHI and isel would fail to find a pattern.

This patch teaches BuildUDIV/SDIV to create the wide mul and shift
so that we can report the correct operation legality on AArch64. It
also enables div by constant folding for more cases on VE.

I don't know if VE wants this div by constant optimization or not. If they
don't want it, they can use the isIntDivCheap hook to disable it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D150333
2023-05-12 09:06:17 -07:00
Matt Arsenault
5b7fa4a48d VE: Register null MCTargetStreamer 2023-04-26 19:27:11 -04:00
Andrew Savonichev
c65b4d64d4 [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2023-02-09 18:45:20 +03:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Nikita Popov
4861a58769 [VE] Convert test to opaque pointers (NFC)
There is a minor codegen regression here (an extra and instruction).
The reason is that CGP only eliminates fallthrough branches if it
has made some other kind of change, and with opaque pointers that
other change does not occur.

Ideally, we should probably always try to eliminate fallthroughs,
but this runs into the problem that performing a dummy fallthrough
is a common pattern in tests for forcing SDAG to select them
separately, so it's not quite that simple.
2022-12-23 12:51:06 +01:00
Nikita Popov
ce5ef7d1d5 [VE] Name instructions in test (NFC) 2022-12-23 11:43:01 +01:00
Nikita Popov
b006b60dc9 [VE] Convert some tests to opaque pointers (NFC) 2022-12-19 13:06:34 +01:00
Ron Lieberman
38f1abef86 Revert "[SelectionDAG] Do not second-guess alignment for alloca"
Breaks amdgpu buildbot https://lab.llvm.org/buildbot/#/builders/193
 23491

This reverts commit ffedf47d8b793e07317f82f9c2a5f5425ebb71ad.
2022-12-15 10:55:18 -06:00
Andrew Savonichev
ffedf47d8b [SelectionDAG] Do not second-guess alignment for alloca
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.

The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.

Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).

Differential Revision: https://reviews.llvm.org/D135462
2022-12-15 18:18:12 +03:00
Sanjay Patel
fe05a0a3dd [SDAG] avoid udiv/urem transform for vector/scalar type mismatches
This solves the crashing from issue #58994.
I don't know anything about VE, so I don't know if the output
is as expected or even correct.
2022-11-15 11:01:18 -05:00
Kazushi (Jam) Marukawa
33dda45dde [VE] Change the way to lower selectcc
Change to use VEISD::CMPI/CMPU/CMPF/CMPQ and VEISD::CMOV in combineSelectCC
for better optimization.  Support VEISD::CMPI/CMPU in combineTRUNCATE also
to optimize truncate.  Remove obsolete lower patterns from VEInstrInfo.td.
Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D136049
2022-10-20 08:08:59 +09:00
Peter Rong
c2e7c9cb33 [CodeGen] Using ZExt for extractelement indices.
In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`.
This is because IRTranslator uses SExt for indices.

In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt.
This change includes both documentation, SelectionDAG and IRTranslator.
We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86

This patch fixes issue #57452.

Differential Revision: https://reviews.llvm.org/D132978
2022-10-15 15:45:35 -07:00
Kazushi (Jam) Marukawa
0278c9ceb6 [VE] Change the way to lower select
Change to use VEISD::CMOV in combineSelect for better optimization.
Support VEISD::CMOV in combineTRUNCATE also to optimize trancate.
Merge functions to handle condition codes to VE.h.  And add basic
CMOV patterns to VEInstrInfo.td.  Update regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D135878
2022-10-15 08:49:36 +09:00
Matt Arsenault
a61c3455c0 AtomicExpand: Use llvm.ptrmask instead of ptrtoint
This removes the ptrtoint from the load's pointer operand, although we
can't entirely eliminate these to get the LSB shift. In a future
patch, this will avoid ptrtoint in the case where the atomic is
overaligned to the word size.
2022-09-28 12:51:30 -04:00
Kazushi (Jam) Marukawa
de8013201f [VE] Change to expand FPOW
VE doesn't have FPOW instruction, so this patch makes llvm expand it.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134695
2022-09-27 20:03:10 +09:00
Kazushi (Jam) Marukawa
1cef30b9d3 [VE] Disable automatic maxnum/minnum selection
Disable FMAX/FMIN selection from select_cc in VEInstrInfo.td because of
the lack of NaN consideration.  This patch removes such selection from
VEInstrInfo.td and lets llvm work on it in combineMinNumMaxNum.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134595
2022-09-26 22:04:02 +09:00
Kazushi (Jam) Marukawa
76c76e9ab4 [VE] Support smax/smin
Support smax/smin in VEInstrInfo.td.  Remove obsolete patterns for
smax/smin.  Add regression tests for smax/smin/umax/umin.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134583
2022-09-26 22:02:57 +09:00
Kazushi (Jam) Marukawa
337e54ec95 [VE] Add maxnum and minnum
Add maxnum and minnum for float and double.  Lowering is already
implemented, so this patch changes them legal and adds regression
tests.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134108
2022-09-21 18:03:49 +09:00
Kazushi (Jam) Marukawa
3ee64ea5cf [VE] Change to expand FMA
VE has fused multiply-add instruction for only vector calculations.  This
patch forces to expand scalar FMA to multiply and add instructions.
This patch also adds regression test.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D134107
2022-09-21 18:02:55 +09:00
Matt Arsenault
230dbe0857 VE: Use generated checks for a copy-pasted output test 2022-09-20 16:51:04 -04:00
Craig Topper
1121eca685 [VP][VE] Default VP_SREM/UREM to Expand and add generic expansion using VP_SDIV/UDIV+VP_MUL+VP_SUB.
I want to default all VP operations to Expand. These 2 were blocking
because VE doesn't support them and the tests were expecting them
to fail a specific way. Using Expand caused them to fail differently.

Seemed better to emulate them using operations that are supported.

@simoll mentioned on Discord that VE has some expansion downstream. Not
sure if its done like this or in the VE target.

Reviewed By: frasercrmck, efocht

Differential Revision: https://reviews.llvm.org/D133514
2022-09-16 13:19:02 -07:00
Craig Topper
38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Alex Richardson
2616e00949 [update_llc_test_checks][VE] Handle .Lfoo$local in function regex
While working on https://reviews.llvm.org/D131429, I got a test diff in
one of the VE tests and running update_llc_test_checks.py deleted all the
code for that function. This updates the regex to handle this new output.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D131431
2022-08-24 14:16:20 +00:00
Kazushi (Jam) Marukawa
b88aba9d7d [VE] Support inlineasm memory operand
Support inline asm memory operand for VE.  Add regression tests also.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D132380
2022-08-23 13:44:03 +09:00
Amaury Séchet
06da353748 [NFC] Automatically generate CodeGen/VE/Scalar/atomic.ll 2022-07-27 23:52:00 +00:00
Kazushi (Jam) Marukawa
469044cfd3 [VE] Support load/store/spill of vector mask registers
Support load/store/spill of vector mask registers and add regression
tests.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D129415
2022-07-19 10:29:21 +09:00
Kazushi (Jam) Marukawa
da5a6b2bf5 [VE] Restructure eliminateFrameIndex
Restructure the current implementation of eliminateFrameIndex function
in order to support more instructions.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D129034
2022-07-05 20:00:19 +09:00
Kazushi (Jam) Marukawa
9ad38e5288 Revert "[VE] Restructure eliminateFrameIndex"
This reverts commit 98e52e8bff525b1fb2b269f74b27f0a984588c9c.
2022-07-05 19:35:12 +09:00
Kazushi (Jam) Marukawa
98e52e8bff [VE] Restructure eliminateFrameIndex
Restructure the current implementation of eliminateFrameIndex function
in order to support more instructions.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D129034
2022-07-05 19:28:11 +09:00
Kazushi (Jam) Marukawa
adbb46ea65 [VE] Support load/store vm regsiters
Support load/store vm registers to memory location as a first step.
As a next step, support load/store vm registers to stack location.
This patch also adds several regression tests for not only load/store
vm registers but also missing load/store for vr registers.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D128610
2022-07-01 08:25:24 +09:00
Daniil Kovalev
62a983ebc5 Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0e17ac41d5430f1290d3661343eee1e.

As discussed in D120714 with @thakis, the patch added unneeded complexity
without noticeable benefits.
2022-04-06 20:32:53 +03:00
Daniil Kovalev
83a798d4b0 [CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.

Differential Revision: https://reviews.llvm.org/D120714
2022-04-06 14:09:32 +03:00
Craig Topper
49c2206b3b [VP] Preserve address space of pointer for strided load/store intrinsics.
This adds LLVMAnyPointerToElt to use instead of LLVMPointerToElt.
This allows us to preserve the address space as part of the type
overload for the intrinsic, but still require the vector element
type to match the pointer type.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D122042
2022-03-22 09:52:54 -07:00
Jake Egan
c7dc9dbaff [VE] Remove output to /dev/stdout
Sending output to /dev/stdout on AIX gets an llc permission denied error, so this patch removes this from the tests.

Reviewed By: simoll, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D121799
2022-03-16 11:42:09 -04:00
Simon Moll
91fad1167a [VE] v512|256 f32|64 fneg isel and tests
fneg instruction isel and tests. We do this also in preparation of fused
negatate-multiple-add fp operations.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121620
2022-03-16 11:31:26 +01:00
Simon Moll
6ac3d8ef9c [VE] strided v256.23 isel and tests
ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121616
2022-03-15 15:29:19 +01:00
Simon Moll
3297571e32 [VE] v256f32|64 fma isel
llvm.fma|fmuladd vp.fma isel and tests

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D121477
2022-03-14 15:59:13 +01:00
Kazushi (Jam) Marukawa
9260592141 [VE] Support more intrinsics
Support new intrinsics for following instrauctions.
  - VLDZ, VPCNT, VBRV
  - LCR, SCR, TSCR, FIDCR
  - FENCE
Also clean the intrinsics implementation of a following instruction.
  - SVOB

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D121509
2022-03-14 19:17:15 +09:00