1866 Commits

Author SHA1 Message Date
Simon Pilgrim
8757ce4901 [PowerPC] Replace PPCISD::VABSD cases with generic ISD::ABDU(X,Y) node
A move towards using the generic ISD::ABDU nodes on more backends

Also support ISD::ABDS for v4i32 types using the existing signbit flip trick

PowerPC has a select(icmp_ugt(x,y),sub(x,y),sub(y,x)) -> abdu(x,y) combine that I intend to move to DAGCombiner in a future patch.

The ABS(SUB(X,Y)) -> PPCISD::VABSD(X,Y,1) v4i32 combine wasn't legal (https://alive2.llvm.org/ce/z/jc2hLU) - so I've removed it, having already added the legal sub nsw tests equivalent.

Differential Revision: https://reviews.llvm.org/D142313
2023-02-25 20:17:17 +00:00
Ting Wang
d567e06946 [PowerPC][NFC] refactor eligible check for tail call optimization
The check logic for TCO is scattered in two functions:
IsEligibleForTailCallOptimization_64SVR4() IsEligibleForTailCallOptimization(),
and serves instruction selection phase only at this moment.

This patch aims to refactor existing logic to export an API for TCO
eligible query before instruction selection phase.

Reviewed By: shchenz, nemanjai

Differential Revision: https://reviews.llvm.org/D141673
2023-02-21 06:14:47 -05:00
esmeyi
fd226142fc [AIX] Lower some memory intrinsics to millicode functions on AIX
Summary: Currently we lower MEMCPY/MEMMOVE/MEMSET/BZERO to the corresponding libc functions. And the libc functions call the millicode functions on AIX. We can lower these intrinsics directly to save one call layer.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D143997
2023-02-20 22:25:49 -05:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Matt Arsenault
09dd4d870e DAG: Remove hasBitPreservingFPLogic
This doesn't make sense as an option. fneg and fabs are bit
preserving by definition. If a target has some fneg or fabs
instruction that are not bitpreserving it's incorrect to lower
fneg/fabs to use it.
2023-02-14 10:25:24 -04:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Philip Reames
3be1ae24fb [CodeGen] Add standard print/debug utilities to MVT
Doing so makes it easier to do printf style debugging in idiomatic manner. I followed the code structure of Value with only the definition of dump being #ifdef out in non-debug builds. Not sure if this is the "right" option; we don't seem to have any single consistent scheme on how dump is handled.

Note: This is a follow up to D143454 which did the same for EVT.

Differential Revision: https://reviews.llvm.org/D143511
2023-02-07 10:50:14 -08:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
Matt Arsenault
778cf5431c IR: Add atomicrmw uinc_wrap and udec_wrap
These are essentially add/sub 1 with a clamping value.

AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
2023-01-24 17:55:11 -04:00
Guillaume Chatelet
8b1d86aedf [NFC] Deprecate SelectionDag::getLoad that takes alignment as
unsigned
2023-01-24 09:42:36 +00:00
Craig Topper
79858d1908 [CodeGen][Target] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC
Use isPhysical/isVirtual methods.
2023-01-13 23:12:48 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Stefan Pintilie
c1d0118459 [PowerPC] Materialize floats in the range [-16.0, 15.0].
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].

For example we will now materialize 3.0 and -5.0 but not 4.7.

Reviewed By: nemanjai, lei, #powerpc

Differential Revision: https://reviews.llvm.org/D138844
2023-01-04 12:52:30 -06:00
Lei Huang
7a7e9109a2 [PowerPC] Implement P10 Byte Reverse Insructions
Generate brh, brw and brd instructions for byte-swap operations
on P10 and generating a single instruction for a 32-bit swap followed
by a 16-bit right shift.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D140414
2022-12-21 09:15:57 -06:00
Qiu Chaofan
a40ef656d8 [Intrinsic] Rename flt.rounds intrinsic to get.rounding
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D139507
2022-12-19 15:22:39 +08:00
Nemanja Ivanovic
cb3f415cd2 [PowerPC] Fix up memory ordering after combining BV to a load
The combiner for BUILD_VECTOR that merges consecutive
loads into a wide load had two issues:

- It didn't check that the input loads all have the
  same input chain
- It didn't update nodes that are chained to the original
  loads to be chained to the new load

This caused issues with bootstrap when
3c4d2a03968ccf5889bacffe02d6fa2443b0260f was committed.
This patch fixes the issue so it can unblock this commit.

Differential revision: https://reviews.llvm.org/D140046
2022-12-16 08:57:36 -06:00
Matt Arsenault
c16a58b36c Attributes: Add function getter to parse integer string attributes
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.

Update a few of the uses, but there are quite a lot of these.
2022-12-14 13:12:35 -05:00
Kazu Hirata
f7dffc28b3 Don't include None.h (NFC)
I've converted all known uses of None to std::nullopt, so we no longer
need to include None.h.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 11:24:26 -08:00
Qiu Chaofan
62f20f51ce [PowerPC] Support test data class intrinsic of 128-bit float
We've exploited test data class instructions introduced in ISA 3.0.
This change unifies the scalar intrinsics into ppc_test_data_class
and add support for 128-bit precision float values using xststdcqp.

Vector versions of the intrinsic can't be unified because they return
vector int instead of int.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138105
2022-12-07 16:44:12 +08:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Krzysztof Parzyszek
864aaa21b4 TargetLowering: convert Optional to std::optional 2022-12-01 16:19:10 -08:00
Maryam Moghadas
7614ba0a5d [PowerPC] Fix vperm codegen
Commit rG934d5fa2b8672695c335deed0e19d0e777c98403 changed the vperm codegen
for cases that vperm is not replaced by xxperm, this patch is to revert that.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D138736
2022-11-29 15:47:32 -06:00
Benjamin Kramer
bfc812a2f3 [PowerPC][NFC] Merge LLVM_DEBUG statements to avoid unused variable warnings 2022-11-23 21:09:33 +01:00
Maryam Moghadas
934d5fa2b8 [PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm
vperm instruction requires the data to be in the Altivec registers, if one of
the vector operands is not used after this vperm instruction then it can be
substituted by xxperm which doubles the number of available registers.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D133700
2022-11-23 13:28:12 -06:00
Stefan Pintilie
1ac6956b52 [PowerPC] Add handling for WACC register spilling.
This patch adds spilling for the new WACC registers.

In order to get the spilling test to work the MMA instructions from Power 10 are
now supported for Future CPU except that they are all using the new WACC
registers instead of the ACC registers from Power 10.

Reviewed By: amyk, saghir

Differential Revision: https://reviews.llvm.org/D136728
2022-11-22 09:37:52 -06:00
Stanislav Mekhanoshin
bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Qiu Chaofan
5d19fea81f [PowerPC] Fix strict load-conversion recognition
Direct-move instructions are usually more efficient than load then store
for conversion. But direct moves are not needed when the source register
was just loaded from some address.

The pattern has already been recognized, but the source value of strict
nodes are not the first (that's the chain), but the second.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138011
2022-11-16 10:02:10 +08:00
Amy Kwan
715301056e [PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction.
When lowering vector shuffles into the xxsplti32dx instruction on Power10, we
canonicalize the right operand to be a BUILD_VECTOR and as a result, get the
commuted vector shuffle node.

However, a vector shuffle will not always be returned as the result for a
commuted vector shuffle. In such a scenario, this patch updates the original
cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector
shuffle node prior to obtaining the commuted shuffle mask.

This patch also adds a new test case that demonstrates this scenario (primarily
seen on 32-bit), and was originally a crash prior to this fix.

Differential Revision: https://reviews.llvm.org/D135024
2022-10-24 09:56:54 -05:00
Nemanja Ivanovic
4ea121c904 [PowerPC] Fix a number of inefficiencies and issues with atomic code gen
There are a few issues with the code we generate for atomic operations and the way we generate it:

- Hard coded CR0 for compares
- Order of operands for compares not conducive to
  emitting compare-immediate or for CSE of compares
- Missing MachineMemOperand for st[bhwd]cx intrinsics
- Missing intrinsic properties for the same
- Unnecessary blocks with store conditional
  instructions to clear reservation (which ends
  up hindering performance)
- Move from CR instructions just to compare the
  result of a store conditional with zero (even
  though it is a record-form)

This patch aims to resolve all of those issues.

Differential revision: https://reviews.llvm.org/D134783
2022-10-03 19:55:29 -05:00
Paul Scoropan
ce004fb4f2 [PowerPC] XCOFF exception section support on the direct assembler path
This feature implements support for making entries in the exception section
on XCOFF on the direct assembly path using the ".except" pseudo-op. It also
provides functionality to lower entries (comprised of language and reason
codes) into the exception section through the use of annotation metadata
attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler
support will be provided in another review. https://reviews.llvm.org/D133030
needs to merge first for LIT tests

Reviewed By: shchenz, RKSimon

Differential Revision: https://reviews.llvm.org/D132146
2022-09-26 22:24:20 -04:00
Josh Stone
4dcfb09e40 [NFC][CodeGen] Use const MF in TargetLowering stack probe functions
This makes them callable from places like canUseAsPrologue.

Differential Revision: https://reviews.llvm.org/D134492
2022-09-23 09:30:32 -07:00
Sergei Barannikov
c6acb4eb0f [SDAG] Add getCALLSEQ_END overload taking uint64_ts
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Joe Loser
5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
Kazu Hirata
7d8c2d17eb [llvm] Use range-based for loops (NFC)
Identified with modernize-loop-convert.
2022-09-03 23:27:25 -07:00
Stefan Pintilie
1492c88f49 [PowerPC] Fix bugs in sign-/zero-extension elimination
This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.
- Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
- Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).

To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D40554
2022-08-19 07:05:40 -05:00
Justin Hibbits
f43b228581 PowerPC: Don't hoist float multiply + add to fused operation on SPE
SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the
CTR in a loop, and results in an assert "Invalid PPC CTR loop!".
2022-08-10 11:04:27 -04:00
Chen Zheng
d9004dfbab [PowerPC] mapping hardward loop intrinsics to powerpc pseudo
Map hardware loop intrinsics loop_decrement and set_loop_iteration
to the new PowerPC pseudo instructions, so that the hardware loop
intrinsics will be expanded to normal cmp+branch form or ctrloop
form based on the CTR register usage on MIR level.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D123366
2022-08-08 21:34:20 -04:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Dawid Jurczak
1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
David Truby
9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Eli Friedman
1a6d82b93f Fix misc uses of "long" variables to use "int64_t".
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
2022-07-27 09:47:19 -07:00
Masoud Ataei
96515df816 [PowerPC] Fix the check for scalar MASS conversion
Proposing to move the check for scalar MASS conversion from constructor
of PPCTargetLowering to the lowerLibCallBase function which decides
about the lowering.

The Target machine option Options.PPCGenScalarMASSEntries is set in
PPCTargetMachine.cpp. But an object of the class PPCTargetLowering
is created in one of the included header files. So, the constructor will run
before setting PPCGenScalarMASSEntries to correct value. So, we cannot
check this option in the constructor.

Differential: https://reviews.llvm.org/D128653
Reviewer: @bmahjour
2022-07-06 11:44:00 -07:00
Ting Wang
88b6d22791 [PowerPC] Improve getNormalLoadInput to reach more splat load
opportunities

There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D128703
2022-06-28 08:02:49 -04:00
Nemanja Ivanovic
e09f6ff3c1 [PowerPC] Disable automatic generation of STXVP
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).

Reviewed By : lei, amyk, saghir

Differential Revision: https://reviews.llvm.org/D127218
2022-06-20 14:30:29 -05:00
Amy Kwan
34033a84b8 [PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types.
Currently in `combineVectorShuffle()`, we update the shuffle mask if either
input vector comes from a scalar_to_vector, and we keep the respective input
vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED.
However, it is possible that we end up in a situation where both input vectors
to the vector_shuffle are scalar_to_vector, and are different vector types.
In situations like this, the shuffle mask is updated incorrectly as the current
code assumes both scalar_to_vector inputs are the same vector type.

This patch skips the combines for vector_shuffle if both input vectors are
scalar_to_vector, and if they are of different vector types. A follow up patch
will focus on fixing this issue afterwards, in order to correctly update the
shuffle mask.

Differential Revision: https://reviews.llvm.org/D127818
2022-06-15 14:12:18 -05:00
Quinn Pham
335e8bf100 [PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores
This patch changes the PowerPC backend to generate VSX load/store instructions
for all vector loads/stores on Power8 and earlier  (LE) instead of VMX
load/store instructions. The reason for this change is because VMX instructions
require the vector to be 16-byte aligned. So, a vector load/store will fail with
VMX instructions if the vector is misaligned. Also, `gcc` generates VSX
instructions in this situation which allow for unaligned access but require a
swap instruction after loading/before storing. This is not an issue for BE
because we already emit VSX instructions since no swap is required. And this is
not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which
allow for unaligned access and do not require swaps.

This patch also delays the VSX load/store for LE combines until after
LegalizeOps to prioritize other load/store combines.

Reviewed By: #powerpc, stefanp

Differential Revision: https://reviews.llvm.org/D127309
2022-06-15 12:06:04 -05:00
Stefan Pintilie
263f1b2f5d [PowerPC] Fix combine step for shufflevector.
The combine step for shufflevector will sometimes replace undef in the mask
with a defined value. This can cause an infinite loop in some cases as another
combine will then put the undef back in the mask.

This patch fixes the issue so that undefs are not replaced when doing a combine.

Reviewed By: ZarkoCA, amyk, quinnp, saghir

Differential Revision: https://reviews.llvm.org/D127439
2022-06-14 11:31:24 -05:00