1820 Commits

Author SHA1 Message Date
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Chen Zheng
4b932d84f4
[PowerPC] redesign the target flags (#69695)
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316

Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.

This patch aligns with some targets like X86 which also has many target
specific flags.

The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
2023-12-07 12:47:25 +08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Qiu Chaofan
426ad99bb2
[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497) 2023-11-15 12:20:06 +08:00
Qiu Chaofan
5f295552f1
[PowerPC] Fix incorrect symbol name of frexp libcall (#71626)
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
2023-11-08 14:41:19 +08:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Nikita Popov
127ed9ae26
[PowerPC] Use zext instead of anyext in custom and combine (#68784)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.

Emit `zext(and(x,c))` instead.

Fixes https://github.com/llvm/llvm-project/issues/68783.
2023-10-12 09:32:17 +02:00
Kishan Parmar
696ea67f19 Disable call to fma for soft-float
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]

Below is patch given by @nemanjai

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D156344
2023-09-28 14:06:54 +05:30
Nick Desaulniers
330fa7d2a4
[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.

That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint

Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.

This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
2023-09-25 08:53:03 -07:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Maryam Moghadas
7b021f2e64 [PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D149083
2023-09-13 15:00:49 -05:00
Nick Desaulniers
93bd428742
[InlineAsm] refactor InlineAsm class NFC (#65649)
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.

Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
2023-09-11 09:27:37 -07:00
Amy Kwan
3f46e5453d [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.

The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.

This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.

Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.

Differential Revision: https://reviews.llvm.org/D155600
2023-09-07 20:05:29 -05:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Matt Arsenault
ad9d13d535 SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.

There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.

I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.

https://reviews.llvm.org/D123143
2023-08-31 17:30:10 -04:00
Nick Desaulniers
2fad6e6985 [InlineAsm] wrap Kind in enum class NFC
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.

I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.

I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D159242
2023-08-31 08:54:51 -07:00
Qiu Chaofan
21bea1a208 [PowerPC] Support initial-exec TLS relocation on AIX
Add TLS_IE relocation type to XCOFF writer, and emit code sequence for
initial-exec TLS variables.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D156292
2023-08-30 16:22:16 +08:00
Chen Zheng
732f63d96d [PowerPC]set default min-jump-table-entries to 64 on PPC
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D159050
2023-08-29 21:42:22 -04:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00
Brad Smith
a3e524df90 [PowerPC] Reorder setMaxAtomicSizeInBitsSupported(). NFC
Reorder setMaxAtomicSizeInBitsSupported() in numerical and more logical order.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D155379
2023-07-22 20:01:27 -04:00
Kamau Bridgeman
62c1cf7c63 [PowerPC][Future] Enable __builtin_mma_xxm[t|f]acc
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.

Reviewed By: amyk, lei

Differential Revision: https://reviews.llvm.org/D153034
2023-07-14 13:38:40 -05:00
Nemanja Ivanovic
b0e249d5e2 Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
2023-07-07 14:45:05 -04:00
Nemanja Ivanovic
7cd9084c69 Revert "[PowerPC] Remove extend between shift and and"
This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b.
Causes a bootstrap failure on ppc64be.
2023-07-05 20:04:49 -04:00
Nemanja Ivanovic
a57236de4e [PowerPC] Remove extend between shift and and
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.

Differential revision: https://reviews.llvm.org/D152911
2023-07-05 16:33:07 -04:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Amy Kwan
f5ae075048 [AIX][TLS] Generate 32-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.

The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le.   // variable offset, with the le relocation specifier

bla .__get_tpointer()     // get the thread pointer, modifies r3
lwz reg1, var[TC](2)      // load the variable offset
add reg2, r3, reg1        // add the variable offset to the retrieved thread pointer
```

Differential Revision: https://reviews.llvm.org/D152669
2023-06-20 11:57:38 -05:00
Amy Kwan
d5659808b2 [AIX][TLS] Generate 64-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.

For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13     // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.

Differential Revision: https://reviews.llvm.org/D149722
2023-06-19 12:17:30 -05:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Qiu Chaofan
9e17e08324 [PowerPC] Combine fptoint-store under strict cases
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141249
2023-06-05 16:24:02 +08:00
Qiu Chaofan
590c6a1727 [PowerPC] Require FPCVT for store fptoi combination 2023-06-05 14:26:32 +08:00
Qiu Chaofan
69bc8ff766 Reland "[PowerPC] Simplify fp-to-int store optimization"
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.

This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
2023-06-05 13:53:08 +08:00
Craig Topper
6006d43e2d LLVM_FALLTHROUGH => [[fallthrough]]. NFC
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D150996
2023-05-24 12:40:10 -07:00
Vitaly Buka
e7c5ced0b9 Revert "[PowerPC] Simplify fp-to-int store optimization"
Breaks https://lab.llvm.org/buildbot/#/builders/18/builds/9118

This reverts commit 8064caf83fb166b709bfe0e7641c5181341cb064.
2023-05-24 10:05:28 -07:00
Nemanja Ivanovic
de681d53ba [PowerPC] Do not attempt to combine fptoui without FPCVT
Commit 8064caf83fb166b709bfe0e7641c5181341cb064 added a call
to a function that performs this combine without checking whether
the target supports FPCVT. This caused asserts to trip on BE bots
as the default target does not have this feature.
2023-05-24 11:14:26 -05:00
Krasimir Georgiev
c37ced7d02 silence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064 2023-05-23 12:47:13 +00:00
Qiu Chaofan
8064caf83f [PowerPC] Simplify fp-to-int store optimization
On PowerPC VSX targets, fp-to-int will be transformed into xscv with
mfvsr. When the result is to be stored, mfvsr can be replaced by a
direct store.

This change simplifies the optimization by using existing fp-to-int
code, which helps CSE and handling strictfp cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141473
2023-05-23 16:40:54 +08:00
Sergei Barannikov
da42b2846c [CodeGen] Support allocating of arguments by decreasing offsets
Previously, `CCState::AllocateStack` always allocated stack space by increasing
offsets. For targets with stack growing up (away from zero) it is more
convenient to allocate arguments by decreasing offsets, so that the first
argument is at the top of the stack. This is important when calling a function
with variable number of arguments: the callee does not know the size of the
stack, but must be able to access "fixed" arguments. For that to work, the
"fixed" arguments should have fixed offsets relative to the stack top, i.e. the
variadic arguments area should be at the stack bottom (at lowest addresses).

The in-tree target with stack growing up is AMDGPU, but it allocates
arguments by increasing addresses. It does not support variadic arguments.

A drive-by change is to promote stack size/offset to 64-bit integer.
This is what MachineFrameInfo expects.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D149575
2023-05-17 21:51:52 +03:00
Sergei Barannikov
01a7967447 [CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC)
The term "next stack offset" is misleading because the next argument is
not necessarily allocated at this offset due to alignment constrains.
It also does not make much sense when allocating arguments at negative
offsets (introduced in a follow-up patch), because the returned offset
would be past the end of the next argument.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D149566
2023-05-17 21:51:45 +03:00
Zequan Wu
3977b77a6b [CodeGen] Fix nomerge attribute not working in tail calls.
In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls.

For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed.

Fixes #61545.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D146749
2023-05-10 14:25:11 -04:00
NAKAMURA Takumi
c1221251fb Restore CodeGen/MachineValueType.h from Support
This is rework of;

  - rG13e77db2df94 (r328395; MVT)

Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.

Depends on D148767

Differential Revision: https://reviews.llvm.org/D149024
2023-05-03 00:13:20 +09:00
Kai Luo
eee024bf1b [PowerPC] Update incr after resetting the register in MI
After performing signed extension, we update the register in MI. We should also update `incr` register which is tracking the register in `MI`.

Fixes https://github.com/llvm/llvm-project/issues/61882.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D147594
2023-04-14 17:36:30 +08:00
Maryam Moghadas
cf0395f816 [PowerPC] Fix the xxperm swap requirements
This patch is to fix the xxperm vector operand swap condition so that the
single-use operand is in V2 to prevent copying, it also fixes the subtarget
condition to exploit the xpperm.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D146632
2023-04-05 20:13:40 -05:00
Qiu Chaofan
5b8ea2d0e1 [PowerPC] Lower IS_FPCLASS by test data class instruction
Power ISA 3.0 introduced new 'test data class' instructions, which
accept flags for: NaN/Infinity/Zero/Denormal. This instruction can be
used to implement custom lowering for llvm.is.fpclass, but some extra
bits provided by the intrinsic are missing (normal and QNaN/SNaN).

For those categories not natively supported, this patch uses a two-way
or three-way combination to implement correct behavior.

Reviewed By: sepavloff, shchenz

Differential Revision: https://reviews.llvm.org/D140381
2023-04-03 11:37:17 +08:00
Craig Topper
219ff07f72 [Targets] Rename Flag->Glue. NFC
Long long ago Glue was called Flag, and it was never completely
renamed.
2023-04-02 19:28:51 -07:00
Simon Pilgrim
8153b92d9b [DAG] Add SelectionDAG::SplitScalar helper
Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source.

Differential Revision: https://reviews.llvm.org/D147264
2023-03-31 18:35:40 +01:00
Amy Kwan
6126356d82 [PowerPC] Implement 64-bit ELFv2 Calling Convention in TableGen (for integers/floats/vectors in registers)
This patch partially implements the parameter passing rules outlined in the
ELFv2 ABI within TableGen. Specifically, it implements the parameter assignment
of integers, floats, and vectors within registers - where the GPR numbering will
be "skipped" depending on the ordering of floats and vectors that appear within
a parameter list.

As we begin to adopt GlobalISel to the PowerPC backend, there is a need for a
TableGen definition that encapsulates the ELFv2 parameter passing rules. Thus,
this patch also changes the default calling convention that is returned within
the ccAssignFnForCall() function used in our GlobalISel implementation, and also
adds some additional testing of the calling convention that is implemented.

Future patches that build on top of this initial TableGen definition will aim to
add more of the ABI complexities, including support for additional types and
also in-memory arguments.

Differential Revision: https://reviews.llvm.org/D137504
2023-03-27 08:23:04 -05:00
Kazu Hirata
7bb6d1b32e [llvm] Skip getAPIntValue (NFC)
ConstantSDNode provides some convenience functions like isZero,
getZExtValue, and isMinSignedValue that are named identically to those
provided by APInt, so we can "skip" getAPIntValue.
2023-03-22 22:10:25 -07:00
Simon Pilgrim
da570ef1b4 [DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns
Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches)

Differential Revision: https://reviews.llvm.org/D144789
2023-03-14 15:10:30 +00:00