1837 Commits

Author SHA1 Message Date
Chen Zheng
f33a6dcf95
[PPC][NFC] add an option for GatherAllAliasesMaxDepth (#87071)
GatherAllAliases is time consuming. Add an debug option on PPC to
control the complexity of the function. This is useful when debuging
compile time related issues.
2024-04-02 08:40:28 +08:00
Amy Kwan
a3efc53f16
[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053)
Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
2024-03-28 09:18:45 -04:00
Qiu Chaofan
e5b20c83e5
[PowerPC] Update chain uses when emitting lxsizx (#84892) 2024-03-18 22:31:05 +08:00
Qiu Chaofan
65ae09eeb6
[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040)
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.

This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
2024-03-18 14:17:16 +08:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Arthur Eubanks
94c988bcfd [NFC] Remove unused parameter from shouldAssumeDSOLocal() 2024-03-11 19:48:17 +00:00
Qiu Chaofan
906580bad3
[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968)
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
2024-03-04 21:13:59 +08:00
Felix (Ting Wang)
5b05870953
[PowerPC] Support local-dynamic TLS relocation on AIX (#66316)
Supports TLS local-dynamic on AIX, generates below sequence of code:

```
.tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier
.tc mh[TC],mh[TC]@ml # Module handle for the caller
lwz 3,mh[TC]\(2\) $$ For 64-bit: ld 3,mh[TC]\(2\)
bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0
#r3 = &TLS for module
lwz 4,foo[TC]\(2\) $$ For 64-bit: ld 4,foo[TC]\(2\)
add 5,3,4 # Compute &foo
.rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML"
```

---------

Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan>
Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>
2024-03-01 08:09:40 +08:00
Kai Luo
d1924f0474
[PowerPC] Do not generate isel instruction if target doesn't have this instruction (#72845)
When expand `select_cc` in finalize-isel, we should not generate `isel`
for targets not feature it.
2024-03-01 08:03:06 +08:00
Kazu Hirata
ae46855f53 [Target] Use getConstantOperand (NFC) 2024-01-28 18:03:38 -08:00
Kazu Hirata
1f5934a901 [PowerPC] Directly call Instruction::getMetadata (NFC) 2024-01-28 18:03:36 -08:00
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Alex Bradbury
2d54ec36f7
[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455)
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.

Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
2024-01-09 14:27:07 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Qiu Chaofan
c97a7675ee
[PowerPC] Expand FSINCOS of fp128 (#76494) 2023-12-29 11:27:06 +08:00
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Chen Zheng
4b932d84f4
[PowerPC] redesign the target flags (#69695)
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316

Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.

This patch aligns with some targets like X86 which also has many target
specific flags.

The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
2023-12-07 12:47:25 +08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Qiu Chaofan
426ad99bb2
[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497) 2023-11-15 12:20:06 +08:00
Qiu Chaofan
5f295552f1
[PowerPC] Fix incorrect symbol name of frexp libcall (#71626)
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
2023-11-08 14:41:19 +08:00
Paulo Matos
7b9d73c2f9
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Nikita Popov
127ed9ae26
[PowerPC] Use zext instead of anyext in custom and combine (#68784)
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.

Emit `zext(and(x,c))` instead.

Fixes https://github.com/llvm/llvm-project/issues/68783.
2023-10-12 09:32:17 +02:00
Kishan Parmar
696ea67f19 Disable call to fma for soft-float
PowerPC backend generate calls to libc function calls
for soft-float, regardless of the -nostdlib /-ffreestanding flag.
fma is not a function provided by compiler-rt builtins and
thus should not be generated here.
PR : [[ https://github.com/llvm/llvm-project/issues/55230 | #55230 ]]

Below is patch given by @nemanjai

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D156344
2023-09-28 14:06:54 +05:30
Nick Desaulniers
330fa7d2a4
[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)
Given a list of constraints for InlineAsm (ex. "imr") I'm looking to
modify the order in which they are chosen. Before doing so, I noticed a
fair
amount of logic is duplicated between SelectionDAGISel and GlobalISel
for this.

That is because SelectionDAGISel is also trying to lower immediates
during selection. If we detangle these concerns into:
1. choose the preferred constraint
2. attempt to lower that constraint

Then we can slide down the list of constraints until we find one that
can be lowered. That allows the implementation to be shared between
instruction selection frameworks.

This makes it so that later I might only need to adjust the priority of
constraints in one place, and have both selectors behave the same.
2023-09-25 08:53:03 -07:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Maryam Moghadas
7b021f2e64 [PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE
This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its
vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the
little-endian swap code on LE, swapping the vector operand after
adjusting the mask operand. This ensures that the vector operand is
swapped at the correct point in the code, resulting in a valid
constant pool for the mask operand.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D149083
2023-09-13 15:00:49 -05:00
Nick Desaulniers
93bd428742
[InlineAsm] refactor InlineAsm class NFC (#65649)
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.

Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
2023-09-11 09:27:37 -07:00
Amy Kwan
3f46e5453d [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.

The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.

This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.

Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.

Differential Revision: https://reviews.llvm.org/D155600
2023-09-07 20:05:29 -05:00
Ting Wang
71be020dda [SelectionDAG][PowerPC] Memset reuse vector element for tail store
On PPC there are instructions to store element from vector(e.g.
stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail
constant in memset and constant splat array initialization.

This patch tries to explore these opportunities.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D138883
2023-09-06 01:52:38 -04:00
Fangrui Song
111fcb0df0 [llvm] Fix duplicate word typos. NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 18:25:16 -07:00
Matt Arsenault
ad9d13d535 SelectionDAG: Swap operands of atomic_store
Irritatingly, atomic_store had operands in the opposite order from
regular store. This made it difficult to share patterns between
regular and atomic stores.

There was a previous incomplete attempt to move atomic_store into the
regular StoreSDNode which would be better.

I think it was a mistake for all atomicrmw to swap the operand order,
so maybe it's better to take this one step further.

https://reviews.llvm.org/D123143
2023-08-31 17:30:10 -04:00
Nick Desaulniers
2fad6e6985 [InlineAsm] wrap Kind in enum class NFC
Should add some minor type safety to the use of this information, since
there's quite a bit of metadata being laundered through an `unsigned`.

I'm looking to potentially add more bitfields to that `unsigned`, but I
find InlineAsm's big ol' bag of enum values and usage of `unsigned`
confusing, type-unsafe, and un-ergonomic. These can probably be better
abstracted.

I think the lack of static_cast outside of InlineAsm indicates the prior
code smell fixed here.

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D159242
2023-08-31 08:54:51 -07:00
Qiu Chaofan
21bea1a208 [PowerPC] Support initial-exec TLS relocation on AIX
Add TLS_IE relocation type to XCOFF writer, and emit code sequence for
initial-exec TLS variables.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D156292
2023-08-30 16:22:16 +08:00
Chen Zheng
732f63d96d [PowerPC]set default min-jump-table-entries to 64 on PPC
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D159050
2023-08-29 21:42:22 -04:00
Bjorn Pettersson
e53b28c833 [llvm] Drop some bitcasts and references related to typed pointers
Differential Revision: https://reviews.llvm.org/D157551
2023-08-10 15:07:07 +02:00
Kai Luo
f26af16e2c [PowerPC][AIX] Enable quadword atomics by default for AIX
On AIX, a libatomic supporting inline quadword atomic operations has been released, so that compatibility is not an issue now, we can enable quadword atomics by default.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D151312
2023-07-25 08:21:07 +08:00
Brad Smith
a3e524df90 [PowerPC] Reorder setMaxAtomicSizeInBitsSupported(). NFC
Reorder setMaxAtomicSizeInBitsSupported() in numerical and more logical order.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D155379
2023-07-22 20:01:27 -04:00
Kamau Bridgeman
62c1cf7c63 [PowerPC][Future] Enable __builtin_mma_xxm[t|f]acc
Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract
quad vectors from the new wide accumulator(wacc) register class.
The introduction of these new instructions renders the p10 instructions
xxmtacc and xxmfacc obsolete since the new wacc register class is a better
choice for handing quad vector operations. This patch ensures that, for
future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated
by custom lowering the intrinsics for xxm[t|f]acc to produce no instructions.

Reviewed By: amyk, lei

Differential Revision: https://reviews.llvm.org/D153034
2023-07-14 13:38:40 -05:00
Nemanja Ivanovic
b0e249d5e2 Reland "[PowerPC] Remove extend between shift and and"
The commit originally caused a bootstrap failure on the big endian
PPC bot as the combine was interfering with the legalizer when
applied on illegal types. This update restricts the combine to
the only types for which it is actually needed. Tested on PPC BE
bootstrap locally.
2023-07-07 14:45:05 -04:00
Nemanja Ivanovic
7cd9084c69 Revert "[PowerPC] Remove extend between shift and and"
This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b.
Causes a bootstrap failure on ppc64be.
2023-07-05 20:04:49 -04:00
Nemanja Ivanovic
a57236de4e [PowerPC] Remove extend between shift and and
The SDAG will sometimes insert an extend between
the shift and an and (immediate) even though the
immediate is narrower than the narrow size.
This does not allow us to produce a rotate
instruction (such as rlwinm).
This patch just adds a combine to move the extend
onto the and.

Differential revision: https://reviews.llvm.org/D152911
2023-07-05 16:33:07 -04:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Amy Kwan
f5ae075048 [AIX][TLS] Generate 32-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.

The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le.   // variable offset, with the le relocation specifier

bla .__get_tpointer()     // get the thread pointer, modifies r3
lwz reg1, var[TC](2)      // load the variable offset
add reg2, r3, reg1        // add the variable offset to the retrieved thread pointer
```

Differential Revision: https://reviews.llvm.org/D152669
2023-06-20 11:57:38 -05:00
Amy Kwan
d5659808b2 [AIX][TLS] Generate 64-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.

For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13     // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.

Differential Revision: https://reviews.llvm.org/D149722
2023-06-19 12:17:30 -05:00
Matt Arsenault
eece6ba283 IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.

Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
2023-06-06 17:07:18 -04:00
Qiu Chaofan
9e17e08324 [PowerPC] Combine fptoint-store under strict cases
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D141249
2023-06-05 16:24:02 +08:00
Qiu Chaofan
590c6a1727 [PowerPC] Require FPCVT for store fptoi combination 2023-06-05 14:26:32 +08:00
Qiu Chaofan
69bc8ff766 Reland "[PowerPC] Simplify fp-to-int store optimization"
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.

This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
2023-06-05 13:53:08 +08:00