1866 Commits

Author SHA1 Message Date
Zaara Syeda
22067a8eb4
[PowerPC] Fix assert exposed by PR 95931 in LowerBITCAST (#108062)
Hit Assertion failed: Num < NumOperands && "Invalid child # of SDNode!" 
Fix by checking opcode and value type before calling getOperand.
2024-09-10 14:14:01 -04:00
Qiu Chaofan
06c331163e
[PowerPC] Implement llvm.set.rounding intrinsic (#67302) 2024-09-10 14:30:31 +08:00
anjenner
4af249fe6e
Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
RolandF77
26ba186bd0
[PowerPC] Improve pwr7 codegen for v4i8 load (#104507)
There are no partial vector loads on pwr7 so current v4i8 codegen is an
int load then store to vector sized temp and re-load as vector. Try to
use lfiwax to load 32 bits into an FP reg and take advantage of VSX FP
and vector reg sharing to move the result to the right vector position.
2024-09-04 12:55:27 -04:00
Matt Arsenault
911b96058a
PPC: Custom lower ppcf128 is_fpclass if is_fpclass is custom (#105540)
Unfortunately expandIS_FPCLASS is called directly in SelectionDAGBuilder
depending on whether IS_FPCLASS is custom or not. This helps avoid ppc test
regressions in a future patch where the custom lowering would be bypassed.
2024-08-29 14:01:54 +04:00
RolandF77
89bbcbe285
[PowerPC] fix legalization crash (#105563)
If v2i64 scalar_to_vector is made custom, llc can crash in certain
legalization cases where v2i64 vectors are injected, even if they
weren't otherwise present. The code generated would be fine, but that
operation is not handled in ReplaceNodeResults. Add handling.
2024-08-28 11:22:23 -04:00
Craig Topper
4b0c0ec6b8
[CodeGen] Use MCRegister for CCState::AllocateReg and CCValAssign::getReg. NFC (#106032) 2024-08-26 11:40:25 -07:00
Craig Topper
e994494a59
[PowerPC] Use MathExtras helpers to simplify code. NFC (#104691) 2024-08-17 23:12:52 -07:00
Kazu Hirata
3080c80671
[PowerPC] Use range-based for loops (NFC) (#104410) 2024-08-15 17:58:46 -07:00
RolandF77
8b6e9de3dd
[PowerPC] improve P10 store forwarding on P7 scalar to vector (#102330)
Try to make P7 code with scalar to vector operations that use store/re-load to run smoother on P10 by supplying enough store width to cover the load and allow hardware store forwarding.
2024-08-12 12:30:06 -04:00
Kazu Hirata
f4fb735840
[llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102578) 2024-08-09 09:15:42 -07:00
Tim Gymnich
408d82d352
[PowerPC] Respect endianness when bitcasting to fp128 (#95931)
Fixes #92246

Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when
encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`.
by inserting a missing swap of the arguments based on endianness.

### Current behaviour: 
**fp128**
bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi
BUILD_FP128 %lo %hi => MTVSRDD %hi %lo

**v2i64**
bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo
BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi
2024-08-08 08:51:04 +08:00
Qiu Chaofan
20957d2091
[AIX] Add -msave-reg-params to save arguments to stack (#97524)
In PowerPC ABI, a few initial arguments are passed through registers,
but their places in parameter save area are reserved, arguments passed
by memory goes after the reserved location.

For debugging purpose, we may want to save copy of the pass-by-reg
arguments into correct places on stack. The new option achieves by
adding new function level attribute and make argument lowering part
aware of it.
2024-07-24 20:58:37 +08:00
azhan92
1df4d866cc
[PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511)
This PR adds support for -mcpu=pwr11/power11 and -mtune=pwr11/power11 in
clang and llvm.
2024-07-23 09:49:41 -04:00
Joseph Huber
615b7eeaa9 Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5.

I moved the `ISD` dependencies into the CodeGen portion of the handling,
it's a little awkward but it's the easiest solution I can think of for
now.
2024-07-20 09:29:31 -05:00
NAKAMURA Takumi
740161a9b9 Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)"
This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69.
(llvmorg-19-init-17714-gc05126bdfc3b)
See #99610
2024-07-20 12:36:57 +09:00
Matt Arsenault
0f0cfcff2c
CodeGen: Avoid some references to MachineFunction's getMMI (#99652)
MachineFunction's probably should not include a backreference to
the owning MachineModuleInfo. Most of these references were used
just to query the MCContext, which MachineFunction already directly
stores. Other contexts are using it to query the LLVMContext, which
can already be accessed through the IR function reference.
2024-07-19 22:09:05 +04:00
Amara Emerson
f270a4dd66
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.

This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.

rdar://131419786
2024-07-17 01:31:52 -07:00
Joseph Huber
c05126bdfc
[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)
Summary:
The LTO pass and LLD linker have logic in them that forces extraction
and prevent internalization of needed runtime calls. However, these
currently take all RTLibcalls into account, even if the target does not
support them. The target opts-out of a libcall if it sets its name to
nullptr. This patch pulls this logic out into a class in the header so
that LTO / lld can use it to determine if a symbol actually needs to be
kept.

This is important for targets like AMDGPU that want to be able to use
`lld` to perform the final link step, but does not want the overhead of
uncalled functions. (This adds like a second to the link time trivially)
2024-07-16 06:22:09 -05:00
Joseph Huber
3f1a767572
[LLVM] Factor disabled Libcalls into the initializer (#98421)
Summary:
These Libcalls represent which functions are available to the backend.
If a runtime call is not available, the target sets the the name to
`nullptr`. Currently, this logic is spread around the various targets.
This patch pulls all of the locations that disable libcalls into the
intializer. This patch is effectively NFC.

The motivation behind this patch is that currently the LTO handling uses
the list of all runtime calls to determine which functions cannot be
internalized and must be extracted from static libraries. We do not want
this to happen for libcalls that are not emitted by the backend. A
follow-up patch will move out this logic so the LTO pass can know which
rtlib calls are actually used by the backend.
2024-07-11 12:59:25 -05:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Kazu Hirata
0a5292ebbb
[PowerPC] Remove extraneous ArrayRef (NFC) (#96092)
ArrayRef can be implicitly constructed from a C array while inferring
its size.
2024-06-19 13:38:46 -07:00
Zarko Todorovski
0295c2ada4
[PowerPC][AIX] Support ByVals with greater alignment then pointer size (#93341)
Implementation is NOT compatible with IBM XL C 16.1 and earlier but is
compatible with GCC.
It handles all ByVals with greater alignment then pointer width the same
way IBM XL C handles Byvals
that have vector members. For overaligned objects that do not contain
vectors IBM XL C does not align them properly if they are passed in the
GPR
argument registers.

This patch was originally written by Sean Fertile @mandlebug. 

Previously on Phabricator https://reviews.llvm.org/D105659
2024-06-05 12:19:16 -04:00
zhijian lin
6127f15e5b
[PowerPC] option -msoft-float should not block the PC-relative address instruction (#92543)
The Prefix instruction is introduced on PowerPC ISA3_1.

In the PR,  
1. The `FeaturePrefixInstrs` do not imply the `FeatureP8Vector`
,`FeatureP9Vector` .
2. `FeaturePrefixInstrs`  implies only the FeatureISA3_1.
3. For the prefix instructions `paddi` and `pli` , they have `Predicates
= [PrefixInstrs] `
4. For the prefix instructions `plfs` and `plfd`, they have `Predicates
= [PrefixInstrs, HasFPU] `
5. For the prefix instructions "plxv` , "plxssp` and `plxsd` , they have
`Predicates = [PrefixInstrs, HasP10Vector]`

Fixes #62372
2024-05-29 10:53:00 -04:00
Chen Zheng
2143b7cd7d [PowerPC]perform bitcast lowering only at 64 bit
Perform bitcast lowering requires 64-bit to be native supported,
However this is not true on 32-bit targets. Explicitly require
64-bit target.

Fixes #92233
2024-05-20 03:17:21 -04:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
Felix (Ting Wang)
ea126aebdc
[PowerPC] Tune AIX shared library TLS model at function level (#84132)
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
2024-05-09 09:50:36 +08:00
Felix (Ting Wang)
09d51a841d
[PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute (#86641)
Following the aix-small-local-exec-tls target attribute, this patch adds
a target attribute for an AIX-specific option in llc that informs the
compiler that it can use a faster access sequence for the local-dynamic
TLS model (formally named aix-small-local-dynamic-tls) when TLS
variables are less than ~32KB in size.

The patch either produces an addi/la with a displacement off of module
handle (return value from .__tls_get_mod) when the address is
calculated, or it produces an addi/la followed by a load/store when the
address is calculated and used for further accesses.

---------

Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
2024-04-12 08:18:01 +08:00
Chen Zheng
053750c3b4 [PowerPC] Fix the undef register for VECINSERT
If the V2 of the vector_shuffle is undef, the two vector inputs are
expected to be the same when do the VECINSERT transformation. For now
the first operand of VECINSERT is set to undef which is not right.
This patch fixes this bug.
2024-04-11 04:01:07 -04:00
Chen Zheng
f33a6dcf95
[PPC][NFC] add an option for GatherAllAliasesMaxDepth (#87071)
GatherAllAliases is time consuming. Add an debug option on PPC to
control the complexity of the function. This is useful when debuging
compile time related issues.
2024-04-02 08:40:28 +08:00
Amy Kwan
a3efc53f16
[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053)
Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
2024-03-28 09:18:45 -04:00
Qiu Chaofan
e5b20c83e5
[PowerPC] Update chain uses when emitting lxsizx (#84892) 2024-03-18 22:31:05 +08:00
Qiu Chaofan
65ae09eeb6
[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040)
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.

This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
2024-03-18 14:17:16 +08:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Arthur Eubanks
94c988bcfd [NFC] Remove unused parameter from shouldAssumeDSOLocal() 2024-03-11 19:48:17 +00:00
Qiu Chaofan
906580bad3
[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968)
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
2024-03-04 21:13:59 +08:00
Felix (Ting Wang)
5b05870953
[PowerPC] Support local-dynamic TLS relocation on AIX (#66316)
Supports TLS local-dynamic on AIX, generates below sequence of code:

```
.tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier
.tc mh[TC],mh[TC]@ml # Module handle for the caller
lwz 3,mh[TC]\(2\) $$ For 64-bit: ld 3,mh[TC]\(2\)
bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0
#r3 = &TLS for module
lwz 4,foo[TC]\(2\) $$ For 64-bit: ld 4,foo[TC]\(2\)
add 5,3,4 # Compute &foo
.rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML"
```

---------

Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan>
Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>
2024-03-01 08:09:40 +08:00
Kai Luo
d1924f0474
[PowerPC] Do not generate isel instruction if target doesn't have this instruction (#72845)
When expand `select_cc` in finalize-isel, we should not generate `isel`
for targets not feature it.
2024-03-01 08:03:06 +08:00
Kazu Hirata
ae46855f53 [Target] Use getConstantOperand (NFC) 2024-01-28 18:03:38 -08:00
Kazu Hirata
1f5934a901 [PowerPC] Directly call Instruction::getMetadata (NFC) 2024-01-28 18:03:36 -08:00
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Alex Bradbury
2d54ec36f7
[SelectionDAG] Add and use SDNode::getAsAPIntVal() helper (#77455)
This is the logical equivalent for #76710 for APInt and uses the same
naming scheme.

Converted existing users through:
`git grep -l "cast<ConstantSDNode>\(.*\).*getAPIntValueValue" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getAPIntValue/\1->getAsAPIntVal/'`
2024-01-09 14:27:07 +00:00
Alex Bradbury
197214e39b
[RFC][SelectionDAG] Add and use SDNode::getAsZExtVal() helper (#76710)
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
    
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
2024-01-09 12:25:17 +00:00
Alex Bradbury
80aeb62211
[llvm][NFC] Use SDValue::getConstantOperandVal(i) where possible (#76708)
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.

Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i

's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
2024-01-02 13:14:28 +00:00
Qiu Chaofan
c97a7675ee
[PowerPC] Expand FSINCOS of fp128 (#76494) 2023-12-29 11:27:06 +08:00
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Chen Zheng
4b932d84f4
[PowerPC] redesign the target flags (#69695)
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316

Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.

This patch aligns with some targets like X86 which also has many target
specific flags.

The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
2023-12-07 12:47:25 +08:00
Sander de Smalen
81b7f115fb
[llvm][TypeSize] Fix addition/subtraction in TypeSize. (#72979)
It seems TypeSize is currently broken in the sense that:

  TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)

without failing its assert that explicitly tests for this case:

  assert(LHS.Scalable == RHS.Scalable && ...);

The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.

This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.

The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
2023-11-22 08:52:53 +00:00
Qiu Chaofan
426ad99bb2
[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497) 2023-11-15 12:20:06 +08:00