3809 Commits

Author SHA1 Message Date
Qiu Chaofan
e5b20c83e5
[PowerPC] Update chain uses when emitting lxsizx (#84892) 2024-03-18 22:31:05 +08:00
Yingwei Zheng
38a44bdc93
[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572)
In commit
2b582440c1,
we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better
analysis/codegen (See also the discussion in
https://github.com/llvm/llvm-project/pull/76338).

This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass`
is not supported by the target, it will be expanded by TLI.

Fixes the regression introduced by
2b582440c1
and
https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.
2024-03-18 18:27:45 +08:00
Qiu Chaofan
65ae09eeb6
[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040)
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.

This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
2024-03-18 14:17:16 +08:00
Sean Fertile
2d80505401
[AIX] Support per global code model. (#79202)
Exploit the per global code model attribute on AIX. On AIX we need to
update both the code sequence used to access the global (either 1 or 2
instructions for small and large code model respectively) and the
storage mapping class that we emit the toc entry.

---------

Co-authored-by: Amy Kwan <akwan0907@gmail.com>
2024-03-15 12:52:04 -04:00
Kevin P. Neal
ea628f087e [FPEnv][PowerPC] Correct strictfp test.
Correct llvm-reduce strictfp test to follow the rules documented in the
LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

This test needed the strictfp attribute added to function definitions.

Test changes verified with D146845.
2024-03-15 12:08:09 -04:00
Zaara Syeda
cc761a7c35
[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099)
In preparation of adding a similar instruction for large code model on
AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8
to match the naming convention of other instructions with 32-bit and
64-bit variants.
2024-03-13 11:57:07 -04:00
Zaara Syeda
37b5eb0a0a
[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999)
This patch enables support that the XL compiler had for AIX under
-qdatalocal/-qdataimported.
2024-03-13 10:26:31 -04:00
Chen Zheng
cc34e56b86 [PPC][NFC] add an option to expose the bug in 74951 2024-03-07 20:52:44 -05:00
Chen Zheng
e7a22e72de [PPC] precommit cases for issue 74915 2024-03-07 20:22:26 -05:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Felix (Ting Wang)
ed6275868b
[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764)
Update test case changed by #66316
2024-03-05 14:07:47 +08:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Qiu Chaofan
906580bad3
[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968)
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
2024-03-04 21:13:59 +08:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
George Koehler
6b70c5d79f
[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098)
Delete the code that skips the CFI for the condition register on ELF32.
The code checked !MustSaveCR, which happened only when
Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling
cr in a different way. The spill was missing CFI. After deleting this
code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14
to r31.

Fixes #83094
2024-03-02 22:18:24 -05:00
Felix (Ting Wang)
5b05870953
[PowerPC] Support local-dynamic TLS relocation on AIX (#66316)
Supports TLS local-dynamic on AIX, generates below sequence of code:

```
.tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier
.tc mh[TC],mh[TC]@ml # Module handle for the caller
lwz 3,mh[TC]\(2\) $$ For 64-bit: ld 3,mh[TC]\(2\)
bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0
#r3 = &TLS for module
lwz 4,foo[TC]\(2\) $$ For 64-bit: ld 4,foo[TC]\(2\)
add 5,3,4 # Compute &foo
.rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML"
```

---------

Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan>
Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>
2024-03-01 08:09:40 +08:00
Kai Luo
d1924f0474
[PowerPC] Do not generate isel instruction if target doesn't have this instruction (#72845)
When expand `select_cc` in finalize-isel, we should not generate `isel`
for targets not feature it.
2024-03-01 08:03:06 +08:00
Chen Zheng
3196005f6b [NFC][PowerPC] use script to regenerate the CHECK lines 2024-02-29 04:49:37 -05:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
stephenpeckham
26db845536
[XCOFF] Support the subtype flag in DWARF section headers (#81667)
The section headers for XCOFF files have a subtype flag for Dwarf
sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that
they recognize the subtype.
2024-02-20 08:42:12 -06:00
Jeffrey Byrnes
7180c23cf6
[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants (#81671)
Actually update tests w.r.t
9e5a77f252
and reland https://github.com/llvm/llvm-project/pull/73056
2024-02-13 17:10:23 -08:00
Philip Reames
99c5a66c62 Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056)" and follow ups
"ninja check-llvm" is failing on tip of tree.

This reverts commit ec0aa1646e9953d1a8d0d15dc381d3250c854572.
This reverts commit 1b65742f8c71f576381fe85d5e34579b24f2d874.
2024-02-13 13:29:23 -08:00
Jeffrey Byrnes
1b65742f8c
[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056)
In this case, a trivial GEP chain has the form:

```
%ptr = getelementptr sameType, %base, constant
%val = getelementptr sameType, %ptr, %variable
```

That is, a one-index GEP consumes another (of the same basis and result
type) one-index GEP, where the inner GEP uses a constant index and the
outer GEP uses a variable index. For chains of this type, it is trivial
to reorder them (by simply swapping the indexes). The result of doing so
is better AddrMode matching for users of the ultimate ptr produced by
GEP chain.

Future patches can extend this to support non-trivial GEP chains (e.g.
those with different basis types and/or multiple indices).
2024-02-13 11:22:49 -08:00
stephenpeckham
90e8dc0f7c
Fix failing testcases (#80902) 2024-02-06 15:35:21 -05:00
stephenpeckham
b1acb7a315
[XCOFF] Add compiler version to an auxiliary symbol table entry (#80162)
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
2024-02-06 09:08:18 -06:00
Qiu Chaofan
292d9e869f
[PowerPC] Mask constant operands in ValueBit tracking (#67653)
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.

PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.

This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
2024-02-06 18:37:31 +08:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Amy Kwan
2a50921553
[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485)
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.

In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
  access where R13 is the thread pointer, respectively.

In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.

Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
  TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.

This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.

This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
2024-02-01 09:29:21 -05:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
Zaara Syeda
a03a6e9964
[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530)
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.

---------

Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
2024-01-31 16:34:21 -05:00
Shimin Cui
1bab570e9b
Move the PowerPC/PPCMergeStringPool work to initializer (#77352)
Currently, the `PPCMergeStringPool` merges the global variable after the
`AsmPrinter` initializer adds the global variables to its symbol list.
This is to move the merging work of `PPCMergeStringPool` to its
initializer, just like what GlobalMerge does, to avoid adding merged
global variables to the `AsmPrinter` symbol lis.
2024-01-31 10:27:07 -05:00
Amy Kwan
d5fe1bd081
[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252)
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.

This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.

Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
2024-01-26 12:39:25 -05:00
Nemanja Ivanovic
67c1c1dbb6
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.

I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.

---------

Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2024-01-26 11:24:50 -05:00
Krzysztof Drewniak
63fe80fb18
[SeperateConstOffsetFromGEP] Handle or disjoint flags (#76997)
This commit extends separate-const-offset-from-gep to look at the
newly-added `disjoint` flag on `or` instructions so as to preserve
additional opportunities for optimization.

The tests were pre-committed in #76972.
2024-01-26 09:56:06 -06:00
Shimin Cui
e278c67096
Add support to meger strings used by metadata (#77364)
Currently if the merged string is used by metadata, its metadata uses
are not replaced if the string is merged. This is to add code support
for the metadata use replacement.
2024-01-26 09:22:37 -05:00
Douglas Yung
cc2c8ab21f Require asserts for llvm/test/CodeGen/PowerPC/sms-regpress.mir. 2024-01-22 13:51:03 -08:00
Ryotaro KASUGA
7556626dcf
[CodeGen][MachinePipeliner] Limit register pressure when scheduling (#74807)
In software pipelining, when searching for the Initiation Interval (II),
`MachinePipeliner` tries to reduce register pressure, but doesn't check
how many variables can actually be alive at the same time. As a result,
a lot of register spills/fills can be generated after register
allocation, which might cause performance degradation. To prevent such
cases, this patch adds a check phase that calculates the maximum
register pressure of the scheduled loop and reject it if the pressure is
too high. This can be enabled this by specifying
`pipeliner-register-pressure`. Additionally, an II search range is
currently fixed at 10, which is too small to find a schedule when the
above algorithm is applied. Therefore this patch also adds a new option
`pipeliner-ii-search-range` to specify the length of the range to
search. There is one more new option
`pipeliner-register-pressure-margin`, which can be used to estimate a
register pressure limit less than actual for conservative analysis.

Discourse thread:
https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
2024-01-22 17:06:37 +09:00
Hans Wennborg
677ced8af2 Require asserts for llvm/test/CodeGen/PowerPC/fence.ll 2024-01-15 17:25:49 +01:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Qiu Chaofan
ce1f9465b0 [NFC] Pre-commit case of ppcf128 extractelt soften 2024-01-15 15:27:36 +08:00
Qiu Chaofan
85071a3c74
[PowerPC] Implement fence builtin (#76495) 2024-01-15 11:19:16 +08:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Nikita Popov
13b5882ee6 [PowerPC] Add test for #77748 (NFC) 2024-01-11 15:45:52 +01:00
Kai Luo
6615581526
[PowerPC] Make verifier happy when lowering llvm.trap (#77266)
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.

Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.

`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
2024-01-10 09:23:30 +08:00
Fangrui Song
f972e4d343 [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
  argument goes before the section group name, if a reader doesn't know
  that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
  them with 'G' is kinda ugly.

Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
2024-01-09 10:48:23 -08:00
Kai Luo
225e2704af [PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC. 2024-01-08 10:20:01 +08:00
Chen Zheng
d6aef863d8
[PowerPC] make LR/LR8 CTR/CTR8 aliased (#76926)
fixes https://github.com/llvm/llvm-project/issues/47156 
fixes https://github.com/llvm/llvm-project/issues/47155
2024-01-08 09:37:40 +08:00
Chen Zheng
dd4dc2111e nfc add cases for pr47156 and pr47155 2024-01-04 03:56:40 -05:00