3802 Commits

Author SHA1 Message Date
Chen Zheng
cc34e56b86 [PPC][NFC] add an option to expose the bug in 74951 2024-03-07 20:52:44 -05:00
Chen Zheng
e7a22e72de [PPC] precommit cases for issue 74915 2024-03-07 20:22:26 -05:00
Sameer Sahasrabuddhe
60822637bf Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-06 12:19:32 +05:30
Felix (Ting Wang)
ed6275868b
[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764)
Update test case changed by #66316
2024-03-05 14:07:47 +08:00
Mitch Phillips
f010b1bef4 Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785)""
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.

Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
2024-03-04 17:05:34 +01:00
Qiu Chaofan
906580bad3
[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968)
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
2024-03-04 21:13:59 +08:00
Sameer Sahasrabuddhe
c7fdd8c11e Restore "Implement convergence control in MIR using SelectionDAG (#71785)"
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.

LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-03-04 13:28:04 +05:30
George Koehler
6b70c5d79f
[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098)
Delete the code that skips the CFI for the condition register on ELF32.
The code checked !MustSaveCR, which happened only when
Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling
cr in a different way. The spill was missing CFI. After deleting this
code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14
to r31.

Fixes #83094
2024-03-02 22:18:24 -05:00
Felix (Ting Wang)
5b05870953
[PowerPC] Support local-dynamic TLS relocation on AIX (#66316)
Supports TLS local-dynamic on AIX, generates below sequence of code:

```
.tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier
.tc mh[TC],mh[TC]@ml # Module handle for the caller
lwz 3,mh[TC]\(2\) $$ For 64-bit: ld 3,mh[TC]\(2\)
bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0
#r3 = &TLS for module
lwz 4,foo[TC]\(2\) $$ For 64-bit: ld 4,foo[TC]\(2\)
add 5,3,4 # Compute &foo
.rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML"
```

---------

Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan>
Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>
2024-03-01 08:09:40 +08:00
Kai Luo
d1924f0474
[PowerPC] Do not generate isel instruction if target doesn't have this instruction (#72845)
When expand `select_cc` in finalize-isel, we should not generate `isel`
for targets not feature it.
2024-03-01 08:03:06 +08:00
Chen Zheng
3196005f6b [NFC][PowerPC] use script to regenerate the CHECK lines 2024-02-29 04:49:37 -05:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
stephenpeckham
26db845536
[XCOFF] Support the subtype flag in DWARF section headers (#81667)
The section headers for XCOFF files have a subtype flag for Dwarf
sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that
they recognize the subtype.
2024-02-20 08:42:12 -06:00
Jeffrey Byrnes
7180c23cf6
[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants (#81671)
Actually update tests w.r.t
9e5a77f252
and reland https://github.com/llvm/llvm-project/pull/73056
2024-02-13 17:10:23 -08:00
Philip Reames
99c5a66c62 Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056)" and follow ups
"ninja check-llvm" is failing on tip of tree.

This reverts commit ec0aa1646e9953d1a8d0d15dc381d3250c854572.
This reverts commit 1b65742f8c71f576381fe85d5e34579b24f2d874.
2024-02-13 13:29:23 -08:00
Jeffrey Byrnes
1b65742f8c
[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056)
In this case, a trivial GEP chain has the form:

```
%ptr = getelementptr sameType, %base, constant
%val = getelementptr sameType, %ptr, %variable
```

That is, a one-index GEP consumes another (of the same basis and result
type) one-index GEP, where the inner GEP uses a constant index and the
outer GEP uses a variable index. For chains of this type, it is trivial
to reorder them (by simply swapping the indexes). The result of doing so
is better AddrMode matching for users of the ultimate ptr produced by
GEP chain.

Future patches can extend this to support non-trivial GEP chains (e.g.
those with different basis types and/or multiple indices).
2024-02-13 11:22:49 -08:00
stephenpeckham
90e8dc0f7c
Fix failing testcases (#80902) 2024-02-06 15:35:21 -05:00
stephenpeckham
b1acb7a315
[XCOFF] Add compiler version to an auxiliary symbol table entry (#80162)
C_FILE symbols. To match the behavior of the assembler and the legacy
compiler, this includes using the generic ".file" name for the C_FILE
symbol and generating the actual file name in an auxiliary entry.
2024-02-06 09:08:18 -06:00
Qiu Chaofan
292d9e869f
[PowerPC] Mask constant operands in ValueBit tracking (#67653)
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.

PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.

This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
2024-02-06 18:37:31 +08:00
Nikita Popov
ff9af4c43a [CodeGen] Convert tests to opaque pointers (NFC) 2024-02-05 14:07:09 +01:00
Amy Kwan
2a50921553
[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485)
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.

In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
  access where R13 is the thread pointer, respectively.

In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.

Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
  TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.

This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.

This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
2024-02-01 09:29:21 -05:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
Zaara Syeda
a03a6e9964
[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530)
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.

---------

Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
2024-01-31 16:34:21 -05:00
Shimin Cui
1bab570e9b
Move the PowerPC/PPCMergeStringPool work to initializer (#77352)
Currently, the `PPCMergeStringPool` merges the global variable after the
`AsmPrinter` initializer adds the global variables to its symbol list.
This is to move the merging work of `PPCMergeStringPool` to its
initializer, just like what GlobalMerge does, to avoid adding merged
global variables to the `AsmPrinter` symbol lis.
2024-01-31 10:27:07 -05:00
Amy Kwan
d5fe1bd081
[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252)
This patch disallows the use of the -maix-small-local-exec-tls and
-fno-data-sections options within clang, and also disallows the use of
the aix-small-local-exec-tls attribute with the -data-sections=false
option in llc.

This is because having data sections off when using the
aix-small-local-exec-tls feature is not ideal for performance. As the
small-local-exec-tls region is a limited resource, this space should not
used for variables that may be replaced.

Note, that on AIX, data sections is turned on by default, so this patch
makes it so that a diagnostic is emitted when users explicitly turn off
data sections while using the aix-small-local-exec-tls feature.
2024-01-26 12:39:25 -05:00
Nemanja Ivanovic
67c1c1dbb6
[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919)
Make __builtin_cpu_{init|supports|is} target independent and provide an
opt-in query for targets that want to support it. Each target is still
responsible for their specific lowering/code-gen. Also provide code-gen
for PowerPC.

I originally proposed this in https://reviews.llvm.org/D152914 and this
addresses the comments I received there.

---------

Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn>
Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>
2024-01-26 11:24:50 -05:00
Krzysztof Drewniak
63fe80fb18
[SeperateConstOffsetFromGEP] Handle or disjoint flags (#76997)
This commit extends separate-const-offset-from-gep to look at the
newly-added `disjoint` flag on `or` instructions so as to preserve
additional opportunities for optimization.

The tests were pre-committed in #76972.
2024-01-26 09:56:06 -06:00
Shimin Cui
e278c67096
Add support to meger strings used by metadata (#77364)
Currently if the merged string is used by metadata, its metadata uses
are not replaced if the string is merged. This is to add code support
for the metadata use replacement.
2024-01-26 09:22:37 -05:00
Douglas Yung
cc2c8ab21f Require asserts for llvm/test/CodeGen/PowerPC/sms-regpress.mir. 2024-01-22 13:51:03 -08:00
Ryotaro KASUGA
7556626dcf
[CodeGen][MachinePipeliner] Limit register pressure when scheduling (#74807)
In software pipelining, when searching for the Initiation Interval (II),
`MachinePipeliner` tries to reduce register pressure, but doesn't check
how many variables can actually be alive at the same time. As a result,
a lot of register spills/fills can be generated after register
allocation, which might cause performance degradation. To prevent such
cases, this patch adds a check phase that calculates the maximum
register pressure of the scheduled loop and reject it if the pressure is
too high. This can be enabled this by specifying
`pipeliner-register-pressure`. Additionally, an II search range is
currently fixed at 10, which is too small to find a schedule when the
above algorithm is applied. Therefore this patch also adds a new option
`pipeliner-ii-search-range` to specify the length of the range to
search. There is one more new option
`pipeliner-register-pressure-margin`, which can be used to estimate a
register pressure limit less than actual for conservative analysis.

Discourse thread:
https://discourse.llvm.org/t/considering-register-pressure-when-deciding-initiation-interval-in-machinepipeliner/74725
2024-01-22 17:06:37 +09:00
Hans Wennborg
677ced8af2 Require asserts for llvm/test/CodeGen/PowerPC/fence.ll 2024-01-15 17:25:49 +01:00
Nikita Popov
87bc91d425
[PowerPC] Fix shuffle combine with undef elements (#77787)
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.

In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.

Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.

Fixes https://github.com/llvm/llvm-project/issues/77748.
2024-01-15 10:12:33 +01:00
Qiu Chaofan
ce1f9465b0 [NFC] Pre-commit case of ppcf128 extractelt soften 2024-01-15 15:27:36 +08:00
Qiu Chaofan
85071a3c74
[PowerPC] Implement fence builtin (#76495) 2024-01-15 11:19:16 +08:00
Philip Reames
e4d01bb227
[SCEV] Special case sext in isKnownNonZero (#77834)
The existing logic in isKnownNonZero relies on unsigned ranges, which
can be problematic when our range calculation is imprecise. Consider the
following:
  %offset.nonzero = or i32 %offset, 1
  -->  %offset.nonzero U: [1,0) S: [1,0)
  %offset.i64 = sext i32 %offset.nonzero to i64
  -->  (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648)
                                         S: [-2147483648,2147483648)

Note that the unsigned range for the sext does contain zero in this case
despite the fact that it can never actually be zero.

Instead, we can push the query down one level - relying on the fact that
the sext is an invertible operation and that the result can only be zero
if the input is. We could likely generalize this reasoning for other
invertible operations, but special casing sext seems worthwhile.
2024-01-12 07:45:28 -08:00
Nikita Popov
13b5882ee6 [PowerPC] Add test for #77748 (NFC) 2024-01-11 15:45:52 +01:00
Kai Luo
6615581526
[PowerPC] Make verifier happy when lowering llvm.trap (#77266)
`llvm.trap` is lowered to `PPC::TRAP` and `PPC::TRAP` is set as
terminator. Verifier complains about terminator should not lie in the
middle of an MBB. See #77095.

Fix it by removing `isTerminator` and `isBarrier` and then set `isTrap`
which was introduced by https://reviews.llvm.org/D48836# and is being
used by X86 and AArch64.

`PPC::TRAP` is not a hardware memory barrier and `llvm.trap` doesn't
indicate a memory barrier either.
2024-01-10 09:23:30 +08:00
Fangrui Song
f972e4d343 [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
  argument goes before the section group name, if a reader doesn't know
  that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
  them with 'G' is kinda ugly.

Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
2024-01-09 10:48:23 -08:00
Kai Luo
225e2704af [PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC. 2024-01-08 10:20:01 +08:00
Chen Zheng
d6aef863d8
[PowerPC] make LR/LR8 CTR/CTR8 aliased (#76926)
fixes https://github.com/llvm/llvm-project/issues/47156 
fixes https://github.com/llvm/llvm-project/issues/47155
2024-01-08 09:37:40 +08:00
Chen Zheng
dd4dc2111e nfc add cases for pr47156 and pr47155 2024-01-04 03:56:40 -05:00
Arthur Eubanks
ece1359857 Revert "[PowerPC] Add test after #75271 on PPC. NFC. (#75616)"
This reverts commit 5cfc7b3342ce4de0bbe182b38baa8a71fc83f8f8.

This depends on 0e46b49de43349f8cbb2a7d4c6badef6d16e31ae which is being reverted.
2024-01-03 17:09:45 +00:00
Kai Luo
8ae73fea3a [PowerPC] Precommit test for #72845. NFC. 2024-01-03 03:03:48 +00:00
Qiu Chaofan
c97a7675ee
[PowerPC] Expand FSINCOS of fp128 (#76494) 2023-12-29 11:27:06 +08:00
Kai Luo
5cfc7b3342
[PowerPC] Add test after #75271 on PPC. NFC. (#75616)
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.

The case is reduced from failure in #75570. The failure is triggered
after #75271 .
2023-12-26 00:21:56 +08:00
stephenpeckham
7026086073
[XCOFF] Use RLDs to print branches even without -r (#74342)
This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,

`void f() { f(); g();}`

disassembles as 
>00000000 <.f>:
       0: 7c 08 02 a6   mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
       8: 90 01 00 48   stw 0, 72(1)
      c: 4b ff ff f5   bl 0x0 <.f>
      10: 4b ff ff f1   bl 0x0 <.f> 

With this PR, the second call will display:

`10: 4b ff ff f1   bl 0x0 <.g>  `

Using -r can help, but you still get the confusing output:

>10: 4b ff ff f1   bl 0x0 <.f>
      00000010:  R_RBR        .g
2023-12-21 08:17:32 -06:00
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Paul Kirth
9a578a9f60
Revert "[StackColoring] Delete dead stack slots (#75351)" (#75655)
This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20.

it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
2023-12-15 13:32:39 -08:00