llvm-project

Author	SHA1	Message	Date
Stephen Peckham	ac5d5351d4	Use empty symbol name for XCOFF text csect When generating XCOFF, the compiler generates a csect with an internal name. Each function results in a label within the csect. This patch replaces the internal name ".text" with an empty string "". This avoids adding special code to handle a function text() in the source file, and works better with some XCOFF tools that are confused when the csect and the first function have the same address. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D154854	2023-07-15 16:13:48 -04:00
Kamau Bridgeman	62c1cf7c63	[PowerPC][Future] Enable __builtin_mma_xxm[t\|f]acc Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract quad vectors from the new wide accumulator(wacc) register class. The introduction of these new instructions renders the p10 instructions xxmtacc and xxmfacc obsolete since the new wacc register class is a better choice for handing quad vector operations. This patch ensures that, for future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated by custom lowering the intrinsics for xxm[t\|f]acc to produce no instructions. Reviewed By: amyk, lei Differential Revision: https://reviews.llvm.org/D153034	2023-07-14 13:38:40 -05:00
Sean Fertile	5e28d30f1f	[XCOFF][AIX] Peephole optimization for toc-data. Followup to D101178 - peephole optimization that converts a load address instruction and a consuming load/store into just the load/store when its safe to do so. eg: converts the 2 instruction code sequence la 4, i[TD](2) stw 3, 0(4) to stw 3, i[TD](2) Differential Revision: https://reviews.llvm.org/D101470	2023-07-13 20:40:09 -04:00
Nemanja Ivanovic	329b8cd3e3	[PowerPC] Improve code gen for vector add Improve codegen for vectors modulo additions. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D154447	2023-07-13 15:21:49 -04:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Jake Egan	bbd0d123d3	Implement -frecord-command-line for XCOFF This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D153600	2023-07-10 12:47:07 -04:00
Matt Arsenault	310f839612	DAG: Lower is.fpclass fcInf to fcmp of fabs InstCombine should have taken care of this, but I think this is more useful in the future when the expansion tries to handle multiple cases at a time with fcmp. x87 looks worse to me but the only thing I know about it is that I aggressively do not care about it. https://reviews.llvm.org/D143198	2023-07-07 17:00:10 -04:00
Nemanja Ivanovic	b0e249d5e2	Reland "[PowerPC] Remove extend between shift and and" The commit originally caused a bootstrap failure on the big endian PPC bot as the combine was interfering with the legalizer when applied on illegal types. This update restricts the combine to the only types for which it is actually needed. Tested on PPC BE bootstrap locally.	2023-07-07 14:45:05 -04:00
Qiu Chaofan	a2b5117df7	[PowerPC] Update InputOps of Power10 SchedModel Count of input operands affect pipeline forwarding in scheduling model. Previous Power10 model definition arranges some instructions into incorrect groups, by counting the wrong number of input operands. This patch updates the model, setting the input operands count correctly by excluding irrelevant immediate operands and count memory operands of load instructions correctly. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D153842	2023-07-07 22:46:22 +08:00
zhijian	d6d7f7b1d2	[AIX][XCOFF] print out the traceback info Summary: Adding a new option -traceback-table to print out the traceback info of xcoff ojbect file. Reviewers: James Henderson, Fangrui Song, Stephen Peckham, Xing Xue Differential Revision: https://reviews.llvm.org/D89049	2023-07-06 11:47:08 -04:00
Amy Kwan	598cccea80	[AIX][TLS] Generate optimized local-exec access code sequence using X-Form loads/stores This patch is a follow up to D149722, D152669 and D153645, where a slightly more optimized code sequence is generated for 64-bit and 32-bit local-exec accesses when optimizations are turned on. Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or store. In this particular situation, this allows the ADD_TLS node to be removed completely. Differential Revision: https://reviews.llvm.org/D150367	2023-07-06 07:57:05 -05:00
Nemanja Ivanovic	7cd9084c69	Revert "[PowerPC] Remove extend between shift and and" This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b. Causes a bootstrap failure on ppc64be.	2023-07-05 20:04:49 -04:00
Nemanja Ivanovic	a57236de4e	[PowerPC] Remove extend between shift and and The SDAG will sometimes insert an extend between the shift and an and (immediate) even though the immediate is narrower than the narrow size. This does not allow us to produce a rotate instruction (such as rlwinm). This patch just adds a combine to move the extend onto the and. Differential revision: https://reviews.llvm.org/D152911	2023-07-05 16:33:07 -04:00
esmeyi	2d74cf1f24	[XCOFF] Force recording a relocation for weak symbol label. Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D153839	2023-07-05 01:58:18 -04:00
Lei Huang	c7c3d71414	[PowerPC] add testcase for vector add and shift	2023-07-04 10:45:19 -04:00
Ting Wang	0b955fee90	[PowerPC][NFC] add SADDO/SSUBO test case Differential Revision: https://reviews.llvm.org/D152339 Reviewed By: qiucf	2023-06-29 20:35:59 -04:00
Ting Wang	919588fd10	[PowerPC][NFC] expose issue on absol-jump-table-enabled.ll (relocation-model=pic + ppc-use-absolute-jumptables) Differential Revision: https://reviews.llvm.org/D154047	2023-06-29 20:32:15 -04:00
Matt Arsenault	003b58f65b	IR: Add llvm.frexp intrinsic Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts. AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.	2023-06-28 14:50:16 -04:00
Amy Kwan	11b71ade51	[PowerPC][TLS] Add additional TLS X-Form loads/store instructions This patch is a follow up to D43315, and adds the following new load/store TLS specific instructions for integer and floating point scalar types: ``` LHAXTLS LWAXTLS LHAXTLS_32 LWAXTLS_32 LFSXTLS LFDXTLS STFSXTLS STFDXTLS ``` These instructions can be used to optimized TLS sequences when D-Form loads/stores follow an ADD_TLS instruction. Duplicate versions of these instructions are also added within an isAsmParserOnly=1 block (similar to D47382) to allow llvm-mc to assemble these instructions. Differential Revision: https://reviews.llvm.org/D153645	2023-06-27 11:33:38 -05:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Matthias Braun	759b217626	Switch tests to use update_llc_test_checks Switch and update some tests to use `update_llc_test_checks` to reduce clutter in upcoming change. Differential Revision: https://reviews.llvm.org/D152215	2023-06-26 13:50:36 -07:00
Matt Arsenault	f2596b754c	SeparateConstOffsetFromGEP: Don't use SCEV This was only using the SCEV expressions as a map key, which we can do just as well with the value pointers. This also allows it to handle vectors.	2023-06-26 13:58:06 -04:00
Amaury Séchet	632a8aca07	[NFC] Autogenerate CodeGen/PowerPC/tail-dup-break-cfg.ll	2023-06-25 22:55:49 +00:00
Amaury Séchet	e345b9ca7a	[NFC] Autogenerate CodeGen/PowerPC/pr40922.ll	2023-06-25 21:05:06 +00:00
Amaury Séchet	93af6bdcaf	[NFC] Autogenerate CodeGen/PowerPC/select-i1-vs-i1.ll	2023-06-25 01:27:29 +00:00
Matt Arsenault	80e2c26dfd	RegisterCoalescer: Fix name of pass I finally snapped and fixed this inconsistency.	2023-06-21 10:30:43 -04:00
Kishan Parmar	c42f0a6e64	PowerPC/SPE: Add phony registers for high halves of SPE SuperRegs The intent of this patch is to make upper halves of SPE SuperRegs(s0,..,s31) as artificial regs, similar to how X86 has done it. And emit store /reload instructions for the required halves. PR : https://github.com/llvm/llvm-project/issues/57307 Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D152437	2023-06-21 10:24:40 +00:00
tianleli	1c27275813	[DAG] Unroll and expand illegal result of LDEXP and POWI instead of widen. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D153104	2023-06-21 14:27:39 +08:00
Fangrui Song	e0a6561ec9	[XRay] Make xray_fn_idx entries PC-relative As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020), PC-relative entries avoid dynamic relocations and can therefore make the section read-only. This is similar to D78082 and D78590. We cannot commit to support compiler/runtime built at different versions, so just don't play with versions. For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+` symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`. A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d). Differential Revision: https://reviews.llvm.org/D152661	2023-06-20 22:40:56 -07:00
Amy Kwan	f5ae075048	[AIX][TLS] Generate 32-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 32-bit (specifically, non-optimized) code sequence. This work is a follow up of D149722. The particular sequence that is generated for this sequence is as follows: ``` .tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier bla .__get_tpointer() // get the thread pointer, modifies r3 lwz reg1, var[TC](2) // load the variable offset add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer ``` Differential Revision: https://reviews.llvm.org/D152669	2023-06-20 11:57:38 -05:00
Simon Pilgrim	ff23856c1c	[DAG] Fold (abds x, y) -> (abdu x, y) iff both args are known positive This is a generic DAG combine version of D151055 which recognizes when a signed ABDS can be safely replaced with a unsigned ABDU instruction if it is legal. Alive2: https://alive2.llvm.org/ce/z/pb5BjG Differential Revision: https://reviews.llvm.org/D153328	2023-06-20 15:31:22 +01:00
Amy Kwan	d5659808b2	[AIX][TLS] Generate 64-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 64-bit (specifically, non-optimized) code sequence. For this patch in particular, the sequence that is generated involves a load of the variable offset, followed by an add of the loaded variable offset to r13 (which is thread pointer, respectively). This code sequence looks like the following: ``` ld reg1,var[TC](2) add reg2, reg1, r13 // r13 contains the thread pointer ``` The TOC (.tc pseudo-op) entries generated in the assembly files are also changed where we add the @le relocation for the variable offset. Differential Revision: https://reviews.llvm.org/D149722	2023-06-19 12:17:30 -05:00
Fangrui Song	49b61ead47	[XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes	2023-06-18 13:32:40 -07:00
esmeyi	028a261350	[XCOFF] FixupOffsetInCsect should be 0 for R_REF relocation. Summary: The FixupOffsetInCsect should be 0 for R_REF relocation since it specifies a nonrelocating reference. Otherwise liker would try to relocate the symbol through its address and an error like following occurred. ``` ld: 0711-547 SEVERE ERROR: Object /tmp/1-2a7ea1.o cannot be processed. RLD address 0x65 for section 2 (.data) is not contained in the section. ``` Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D152777	2023-06-15 01:28:45 -04:00
Amaury Séchet	a70d5e25f3	[DAGCombine] Make sure combined nodes are added back to the worklist in topological order. Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127115	2023-06-13 09:14:37 +00:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
JP Lehr	c9998ec145	Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in topological order." This reverts commit e69fa03ddd85812be3143d79a0359c3e8d43bd45. This patch lead to build time outs on the AMDGPU OpenMP runtime buildbot.	2023-06-05 10:55:58 -04:00
Amaury Séchet	e69fa03ddd	[DAGCombine] Make sure combined nodes are added back to the worklist in topological order. Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D127115	2023-06-05 11:09:18 +00:00
Qiu Chaofan	9e17e08324	[PowerPC] Combine fptoint-store under strict cases Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D141249	2023-06-05 16:24:02 +08:00
esmeyi	6f57d8df2d	Revert "[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode." This reverts commit 4054c68644dfebbb584bca698a25d18d1d312bae. Due to AIX system linker requires DWARF64 for XCOFF64.	2023-06-05 02:50:47 -04:00
Qiu Chaofan	69bc8ff766	Reland "[PowerPC] Simplify fp-to-int store optimization" The build failure should be fixed by de681d53. Follow-up refactor will be done in future patches. This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.	2023-06-05 13:53:08 +08:00
sgokhale	c4a60c9d34	[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default This is an attempt to reland D42600 and enabling this optimisation by default. This also resolves the issue pointed out in the context of PGO build. Differential Revision: https://reviews.llvm.org/D42600	2023-05-25 13:56:29 +05:30
Vitaly Buka	e7c5ced0b9	Revert "[PowerPC] Simplify fp-to-int store optimization" Breaks https://lab.llvm.org/buildbot/#/builders/18/builds/9118 This reverts commit 8064caf83fb166b709bfe0e7641c5181341cb064.	2023-05-24 10:05:28 -07:00
Nemanja Ivanovic	de681d53ba	[PowerPC] Do not attempt to combine fptoui without FPCVT Commit 8064caf83fb166b709bfe0e7641c5181341cb064 added a call to a function that performs this combine without checking whether the target supports FPCVT. This caused asserts to trip on BE bots as the default target does not have this feature.	2023-05-24 11:14:26 -05:00
Fangrui Song	e018cbf720	[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data There are two motivations. `-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data. Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me. --- Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR. Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much. As a compromise, we set "direct-access-external-data" only when it's different from the implied default value. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D150841	2023-05-23 09:49:57 -07:00
Qiu Chaofan	8064caf83f	[PowerPC] Simplify fp-to-int store optimization On PowerPC VSX targets, fp-to-int will be transformed into xscv with mfvsr. When the result is to be stored, mfvsr can be replaced by a direct store. This change simplifies the optimization by using existing fp-to-int code, which helps CSE and handling strictfp cases. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D141473	2023-05-23 16:40:54 +08:00
Kai Luo	330319557f	[PowerPC] Precommit test for D151055. NFC.	2023-05-22 12:14:22 +08:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
esmeyi	4054c68644	[XCOFF][DWARF] XCOFF64 should be able to select the dwarf format in intergrated-as mode. Summary: DWARF32 is not supported for XCOFF64 under non-integrated-as mode on AIX, because system assembler will fill the debug section lengths according to DWARF64 format. While in intergrated-as mode, XCOFF64 should be able to select the DWARF format. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D150181	2023-05-16 03:02:00 -04:00
Florian Hahn	5d57a9fd2b	[PowerPC] Adjust tests after e351b9b66da088. Those tests were missed when landing e351b9b66da088.	2023-05-12 20:20:13 +01:00

1 2 3 4 5 ...

3653 Commits