llvm-project

Author	SHA1	Message	Date
Qiu Chaofan	a4558a4a53	[PowerPC] Implement 32-bit expansion for rldimi (#86783 ) rldimi is 64-bit instruction, due to backward compatibility, it needs to be expanded into series of rotate and masking in 32-bit environment. In the future, we may improve bit permutation selector and remove such direct codegen.	2024-04-09 16:43:49 +08:00
Qiu Chaofan	71eda17a06	[Legalizer] Soften EXTRACT_ELEMENT on ppcf128 (#77412 ) ppc_fp128 values are always split into two f64. Implement soften operation in soft-float mode to handle output f64 correctly.	2024-04-09 10:26:24 +08:00
Chen Zheng	29c7d1a60c	[PPC] [NFC] add testcase for more store forwarding	2024-04-03 04:46:29 -04:00
Ryotaro KASUGA	ea4a11926b	Reapply "[CodeGen] Fix register pressure computation in MachinePipeli… (#87312 ) …ner (#87030)" Fix broken test. This reverts commit b8ead2198f27924f91b90b6c104c1234ccc8972e.	2024-04-03 09:28:09 +09:00
Gulfem Savrun Yeniceri	b8ead2198f	Revert "[CodeGen] Fix register pressure computation in MachinePipeliner (#87030 )" This reverts commit a4dec9d6bc67c4d8fbd4a4f54ffaa0399def9627 because the test failed in the following builder: https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751864477467126481/overview	2024-04-01 18:27:41 +00:00
Ryotaro KASUGA	a4dec9d6bc	[CodeGen] Fix register pressure computation in MachinePipeliner (#87030 ) `RegisterClassInfo::getRegPressureSetLimit` has been changed to return a smaller value than before so the limit may become negative in later calculations. As a workaround, change to use `TargetRegisterInfo::getRegPressureSetLimit`. Also improve tests.	2024-04-01 17:04:44 +09:00
Craig Topper	23d45e55ed	[MCP] Remove dead copies from basic blocks with successors. (#86973 ) Previously we wouldn't remove dead copies from basic blocks with successors. The comment said we didn't want to trust the live-in lists. The comment is very old so I'm not sure if that's still a concern today. This patch checks the live-in lists and removes copies from MaybeDeadCopies if they are referenced by any live-ins in any successors. We only do this if the tracksLiveness property is set. If that property is not set, we retain the old behavior.	2024-03-28 14:43:49 -07:00
Zaara Syeda	6582509daa	[AIX] Handle toc-data offset overflowing 16-bits (#80092 ) When the toc-data offset overflows the 16-bits, we can truncate the value to the 16-bit value as the linker will handle overflow through fixup code.	2024-03-28 13:55:13 -04:00
Amy Kwan	a3efc53f16	[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053 ) Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.	2024-03-28 09:18:45 -04:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
Felix (Ting Wang)	90a7fc366a	[PowerPC][NFC] Add base test case for small-local-dynamic-tls on AIX (#84711 )	2024-03-24 08:46:45 +08:00
Chen Zheng	90454a6098	[PowerPC][AIX] support explicit sections for -ffunction-sections (#85351 ) Fix crashes in https://godbolt.org/z/6voEa1o6Y	2024-03-22 13:23:36 +08:00
Qiu Chaofan	e5b20c83e5	[PowerPC] Update chain uses when emitting lxsizx (#84892 )	2024-03-18 22:31:05 +08:00
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
Qiu Chaofan	65ae09eeb6	[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040 ) rldimi is 64-bit instruction, so the corresponding builtin should not be available in 32-bit mode. Rotate amount should be in range and cases when mask is zero needs special handling. This change also swaps the first and second operands of rldimi/rlwimi to match previous behavior. For masks not ending at bit 63-SH, rotation will be inserted before rldimi.	2024-03-18 14:17:16 +08:00
Sean Fertile	2d80505401	[AIX] Support per global code model. (#79202 ) Exploit the per global code model attribute on AIX. On AIX we need to update both the code sequence used to access the global (either 1 or 2 instructions for small and large code model respectively) and the storage mapping class that we emit the toc entry. --------- Co-authored-by: Amy Kwan <akwan0907@gmail.com>	2024-03-15 12:52:04 -04:00
Kevin P. Neal	ea628f087e	[FPEnv][PowerPC] Correct strictfp test. Correct llvm-reduce strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to function definitions. Test changes verified with D146845.	2024-03-15 12:08:09 -04:00
Zaara Syeda	cc761a7c35	[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099 ) In preparation of adding a similar instruction for large code model on AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8 to match the naming convention of other instructions with 32-bit and 64-bit variants.	2024-03-13 11:57:07 -04:00
Zaara Syeda	37b5eb0a0a	[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999 ) This patch enables support that the XL compiler had for AIX under -qdatalocal/-qdataimported.	2024-03-13 10:26:31 -04:00
Chen Zheng	cc34e56b86	[PPC][NFC] add an option to expose the bug in 74951	2024-03-07 20:52:44 -05:00
Chen Zheng	e7a22e72de	[PPC] precommit cases for issue 74915	2024-03-07 20:22:26 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Felix (Ting Wang)	ed6275868b	[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764 ) Update test case changed by #66316	2024-03-05 14:07:47 +08:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
George Koehler	6b70c5d79f	[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098 ) Delete the code that skips the CFI for the condition register on ELF32. The code checked !MustSaveCR, which happened only when Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling cr in a different way. The spill was missing CFI. After deleting this code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14 to r31. Fixes #83094	2024-03-02 22:18:24 -05:00
Felix (Ting Wang)	5b05870953	[PowerPC] Support local-dynamic TLS relocation on AIX (#66316 ) Supports TLS local-dynamic on AIX, generates below sequence of code: ``` .tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier .tc mh[TC],mh[TC]@ml # Module handle for the caller lwz 3,mh[TC]$2$ $$ For 64-bit: ld 3,mh[TC]$2$ bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0 #r3 = &TLS for module lwz 4,foo[TC]$2$ $$ For 64-bit: ld 4,foo[TC]$2$ add 5,3,4 # Compute &foo .rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML" ``` --------- Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan> Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>	2024-03-01 08:09:40 +08:00
Kai Luo	d1924f0474	[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction (#72845 ) When expand `select_cc` in finalize-isel, we should not generate `isel` for targets not feature it.	2024-03-01 08:03:06 +08:00
Chen Zheng	3196005f6b	[NFC][PowerPC] use script to regenerate the CHECK lines	2024-02-29 04:49:37 -05:00
Jack Styles	28233408a2	[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770 ) When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour. This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.	2024-02-26 12:12:31 +00:00
Sameer Sahasrabuddhe	a2afcd5721	Revert "Implement convergence control in MIR using SelectionDAG (#71785 )" This reverts commit 79889734b940356ab3381423c93ae06f22e772c9. Encountered multiple buildbot failures.	2024-02-21 11:07:02 +05:30
Sameer Sahasrabuddhe	79889734b9	Implement convergence control in MIR using SelectionDAG (#71785 ) LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-02-21 10:06:37 +05:30
stephenpeckham	26db845536	[XCOFF] Support the subtype flag in DWARF section headers (#81667 ) The section headers for XCOFF files have a subtype flag for Dwarf sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that they recognize the subtype.	2024-02-20 08:42:12 -06:00
Jeffrey Byrnes	7180c23cf6	[SeparateConstOffsetFromGEP] Reland: Reorder trivial GEP chains to separate constants (#81671 ) Actually update tests w.r.t `9e5a77f252` and reland https://github.com/llvm/llvm-project/pull/73056	2024-02-13 17:10:23 -08:00
Philip Reames	99c5a66c62	Revert "[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 )" and follow ups "ninja check-llvm" is failing on tip of tree. This reverts commit ec0aa1646e9953d1a8d0d15dc381d3250c854572. This reverts commit 1b65742f8c71f576381fe85d5e34579b24f2d874.	2024-02-13 13:29:23 -08:00
Jeffrey Byrnes	1b65742f8c	[SeparateConstOffsetFromGEP] Reorder trivial GEP chains to separate constants (#73056 ) In this case, a trivial GEP chain has the form: ``` %ptr = getelementptr sameType, %base, constant %val = getelementptr sameType, %ptr, %variable ``` That is, a one-index GEP consumes another (of the same basis and result type) one-index GEP, where the inner GEP uses a constant index and the outer GEP uses a variable index. For chains of this type, it is trivial to reorder them (by simply swapping the indexes). The result of doing so is better AddrMode matching for users of the ultimate ptr produced by GEP chain. Future patches can extend this to support non-trivial GEP chains (e.g. those with different basis types and/or multiple indices).	2024-02-13 11:22:49 -08:00
stephenpeckham	90e8dc0f7c	Fix failing testcases (#80902 )	2024-02-06 15:35:21 -05:00
stephenpeckham	b1acb7a315	[XCOFF] Add compiler version to an auxiliary symbol table entry (#80162 ) C_FILE symbols. To match the behavior of the assembler and the legacy compiler, this includes using the generic ".file" name for the C_FILE symbol and generating the actual file name in an auxiliary entry.	2024-02-06 09:08:18 -06:00
Qiu Chaofan	292d9e869f	[PowerPC] Mask constant operands in ValueBit tracking (#67653 ) In IR or C code, shift amount larger than value size is undefined behavior. But in practice, backend lowering for shift_parts produces add/sub of shift amounts, thus constant shift amounts might be negative or larger than value size, which depends on ISA definition. PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction) will be taken, and if the highest among them is 1, result will be zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift amount. This commit emulates the behavior and avoids array overflow in bit permutation's value bits calculator.	2024-02-06 18:37:31 +08:00
Nikita Popov	ff9af4c43a	[CodeGen] Convert tests to opaque pointers (NFC)	2024-02-05 14:07:09 +01:00
Amy Kwan	2a50921553	[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485 ) This patch utilizes the -maix-small-local-exec-tls option to produce a faster, non-TOC-based access sequence for the local-exec TLS model. Specifically, for when the offsets from the TLS variable are non-zero. In particular, this patch produces either a single: - addi/la with a displacement off of R13 plus a non-zero offset for when an address is calculated, or - load or store off of R13 plus a non-zero offset for when an address is calculated and used for further access where R13 is the thread pointer, respectively. In order to produce a single addi or load/store off of the thread pointer with a non-zero offset, this patch also adds the necessary support in the assembly printer when printing these instructions. Specifically: - The non-zero offset is added to the TLS variable address when the address of the TLS variable + it's offset is less than 32KB. - Otherwise, when the address of the TLS variable + its offset is greater than 32KB, the non-zero offset (and a multiple of 64KB) is subtracted from the TLS address. This handling in the assembly printer is necessary to ensure that the TLS address + the non-zero offset is between [-32768, 32768), so that the total displacement can fit within the addi/load/store instructions. This patch is meant to be a follow-up to 3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the optimization occurs for when the offset is zero).	2024-02-01 09:29:21 -05:00
Quentin Dian	112fba974c	[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147 ) Per #80143, we can remove the extra line break when there is no instruction.	2024-02-01 22:10:52 +08:00
Zaara Syeda	a03a6e9964	[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530 ) This patch adds support for common and local symbols in the TOC for AIX. Note that we need to update isVirtualSection so as a common symbol in TOC will have the symbol type XTY_CM and will be initialized when placed in the TOC so sections with this type are no longer virtual. --------- Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>	2024-01-31 16:34:21 -05:00
Shimin Cui	1bab570e9b	Move the PowerPC/PPCMergeStringPool work to initializer (#77352 ) Currently, the `PPCMergeStringPool` merges the global variable after the `AsmPrinter` initializer adds the global variables to its symbol list. This is to move the merging work of `PPCMergeStringPool` to its initializer, just like what GlobalMerge does, to avoid adding merged global variables to the `AsmPrinter` symbol lis.	2024-01-31 10:27:07 -05:00
Amy Kwan	d5fe1bd081	[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252 ) This patch disallows the use of the -maix-small-local-exec-tls and -fno-data-sections options within clang, and also disallows the use of the aix-small-local-exec-tls attribute with the -data-sections=false option in llc. This is because having data sections off when using the aix-small-local-exec-tls feature is not ideal for performance. As the small-local-exec-tls region is a limited resource, this space should not used for variables that may be replaced. Note, that on AIX, data sections is turned on by default, so this patch makes it so that a diagnostic is emitted when users explicitly turn off data sections while using the aix-small-local-exec-tls feature.	2024-01-26 12:39:25 -05:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Krzysztof Drewniak	63fe80fb18	[SeperateConstOffsetFromGEP] Handle `or disjoint` flags (#76997 ) This commit extends separate-const-offset-from-gep to look at the newly-added `disjoint` flag on `or` instructions so as to preserve additional opportunities for optimization. The tests were pre-committed in #76972.	2024-01-26 09:56:06 -06:00
Shimin Cui	e278c67096	Add support to meger strings used by metadata (#77364 ) Currently if the merged string is used by metadata, its metadata uses are not replaced if the string is merged. This is to add code support for the metadata use replacement.	2024-01-26 09:22:37 -05:00

1 2 3 4 5 ...

3822 Commits