llvm-project

Author	SHA1	Message	Date
Jake Egan	d9db266499	[PowerPC][test] Catch any exception when retrieving git revision (#92004 ) This makes the `vc-rev-enabled` feature unsupported if we fail to retrieve the git revision for any reason, such as if git is not installed.	2024-05-14 10:32:30 -04:00
Simon Pilgrim	31fb0ae23d	[PowerPC] Regenerate and_sext.ll with test checks I've kept the grep checks for extsh/extsb instructions, but we can now see the actual codegen as well	2024-05-14 11:58:48 +01:00
Chen Zheng	662267daea	[PPC] add testcase, nfc	2024-05-13 01:49:00 -04:00
Matt Arsenault	6a8d30b1c1	DAG: Skip 0 sign handling in minimum/maximum lowering for _ieee case (#91326 ) dc9664a8adae17f2083fbcc8e96cfce606c56d57 changed the documentation to assume these order -0 as less than +0.	2024-05-09 14:41:13 +02:00
Nikita Popov	3a3aeb8eba	[PPCMergeStringPool] Avoid replacing constant with instruction (#88846 ) String pool merging currently, for a reason that's not entirely clear to me, tries to create GEP instructions instead of GEP constant expressions when replacing constant references. It only uses constant expressions in cases where this is required. However, it does not catch all cases where such a requirement exists. For example, the landingpad catch clause has to be a constant. Fix this by always using the constant expression variant, which also makes the implementation simpler. Additionally, there are some edge cases where even replacement with a constant GEP is not legal. The one I am aware of is the llvm.eh.typeid.for intrinsic, so add a special case to forbid replacements for it. Fixes https://github.com/llvm/llvm-project/issues/88844.	2024-05-09 13:27:20 +09:00
Felix (Ting Wang)	ea126aebdc	[PowerPC] Tune AIX shared library TLS model at function level (#84132 ) Under some circumstance (library loaded with the main program), TLS initial-exec model can be applied to local-dynamic access(es). We could use some simple heuristic to decide the update at function level: * If there is equal or less than a number of TLS local-dynamic access(es) in the function, use TLS initial-exec model. (the threshold which default to 1 is controlled by hidden option)	2024-05-09 09:50:36 +08:00
Felix (Ting Wang)	19220110ac	[PowerPC][AIX] Refactor existing logic to handle non-zero offsets for aix-small-local-dynamic-tls (#89182 ) To enable optimized small local-dynamic access sequence for non-zero offsets, this patch refactors existing 2a50921553798d2db52ca6330c89f0f8a5bc2215.	2024-05-08 18:37:51 +08:00
Maryam Moghadas	9a28814f59	[PowerPC] Spill non-volatile registers required for traceback table (#71115 ) On AIX we need to spill all [rfv]N-[rfv]31 when a function clobbers [rfv]N so that the traceback table contains accurate information.	2024-05-07 16:23:37 -04:00
Jake Egan	8cde1cfc60	[AIX] Add git revision to .file string (#88164 ) If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file` string. The revision can be set with `LLVM_FORCE_VC_REVISION`. Before: `.file "git_revision.cpp",,"LLVM version 19.0.0git"` After: `.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`	2024-04-30 20:37:35 -04:00
zhijian lin	70ada5b178	NFC add a new precommit test case for PPCMIpeephole (#90656 ) Add pre-commit MIR test for PR "[Promote Pseudo Opcode from 32-bit to 64-bit after eliminating the extsw instruction in PPCMIPeepholes optimization](https://github.com/llvm/llvm-project/pull/85451)" which fixes bug reported in the issue "[Inconsistent Output at -O1 and -O2 Optimization Levels on PowerPC64 Due to Complex Type Casting and Nested Loop Structure](https://github.com/llvm/llvm-project/issues/71030)".	2024-04-30 16:27:34 -04:00
Qiu Chaofan	4a8f2f2e1a	[Legalizer] Expand fmaximum and fminimum (#67301 ) According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.	2024-04-29 15:09:54 +08:00
Chen Zheng	0a0f1f9f1d	[PPC]add DEBUG_COUNTER for PPCMIPeephole pass	2024-04-28 05:38:40 -04:00
Chen Zheng	1d77eb49a4	Revert "[PPC] [NFC] add testcase for more store forwarding" This reverts commit 29c7d1a60c9d45e82f08cd7487178846ed5f9c6d. The store forwarding patch https://github.com/llvm/llvm-project/pull/87465 is closed.	2024-04-24 20:54:45 -04:00
Matthias Braun	6bbccd2516	GlobalsModRef, ValueTracking: Look through threadlocal.address intrinsic (#88418 ) This improves handling of `threadlocal.address` intrinsic in analyses: The thread-id cannot change within a function with the exception of suspend points of pre-split coroutines. This changes `llvm::getUnderlyingObject` to look through `threadlocal.address` in these cases. `GlobalsAAResult::AnalyzeUsesOfPointer` checks whether an address can be traced to simple loads/stores or escapes to other places. Starting the analysis from a thread-local `GlobalValue` the `threadlocal.address` intrinsic is safe to skip here. This improves issue #87437	2024-04-19 10:01:42 -07:00
Zaara Syeda	76ad289748	[PowerPC] 32-bit large code-model support for toc-data (#85129 ) This patch adds the pseudo op ADDItocL for 32-bit large code-model support for toc-data.	2024-04-17 09:24:53 -04:00
Björn Pettersson	33e6b488be	[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646 ) The load narrowing part of TargetLowering::SimplifySetCC is updated according to this: 1) The offset calculation (for big endian) did not work properly for non byte-sized types. This is basically solved by an early exit if the memory type isn't byte-sized. But the code is also corrected to use the store size when calculating the offset. 2) To still allow some optimizations for non-byte-sized types the TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is added. By default it assumes that scalar integer types are padded starting at the most significant bits, if the type needs padding when being stored to memory. 3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is true, as that hook makes it possible for TargetLowering to know how the non byte-sized value is aligned in memory. 4) Update the algorithm to always search for a narrowed load with a power-of-2 byte-sized type. In the past the algorithm started with the the width of the original load, and then divided it by two for each iteration. But for a type such as i48 that would just end up trying to narrow the load into a i24 or i12 load, and then we would fail sooner or later due to not finding a newVT that fulfilled newVT.isRound(). With this new approach we can narrow the i48 load into either an i8, i16 or i32 load. By checking if such a load is allowed (e.g. alignment wise) for any "multiple of 8 offset", then we can find more opportunities for the optimization to trigger. So even for a byte-sized type such as i32 we may now end up narrowing the load into loading the 16 bits starting at offset 8 (if that is allowed by the target). The old algorithm did not even consider that case. 5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset when creating the new ptr. This way we get "nsw" on the add.	2024-04-12 16:18:12 +02:00
Bjorn Pettersson	bcf047a4ed	[ARM][PowerPC] Add regression tests for narrowing load in TargetLowering::SimplifySetCC These test cases show some miscomplies for big-endian when dealing with non byte-sized loads. One part of the problem is that LLVM IR isn't really telling where the padding goes for non byte-sized loads/stores. So currently TargetLowering::SimplifySetCC can't assume anything about it. But the implementation also do not consider that the TypeStoreSize could be larger than the TypeSize, resulting in the offset calculation being wrong for big-endian. Pre-commit for https://github.com/llvm/llvm-project/pull/87646	2024-04-12 16:14:39 +02:00
Felix (Ting Wang)	09d51a841d	[PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute (#86641 ) Following the aix-small-local-exec-tls target attribute, this patch adds a target attribute for an AIX-specific option in llc that informs the compiler that it can use a faster access sequence for the local-dynamic TLS model (formally named aix-small-local-dynamic-tls) when TLS variables are less than ~32KB in size. The patch either produces an addi/la with a displacement off of module handle (return value from .__tls_get_mod) when the address is calculated, or it produces an addi/la followed by a load/store when the address is calculated and used for further accesses. --------- Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>	2024-04-12 08:18:01 +08:00
Chen Zheng	053750c3b4	[PowerPC] Fix the undef register for VECINSERT If the V2 of the vector_shuffle is undef, the two vector inputs are expected to be the same when do the VECINSERT transformation. For now the first operand of VECINSERT is set to undef which is not right. This patch fixes this bug.	2024-04-11 04:01:07 -04:00
Chen Zheng	d7e0ea205f	[PowerPC] add testcase for a xxinsertw bug, NFC	2024-04-11 04:01:01 -04:00
Qiu Chaofan	a4558a4a53	[PowerPC] Implement 32-bit expansion for rldimi (#86783 ) rldimi is 64-bit instruction, due to backward compatibility, it needs to be expanded into series of rotate and masking in 32-bit environment. In the future, we may improve bit permutation selector and remove such direct codegen.	2024-04-09 16:43:49 +08:00
Qiu Chaofan	71eda17a06	[Legalizer] Soften EXTRACT_ELEMENT on ppcf128 (#77412 ) ppc_fp128 values are always split into two f64. Implement soften operation in soft-float mode to handle output f64 correctly.	2024-04-09 10:26:24 +08:00
Chen Zheng	29c7d1a60c	[PPC] [NFC] add testcase for more store forwarding	2024-04-03 04:46:29 -04:00
Ryotaro KASUGA	ea4a11926b	Reapply "[CodeGen] Fix register pressure computation in MachinePipeli… (#87312 ) …ner (#87030)" Fix broken test. This reverts commit b8ead2198f27924f91b90b6c104c1234ccc8972e.	2024-04-03 09:28:09 +09:00
Gulfem Savrun Yeniceri	b8ead2198f	Revert "[CodeGen] Fix register pressure computation in MachinePipeliner (#87030 )" This reverts commit a4dec9d6bc67c4d8fbd4a4f54ffaa0399def9627 because the test failed in the following builder: https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751864477467126481/overview	2024-04-01 18:27:41 +00:00
Ryotaro KASUGA	a4dec9d6bc	[CodeGen] Fix register pressure computation in MachinePipeliner (#87030 ) `RegisterClassInfo::getRegPressureSetLimit` has been changed to return a smaller value than before so the limit may become negative in later calculations. As a workaround, change to use `TargetRegisterInfo::getRegPressureSetLimit`. Also improve tests.	2024-04-01 17:04:44 +09:00
Craig Topper	23d45e55ed	[MCP] Remove dead copies from basic blocks with successors. (#86973 ) Previously we wouldn't remove dead copies from basic blocks with successors. The comment said we didn't want to trust the live-in lists. The comment is very old so I'm not sure if that's still a concern today. This patch checks the live-in lists and removes copies from MaybeDeadCopies if they are referenced by any live-ins in any successors. We only do this if the tracksLiveness property is set. If that property is not set, we retain the old behavior.	2024-03-28 14:43:49 -07:00
Zaara Syeda	6582509daa	[AIX] Handle toc-data offset overflowing 16-bits (#80092 ) When the toc-data offset overflows the 16-bits, we can truncate the value to the 16-bit value as the linker will handle overflow through fixup code.	2024-03-28 13:55:13 -04:00
Amy Kwan	a3efc53f16	[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053 ) Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.	2024-03-28 09:18:45 -04:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
Felix (Ting Wang)	90a7fc366a	[PowerPC][NFC] Add base test case for small-local-dynamic-tls on AIX (#84711 )	2024-03-24 08:46:45 +08:00
Chen Zheng	90454a6098	[PowerPC][AIX] support explicit sections for -ffunction-sections (#85351 ) Fix crashes in https://godbolt.org/z/6voEa1o6Y	2024-03-22 13:23:36 +08:00
Qiu Chaofan	e5b20c83e5	[PowerPC] Update chain uses when emitting lxsizx (#84892 )	2024-03-18 22:31:05 +08:00
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
Qiu Chaofan	65ae09eeb6	[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040 ) rldimi is 64-bit instruction, so the corresponding builtin should not be available in 32-bit mode. Rotate amount should be in range and cases when mask is zero needs special handling. This change also swaps the first and second operands of rldimi/rlwimi to match previous behavior. For masks not ending at bit 63-SH, rotation will be inserted before rldimi.	2024-03-18 14:17:16 +08:00
Sean Fertile	2d80505401	[AIX] Support per global code model. (#79202 ) Exploit the per global code model attribute on AIX. On AIX we need to update both the code sequence used to access the global (either 1 or 2 instructions for small and large code model respectively) and the storage mapping class that we emit the toc entry. --------- Co-authored-by: Amy Kwan <akwan0907@gmail.com>	2024-03-15 12:52:04 -04:00
Kevin P. Neal	ea628f087e	[FPEnv][PowerPC] Correct strictfp test. Correct llvm-reduce strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to function definitions. Test changes verified with D146845.	2024-03-15 12:08:09 -04:00
Zaara Syeda	cc761a7c35	[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099 ) In preparation of adding a similar instruction for large code model on AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8 to match the naming convention of other instructions with 32-bit and 64-bit variants.	2024-03-13 11:57:07 -04:00
Zaara Syeda	37b5eb0a0a	[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999 ) This patch enables support that the XL compiler had for AIX under -qdatalocal/-qdataimported.	2024-03-13 10:26:31 -04:00
Chen Zheng	cc34e56b86	[PPC][NFC] add an option to expose the bug in 74951	2024-03-07 20:52:44 -05:00
Chen Zheng	e7a22e72de	[PPC] precommit cases for issue 74915	2024-03-07 20:22:26 -05:00
Sameer Sahasrabuddhe	60822637bf	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-06 12:19:32 +05:30
Felix (Ting Wang)	ed6275868b	[PowerPC][NFC] Update aix-tls-xcoff-reloc.ll (#83764 ) Update test case changed by #66316	2024-03-05 14:07:47 +08:00
Mitch Phillips	f010b1bef4	Revert "Restore "Implement convergence control in MIR using SelectionDAG (#71785 )"" This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9. Reason: Broke the sanitizer buildbots. See the comments at https://github.com/llvm/llvm-project/pull/71785 for more information.	2024-03-04 17:05:34 +01:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
Sameer Sahasrabuddhe	c7fdd8c11e	Restore "Implement convergence control in MIR using SelectionDAG (#71785 )" Original commit 79889734b940356ab3381423c93ae06f22e772c9. Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e. LLVM function calls carry convergence control tokens as operand bundles, where the tokens themselves are produced by convergence control intrinsics. This patch implements convergence control tokens in MIR as follows: 1. Introduce target-independent ISD opcodes and MIR opcodes for convergence control intrinsics. 2. Model token values as untyped virtual registers in MIR. The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a corresponding machine opcode with the same spelling. This glues the convergence control token to SDNodes that represent calls to intrinsics. The glued token is later translated to an implicit argument in the MIR. The lowering of calls to user-defined functions is target-specific. On AMDGPU, the convergence control operand bundle at a non-intrinsic call is translated to an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment converts this explicit argument to an implicit argument on the SI_CALL instruction.	2024-03-04 13:28:04 +05:30
George Koehler	6b70c5d79f	[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098 ) Delete the code that skips the CFI for the condition register on ELF32. The code checked !MustSaveCR, which happened only when Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling cr in a different way. The spill was missing CFI. After deleting this code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14 to r31. Fixes #83094	2024-03-02 22:18:24 -05:00
Felix (Ting Wang)	5b05870953	[PowerPC] Support local-dynamic TLS relocation on AIX (#66316 ) Supports TLS local-dynamic on AIX, generates below sequence of code: ``` .tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier .tc mh[TC],mh[TC]@ml # Module handle for the caller lwz 3,mh[TC]$2$ $$ For 64-bit: ld 3,mh[TC]$2$ bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0 #r3 = &TLS for module lwz 4,foo[TC]$2$ $$ For 64-bit: ld 4,foo[TC]$2$ add 5,3,4 # Compute &foo .rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML" ``` --------- Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan> Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>	2024-03-01 08:09:40 +08:00
Kai Luo	d1924f0474	[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction (#72845 ) When expand `select_cc` in finalize-isel, we should not generate `isel` for targets not feature it.	2024-03-01 08:03:06 +08:00

1 2 3 4 5 ...

3842 Commits