llvm-project

Author	SHA1	Message	Date
stephenpeckham	1d1fede493	[XCOFF] Ensure .file is emitted before any .info pseudo-ops (#71577 ) When generating the assembly code for AIX/XCOFF, the .file pseudo-op needs to be emitted first, before any csects are generated. Otherwise, information such as the embedded command line will be associated with part of the object file rather than the entire object file.	2023-11-09 16:03:45 -06:00
Juergen Ributzka	6d1d7be133	Obsolete WebKit Calling Convention (#71567 ) The WebKit Calling Convention was created specifically for the WebKit FTL. FTL doesn't use LLVM anymore and therefore this calling convention is obsolete. This commit removes the WebKit CC, its associated tests, and documentation.	2023-11-09 09:08:41 -08:00
Qiu Chaofan	5f295552f1	[PowerPC] Fix incorrect symbol name of frexp libcall (#71626 ) frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.	2023-11-08 14:41:19 +08:00
Qiu Chaofan	d199fd76f7	[NFC] Add f128 frexp intrinsics for PowerPC	2023-11-08 11:27:40 +08:00
Nikita Popov	e4a4122eb6	[IR] Remove zext and sext constant expressions (#71040 ) Remove support for zext and sext constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. There is some additional cleanup that can be done on top of this, e.g. we can remove the ZExtInst vs ZExtOperator footgun. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-03 10:46:07 +01:00
Nikita Popov	060de415af	Reapply [InstCombine] Simplify and/or of icmp eq with op replacement (#70335 ) Relative to the first attempt, this contains two changes: First, we only handle the case where one side simplifies to true or false, instead of calling simplification recursively. The previous approach would return poison if one operand simplified to poison (under the equality assumption), which is incorrect. Second, we do not fold llvm.is.constant in simplifyWithOpReplaced(). We may be assuming that a value is constant, if the equality holds, but it may not actually be constant. This is nominally just a QoI issue, but the std::list implementation in libstdc++ relies on the precise behavior in a way that causes miscompiles. ----- and/or in logical (select) form benefit from generic simplifications via simplifyWithOpReplaced(). However, the corresponding fold for plain and/or currently does not exist. Similar to selects, there are two general cases for this fold (illustrated with `and`, but there are `or` conjugates). The basic case is something like `(a == b) & c`, where the replacement of a with b or b with a inside c allows it to fold to true or false. Then the whole operation will fold to either false or `a == b`. The second case is something like `(a != b) & c`, where the replacement inside c allows it to fold to false. In that case, the operand can be replaced with c, because in the case where a == b (and thus the icmp is false), c itself will already be false. As the test diffs show, this catches quite a lot of patterns in existing test coverage. This also obsoletes quite a few existing special-case and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst), but I haven't removed anything as part of this patch in the interest of risk mitigation. Fixes #69050. Fixes #69091.	2023-11-03 10:16:15 +01:00
Kai Luo	7b5505b0d5	[PowerPC] Change registers used in test due to ABI breakage. NFC. (#70758 ) Usage of `r30` and `r31` has broken current traceback table's convention on AIX. Avoid using CSRs in livein list.	2023-11-03 07:08:33 +08:00
Qiu Chaofan	b46e768455	[DAGCombine] Fold setcc_eq infinity into is.fpclass (#67829 )	2023-11-01 11:51:15 +09:00
Nikita Popov	e46dd6fbc0	Revert "[InstCombine] Simplify and/or of icmp eq with op replacement (#70335 )" This reverts commit 1770a2e325192f1665018e21200596da1904a330. Stage 2 llvm-tblgen crashes when generating X86GenAsmWriter.inc and other files.	2023-10-30 18:33:03 +01:00
Nikita Popov	1770a2e325	[InstCombine] Simplify and/or of icmp eq with op replacement (#70335 ) and/or in logical (select) form benefit from generic simplifications via simplifyWithOpReplaced(). However, the corresponding fold for plain and/or currently does not exist. Similar to selects, there are two general cases for this fold (illustrated with `and`, but there are `or` conjugates). The basic case is something like `(a == b) & c`, where the replacement of a with b or b with a inside c allows it to fold to true or false. Then the whole operation will fold to either false or `a == b`. The second case is something like `(a != b) & c`, where the replacement inside c allows it to fold to false. In that case, the operand can be replaced with c, because in the case where a == b (and thus the icmp is false), c itself will already be false. As the test diffs show, this catches quite a lot of patterns in existing test coverage. This also obsoletes quite a few existing special-case and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst), but I haven't removed anything as part of this patch in the interest of risk mitigation. Fixes #69050. Fixes #69091.	2023-10-30 10:05:39 +01:00
Simon Pilgrim	c9c9bf0f20	[DAG] WidenVectorOperand - add basic handling for *_EXTEND_VECTOR_INREG nodes Fixes Issue #70208	2023-10-25 16:52:15 +01:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
Ramkumar Ramachandra	98c90a13c6	ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924 ) The issue #55208 noticed that std::rint is vectorized by the SLPVectorizer, but a very similar function, std::lrint, is not. std::lrint corresponds to ISD::LRINT in the SelectionDAG, and std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now, neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant, and the LangRef makes this clear in the documentation of llvm.lrint.* and llvm.llrint.. This patch extends the LangRef to include vector variants of llvm.lrint. and llvm.llrint.*, and lays the necessary ground-work of scalarizing it for all targets. However, this patch would be devoid of motivation unless we show the utility of these new vector variants. Hence, the RISCV target has been chosen to implement a custom lowering to the vfcvt.x.f.v instruction. The patch also includes a CostModel for RISCV, and a trivial follow-up can potentially enable the SLPVectorizer to vectorize std::lrint and std::llrint, fixing #55208. The patch includes tests, obviously for the RISCV target, but also for the X86, AArch64, and PowerPC targets to justify the addition of the vector variants to the LangRef.	2023-10-19 13:05:04 +01:00
Kai Luo	b42738805a	[PowerPC] Auto gen test checks for #69299 . NFC.	2023-10-18 02:21:22 +00:00
Kai Luo	3104681686	[PowerPC][Atomics] Remove redundant block to clear reservation (#68430 ) This PR is following what https://reviews.llvm.org/D134783 does for quardword CAS.	2023-10-13 10:59:27 +08:00
Nikita Popov	127ed9ae26	[PowerPC] Use zext instead of anyext in custom and combine (#68784 ) This custom combine currently converts `and(anyext(x),c)` into `anyext(and(x,c))`. This is not correct, because the original expression guaranteed that the high bits are zero, while the new one sets them to undef. Emit `zext(and(x,c))` instead. Fixes https://github.com/llvm/llvm-project/issues/68783.	2023-10-12 09:32:17 +02:00
Nikita Popov	0ead1faef0	[PowerPC] Add test for #68783 (NFC)	2023-10-11 12:15:26 +02:00
Jay Foad	7b3bbd83c0	Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038 )" This reverts commit 2501ae58e3bb9a70d279a56d7b3a0ed70a8a852c. Reverted due to various buildbot failures.	2023-10-09 12:31:32 +01:00
Jay Foad	2501ae58e3	[CodeGen] Really renumber slot indexes before register allocation (#67038 ) PR #66334 tried to renumber slot indexes before register allocation, but the numbering was still affected by list entries for instructions which had been erased. Fix this to make the register allocator's live range length heuristics even less dependent on the history of how instructions have been added to and removed from SlotIndexes's maps.	2023-10-09 11:44:41 +01:00
Lei	529ad40e05	[PowerPC] Fix missing kill flag update for XVCVDPSP transformations (#67997 ) Add transformed register to kill flag work list for XVCVDPSP tranformations. Ref: reviews.llvm.org/D133103	2023-10-06 10:24:54 -04:00
Kishan Parmar	696ea67f19	Disable call to fma for soft-float PowerPC backend generate calls to libc function calls for soft-float, regardless of the -nostdlib /-ffreestanding flag. fma is not a function provided by compiler-rt builtins and thus should not be generated here. PR : [[ https://github.com/llvm/llvm-project/issues/55230 \| #55230 ]] Below is patch given by @nemanjai Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D156344	2023-09-28 14:06:54 +05:30
Qiu Chaofan	cc627828f5	Pre-commit some PowerPC test cases	2023-09-28 15:51:14 +08:00
Wael Yehia	da55b1b52f	[XCOFF] Do not generate the special .ref for zero-length sections (#66805 ) Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>	2023-09-28 01:33:41 -04:00
esmeyi	d7195c57d8	Reland https://reviews.llvm.org/D159073 . The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed. ``` [PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel. Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input. Reviewed By: qiucf, shchenz Differential Revision: https://reviews.llvm.org/D159073 ```	2023-09-26 06:24:47 -04:00
Kai Luo	5fabc8ba22	[PowerPC] Add test to show wrong target flags printed at MO_TLSGDM_FLAG operand. NFC.	2023-09-26 05:13:26 +00:00
esmeyi	77147a95b8	Revert "[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel." This reverts commit 2de74e1bd4d540063d7495fa6254781abd41e179. A test-suite failure occurs due to this commit, will fix soon.	2023-09-25 23:31:34 -04:00
esmeyi	2de74e1bd4	[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel. Summary: rldicl/rldicr can be eliminated if it's used to clear the high-order or low-order n bits and all bits cleared will be ANDed with 0 by andi/andis. Or they can be folded to `andi 0` if all bits to AND are already zero in the input. Reviewed By: qiucf, shchenz Differential Revision: https://reviews.llvm.org/D159073	2023-09-25 23:11:34 -04:00
Matthias Braun	740ee00a4c	PPCBranchCoalescing: Fix invalid branch weights (#67211 ) Re-normalize branch-weights after removing a block successor to avoid branch-weights not adding up to 100%. This changes MIR for the `test/CodeGen/PowerPC/branch_coalesce.ll` test like this: ```diff - successors: %bb.6(0x40000000); %bb.6(50.00%) + successors: %bb.6(0x80000000); %bb.6(100.00%) ``` This doesn't affect codegen on its own but fixing this helps with fluctuations I have with some of my upcoming changes.	2023-09-25 10:41:04 -07:00
Nemanja Ivanovic	46d5d264fc	[PowerPC] Improve kill flag computation and add verification after MI peephole The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases. This patch does the following: - Removes the function to re-compute kill flags - Adds LiveVariables to compute and maintain kill flags while transforming code - Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D133103	2023-09-22 15:26:39 -04:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Craig Topper	f71a9e8bb7	[SelectionDAG][RISCV][PowerPC][X86] Use TargetConstant for immediates for ISD::PREFETCH. (#66601 ) The intrinsic uses ImmArg so TargetConstant would be consistent with how other intrinsics are handled. This hides the constants from type legalization so we can remove the promotion support. isel patterns are updated accordingly.	2023-09-18 08:58:50 -07:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Maryam Moghadas	7b021f2e64	[PowerPC] Optimize VPERM and fix code order for swapping vector operands on LE This patch reverts commit 7614ba0a5db8 to optimize VPERM when one of its vector operands is XXSWAPD, similar to XXPERM. It also reorganizes the little-endian swap code on LE, swapping the vector operand after adjusting the mask operand. This ensures that the vector operand is swapped at the correct point in the code, resulting in a valid constant pool for the mask operand. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D149083	2023-09-13 15:00:49 -05:00
Simon Pilgrim	e6b85c3027	[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED) Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future. Reapplied after reversion at e1e3c75c7dad72 with a tweak to the pseudo-probe-peep.ll test Differential Revision: https://reviews.llvm.org/D158068	2023-09-13 12:33:39 +01:00
Simon Pilgrim	e1e3c75c7d	Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case" Need to address a missed test change	2023-09-13 11:27:47 +01:00
Simon Pilgrim	6c56cf71ee	[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future. Differential Revision: https://reviews.llvm.org/D158068	2023-09-13 11:01:58 +01:00
Qiu Chaofan	69b056d563	[PowerPC] Implement SchedModel for Power7 Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D158704	2023-09-13 14:55:07 +08:00
Qiu Chaofan	d4d0b5eaab	Fix MIR failure after b922a362	2023-09-08 16:33:45 +08:00
Qiu Chaofan	b922a36211	[PowerPC] Define SchedModel for Power8 PowerPC subtargets prior to Power9 use the 'legacy' itinerary way to provide scheduling information. This patch re-writes the tablegen file to define the scheduling information in the new SchedModel way, which can bring improvements to some benchmarks. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D154488	2023-09-08 15:43:21 +08:00
bzEq	d9efcb54c9	[PEI][PowerPC] Fix false alarm of stack size limit (#65559 ) PPC64 allows stack size up to ((2^63)-1) bytes. Currently llc reports ``` warning: stack frame size (4294967568) exceeds limit (4294967295) in function 'main' ``` if the stack allocated is larger than 4G.	2023-09-08 15:16:00 +08:00
Amy Kwan	3f46e5453d	[AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0) This patch utilizes the -maix-small-local-exec-tls option added in D155544 to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided. The patch either produces an addi/la with a displacement off of r13 (the thread pointer) when the address is calculated, or it produces an addi/la followed by a load/store when the address is calculated and used for further accesses. This patch also optimizes this sequence a bit more where we can remove the addi/la when the load/store offset is 0. A follow up patch will be posted to account for when the load/store offset is non-zero, and currently in these situations we keep the addi/la that precedes the load/store. Furthermore, this access sequence is only performed for TLS variables that are less than ~32KB in size. Differential Revision: https://reviews.llvm.org/D155600	2023-09-07 20:05:29 -05:00
Amy Kwan	8bdbee8aaa	[AIX][TLS] Add target attribute for -maix-small-local-exec-tls option. This patch adds a target attribute for an AIX-specific option that informs the compiler that it can use a faster access sequence for the local-exec TLS model (formally named aix-small-local-exec-tls). The Clang portion of this option is in D155544. The initial implementation to generate the faster access sequence is in D155600. Differential Revision: https://reviews.llvm.org/D156203	2023-09-07 20:05:29 -05:00
stefanp-ibm	0a4a8bec34	[PowerPC] Turn string pooling on by default. (#65628 ) This patch turns the string pooling pass on by default. Some tests are updated as required.	2023-09-07 16:49:31 -04:00
Wael Yehia	11d5c7bd28	[AIX] Add threadId and use nanosecond timestamp in sinit/sterm symbols With ThinLTO, when compiling SPEC 2017 omnetpp_r with -threads=4, two small modules can end up with the same timestamp in their sinit symbols when calculating time in seconds, creating duplicate definitions. This patch uses a timestamp in nanoseconds. Because the race can be between threads, embed the thread ID as well. Reviewed By: xingxue, daltenty Differential Revision: https://reviews.llvm.org/D159319	2023-09-07 17:46:41 +00:00
Amy Kwan	f94f85348d	Revert "[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses." This reverts commit f0b2f6954101c9052763a99a1e7ac135770e779a. The implementation is incorrect and breaks compiling local-exec programs.	2023-09-07 12:10:37 -05:00
esmeyi	b85a9b3093	[PowerPC] Try to use less instructions to materialize 64-bit constant when High32=Low32. Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D158495	2023-09-07 13:03:17 -04:00
Stefan Pintilie	84e2fd7ee4	[PowerPC] Add a pass to merge all of the constant global arrays into one pool. On PowerPC the number of TOC entries must be kept low for large applications. In order to reduce the number of constant global arrays we can pool them into one structure and then access them as the base address of that structure plus some offset. The constant global arrays may be arrays of `i8` which are constant strings but they may also be arrays of `i32, i64, etc...`. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D155730	2023-09-07 11:14:56 -04:00
Stefan Pintilie	492c1f3d7c	[PowerPC] Merge rotate and clear into single instruction. This patch tries to catch a codegen opportunity where the rotate and mask can be merged into a single RLDCL instruction. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D158328	2023-09-07 09:25:41 -04:00
Ting Wang	71be020dda	[SelectionDAG][PowerPC] Memset reuse vector element for tail store On PPC there are instructions to store element from vector(e.g. stxsdx/stxsiwx), and these instructions can be leveraged to avoid tail constant in memset and constant splat array initialization. This patch tries to explore these opportunities. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D138883	2023-09-06 01:52:38 -04:00
Amy Kwan	f0b2f69541	[AIX][TLS] Generate .extern and .ref references to __tls_get_addr for local-exec accesses. Compiling with TLS variables requires -pthread, but if the user omits this option, the compiler will not show any obvious indication during compilation that -pthread is needed for programs using TLS variables. Instead, the user will experience a segmentation fault when running programs with TLS variables in them and without specifying -pthread. This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for local-exec accesses, in order to trigger an error from the linker to indicate that there is an undefined symbol to __tls_get_addr. Doing so will remind the user to compile/link with -pthread. Differential Revision: https://reviews.llvm.org/D151335	2023-09-05 12:15:14 -05:00

1 2 3 4 5 ...

3729 Commits