llvm-project

Author	SHA1	Message	Date
Kamau Bridgeman	62c1cf7c63	[PowerPC][Future] Enable __builtin_mma_xxm[t\|f]acc Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract quad vectors from the new wide accumulator(wacc) register class. The introduction of these new instructions renders the p10 instructions xxmtacc and xxmfacc obsolete since the new wacc register class is a better choice for handing quad vector operations. This patch ensures that, for future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated by custom lowering the intrinsics for xxm[t\|f]acc to produce no instructions. Reviewed By: amyk, lei Differential Revision: https://reviews.llvm.org/D153034	2023-07-14 13:38:40 -05:00
Nemanja Ivanovic	b0e249d5e2	Reland "[PowerPC] Remove extend between shift and and" The commit originally caused a bootstrap failure on the big endian PPC bot as the combine was interfering with the legalizer when applied on illegal types. This update restricts the combine to the only types for which it is actually needed. Tested on PPC BE bootstrap locally.	2023-07-07 14:45:05 -04:00
Nemanja Ivanovic	7cd9084c69	Revert "[PowerPC] Remove extend between shift and and" This reverts commit a57236de4eb8f38b4201647b10146941cbbb5c0b. Causes a bootstrap failure on ppc64be.	2023-07-05 20:04:49 -04:00
Nemanja Ivanovic	a57236de4e	[PowerPC] Remove extend between shift and and The SDAG will sometimes insert an extend between the shift and an and (immediate) even though the immediate is narrower than the narrow size. This does not allow us to produce a rotate instruction (such as rlwinm). This patch just adds a combine to move the extend onto the and. Differential revision: https://reviews.llvm.org/D152911	2023-07-05 16:33:07 -04:00
Elliot Goodrich	b0abd4893f	[llvm] Add missing StringExtras.h includes In preparation for removing the `#include "llvm/ADT/StringExtras.h"` from the header to source file of `llvm/Support/Error.h`, first add in all the missing includes that were previously included transitively through this header.	2023-06-25 15:42:22 +01:00
Amy Kwan	f5ae075048	[AIX][TLS] Generate 32-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 32-bit (specifically, non-optimized) code sequence. This work is a follow up of D149722. The particular sequence that is generated for this sequence is as follows: ``` .tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier bla .__get_tpointer() // get the thread pointer, modifies r3 lwz reg1, var[TC](2) // load the variable offset add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer ``` Differential Revision: https://reviews.llvm.org/D152669	2023-06-20 11:57:38 -05:00
Amy Kwan	d5659808b2	[AIX][TLS] Generate 64-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 64-bit (specifically, non-optimized) code sequence. For this patch in particular, the sequence that is generated involves a load of the variable offset, followed by an add of the loaded variable offset to r13 (which is thread pointer, respectively). This code sequence looks like the following: ``` ld reg1,var[TC](2) add reg2, reg1, r13 // r13 contains the thread pointer ``` The TOC (.tc pseudo-op) entries generated in the assembly files are also changed where we add the @le relocation for the variable offset. Differential Revision: https://reviews.llvm.org/D149722	2023-06-19 12:17:30 -05:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Qiu Chaofan	9e17e08324	[PowerPC] Combine fptoint-store under strict cases Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D141249	2023-06-05 16:24:02 +08:00
Qiu Chaofan	590c6a1727	[PowerPC] Require FPCVT for store fptoi combination	2023-06-05 14:26:32 +08:00
Qiu Chaofan	69bc8ff766	Reland "[PowerPC] Simplify fp-to-int store optimization" The build failure should be fixed by de681d53. Follow-up refactor will be done in future patches. This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.	2023-06-05 13:53:08 +08:00
Craig Topper	6006d43e2d	LLVM_FALLTHROUGH => [[fallthrough]]. NFC Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D150996	2023-05-24 12:40:10 -07:00
Vitaly Buka	e7c5ced0b9	Revert "[PowerPC] Simplify fp-to-int store optimization" Breaks https://lab.llvm.org/buildbot/#/builders/18/builds/9118 This reverts commit 8064caf83fb166b709bfe0e7641c5181341cb064.	2023-05-24 10:05:28 -07:00
Nemanja Ivanovic	de681d53ba	[PowerPC] Do not attempt to combine fptoui without FPCVT Commit 8064caf83fb166b709bfe0e7641c5181341cb064 added a call to a function that performs this combine without checking whether the target supports FPCVT. This caused asserts to trip on BE bots as the default target does not have this feature.	2023-05-24 11:14:26 -05:00
Krasimir Georgiev	c37ced7d02	silence an unused variable warning after 8064caf83fb166b709bfe0e7641c5181341cb064	2023-05-23 12:47:13 +00:00
Qiu Chaofan	8064caf83f	[PowerPC] Simplify fp-to-int store optimization On PowerPC VSX targets, fp-to-int will be transformed into xscv with mfvsr. When the result is to be stored, mfvsr can be replaced by a direct store. This change simplifies the optimization by using existing fp-to-int code, which helps CSE and handling strictfp cases. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D141473	2023-05-23 16:40:54 +08:00
Sergei Barannikov	da42b2846c	[CodeGen] Support allocating of arguments by decreasing offsets Previously, `CCState::AllocateStack` always allocated stack space by increasing offsets. For targets with stack growing up (away from zero) it is more convenient to allocate arguments by decreasing offsets, so that the first argument is at the top of the stack. This is important when calling a function with variable number of arguments: the callee does not know the size of the stack, but must be able to access "fixed" arguments. For that to work, the "fixed" arguments should have fixed offsets relative to the stack top, i.e. the variadic arguments area should be at the stack bottom (at lowest addresses). The in-tree target with stack growing up is AMDGPU, but it allocates arguments by increasing addresses. It does not support variadic arguments. A drive-by change is to promote stack size/offset to 64-bit integer. This is what MachineFrameInfo expects. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149575	2023-05-17 21:51:52 +03:00
Sergei Barannikov	01a7967447	[CodeGen] Replace CCState's getNextStackOffset with getStackSize (NFC) The term "next stack offset" is misleading because the next argument is not necessarily allocated at this offset due to alignment constrains. It also does not make much sense when allocating arguments at negative offsets (introduced in a follow-up patch), because the returned offset would be past the end of the next argument. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D149566	2023-05-17 21:51:45 +03:00
Zequan Wu	3977b77a6b	[CodeGen] Fix nomerge attribute not working in tail calls. In D79537, `nomerge` was made to only apply to non-tail calls. This fixes it by also applying it to tail calls. For ARM, I only made the new MI to inherit the flag under `TCRETURNdi` and `TCRETURNri`, because that's the place tail calls got replaced. Not sure if there's any other place needed. Fixes #61545. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D146749	2023-05-10 14:25:11 -04:00
NAKAMURA Takumi	c1221251fb	Restore CodeGen/MachineValueType.h from `Support` This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024	2023-05-03 00:13:20 +09:00
Kai Luo	eee024bf1b	[PowerPC] Update `incr` after resetting the register in MI After performing signed extension, we update the register in MI. We should also update `incr` register which is tracking the register in `MI`. Fixes https://github.com/llvm/llvm-project/issues/61882. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D147594	2023-04-14 17:36:30 +08:00
Maryam Moghadas	cf0395f816	[PowerPC] Fix the xxperm swap requirements This patch is to fix the xxperm vector operand swap condition so that the single-use operand is in V2 to prevent copying, it also fixes the subtarget condition to exploit the xpperm. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D146632	2023-04-05 20:13:40 -05:00
Qiu Chaofan	5b8ea2d0e1	[PowerPC] Lower IS_FPCLASS by test data class instruction Power ISA 3.0 introduced new 'test data class' instructions, which accept flags for: NaN/Infinity/Zero/Denormal. This instruction can be used to implement custom lowering for llvm.is.fpclass, but some extra bits provided by the intrinsic are missing (normal and QNaN/SNaN). For those categories not natively supported, this patch uses a two-way or three-way combination to implement correct behavior. Reviewed By: sepavloff, shchenz Differential Revision: https://reviews.llvm.org/D140381	2023-04-03 11:37:17 +08:00
Craig Topper	219ff07f72	[Targets] Rename Flag->Glue. NFC Long long ago Glue was called Flag, and it was never completely renamed.	2023-04-02 19:28:51 -07:00
Simon Pilgrim	8153b92d9b	[DAG] Add SelectionDAG::SplitScalar helper Similar to the existing SelectionDAG::SplitVector helper, this helper creates the EXTRACT_ELEMENT nodes for the LO/HI halves of the scalar source. Differential Revision: https://reviews.llvm.org/D147264	2023-03-31 18:35:40 +01:00
Amy Kwan	6126356d82	[PowerPC] Implement 64-bit ELFv2 Calling Convention in TableGen (for integers/floats/vectors in registers) This patch partially implements the parameter passing rules outlined in the ELFv2 ABI within TableGen. Specifically, it implements the parameter assignment of integers, floats, and vectors within registers - where the GPR numbering will be "skipped" depending on the ordering of floats and vectors that appear within a parameter list. As we begin to adopt GlobalISel to the PowerPC backend, there is a need for a TableGen definition that encapsulates the ELFv2 parameter passing rules. Thus, this patch also changes the default calling convention that is returned within the ccAssignFnForCall() function used in our GlobalISel implementation, and also adds some additional testing of the calling convention that is implemented. Future patches that build on top of this initial TableGen definition will aim to add more of the ABI complexities, including support for additional types and also in-memory arguments. Differential Revision: https://reviews.llvm.org/D137504	2023-03-27 08:23:04 -05:00
Kazu Hirata	7bb6d1b32e	[llvm] Skip getAPIntValue (NFC) ConstantSDNode provides some convenience functions like isZero, getZExtValue, and isMinSignedValue that are named identically to those provided by APInt, so we can "skip" getAPIntValue.	2023-03-22 22:10:25 -07:00
Simon Pilgrim	da570ef1b4	[DAG] Match select(icmp(x,y),sub(x,y),sub(y,x)) -> abd(x,y) patterns Pulled out of PowerPC, and added ABDS support as well (hence the additional v4i32 PPC matches) Differential Revision: https://reviews.llvm.org/D144789	2023-03-14 15:10:30 +00:00
Chen Zheng	a3b57bca97	[PowerPC] remove side effect for some cases for saturate instructions Fixes #60684 Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D145353	2023-03-13 21:37:56 -04:00
Yuanfang Chen	9aae408d55	[NFC] fix typo `funciton` -> `function` credits to @jmagee	2023-03-10 18:05:25 -08:00
Ting Wang	bd4562976c	[PowerPC][NFC] cleanup isEligibleForTCO The input parameter IsByValArg to isEligibleForTCO() is false in all cases, so it is considered redundant and should be removed. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D145028	2023-03-02 23:04:19 -05:00
Ting Wang	65f68812d3	[PowerPC] update PPCTTIImpl::supportsTailCallFor() check conditions This patch reuse `PPCTargetLowering::isEligibleForTCO()` to check `PPCTTIImpl::supportsTailCallFor()`. Fixes #59315 Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D140369	2023-02-28 22:29:16 -05:00
Simon Pilgrim	8757ce4901	[PowerPC] Replace PPCISD::VABSD cases with generic ISD::ABDU(X,Y) node A move towards using the generic ISD::ABDU nodes on more backends Also support ISD::ABDS for v4i32 types using the existing signbit flip trick PowerPC has a select(icmp_ugt(x,y),sub(x,y),sub(y,x)) -> abdu(x,y) combine that I intend to move to DAGCombiner in a future patch. The ABS(SUB(X,Y)) -> PPCISD::VABSD(X,Y,1) v4i32 combine wasn't legal (https://alive2.llvm.org/ce/z/jc2hLU) - so I've removed it, having already added the legal sub nsw tests equivalent. Differential Revision: https://reviews.llvm.org/D142313	2023-02-25 20:17:17 +00:00
Ting Wang	d567e06946	[PowerPC][NFC] refactor eligible check for tail call optimization The check logic for TCO is scattered in two functions: IsEligibleForTailCallOptimization_64SVR4() IsEligibleForTailCallOptimization(), and serves instruction selection phase only at this moment. This patch aims to refactor existing logic to export an API for TCO eligible query before instruction selection phase. Reviewed By: shchenz, nemanjai Differential Revision: https://reviews.llvm.org/D141673	2023-02-21 06:14:47 -05:00
esmeyi	fd226142fc	[AIX] Lower some memory intrinsics to millicode functions on AIX Summary: Currently we lower MEMCPY/MEMMOVE/MEMSET/BZERO to the corresponding libc functions. And the libc functions call the millicode functions on AIX. We can lower these intrinsics directly to save one call layer. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D143997	2023-02-20 22:25:49 -05:00
Kazu Hirata	f8f3db2756	Use APInt::count{l,r}_{zero,one} (NFC)	2023-02-19 22:04:47 -08:00
Kazu Hirata	7e6e636fb6	Use llvm::has_single_bit<uint32_t> (NFC) This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.	2023-02-15 22:17:27 -08:00
Matt Arsenault	09dd4d870e	DAG: Remove hasBitPreservingFPLogic This doesn't make sense as an option. fneg and fabs are bit preserving by definition. If a target has some fneg or fabs instruction that are not bitpreserving it's incorrect to lower fneg/fabs to use it.	2023-02-14 10:25:24 -04:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Philip Reames	3be1ae24fb	[CodeGen] Add standard print/debug utilities to MVT Doing so makes it easier to do printf style debugging in idiomatic manner. I followed the code structure of Value with only the definition of dump being #ifdef out in non-debug builds. Not sure if this is the "right" option; we don't seem to have any single consistent scheme on how dump is handled. Note: This is a follow up to D143454 which did the same for EVT. Differential Revision: https://reviews.llvm.org/D143511	2023-02-07 10:50:14 -08:00
Kazu Hirata	e078201835	[Target] Use llvm::count{l,r}_{zero,one} (NFC)	2023-01-28 09:23:07 -08:00
Kazu Hirata	f20b5071f3	[llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC)	2023-01-28 09:06:31 -08:00
Matt Arsenault	778cf5431c	IR: Add atomicrmw uinc_wrap and udec_wrap These are essentially add/sub 1 with a clamping value. AMDGPU has instructions for these. CUDA/HIP expose these as atomicInc/atomicDec. Currently we use target intrinsics for these, but those do no carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes.	2023-01-24 17:55:11 -04:00
Guillaume Chatelet	8b1d86aedf	[NFC] Deprecate SelectionDag::getLoad that takes alignment as unsigned	2023-01-24 09:42:36 +00:00
Craig Topper	79858d1908	[CodeGen][Target] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC Use isPhysical/isVirtual methods.	2023-01-13 23:12:48 -08:00
Guillaume Chatelet	8fd5558b29	[NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize() This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.	2023-01-11 16:49:38 +00:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Stefan Pintilie	c1d0118459	[PowerPC] Materialize floats in the range [-16.0, 15.0]. Previous to this patch we only materialized 0.0 and all other floating point values would be loaded from the TOC. This patch adds materialization for the floating point values that can be represented as integers in [-16.0, 15.0]. For example we will now materialize 3.0 and -5.0 but not 4.7. Reviewed By: nemanjai, lei, #powerpc Differential Revision: https://reviews.llvm.org/D138844	2023-01-04 12:52:30 -06:00
Lei Huang	7a7e9109a2	[PowerPC] Implement P10 Byte Reverse Insructions Generate brh, brw and brd instructions for byte-swap operations on P10 and generating a single instruction for a 32-bit swap followed by a 16-bit right shift. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D140414	2022-12-21 09:15:57 -06:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00

1 2 3 4 5 ...

1798 Commits