llvm-project

Author	SHA1	Message	Date
Craig Topper	a64b3e92c7	[RISCV] Re-define sha256, Zksed, and Zksh intrinsics to use i32 types. Previously we returned i32 on RV32 and i64 on RV64. The instructions only consume 32 bits and only produce 32 bits. For RV64, the result is sign extended to 64 bits like *W instructions. This patch removes this detail from the interface to improve portability and consistency. This matches the proposal for scalar intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44 I've included IR autoupgrade support as well. I'll be doing this for other builtins/intrinsics that currently use 'long' in other patches. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D154647	2023-07-17 08:58:29 -07:00
Craig Topper	fda45d9198	[RISCV] Add FP compare test to condops.ll to show a missed opportunity to remove an xori. NFC This is a case that D155288 won't get. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D155327	2023-07-17 08:47:42 -07:00
Simon Pilgrim	e9caa37e9c	[DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits Inspired by some of the cases from D145468 Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free. A future patch will propose the equivalent shl narrowing combine. Differential Revision: https://reviews.llvm.org/D146121	2023-07-17 15:50:09 +01:00
Amaury Séchet	a23d6c760c	[NFC] Add test case for D154533.	2023-07-17 14:19:15 +00:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Jay Foad	a2453c6130	[AMDGPU] Add test case for zext of f16 to i32 Preserve the test case from this abandoned review: D51925 [AMDGPU] Fix issue for zext of f16 to i32	2023-07-17 12:55:29 +01:00
Simon Pilgrim	fd2de54920	[X86] Canonicalize vXi64 SIGN_EXTEND_INREG vXi1 to use v2Xi32 splatted shifts instead If somehow a vXi64 bool sign_extend_inreg pattern has been lowered to vector shifts (without PSRAQ support), then try to canonicalize to vXi32 shifts to improve likelihood of value tracking being able to fold them away. Using a PSLLQ and bitcasted PSRAD node make it very difficult for later fold to recover from this.	2023-07-17 10:18:03 +01:00
David Sherwood	cc68e05bd2	[SVE][CodeGen] Improve codegen for some zero-extends of masked loads When doing a masked load of an illegal unpacked type and then zero-extending to some illegal wider types we sometimes end up with pointless 'and' instructions that are trying to zero bits that we already know are zero. This patch fixes that by adding more cases to performSVEAndCombine. Differential Revision: https://reviews.llvm.org/D155281	2023-07-17 08:19:27 +00:00
Luke Lau	b5bcd4f60b	[RISCV] Add VL nodes and VP patterns for unary zvbb instructions This follows the pattern of lowering VP nodes to equivalent RISCVISD::*_VL nodes. The nodes are modelled after the VP ISD nodes rather than the actual zvbb instructions, and I've included a merge operand to be consistent with the underlying pseudos that were recently refactored. I've defined the nodes in RISCVInstrInfoVVLpatterns.td as the nodes aren't Zvk specific, but the patterns are in RISCVInstrInfoZvk.td. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155229	2023-07-17 09:17:58 +01:00
David Sherwood	6a036316b3	[SVE][CodeGen] Add more test cases for zero-extends of masked loads This patch adds test cases for extending masked loads of illegal unpacked types into illegal wider types. Pre-commits tests for D155281	2023-07-17 08:06:15 +00:00
Piyou Chen	7ce4e933ea	[RISCV] Implement prefetch locality by NTLH We add the MemOperand then backend will generate NTLH automatically. ``` __builtin_prefetch(ptr, 0 /* rw==read /, 0 / locality /); => ntl.all + prefetch.r (ptr) __builtin_prefetch(ptr, 0 / rw==read /, 1 / locality /); => ntl.pall + prefetch.r (ptr) __builtin_prefetch(ptr, 0 / rw==read /, 2 / locality /); => ntl.p1 + prefetch.r (ptr) __builtin_prefetch(ptr, 0 / rw==read /, 3 / locality */); => prefetch.r (ptr) ``` Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154691	2023-07-16 20:32:46 -07:00
Jay Foad	a1a9c53ae7	[GlobalISel] Fix infinite loop in reassociation combine Don't reassociate (C1+C2)+Y -> C1+(C2+Y). Fixes https://github.com/llvm/llvm-project/issues/63849 Differential Revision: https://reviews.llvm.org/D155284	2023-07-16 14:15:24 +01:00
Jim Lin	348c67e254	[RISCV] Merge rv32/rv64 vector single-width shift intrinsic tests that have the same content. NFC.	2023-07-16 13:09:10 +08:00
Brad Smith	7973d51965	[Mips] Set setMaxAtomicSizeInBitsSupported Set setMaxAtomicSizeInBitsSupported for Mips. Set the value as appropriate for 64-bit MIPS vs 32-bit. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D141189	2023-07-15 17:29:25 -04:00
Jon Chesterfield	6043d4dfec	[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca	2023-07-15 21:37:21 +01:00
Stephen Peckham	ac5d5351d4	Use empty symbol name for XCOFF text csect When generating XCOFF, the compiler generates a csect with an internal name. Each function results in a label within the csect. This patch replaces the internal name ".text" with an empty string "". This avoids adding special code to handle a function text() in the source file, and works better with some XCOFF tools that are confused when the csect and the first function have the same address. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D154854	2023-07-15 16:13:48 -04:00
Nitin John Raj	6a35ceaacf	[RISCV][GlobalISel] Legalize add, sub and binary logical instructions for narrow types For rv32, we test the legalization of i8, i16 and i32. For rv64, we additionally test the legalization of i64. This is the first of a series of commits aiming to legalize arithmetic instructions for RISCV. Reviewed By: craig.topper, arsenm Differential Revision: https://reviews.llvm.org/D154978	2023-07-14 18:22:53 -07:00
Matt Arsenault	ef4a2b6096	AMDGPU: Expand testing of AMDGPUCodeGenPrepare fdiv handling - Switch to generated checks - Use a different run line per denormal mode to reduce test duplication - Add test coverage for rsqrt cases - Add test coverage for repeated arcp denominator - Fix the optnone test	2023-07-14 18:57:40 -04:00
Maurice Heumann	a1cdb323e2	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Recommitting after undefined behavior fix in D155093. Differential Revision: https://reviews.llvm.org/D153800	2023-07-14 12:54:18 -07:00
Konstantina Mitropoulou	21ca892f69	[NFC][AMDGPU] Add automated tests in or.ll Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155265	2023-07-14 12:39:14 -07:00
Mikhail Gudim	c158ddd99e	Reapply [RISCV] Fold binary op into select if profitable. This fixes some bugs in the original commit: (1) Operands are passed in correct order when creating new constant and the binary operator. New tests were added to cover these cases. (2) Check was added to see if it is safe to commute the select and the binary operator. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D152147	2023-07-14 15:30:54 -04:00
Kamau Bridgeman	62c1cf7c63	[PowerPC][Future] Enable __builtin_mma_xxm[t\|f]acc Future cpu instructions dmxxinstdmr512 and dmxxextfdmr512 insert and extract quad vectors from the new wide accumulator(wacc) register class. The introduction of these new instructions renders the p10 instructions xxmtacc and xxmfacc obsolete since the new wacc register class is a better choice for handing quad vector operations. This patch ensures that, for future cpu, instructions dmxxinstdmr512 and dmxxextfdmr512 are generated by custom lowering the intrinsics for xxm[t\|f]acc to produce no instructions. Reviewed By: amyk, lei Differential Revision: https://reviews.llvm.org/D153034	2023-07-14 13:38:40 -05:00
Craig Topper	3a0a25f9b6	[RISCV] Support i32 clmul* intrinsics on RV64. We can use an i64 clmul to emulate i32 clmul. For clmulh and clmulr we need to zero extend the 32 bit input to 64 bits then extract either bits [63:32] or [62:31]. Unfortunately, without Zba we need to use 2 shifts for the zero extends. These can be optimized out later if the producing instruction already zeroed the upper bits or if we can use lwu. There are alternative sequences we can use for clmulh/clmulr when the zero extend isn't free, but those are best handled by a DAG combine to give the best opportunity for removing the extend. This allows us to implement i32 clmul C intrinsics proposed in https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D154729	2023-07-14 11:20:03 -07:00
Simon Cook	4083ecfd7f	[RISCV] Cleanups in CORE-V (xcv) extensions This is a mostly NFC change cleaning up and clarifying components of the in-tree CORE-V (xcv*) extensions following discussions on the remaining extensions. This makes the following changes to the xcbitmanip and xcvmac support: 1. Add missing extensions from RISCVISAInfo, such that they can be supported in clang's -march option. 2. Clarify the extension version number is 1.0.0 in documentation. 3. Clarify the extensions are by OpenHW Group, and the capitilization of the CORE-V extension family. 4. Add CORE-V to extension name in RISCVFeatures, both to be consistent with other vendors, and also better distinguish e.g. CORE-V bit manipulation vs RISC-V's standard Zb extensions. Differential Revision: https://reviews.llvm.org/D155283	2023-07-14 18:21:08 +01:00
Serge Pavlov	be794e3d92	[X86][FPEnv] Lowering of {get,set,reset}_fpenv The change implements lowering of `get_fpenv`, `set_fpenv` and `reset_fpenv`. Differential Revision: https://reviews.llvm.org/D81833	2023-07-14 22:10:53 +07:00
Serge Pavlov	25a81a1871	Precommit tests on lowering *_fpenv on X86	2023-07-14 22:10:12 +07:00
Alex Bradbury	95075d3d2c	[RISCV][test] Add RV32I and RV64I RUN lines to condops.ll test Some of these test cases will be changed by upcoming combines, even in the non-zicond case.	2023-07-14 13:29:40 +01:00
Alex Bradbury	5c5a1a2927	[RISCV] Introduce RISCVISD::CZERO_{EQZ,NEZ} nodes produce them when zicond is present in lowerSELECT This patch is a step towards altering how we handle the emission of condops. Marking ISD::SELECT as legal is a major change in the codegen path, and gives few options for maintaining the old codegen path when it is believed to be better (e.g. a better branchless sequence is possible using non-zicond instructions, or the branch-based sequence is preferable). This removes the existing SelectionDAG patterns and moves the logic into lowerSELECT. Along some small codegen changes you'll note a few minor regressions in the generated code quality - this are due to the fact that by lowering the SELECT node early we miss out on combines that would kick in later when setcc condcodes that aren't natively supported have been expanded (thus exposing opportunities for optimisation by performing logical negation and swapping truev/falsev). I've opted to split out work that addresses these into follow-on patches (especially as zicond is still 'experimental'). matchSetCC is a straight-forward translation from the version in RISCVISelDAGToDAG. Ideally, in the future it can be converted to a helper shared between both files. Differential Revision: https://reviews.llvm.org/D155083	2023-07-14 11:31:27 +01:00
Simon Pilgrim	720debcf64	[X86] Fold PACKSS(NOT(X),NOT(Y)) -> NOT(PACKSS(X,Y))	2023-07-14 10:36:21 +01:00
David Green	edf9f88566	[AArch64] Handle 64bit vector s/umull from extracts This is similar to D153632, but for mul nodes instead of add/sub. They get recognised in LowerMUL in order to detect the mul(ext, ext), in a way that will work for i64 nodes as well as i16/i32. This extends it to look for mul(subvector_extract(ext(x), 0), subvector_extract(ext(y), 0)), generating a subvector_extract(mull(x,y)) if it matches. Differential Revision: https://reviews.llvm.org/D154063	2023-07-14 10:25:12 +01:00
Yeting Kuo	2ac99205ee	[RISCV] Narrow types of index operand matched pattern (shl (zext), C). (shl (zext to iXLenVec), C) is a possible pattern in auto-vectorized code for indexed loads/stores. But extending to iXLen might be too aggressive, RVV indexed load/store instructions zero extend their indexed operand to XLEN. The patch tries to narrow the type of the zero extension. It's benefit to decrease register pressure. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154687	2023-07-14 15:45:44 +08:00
XinWang10	2d6a5ab5eb	[X86]Recommit D154193 - Remove TEST in AND32ri+TEST16rr in peephole-opt Previously we remove a pattern like: %reg = and32ri %in_reg, 5 ... // EFLAGS not changed. %src_reg = subreg_to_reg 0, %reg, %subreg.sub_index test64rr %src_reg, %src_reg, implicit-def $eflags We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially. And this case is also can be opted for the same reason, like: %reg = and32ri %in_reg, 5 ... // EFLAGS not changed. %src_reg = copy %reg.sub_16bit:gr32 test16rr %src_reg, %src_reg, implicit-def $eflags The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D154193	2023-07-14 03:42:42 -04:00
pvanhout	e5296c52e5	[AMDGPU] Relax restrictions on unbreakable PHI users in BreakLargePHis The previous heuristic rejected a PHI if one of its user was an unbreakable PHI, no matter what the other users were. This worked well in most cases, but there's one case in rocRAND where it doesn't work. In that case, a PHI node has 2 PHI users where one is breakable but not the other. When that PHI node isn't broken performance falls by 35%. Relaxing the restriction to "require that half of the PHI node users are breakable" fixes the issue, and seems like a sensible change. Solves SWDEV-409648, SWDEV-398393 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155184	2023-07-14 09:02:51 +02:00
Weining Lu	ef33d6cbfc	[XRay] Add initial support for loongarch64 Only support patching FunctionEntry/FunctionExit/FunctionTailExit for now. Reviewed By: MaskRay, xen0n Co-Authored-By: zhanglimin <zhanglimin@loongson.cn> Differential Revision: https://reviews.llvm.org/D140727	2023-07-14 09:27:13 +08:00
Sean Fertile	5e28d30f1f	[XCOFF][AIX] Peephole optimization for toc-data. Followup to D101178 - peephole optimization that converts a load address instruction and a consuming load/store into just the load/store when its safe to do so. eg: converts the 2 instruction code sequence la 4, i[TD](2) stw 3, 0(4) to stw 3, i[TD](2) Differential Revision: https://reviews.llvm.org/D101470	2023-07-13 20:40:09 -04:00
Jon Chesterfield	d3316bc111	[amdgpu] Delete elide-module-lds attribute Requires D155190 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155238	2023-07-14 00:36:33 +01:00
Jon Chesterfield	74e928a081	[amdgpu][lds] Remove recalculation of LDS frame from backend Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190	2023-07-13 23:54:38 +01:00
Craig Topper	0aecddcee9	[RISCV] Add Zce extension. According to the spec, Zce is an alias for Zca, Zcb, Zcmp, and Zcmt. If F is enabled on RV32 it also includes Zcf. This patch adds the Zce and the implication rule which unfortunately requires custom handling for adding Zcf. I've also made all the Zc* extensions imply Zca. I've also added an error for Zcf without RV32. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D153742	2023-07-13 12:22:06 -07:00
Nemanja Ivanovic	329b8cd3e3	[PowerPC] Improve code gen for vector add Improve codegen for vectors modulo additions. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D154447	2023-07-13 15:21:49 -04:00
Jeffrey Byrnes	6b7805fcb1	[AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms This adds the IGLP strategy for single-wave gemms. The SchedGroup pipeline is laid out in multiple phases, with each phase corresponding to a distinct pattern present in gemm kernels. The resilience of the optimization is dependent upon IR (as seen by pre-RA scheduling) continuing to have these patterns (as defined by instruction class and dependencies) in their current relative ordering. The kernels of interest have these specific phases: NT: 1, 2a, 2c NN: 1, 2a, 2b TT: 1, 2b, 2c TN: 1, 2b The general approach taken was to have a long SchedGroup pipeline. In this way the scheduler will have less capability of doing the wrong thing. In order to resolve the challenge of correctly fitting these long pipelines, we leverage the rules infrastructure to help the solver. Differential Revision: https://reviews.llvm.org/D149773 Change-Id: I1a35962a95b4bdf740602b8f110d3297c6fb9d96	2023-07-13 12:03:04 -07:00
Ivan Kosarev	289ae6525d	[AMDGPU][MC] Fix handling of A16 operands in intersect_ray instructions. The patch adds the support for 'noa16' operands in non-A16 variants of the instructions, fixes validation of A16 operands and eliminates the custom conversion to MCInst. Part of <https://github.com/llvm/llvm-project/issues/62629>. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155057	2023-07-13 19:46:03 +01:00
Luke Lau	55e2772e9f	[RISCV] Add initial SDNode patterns for unary zvbb instructions This patch adds pseudos and SDNode patterns for vbrev.v, vrev8.v, vclz.v, vctz.v and vcpop.v. I've only added them for integer element types so far since we're lacking tests for floats. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D155216	2023-07-13 19:39:04 +01:00
Amara Emerson	432338a673	Don't assert on a non-pointer value being used for a "p" inline asm constraint. GCC and existing codebases allow the use of integral values to be used with this constraint. A recent change D133914 in this area started causing asserts. Removing the assert is enough as the rest of the code works fine. rdar://109675485 Differential Revision: https://reviews.llvm.org/D155023	2023-07-13 10:45:56 -07:00
Simon Pilgrim	c660a2f0ab	[X86] Fold ANDNP(X,NOT(Y)) -> NOT(OR(X,Y)) Removing the x86-specific node helps further folding and improves commutativity	2023-07-13 16:56:20 +01:00
Mateja Marjanovic	fa46feb314	[AMDGPU] Use V_FMA_MIX* more often Combine mul (f32) + fptrunc (f32->f16) to "v_fma_mixlo_f16 mulSrc1, mulSrc2, 0". Differential Revision: https://reviews.llvm.org/D153544 Reviewers: arsenm, foad	2023-07-13 16:56:16 +02:00
Mateja Marjanovic	d3140f9363	Precommit for more usage of V_FMA/MAD_MIX* Make fdiv.f16.ll autogenerated.	2023-07-13 16:26:21 +02:00
pvanhout	07c5920487	Reland "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This time without the extra `->dump()` A recent addition to the device libs, `__ockl_dm_trim`, caused a series of failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function. The quick fix for this is to support codegen for this rare case. A proper long-term fix for this type of issue is still being discussed. Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D155050	2023-07-13 15:58:48 +02:00
pvanhout	aec971adec	Revert "[AMDGPU] Wave32 CodeGen for amdgcn.ballot.i64" This reverts commit cfa2d0a3aa0beb5422107dc9943cb0eae6d93896.	2023-07-13 15:52:27 +02:00
Mateja Marjanovic	701c4adcea	Check for denormal flushing when selecting V_FMA/MAD_MIX*	2023-07-13 15:26:20 +02:00
Oliver Stannard	aea8db8eb9	Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI." This reverts commit 58d1eaa3b6ce4f7285c51f83faff7a3ac374c746.	2023-07-13 14:25:39 +01:00

1 2 3 4 5 ...

48997 Commits