llvm-project

Author	SHA1	Message	Date
Carl Ritson	99c790dc21	[AMDGPU] Make BVH isel consistent with other MIMG opcodes Suffix opcodes with _gfx10. Remove direct references to architecture specific opcodes. Add a BVH flag and apply this to diassembly. Fix a number of disassembly errors on gfx90a target caused by previous incorrect BVH detection code. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108117	2021-08-17 10:42:22 +09:00
Min-Yih Hsu	eec3495a9d	[M68k] Do not pass llvm::Function& to M68kCCState Previously we're passing `llvm::Function&` into `M68kCCState` to lower arguments in fastcc. However, that reference might not be available if it's a library call and we only need its argument types. Therefore, now we're simply passing a list of argument llvm::Type-s. This fixes PR-50752. Differential Revision: https://reviews.llvm.org/D108101	2021-08-16 15:33:08 -07:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
Jordan Rupprecht	4357562067	[NFC][AArch64] Fix unused var in release build	2021-08-16 10:04:32 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Cullen Rhodes	09507b5325	[AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902	2021-08-16 07:56:48 +00:00
Craig Topper	b82ce77b2b	[X86] Support avx512fp16 compare instructions in the IntelInstPrinter. This enables printing of the mnemonics that contain the predicate in the Intel printer. This requires accounting for the memory size that is explicitly printed in Intel syntax. Those changes have been synced to the ATT printer as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108093	2021-08-16 12:31:36 +08:00
Craig Topper	ff95d2524a	[X86] Prevent accidentally accepting cmpeqsh as a valid mnemonic. We should only accept as vcmpeqsh. Same for all the other 31 comparison values.	2021-08-15 12:00:56 -07:00
Craig Topper	819818f7d5	[X86] Modify the commuted load isel pattern for VCMPSHZrm to match VCMPSSZrm/VCMPSDZrm. This allows commuting any immediate value. The previous code only commuted equality immediates. This was inherited from an earlier version of VCMPSSZrm/VCMPSDZrm.	2021-08-15 11:43:56 -07:00
Craig Topper	786b8fcc9b	[X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl. They were already added to findCommuteOpIndices, but they also need to be in X86InstrInfo::commuteInstructionImpl in order to adjust the immediate control.	2021-08-15 11:36:13 -07:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
Kazu Hirata	915cc69259	[Aarch64] Remove redundant c_str (NFC) Identified with readability-redundant-string-cstr.	2021-08-14 08:49:40 -07:00
Craig Topper	d63f117210	[RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode.	2021-08-13 18:00:09 -07:00
Matt Arsenault	a77ae4aa6a	AMDGPU: Stop attributor adding attributes to intrinsic declarations	2021-08-13 20:51:48 -04:00
Matt Arsenault	5beb9a0e6a	AMDGPU: Respect compute ABI attributes with unknown OS Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL, so we still have the awkward unknown OS case to deal with. Previously if the HSA ABI intrinsics appeared, we we would not add the ABI registers to the function. We would emit an error later, but we still need to produce some compile result. Start adding the registers to any compute function, regardless of the OS. This keeps the internal state more consistent, and will help avoid numerous test crashes in a future patch which starts assuming the ABI inputs are present on functions by default.	2021-08-13 20:44:46 -04:00
Arthur Eubanks	16e8134e7c	[NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr()	2021-08-13 16:56:42 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Amy Kwan	581a80304c	[PowerPC] Disable CTR Loop generate for fma with the PPC double double type. It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this intrinsic needs to remain as a call as it cannot be lowered to any instruction, which also means we need to disable CTR loop generation for fma involving the ppcf128 type. This patch accomplishes this behaviour. Differential Revision: https://reviews.llvm.org/D107914	2021-08-13 12:27:24 -05:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
Shivam Gupta	835ea22b37	[AVR] Enable machine verifier Reviewed By: mhjacobson, benshi001 Differential Revision: https://reviews.llvm.org/D107853	2021-08-13 12:11:22 +08:00
Heejin Ahn	adb96d2e76	[WebAssembly] Fix leak in Emscripten SjLj For SjLj, we allocate a table to record setjmp buffer info in the entry of each setjmp-calling function by inserting a `malloc` call, and insert a `free` call to free the buffer before each `ret` instruction. But this is not sufficient; we have to free the buffer before we throw. In SjLj handling, normal functions that can possibly throw or longjmp are wrapped with an invoke and caught within the function so they don't end up escaping the function. But three functions throw and escape the function: - `__resumeException` (Emscripten library function used for Emscripten EH) - `emscripten_longjmp` (Emscripten library function used for Emscripten SjLj) - `__cxa_throw` (libc++abi function called when for C++ `throw` keyword) The first two functions are used to rethrow the current exception/longjmp when the caught exception/longjmp is not for the current function. `__cxa_throw` is used for exception, and because we consider that a function that cannot longjmp, it escapes the function right away, before which we should free the buffer. Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail in Emscripten; this CL fixes these. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107852	2021-08-12 16:32:46 -07:00
Heejin Ahn	aca198cf74	[WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj cannot handle `invoke` instructions - it assumes all `invoke`s have been lowered away with Emscripten EH. But in Wasm EH they are lowered in instruction selection, so they are still present in the IR stage. This happens when 1. Wasm EH and Emscripten SjLj are used together 2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s We were already erroring out with an assertion failure in this case, but this CL makes it error out more properly with a valid error message. Wasm EH + Wasm SjLj will not have this restrictions. (it will have another restriction though, e.g., `setjmp` cannot be called within `catch`. But why would anyone do that..) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107687	2021-08-12 16:19:04 -07:00
Lei Huang	8930af45c3	[PowerPC] Implement XL compatibility builtin __addex Add builtin and intrinsic for `__addex`. This patch is part of a series of patches to provide builtins for compatibility with the XL compiler. Reviewed By: stefanp, nemanjai, NeHuang Differential Revision: https://reviews.llvm.org/D107002	2021-08-12 16:38:21 -05:00
Heejin Ahn	78e87970af	[WebAssembly] Disable offset folding for function addresses Wasm does not support function addresses with offsets, but isel can generate folded SDValues in the form of (@func + offset) without this patch. Fixes https://bugs.llvm.org/show_bug.cgi?id=43133. Reviewed By: dschuff, sbc100 Differential Revision: https://reviews.llvm.org/D107940	2021-08-12 13:40:41 -07:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
Victor Huang	99e00663d4	[PowerPC] Fix return address computation for "__builtin_return_address" When depth > 0, callee frame address is used to compute the return address of callee producing improper return address. This patch adds the fix to use caller frame address to compute the return address of callee. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D107646	2021-08-12 09:44:49 -05:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Cullen Rhodes	419deccfd1	[AArch64] NFC: Remove register decoder tables in disassembler The register classes are generated by TableGen, use them instead of handwritten tables. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107763	2021-08-12 07:28:56 +00:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00
Usman Nadeem	9396c3ec7b	[AArch64][SVE] Remove assertion/range check for i16 values during immediate selection The assertion can fail in some cases when an i16 constant is promoted to i32. e.g. in the added test case the value `i16 -32768` is within the range of i16 but the assert fails when the constant is promoted to positive `i32 32768` by an earlier call to DAG.getConstant(). Differential Revision: https://reviews.llvm.org/D107880 Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de	2021-08-11 14:50:20 -07:00
Amara Emerson	2c1789bc8c	[AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner.	2021-08-11 13:59:23 -07:00
David Green	8c50b5fbfe	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why.	2021-08-11 10:35:53 +01:00
Cullen Rhodes	1fe0e6a380	[AArch64][SME] Support ptrue(s) in streaming mode The ptrue and ptrues instructions are legal in streaming mode, missed in D106272. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107807	2021-08-11 07:49:36 +00:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Sushma Unnibhavi	7bdce6bcbd	[M68k][GloballSel] RegBankSelect implementation Implementation of RegBankSelect for the M68k backend. Differential Revision: https://reviews.llvm.org/D107542	2021-08-10 15:24:43 -07:00
Thomas Johnson	b821086876	[ARC] Add codegen for count trailing zeros intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107828	2021-08-10 12:07:35 -07:00
Matt Arsenault	d719f1c3cc	AMDGPU: Add alloc priority to global ranges The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently added in 1a6dc92be7d68611077f0fb0b723b361817c950c). This avoids an allocation failure when many wide tuple spills are introduced. I think this is a workaround since I would not expect the allocation priority to be required, and only a performance hint. The allocator should be smarter about when only a subregister needs to be spilled and restored. This does regress a couple of degenerate store stress lit tests which shouldn't be too important.	2021-08-10 13:12:34 -04:00
Craig Topper	6f5edc3487	[RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y)) Similar for sub except sub isn't commutative. Modify the existing and/or/xor folds to also work on ISD::SELECT and not just RISCVISD::SELECT_CC. This is needed to make sure we do this transform before type legalization turns i32 add/sub into add/sub+sign_extend_inreg on RV64. If we don't do this before that, the sign_extend_inreg will still be after the select. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107603	2021-08-10 09:02:56 -07:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
Tim Northover	5ad0860899	AArch64: support @llvm.va_copy in GISel	2021-08-10 13:11:03 +01:00
David Green	c140ff493e	[ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available This changes a couple of calls to LiveRegs.contains to !LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a test to look more correct to me, given r7 should be the frame pointer so is not available), and another in the ARMLoadStoreOptimizer, that I don't have a test for, it was just found by inspection. Differential Revision: https://reviews.llvm.org/D107454	2021-08-10 09:53:26 +01:00
Tony Tye	53eb469195	[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer C++20 no longer requires the failure memory ordering to be no stronger than the success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge instruction memory orderings Add common operation to merge memory orders that allows non strict memory orderings to be combined. Use it in SIMemoryLegalizer and MachineMemOperand::getMergedOrdering. Reviewed By: efriedma, rampitec Differential Revision: https://reviews.llvm.org/D106729	2021-08-10 08:43:03 +00:00
Cullen Rhodes	81f057c253	[AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107752	2021-08-10 07:56:22 +00:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00

1 2 3 4 5 ...

63852 Commits