llvm-project

Author	SHA1	Message	Date
Craig Topper	2ea9db0c49	[AArch64] Fix -Wparentheses warning with gcc 5.4. NFC	2021-07-26 21:08:56 -07:00
Carl Ritson	fbaa35e169	[AMDGPU] Add SelectionDAG support for insert_subvector on v4f64 Enable custom insert_subvector for larger vector types. This is necessary now that SelectionDAG can attempt v3f64 insert to v4f64, etc. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D105385	2021-07-27 10:11:34 +09:00
Nemanja Ivanovic	9654cfd5bb	[PowerPC] Fix materialization of SP float values on Power10 All floating point values in registers are in double precision representation. In order to materialize the correct single precision value, we need to convert the APFloat that represents the value to double precision first. Reviewed By: amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106812	2021-07-26 19:43:10 -05:00
Jon Roelofs	f2e8e46d78	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit 97e95fea53fc403c2a12e356dc835fc922123575. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jon Roelofs	97e95fea53	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Masoud Ataei	45951ad323	[PowerPC] Add pwr7 and pwr10 support to IBM MASSV pass on AIX Before MASSV only supported P8 and P9 on AIX ans Linux . This patch proposes MASSV to add support of P7 and P10 only on AIX too. Differential: https://reviews.llvm.org/D106678	2021-07-26 23:21:38 +00:00
Amara Emerson	172051a1f4	[AArch64][GlobalISel] Add identity combines to post-legal combiner. We see some shifts of zero emitted during legalization. Differential Revision: https://reviews.llvm.org/D106816	2021-07-26 15:17:11 -07:00
Amara Emerson	c658b472f3	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Heejin Ahn	c285a11efd	[WebAssembly] Make Emscripten EH work with Emscripten SjLj When Emscripten EH mixes with Emscripten SjLj, we are not currently handling some of them correctly. There are three cases: 1. The current function calls `setjmp` and there is an `invoke` to a function that can either throw or longjmp. In this case, we have to check both for exception and longjmp. We are currently handling this case correctly: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1058-L1090)` When inserting routines for functions that can longjmp, which we do only for setjmp-calling functions, we check if the function was previously an `invoke` and handle it correctly. 2. The current function does NOT call `setjmp` and there is an `invoke` to a function that can either throw or longjmp. Because there is no `setjmp` call, we haven't been doing any check for functions that can longjmp. But in that case, for `invoke`, we only check for an exception and if it is not an exception we reset `__THREW__` to 0, which can silently swallow the longjmp: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L70-L80)` This CL fixes this. 3. The current function calls `setjmp` and there is no `invoke`. Because it is not an `invoke`, we haven't been doing any check for functions that can throw, and only insert longjmp-checking routines for functions that can longjmp. But in that case, if a longjmpable function throws, we only check for a longjmp so if it is not a longjmp we reset `__THREW__` to 0, which can silently swallow the exception: `0c0eb76782/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L156-L169)` This CL fixes this. To do that, this moves around some code, so we register necessary functions for both EH and SjLj and precompute some data (the set of functions that contains `setjmp`) before doing actual EH or SjLj transformation. This CL makes 2nd and 3rd tests in https://github.com/emscripten-core/emscripten/pull/14732 work. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106525	2021-07-26 13:48:31 -07:00
Lei Huang	64a15817a0	[PowerPC]Add addex instruction definition and MC tests Add td definitions and asm/disasm tests for the addex instruction introduced in ISA 3.0. Reviewed By: nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D106666	2021-07-26 14:55:38 -05:00
Lei Huang	2d788959ed	[PowerPC] Add implicit-def RM to instructions mtfsb[01] This is a followup patch for D105930 to add implicit-def of RM for mtfsb[01] instructions as per review comments. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D106603	2021-07-26 14:07:08 -05:00
Michael Liao	b0402a35fc	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Amara Emerson	6af8d36054	[AArch4][GlobalISel] Post-legalize combine s64 = G_MERGE s32, 0 -> G_ZEXT. These are generated as a byproduce of legalization. Differential Revision: https://reviews.llvm.org/D106768	2021-07-26 10:58:04 -07:00
Amara Emerson	0d41d21929	[AArch64][GlobalISel] Enable some select combines after legalization. The legalizer generates selects for some operations, which can have constant condition values, resulting in lots of dead code if it's not folded away. Differential Revision: https://reviews.llvm.org/D106762	2021-07-26 10:40:32 -07:00
Amara Emerson	dec34104bf	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Heejin Ahn	6b9aba43a2	[WebAssembly] Improve pseudocode in LowerEmscriptenEHSjLj Both `__THREW__` and `__threwValue` are global variables, and we have been distinguishing the global variable `__THREW__` and the loaded value `%__THREW__.val` in comments but not doing it for `__threwValue`. Made the pseudocode comments consistent for both variables. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D106524	2021-07-26 10:13:28 -07:00
Paul Walker	3b77e2737c	[SVE] Use reg+reg addressing mode for immediate offsets. For reg+imm SVE addressing mode imm is implictly scaled by VL, making them impractical for truely immediate offsets. However, if the offset can be unscaled based on the storage element type we can use the reg+reg SVE addressing mode and thus either reduce the number of generate add instructions or replace them with a mov instruction that can be hoisted from the hot code path. Differential Revision: https://reviews.llvm.org/D106744	2021-07-26 16:24:16 +01:00
Bradley Smith	81eafb8a37	[AArch64][SVE] Break false dependencies for inactive lanes of unary operations Differential Revision: https://reviews.llvm.org/D105889	2021-07-26 15:01:21 +00:00
Ulrich Weigand	8cd8120a7b	[SystemZ] Add support for new cpu architecture - arch14 This patch adds support for the next-generation arch14 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Detection of arch14 as host processor. - Assembler/disassembler support for new instructions. - New LLVM intrinsics for certain new instructions. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10304. Note: No currently available Z system supports the arch14 architecture. Once new systems become available, the official system name will be added as supported -march name.	2021-07-26 16:57:28 +02:00
Jay Foad	59f6865231	[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. D106284 did this for SelectionDAG. This is the corresponding fix for GlobalISel. Differential Revision: https://reviews.llvm.org/D106451	2021-07-26 14:27:30 +01:00
Jay Foad	9ac10658ae	[AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset. Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access. Differential Revision: https://reviews.llvm.org/D106284	2021-07-26 14:27:30 +01:00
David Green	010f8e3057	[ARM] Ensure correct regclass in distributing postinc The register class required for some MVE loads/stores is more constrained than the register we use when creating postinc. Make sure we constrain the register class to keep the code correct.	2021-07-26 14:26:38 +01:00
Tim Northover	a487a49acc	AArch64: support i128 (& larger) returns in GlobalISel	2021-07-26 14:16:35 +01:00
Caroline Concatto	bf28111ebd	[AArch65][SVE] Remove vector_splice from AddedComplexity pattern The pattern for vector_splice with Index equal or bigger than zero was misplaced in the AddedComplexity = 1 pattern in the AArch64 tablegen file. This patch fixes it by removing vector_splice pattern from inside AddedComplexity = 1.	2021-07-26 13:35:51 +01:00
Caroline Concatto	0bfc26e3a4	[SVE][AArch64] Improve code generation for vector_splice for Imm > 0 This patch implements vector_splice in tablegen for all cases when the Immediate is positive and lower than the known minimum value of a scalable vector. Vector_splice can be implemented using SVE instruction EXT. For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E Depends on D105633 Differential Revision: https://reviews.llvm.org/D106273	2021-07-26 11:45:46 +01:00
Caroline Concatto	73e4e9cd00	[AArch64][SVE] Improve code generation for vector_splice for Imm == -1 This patch implements vector_splice in tablegen for: a) when the immediate is equal to -1 (Imm==1) and uses: INSR + LASTB For instance : @llvm.experimental.vector.splice(Vector_1, Vector_2, -1) @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G> LAST RegLast, Vector_1 // RegLast = D INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G Differential Revision: https://reviews.llvm.org/D105633	2021-07-26 11:25:01 +01:00
Simon Pilgrim	c8472db0a8	[X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets Splatting the lower xmm with vinsertf128 is at least as quick as vperm2f128, and a lot faster on some AMD targets. First step towards PR50053	2021-07-26 11:11:56 +01:00
Cullen Rhodes	e6ff9179ce	[AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106635	2021-07-26 09:36:34 +00:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Simon Pilgrim	1cfecf4fc4	[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI. Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.	2021-07-25 20:37:58 +01:00
Kyungwoo Lee	6530ea4095	[AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog The stack adjustment for local deallocation was incorrectly ported. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106760	2021-07-25 10:51:11 -07:00
Simon Pilgrim	b95f66ad78	[X86][SSE] LowerRotate - perform modulo on the amount splat source directly. If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues. Fixes one of the concerns raised on D104156	2021-07-25 17:30:32 +01:00
Sanjay Patel	1ce05ad619	[x86] improve CMOV codegen by pushing add into operands, part 2 This is a minimum extension of D106607 to allow folding for 2 non-zero constantsi that can be materialized as immediates.. In the reduced test examples, we save 1 instruction by rolling the constants into LEA/ADD. In the motivating test from the bullet benchmark, we absorb both of the constant moves into add ops via LEA magic, so we reduce by 2 instructions. Differential Revision: https://reviews.llvm.org/D106684	2021-07-25 10:05:41 -04:00
Simon Pilgrim	15b883f457	[X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control	2021-07-25 14:05:11 +01:00
Amara Emerson	acbc0c5f0e	[AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping. For types like s96, we don't want to clamp to s64, we want to first widen to s128 and then narrow it. Otherwise we end up with impossible to legalize types.	2021-07-24 15:50:43 -07:00
Craig Topper	c63dbd8501	[RISCV] Custom lower (i32 (fptoui/fptosi X)). I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32) isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if the assertzexti32 has an additional user. If we add a one use check it would just cause a fcvt.lu followed by a sext.w when only need a fcvt.wu to satisfy both users. To mitigate this I've added custom isel and new ISD opcodes for fcvt.wu. This allows us to keep know it started life as a conversion to i32 without needing to match multiple nodes. ComputeNumSignBits has been taught that this new nodes produces 33 sign bits. To prevent regressions when we need to zero extend the result of an (i32 (fptoui X)), I've added a DAG combine to convert it to an (i64 (fptoui X)) before type legalization. In most cases this would happen in InstCombine, but a zero_extend can be created for function returns or arguments. To keep everything consistent I've added new nodes for fptosi as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106346	2021-07-24 10:50:43 -07:00
Ayke van Laethem	4d7f5c0a85	[AVR] Only support sp, r0 and r1 in llvm.read_register Most other registers are allocatable and therefore cannot be used. This issue was flagged by the machine verifier, because reading other registers is considered reading from an undefined register. Differential Revision: https://reviews.llvm.org/D96969	2021-07-24 14:03:27 +02:00
Ayke van Laethem	41f905b211	[AVR] Fix rotate instructions This patch fixes some issues with the RORB pseudo instruction. - A minor issue in which the instructions were said to use the SREG, which is not true. - An issue with the BLD instruction, which did not have an output operand. - A major issue in which invalid instructions were generated. The fix also reduce RORB from 4 to 3 instructions, so it's also a small optimization. These issues were flagged by the machine verifier. Differential Revision: https://reviews.llvm.org/D96957	2021-07-24 14:03:26 +02:00
Ayke van Laethem	6aa9e746eb	[AVR] Expand large shifts early in IR This patch makes sure shift instructions such as this one: %result = shl i32 %n, %amount are expanded just before the IR to SelectionDAG conversion to a loop so that calls to non-existing library functions such as __ashlsi3 are avoided. The generated code is currently pretty bad but there's a lot of room for improvement: the shift itself can be done in just four instructions. Differential Revision: https://reviews.llvm.org/D96677	2021-07-24 14:03:26 +02:00
Ayke van Laethem	431a941465	[AVR] Improve 8/16 bit atomic operations There were some serious issues with atomic operations. This patch should fix the biggest issues. For details on the issue take a look at this Compiler Explorer sample: https://godbolt.org/z/n3ndhn Code: void atomicadd(_Atomic char val) { val += 5; } Output: atomicadd: movw r26, r24 ldi r24, 5 ; 'operand' register in r0, 63 cli ld r24, X ; load value add r24, r26 ; value += X st X, r24 ; store value back out 63, r0 ret ; return the wrong value (in r24) There are various problems with this. - The value to add (5) is stored in r24. However, the value to add to is loaded in the same register: r24. - The `add` instruction adds half of the pointer to the loaded value, instead of (attempting to) add the operand with value 5. - The output value of the cmpxchg instruction (which is not used in this code sample) is the new value with 5 added, not the old value. The LangRef specifies that it has to be the old value, before the operation. This patch fixes the first two and leaves the third problem to be fixed at a later date. I believe atomics were mostly broken before this patch, with this patch they should become usable as long as you ignore the output of the atomic operation. In particular it fixes the following things: - It sets the earlyclobber flag for the input ('$operand' operand) so that the register allocator puts it in a different register than the output value. - It fixes a number of issues with the pseudo op expansion pass, for example now it adds the $operand field instead of the pointer. This fixes most machine instruction verifier issues (other flagged issues are unrelated to atomics). Differential Revision: https://reviews.llvm.org/D97127	2021-07-24 14:03:26 +02:00
Ayke van Laethem	8544ce80f8	[AVR] Set R31R30 as clobbered after ADJCALLSTACKDOWN In most cases, using R31R30 is fine because the call (which always precedes ADJCALLSTACKDOWN) will clobber R31R30 anyway. However, in some rare cases the register allocator might insert an instruction between the call and the ADJCALLSTACKDOWN instruction and expect the register pair to be live afterwards. I think this happens as a result of rematerialization. Therefore, to fix this, the instruction needs to have Defs set to R31R30. Setting the Defs field does have the effect of making the instruction look dead, which it certainly is not. This is fixed by setting hasSideEffects to true. Differential Revision: https://reviews.llvm.org/D97745	2021-07-24 14:03:26 +02:00
Ayke van Laethem	feda08b70a	[AVR] Do not chain stores in call frame setup Previously, AVRTargetLowering::LowerCall attempted to keep stack stores in order with chains. Perhaps this worked in the past, but it does not work now: it appears that the SelectionDAG legalization phase removes these chains. Therefore, I've removed these chains entirely to match X86 (which, similar to AVR, also prefers to use push instructions over stack-relative stores to set up a call frame). With this change, all the stack stores are in a somewhat reasonable order. Differential Revision: https://reviews.llvm.org/D97853	2021-07-24 14:03:26 +02:00
Alexander Belyaev	edb05d555e	[llvm] Inline getAssociatedFunction() in LLVM_DEBUG. Function* F is used only inside LLVM_DEBUG, so that it causes unused variable warning.	2021-07-24 11:49:21 +02:00
Amara Emerson	5ec0f051c8	[GlobalISel] Add GUnmerge, GMerge, GConcatVectors, GBuildVector abstractions. NFC. Use these to slightly simplify some code in the artifact combiner.	2021-07-23 22:32:26 -07:00
Kuter Dinel	96709823ec	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Thomas Lively	39c0e4afce	[WebAssembly][NFC] Simplify SIMD bitconvert pattern Differential Revision: https://reviews.llvm.org/D106680	2021-07-23 14:43:48 -07:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Craig Topper	cc6d302c91	[X86] Fix a bug in TEST with immediate creation This code tries to form a TEST from CMP+AND with an optional truncate in between. If we looked through the truncate, we may have extra bits in the AND mask that shouldn't participate in the checks. Normally SimplifyDemendedBits takes care of this, but the AND may have another user. So manually mask out any extra bits. Fixes PR51175. Differential Revision: https://reviews.llvm.org/D106634	2021-07-23 09:03:53 -07:00
Benjamin Kramer	dd70cd089a	[llvm][sve] Silence unused variable warning in Release builds. NFC	2021-07-23 16:16:35 +02:00

1 2 3 4 5 ...

63598 Commits