llvm-project

Author	SHA1	Message	Date
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Roman Lebedev	dbce1110f1	[NFC][DAG] Move `getOpcode_EXTEND*()` helpers from X86 into SelectionDAG To be used in an upcoming patch.	2023-01-05 01:12:30 +03:00
Roman Lebedev	e4b260efb2	[Codegen][X86] `LowerBUILD_VECTOR()`: improve lowering w/ multiple FREEZE-UNDEF ops While we have great handling for UNDEF operands, FREEZE-UNDEF operands are effectively normal operands. We are better off "interleaving" such BUILD_VECTORS into a blend between a splat of FREEZE-UNDEF, and "thawed" source BUILD_VECTOR, both of which are more natural for us to handle. Refs. `f738ab9075 (r95017306)`	2023-01-04 21:16:11 +03:00
Thomas Köppe	82be8a1d2b	[X86] Emit RIP-relative access to local function in PIC medium code model Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.) Example: ``` static int g(int n) { return 2 * n + 3; } void f(int(*p)(int)) { p = g; } ``` This results in: ``` Disassembly of section .text: 0000000000000000 <f>: 0: 48 b8 00 00 00 00 00 00 00 00 movabs rax, 0x0 a: 48 89 07 mov qword ptr [rdi], rax d: c3 ret ``` ``` Relocation section '.rela.text' at offset 0xf0 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000000002 0000000200000001 R_X86_64_64 0000000000000000 .text + 10 ``` This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140593	2022-12-28 11:14:39 -08:00
Evgenii Kudriashov	15dd5ed96c	[X86] Support ANDNP combine through vector_shuffle Combine ``` and (vector_shuffle<Z,...,Z> (insert_vector_elt undef, (xor X, -1), Z), undef), Y -> andnp (vector_shuffle<Z,...,Z> (insert_vector_elt undef, X, Z), undef), Y ``` Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D138521	2022-12-22 16:55:14 +08:00
Craig Topper	eeb8de9363	[X86] Replace getOperand calls with an existing variable. NFC	2022-12-20 19:27:11 -08:00
Qiu Chaofan	a40ef656d8	[Intrinsic] Rename flt.rounds intrinsic to get.rounding Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding intrinsic to replace flt.rounds. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D139507	2022-12-19 15:22:39 +08:00
Simon Pilgrim	37c3b83bd8	[X86] combineBitcastvxi1 - handle boolmask sign-extension through vselect See if we can freely sign-extend both sources of a vselect operand, also handle allones constant build vectors (easily rematerializable and uses in the test case). Fixes #59526	2022-12-15 16:40:44 +00:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Simon Pilgrim	463910ab2a	[X86] Don't fold scalar_to_vector(i64 C) -> vzext_movl(scalar_to_vector(i32 C)) Fixes constant-folding infinite loop reported by @uabelho on rG5ca77541446d	2022-12-14 12:11:06 +00:00
Simon Pilgrim	4f41ea2016	[X86] lowerShuffleAsVTRUNC - bit shift the offset elements into place instead of shuffle This helps avoid issues on non-BWI targets which can end up splitting the shuffles to 2 x 256-bit bitshifts of a smaller scalar width	2022-12-14 11:41:14 +00:00
Simon Pilgrim	b3eaf40166	[X86] lowerShuffleAsVTRUNC - improve detection of cheap/free vector concatenation Handle the case where the lo/hi subvectors are a split load.	2022-12-14 10:49:44 +00:00
Phoebe Wang	57f71dccd3	[NFC] Fix duplicated `Src`	2022-12-13 22:44:28 +08:00
Simon Pilgrim	4177e6cd4f	[X86] lowerShuffleAsVTRUNC - support offseted truncations Extend the <0,Scale,2Scale,..> pattern to allow for a fixed offset <Offset,Offset+Scale,Offset+2Scale,..> pattern, which will lower to a single additional bitshift/pshufd. At the moment I've limited this to cases where the LHS/RHS operands are concatenated for free, but this is only to avoid a couple of regressions that should be easily addressable in followups.	2022-12-13 14:00:35 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Krzysztof Parzyszek	864aaa21b4	TargetLowering: convert Optional to std::optional	2022-12-01 16:19:10 -08:00
Phoebe Wang	54ebf1c4a1	[X86][FP16] Do not combine fminnum/fmaxnum for FP16 emulation Under the emulation situation, we lack native fmin/fmax instruction support. Fixes #59258 Reviewed By: skan, spatel Differential Revision: https://reviews.llvm.org/D139078	2022-12-01 23:24:40 +08:00
Freddy Ye	89f36dd8f3	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241	2022-12-01 13:47:43 +08:00
Simon Pilgrim	c757780c62	[X86] lowerShuffleAsDecomposedShuffleMerge - try to match unpck(permute(x),permute(y)) for v4i32/v2i64 shuffles We're using lowerShuffleAsPermuteAndUnpack, which can probably be improved to handle 256/512-bit types pretty easily. First step towards trying to address the poor vector-shuffle-sse4a.ll pre-SSSE3 codegen mentioned on D127115	2022-11-25 16:24:56 +00:00
Simon Pilgrim	38275ab1b3	[X86] Move lowerShuffleAsPermuteAndUnpack earlier in the source next to similar helpers. NFC. I'm currently investigating using this inside lowerShuffleAsDecomposedShuffleMerge	2022-11-25 14:56:38 +00:00
Simon Pilgrim	6fd0ae39be	[X86] combineScalarAndWithMaskSetcc - handle (concat_vectors (and (vYi1 setcc, vYi1 x), undef)) patterns If one of the AND operands is a setcc then we're implicitly zeroing the upper mask bits Similar pattern to regressions identified in D127115 (masked comparisons)	2022-11-25 11:16:24 +00:00
Simon Pilgrim	dbe2f44316	[X86] combineScalarAndWithMaskSetcc - optionally peek through (oneuse) any_extend node Extend pass to handle: (and (any_extend (bitcast (vXi1 (concat_vectors (vYi1 setcc), undef,)))), C) Fixes several regressions identified in D127115	2022-11-24 16:26:35 +00:00
Phoebe Wang	7218103bca	[X86] Use lock add/sub/or/and/xor for cases that we only care about the EFLAGS (negated cases) This fixes #58685 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D138428	2022-11-23 09:39:04 +08:00
Davide Italiano	0c011335c9	[X86] Don't lower f16->f80 fpext to libcall on darwin. We don't provide __extendhfxf2, and only have the soft-float __extendhfsf2 in compiler-rt. This only changed recently with 655ba9c8a1d2, so this patch reverts back to the previous behavior. However, the f80->f16 fptrunc is not easily implementable without the compiler-rt __truncxfhf2, but that has always been true, and isn't an immediate regression. Patch by Ahmed Bougacha. rdar://102194995	2022-11-22 12:32:22 -08:00
Phoebe Wang	b39b76f2ef	[X86] Allow no X87 on 32-bit This patch is an alternative of D100091. It solved the problems in `f80` type lowering. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D137946	2022-11-22 10:47:47 +08:00
Benjamin Kramer	e2bff1e489	[X86] Fix atomic rmw intrinsic expansion for non-opaque pointers This is a bit annoying, but there are still users out there that got broken by this (this time it was numba). We need to keep some barebones support around until non-opaque pointers are completely gone.	2022-11-20 15:39:30 +01:00
Phoebe Wang	510e5fba16	[X86] Use lock or/and/xor for cases that we only care about the EFLAGS This is a follow up of D137711 to fix the reset of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D138294	2022-11-20 10:42:48 +08:00
Phoebe Wang	d558255650	[X86] Use lock add/sub for cases that we only care about the EFLAGS This fixes #36373, #36905 and partial of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137711	2022-11-18 21:43:47 +08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Simon Pilgrim	ff252e6b13	[X86] combineConcatVectorOps - don't concat(vselect,vselect) if the concatenated selection mask isn't legal One of the crash regression tests now exposes an existing issue with SelectionDAG::simplifySelect not folding vselect with constant masks Fixes #59003	2022-11-16 11:49:14 +00:00
Tim Northover	2bcf51c7f8	X86: call fp16-conversion functions soft-float on Darwin. We've been shipping implementations of these with a soft-float ABI since MacOS 10.10 in 2014 and there's evidence they're in binaries now, so we can't easily switch to %xmm0. This emits special libcalls with casts in place to restore the soft-float ABI for __truncdfhf2, __truncsfhf2, and __extendhfsf2.	2022-11-10 10:00:01 +00:00
Nathan James	6aa050a690	Reland "[llvm][NFC] Use c++17 style variable type traits" This reverts commit 632a389f96355cbe7ed8fa7b8d2ed6267c92457c. This relands commit 1834a310d060d55748ca38d4ae0482864c2047d8. Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 14:15:15 +00:00
Nathan James	632a389f96	Revert "[llvm][NFC] Use c++17 style variable type traits" This reverts commit 1834a310d060d55748ca38d4ae0482864c2047d8.	2022-11-08 13:11:41 +00:00
Nathan James	1834a310d0	[llvm][NFC] Use c++17 style variable type traits This was done as a test for D137302 and it makes sense to push these changes Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 12:22:52 +00:00
Simon Pilgrim	90ec51a9ab	[X86] combineConcatVectorOps - fold 512-bit concat(GF2P8AFFINEQB(x,y,c),GF2P8AFFINEQB(z,w,c)) -> GF2P8AFFINEQB(concat(x,z),concat(y,w),c) Now that D137036 has landed, we just need AVX512F support to generate 512-bit GF2P8AFFINEQB ops	2022-11-01 12:06:46 +00:00
Freddy Ye	aee2a35ac4	[X86] Add AVX-NE-CONVERT instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D135930	2022-10-31 23:39:38 +08:00
Simon Pilgrim	b172c7e193	[X86] combineConcatVectorOps - fold concat(GF2P8AFFINEQB(x,y,c),GF2P8AFFINEQB(z,w,c)) -> GF2P8AFFINEQB(concat(x,z),concat(y,w),c) Pulled out of D137026	2022-10-31 12:27:57 +00:00
Freddy Ye	23f02693ec	[X86] Add AVX-VNNI-INT8 instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D135938	2022-10-28 10:39:54 +08:00
Phoebe Wang	b51b90d6e2	[X86][1/2] SUPPORT RAO-INT For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Initial authored by Liu Chen (@LiuChen3) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D135951	2022-10-27 17:20:07 +08:00
Simon Pilgrim	ed1b0da557	[X86] combineConcatVectorOps - fold v4i64/v8x32 concat(broadcast(),broadcast()) -> permilps(concat()) Extend the existing v4f64 fold to handle v4i64/v8f32/v8i32 as well Fixes #58585	2022-10-25 15:37:42 +01:00
Simon Pilgrim	c4051b2606	[X86] Fold vbroadcast(bitcast(vbroadcast(src))) -> bitcast(vbroadcast(vbroadcast(src))) If the inner broadcast scalar type is smaller/same width as the outer broadcast scalar type then we can broadcast using the same inner type directly. Works for vbroadcast_load as well.	2022-10-25 14:03:43 +01:00
Freddy Ye	fdac4c4e92	[X86] Add CMPCCXADD instructions. For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: pengfei, skan Differential Revision: https://reviews.llvm.org/D135933	2022-10-25 14:33:39 +08:00
Simon Pilgrim	4e8f847676	[X86][AVX512] Fold extract_element(bitcast(<X x i1>) -> bitcast(extract_subvector()) On AVX512, extract legal bool vectors as bool subvectors before bitcasting to scalars to avoid spilling to stack. This helps rust which internally represents bool vectors as bool arrays It also exposes more missed opportunities to use the KADD instruction to add masks together before moving to gpr Fixes #58546	2022-10-23 14:47:24 +01:00
Simon Pilgrim	c175d880a4	[X86] Add freeze(pshufd/permilps(x,imm)) -> pshufd/permilps(freeze(x),imm) folding Add X86 isGuaranteedNotToBeUndefOrPoisonForTargetNode / canCreateUndefOrPoisonForTargetNode overrides and add X86ISD::PSHUFD/VPERMILPI handling.	2022-10-23 10:39:12 +01:00
Kazu Hirata	5bb00cd309	[llvm] Use llvm::is_contained (NFC)	2022-10-22 08:57:37 -07:00
Xiang1 Zhang	661881d436	[X86] Add AMX-FP16 instructions. Differential Revision: https://reviews.llvm.org/D135941	2022-10-22 08:05:22 +08:00
Simon Pilgrim	5ca7754144	[X86] Fold scalar_to_vector(i64 zext(x)) -> bitcast(vzext_movl(scalar_to_vector(i32 x))) Extends existing anyextend fold to make use of the implicit zero-extension of the movd instruction This also helps replace some nasty xmm->gpr->xmm traffic with a shuffle pattern instead Noticed while looking at D130953	2022-10-21 10:40:13 +01:00
Phoebe Wang	bc1819389f	[X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics This is an alternative of D120395 and D120411. Previously we use `__bfloat16` as a typedef of `unsigned short`. The name may give user an impression it is a brand new type to represent BF16. So that they may use it in arithmetic operations and we don't have a good way to block it. To solve the problem, we introduced `__bf16` to X86 psABI and landed the support in Clang by D130964. Now we can solve the problem by switching intrinsics to the new type. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D132329	2022-10-19 23:47:04 +08:00
Han Zhu	d0d48a91f8	[X86] Lower vector interleave into unpck and perm [This Godbolt link](https://godbolt.org/z/s17Kv1s9T) shows different codegen between clang and gcc for a transpose operation. clang result: ``` vmovdqu xmm0, xmmword ptr [rcx + rax] vmovdqu xmm1, xmmword ptr [rcx + rax + 16] vmovdqu xmm2, xmmword ptr [r8 + rax] vmovdqu xmm3, xmmword ptr [r8 + rax + 16] vpunpckhbw xmm4, xmm2, xmm0 vpunpcklbw xmm0, xmm2, xmm0 vpunpcklbw xmm2, xmm3, xmm1 vpunpckhbw xmm1, xmm3, xmm1 vmovdqu xmmword ptr [rdi + 2rax + 48], xmm1 vmovdqu xmmword ptr [rdi + 2rax + 32], xmm2 vmovdqu xmmword ptr [rdi + 2rax], xmm0 vmovdqu xmmword ptr [rdi + 2rax + 16], xmm4 ``` gcc result: ``` vmovdqu ymm3, YMMWORD PTR [rdi+rax] vpunpcklbw ymm1, ymm3, YMMWORD PTR [rsi+rax] vpunpckhbw ymm0, ymm3, YMMWORD PTR [rsi+rax] vperm2i128 ymm2, ymm1, ymm0, 32 vperm2i128 ymm1, ymm1, ymm0, 49 vmovdqu YMMWORD PTR [rcx+rax2], ymm2 vmovdqu YMMWORD PTR [rcx+32+rax2], ymm1 ``` clang's code is roughly 15% slower than gcc's when evaluated on an internal compression benchmark. The loop vectorizer generates the following shufflevector intrinsic: ``` %interleaved.vec = shufflevector <32 x i8> %a, <32 x i8> %b, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63> ``` which is lowered to SelectionDAG: ``` t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0 t6: v64i8 = concat_vectors t2, undef:v32i8 t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1 t7: v64i8 = concat_vectors t4, undef:v32i8 t8: v64i8 = vector_shuffle<0,64,1,65,2,66,3,67,4,68,5,69,6,70,7,71,8,72,9,73,10,74,11,75,12,76,13,77,14,78,15,79,16,80,17,81,18,82,19,83,20,84,21,85,22,86,23,87,24,88,25,89,26,90,27,91,28,92,29,93,30,94,31,95> t6, t7 ``` So far this `vector_shuffle` is good enough for us to pattern-match and transform, but as we go down the SelectionDAG pipeline, it got split into smaller shuffles. During dagcombine1, the shuffle is split by `foldShuffleOfConcatUndefs`. ``` // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0 t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1 t19: v32i8 = vector_shuffle<0,32,1,33,2,34,3,35,4,36,5,37,6,38,7,39,8,40,9,41,10,42,11,43,12,44,13,45,14,46,15,47> t2, t4 t15: ch,glue = CopyToReg t0, Register:v32i8 $ymm0, t19 t20: v32i8 = vector_shuffle<16,48,17,49,18,50,19,51,20,52,21,53,22,54,23,55,24,56,25,57,26,58,27,59,28,60,29,61,30,62,31,63> t2, t4 t17: ch,glue = CopyToReg t15, Register:v32i8 $ymm1, t20, t15:1 ``` With `foldShuffleOfConcatUndefs` commented out, the vector is still split later by the type legalizer, which comes after dagcombine1, because v64i8 is not a legal type in AVX2 (64 * 8 = 512 bits while ymm = 256 bits). There doesn't seem to be a good way to avoid this split. Lowering the `vector_shuffle` into unpck and perm during dagcombine1 is too early. Therefore, although somewhat inconvenient, we decided to go with pattern-matching a pair vector shuffles later in the SelectionDAG pipeline, as part of `lowerV32I8Shuffle`. The code looks at the two operands of the first shuffle it encounters, iterates through the users of the operands, and tries to find two shuffles that are consecutive interleaves. Once the pattern is found, it lowers them into unpcks and perms. It returns the perm for the shuffle that's currently being lowered (have ISel modify the DAG), and replaces the other shuffle in place. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D134477	2022-10-17 11:39:27 -07:00
Xiang1 Zhang	aad013de41	[InlineAsm][bugfix] Correct function addressing in inline asm In Linux PIC model, there are 4 cases about value/label addressing: Case 1: Function call or Label jmp inside the module. Case 2: Data access (such as global variable, static variable) inside the module. Case 3: Function call or Label jmp outside the module. Case 4: Data access (such as global variable) outside the module. Due to current llvm inline asm architecture designed to not "recognize" the asm code, there are quite troubles for us to treat mem addressing differently for same value/adress used in different instuctions. For example, in pic model, call a func may in plt way or direclty pc-related, but lea/mov a function adress may use got. This patch fix/refine the case 1 and case 2 in inline asm. Due to currently inline asm didn't support jmp the outsider lable, this patch mainly focus on fix the function call addressing bugs in inline asm. Reviewed By: Pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D133914	2022-10-14 09:47:26 +08:00

1 2 3 4 5 ...

8239 Commits