llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	522b259976	[AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940 Differential Revision: https://reviews.llvm.org/D121843	2022-03-17 12:12:06 -07:00
Vang Thao	27e1931508	[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these. Differential Revision: https://reviews.llvm.org/D121874	2022-03-17 11:38:53 -07:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Matt Arsenault	8d66603a48	Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case" This reverts commit c46aab01c002b7a04135b8b7f1f52d8c9ae23a58. This evidently blocks compiling in some cases that used to work before. I'm also not fully convinced this is the correct place to fix this problem.	2022-03-17 13:12:01 -04:00
Craig Topper	bbd2ecf9f0	[RISCV] Add +experimental-zvfh extension to cover half types in vectors. Currently we allow half types in vectors if the scalar Zfh extension is enabled. This behavior is not inline with the vector spec. For f32 and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control the availablity of floating point types in vectors. In order to make our compiler compliant, we either need to remove all support for half in vectors or we need an extension to control it. Draft spec here https://github.com/riscv/riscv-v-spec/pull/780 Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121345	2022-03-17 10:04:02 -07:00
Yonghong Song	d2b4a675a8	[BPF] Fix a bug in BPFAdjustOpt pass for icmp transformation When checking a bcc issue related to bcc tool inject.py, I found a bug in BPFAdjustOpt pass for icmp transformation, caused by typo's. For the following condition: Cond2Op != ICmpInst::ICMP_SLT && Cond1Op != ICmpInst::ICMP_SLE it should be Cond2Op != ICmpInst::ICMP_SLT && Cond2Op != ICmpInst::ICMP_SLE This patch fixed the problem and a test case is added. Differential Revision: https://reviews.llvm.org/D121883	2022-03-17 09:25:18 -07:00
David Green	0fa4aeb453	[AArch64] Add extra insert-subvector tests. NFC	2022-03-17 15:29:07 +00:00
Sanjay Patel	67e9151096	[x86] try harder to use shift instead of test if it can save some immediate bytes We favor 'and' and 'test' in earlier phases of optimization, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here. Differential Revision: https://reviews.llvm.org/D121147	2022-03-17 09:10:57 -04:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
Lian Wang	214afc7116	[RISCV] Add patterns for vnsrl.wi and vnsra.wi instructions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121675	2022-03-17 07:22:32 +00:00
Abinav Puthan Purayil	f59cb41ba1	[AMDGPU] Select buffer_atomic_cmpswap* in tblgen This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen. Differential Revision: https://reviews.llvm.org/D121770	2022-03-17 10:12:32 +05:30
Heejin Ahn	b8038a916d	[WebAssembly] Disable SimplifyDemandedVectorElts after legalization This fixes a reported bug that caused an infinite loop during the SelectionDAG optimization phase in ISel, by creating an overridable hook in `TargetLowering` that allows us to bail out from running `SimplifyDemandedVectorElts`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D121869	2022-03-16 20:52:43 -07:00
Heejin Ahn	0ca2132067	[WebAssembly] Improve EH/SjLj error messages This includes a function name and a relevant instruction in error messages when possible, making them more helpful. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D120678	2022-03-16 20:50:34 -07:00
Christudasan Devadasan	6dd21d1db1	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.	2022-03-17 08:27:53 +05:30
Christudasan Devadasan	af717d4aca	[AMDGPU][MachineVerifier] Alignment check for fp32 packed math instructions The fp32 packed math instructions are introduced in gfx90a. If their vector register operands are not properly aligned, the verifier should flag them. Currently, the verifier failed to report it and the compiler ended up emitting a broken assembly. This patch fixes that missed case in TII::verifyInstruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121794	2022-03-17 08:21:35 +05:30
Craig Topper	74cf8575f7	[RISCV] Remove stale FIXME from a test. NFC	2022-03-16 14:55:11 -07:00
Craig Topper	2e10671ec7	[RISCV] Improve detection of when to skip (and (srl x, c2) c1) -> (srli (slli x, c3-c2), c3) isel. We have a special case to skip this transform if c1 is 0xffffffff and x is sext_inreg in order to use sraiw+zext.w. But we were only checking that we have a sext_inreg opcode, not how many bits are being sign extended. This commit adds a check that it is a sext_inreg from i32 so we know for sure that an sraiw can be created.	2022-03-16 14:54:34 -07:00
Arthur Eubanks	2371c5a0e0	[OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Basically the same as D120527. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D121847	2022-03-16 14:11:53 -07:00
Thomas Lively	7e8913d775	[WebAssembly] Fix names of SIMD instructions containing '_zero' Fix the instruction names to match the WebAssembly spec: - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero` - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero` Also rename related things like intrinsics, builtins, and test functions to match. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D121661	2022-03-16 13:34:57 -07:00
Yonghong Song	98e2274458	[BPF] fix a CO-RE bitfield relocation error with >8 record alignment Jussi Maki reported a fatal error like below for a bitfield CO-RE relocation: fatal error: error in backend: Unsupported field expression for llvm.bpf.preserve.field.info, requiring too big alignment The failure is related to kernel struct thread_struct. The following is a simplied example. Suppose we have below structure: struct t2 { int a[8]; } __attribute__((aligned(64))) __attribute__((preserve_access_index)); struct t1 { int f1:1; int f2:2; struct t2 f3; } __attribute__((preserve_access_index)); Note that struct t2 has aligned 64, which is used sometimes in the kernel to enforce cache line alignment. The above struct will be encoded into BTF and the following is what C code looks like and the struct will appear in the file like vmlinux.h. struct t2 { int a[8]; long: 64; long: 64; long: 64; long: 64; } __attribute__((preserve_access_index)); struct t1 { int f1: 1; int f2: 2; long: 61; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; struct t2 f3; } __attribute__((preserve_access_index)); Note that after origin_source -> BTF -> new_source transition, the new source has the same memory layout as the old one but the alignment interpretation inside the compiler could be different. The bpf program will use the later explicitly padded structure as in vmlinux.h. In the above case, the compiler internal ABI alignment for new struct t1 is 16 while it is 4 for old struct t1. I didn't do a thorough investigation why the ABI alignment is 16 and I suspect it is related to anonymous padding in the above. Current BPF bitfield CO-RE handling requires alignment <= 8 so proper bitfield operatin can be performed. Therefore, alignment 16 will cause a compiler fatal error. To fix the ABI alignment >=16, let us check whether the bitfield can be held within a 8-byte-aligned range. If this is the case, we can use alignment 8. Otherwise, a fatal error will be reported. Differential Revision: https://reviews.llvm.org/D121821	2022-03-16 12:16:46 -07:00
Jessica Clarke	659363c0cc	[RISCV] Ensure PseudoLA* can be hoisted Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved. Fixes https://github.com/llvm/llvm-project/issues/54372 Reviewed By: MaskRay, arichardson Differential Revision: https://reviews.llvm.org/D121654	2022-03-16 18:45:36 +00:00
Jessica Clarke	883f755639	[NFC][RISCV] Pre-commit tests for hoisting of PseudoLLA/PseudoLA* Only PseudoLLA is currently hoisted; this will be fixed in a subsequent commit.	2022-03-16 18:45:19 +00:00
Jake Egan	c7dc9dbaff	[VE] Remove output to /dev/stdout Sending output to /dev/stdout on AIX gets an llc permission denied error, so this patch removes this from the tests. Reviewed By: simoll, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D121799	2022-03-16 11:42:09 -04:00
Simon Pilgrim	e3deb7d88b	[X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling Fixes #54171	2022-03-16 11:05:36 +00:00
Simon Pilgrim	330b532a34	[X86] Add PR54171 test case	2022-03-16 11:05:36 +00:00
Simon Moll	91fad1167a	[VE] v512\|256 f32\|64 fneg isel and tests fneg instruction isel and tests. We do this also in preparation of fused negatate-multiple-add fp operations. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D121620	2022-03-16 11:31:26 +01:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for 7662d1687b09 (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
David Green	09a2b5b506	[AArch64] Regenerate and extend peephole-and-tst.ll tests. NFC	2022-03-16 09:44:20 +00:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Nikita Popov	57d57b1afd	[AAEval] Make compatible with opaque pointers With opaque pointers, we cannot use the pointer element type to determine the LocationSize for the AA query. Instead, -aa-eval tests are now required to have an explicit load or store for any pointer they want to compute alias results for, and the load/store types are used to determine the location size. This may affect ordering of results, and sorting within one result, as the type is not considered part of the sorted string anymore. To somewhat minimize the churn, printing still uses faux typed pointer notation.	2022-03-16 10:02:11 +01:00
Haocong.Lu	6a54776fe0	[RISCV] Select SRLI+SLLI for AND with leading ones mask Select SRLI+SLLI for and i64 %x, imm if the imm is a leading ones mask. It's useful in RV64 when the mask exceeds simm32 (cannot be generated by LUI). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121598	2022-03-16 02:10:57 +00:00
Matthias Braun	84ef62126a	X86ISelDAGToDAG: Transform TEST + MOV64ri to SHR + TEST Optimize a pattern where a sequence of 8/16 or 32 bits is tested for zero: LLVM normalizes this towards and `AND` with mask which is usually good, but does not work well on X86 when the mask does not fit into a 64bit register. This DagToDAG peephole transforms sequences like: ``` movabsq $562941363486720, %rax # imm = 0x1FFFE00000000 testq %rax, %rdi ``` to ``` shrq $33, %rdi testw %di, %di ``` The result has a shorter encoding and saves a register if the tested value isn't used otherwise. Differential Revision: https://reviews.llvm.org/D121320	2022-03-15 14:18:04 -07:00
Matthias Braun	baae814377	Add tests for D121320 Differential Revision: https://reviews.llvm.org/D121319	2022-03-15 14:18:04 -07:00
Stefan Pintilie	78406ac898	[PowerPC][P10] Add Vector pair calling convention Add the calling convention for the vector pair registers. These registers overlap with the vector registers. Part of an original patch by: Lei Huang Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117225	2022-03-15 14:08:42 -05:00
Joe Nash	687d20de7f	[AMDGPU] Regen checks again no-remat-indirect-mov NFC. Update script does not behave right since the run lines have identical output. Delete the duplicated check prefix added in 22cfbf7ecacdf7db47c2f65fe896bdf62ebcc0f3	2022-03-15 13:44:41 -04:00
Joe Nash	4cf86bd744	[AMDGPU] Regen checks for schedule-barrier NFC. Hasn't been updated since script added check-next	2022-03-15 13:35:43 -04:00
Joe Nash	22cfbf7eca	[AMDGPU] Regen checks for no-remat-indirect-mov NFC. Hasn't been updated since the update script started adding check-next. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121719	2022-03-15 13:33:42 -04:00
Craig Topper	1bf4bbc492	[LegalizeTypes][RISCV][WebAssembly] Expand ABS in PromoteIntRes_ABS if it will expand to sra+xor+sub later. If we promote the ABS and then Expand in LegalizeDAG, then both the sra and the xor will have their inputs sign extended. This generates extra code on RISCV which lacks an i8 or i16 sign extend instructon. If we expand during type legalization, then only the sra will get its input sign extended. RISCV is able to combine this with the sra by doing a shift left followed by an sra. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121664	2022-03-15 08:27:39 -07:00
Craig Topper	ad94dfb9a0	[DAGCombiner][RISCV] Adjust (aext (and (trunc x), cst)) -> (and x, cst) to sext cst based on target preference RISCV strong prefers i32 values be sign extended to i64. This combine was always zero extending the constant using APInt methods. This adjusts the code so that it calls getNode using ISD::ANY_EXTEND instead. getNode will call TLI.isSExtCheaperThanZExt to decide how to handle the constant. Tests were copied from D121598 where I noticed that we were creating constants that were hard to materialize. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121650	2022-03-15 08:26:47 -07:00
Simon Moll	6ac3d8ef9c	[VE] strided v256.23 isel and tests ISel for experimental.vp.strided.load\|store for v256.32 types via lowering to vvp_load\|store SDNodes. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D121616	2022-03-15 15:29:19 +01:00
Simon Pilgrim	7262eacd41	Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO" Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'	2022-03-15 13:01:35 +00:00
Simon Pilgrim	f591231cad	[X86] combineSelect - canonicalize (vXi1 bitcast(iX Cond)) with combineToExtendBoolVectorInReg before legalization This replaces the attempt in 20af71f8ec47319d375a871db6fd3889c2487cbd to use combineToExtendBoolVectorInReg to create X86ISD::BLENDV masks directly, instead we use it to canonicalize the iX bitcast to a sign-extended mask and then truncate it back to vXi1 prior to legalization breaking it apart. Fixes #53760	2022-03-15 12:16:11 +00:00
Qiu Chaofan	300e1293de	[PowerPC] Disable perfect shuffle by default We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask: %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty. The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D121082	2022-03-15 15:52:24 +08:00
Julian Lettner	9c542a5a4e	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121327	2022-03-14 17:51:18 -07:00
Stanislav Mekhanoshin	c4500de255	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634	2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin	1f53f20fc1	[AMDGPU] Support gfx940 v_lshl_add_u64 instruction Differential Revision: https://reviews.llvm.org/D121401	2022-03-14 15:45:42 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Stanislav Mekhanoshin	47bac63d3f	[AMDGPU] gfx940 memory model Differential Revision: https://reviews.llvm.org/D121242	2022-03-14 15:01:46 -07:00
Simon Pilgrim	edd3e705bb	[X86] Fix avx512.mask.vpshld/vpshrd tests to correctly test maskz cases	2022-03-14 21:20:26 +00:00
Stanislav Mekhanoshin	72a9e5f891	[AMDGPU] Restrict machine copy propagation from creating unaligned classes Fixes: SWDEV-326366 Differential Revision: https://reviews.llvm.org/D121491	2022-03-14 14:09:40 -07:00

1 2 3 4 5 ...

42604 Commits