llvm-project

Author	SHA1	Message	Date
Farzon Lotfi	36b86438d7	[DXIL] Implement pow lowering (#86733 ) closes #86179 - `DXILIntrinsicExpansion.cpp` - add the pow expansion to exp2(y*log2(x))	2024-03-28 12:32:28 -04:00
Craig Topper	f90813543b	[MCP] Use MachineInstr::all_defs instead of MachineInstr::defs in hasOverlappingMultipleDef. (#86889 ) defs does not return the defs for inline assembly. We need to use all_defs to find them. Fixes #86880.	2024-03-28 08:37:19 -07:00
Amy Kwan	a3efc53f16	[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053 ) Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.	2024-03-28 09:18:45 -04:00
Freddy Ye	36b4b9d988	[X86] Support immediate folding for CCMP/CTEST (#86616 ) E.g. %0:gr32 = MOV32ri 81 CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags => CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags	2024-03-28 18:54:32 +08:00
bvlgah	e640d9e725	[RISCV][GlobalISel] Fix legalizing ‘llvm.va_copy’ intrinsic (#86863 ) Hi, I spotted a problem when running benchmarking programs on a RISCV64 device. ## Issue Segmentation faults only occurred while running the programs compiled with `GlobalISel` enabled. Here is a small but complete example (it is adopted from [Google's benchmark framework](`95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119)`) to reproduce the issue, ```cpp #include <cstdarg> #include <cstdio> #include <iostream> #include <memory> #include <string> std::string FormatString(const char* msg, va_list args) { // we might need a second shot at this, so pre-emptivly make a copy va_list args_cp; va_copy(args_cp, args); std::size_t size = 256; char local_buff[256]; auto ret = vsnprintf(local_buff, size, msg, args_cp); va_end(args_cp); // currently there is no error handling for failure, so this is hack. // BM_CHECK(ret >= 0); if (ret == 0) // handle empty expansion return {}; else if (static_cast<size_t>(ret) < size) return local_buff; else { // we did not provide a long enough buffer on our first attempt. size = static_cast<size_t>(ret) + 1; // + 1 for the null byte std::unique_ptr<char[]> buff(new char[size]); ret = vsnprintf(buff.get(), size, msg, args); // BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size); return buff.get(); } } std::string FormatString(const char* msg, ...) { va_list args; va_start(args, msg); auto tmp = FormatString(msg, args); va_end(args); return tmp; } int main() { std::string Str = FormatString("%-*s %13s %15s %12s", static_cast<int>(20), "Benchmark", "Time", "CPU", "Iterations"); std::cout << Str << std::endl; } ``` Use `clang++ -fglobal-isel -o main main.cpp` to compile it. ## Cause I have examined MIR, it shows that these segmentation faults resulted from a small mistake about legalizing the intrinsic function `llvm.va_copy`. `36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)` `DstLst` and `Tmp` are placed in the wrong order. ## Changes I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so that `s0` is used as the frame pointer (not in all checks) which points to the starting address of the save area. I believe that it helps reason about how `llvm.va_copy` is handled.	2024-03-28 13:09:18 +03:00
Luke Lau	eff4593a64	[RISCV] Add test case for missed vwaddu.vv due to add->or combine. NFC We should be able to recover this with combineBinOp_VLToVWBinOp_VL if we check that the or has the disjoint flag set.	2024-03-28 16:58:52 +08:00
Vyacheslav Levytskyy	b7ac8fddb5	[SPIR-V] Improve type inference: deduce types of composite data structures (#86782 ) This PR improves type inference in general and deduces types of composite data structures in particular. Also added a way to insert a bitcast to make a fun call valid in case of arguments types mismatch due to opaque pointers type inference. The attached test `pointers/nested-struct-opaque-pointers.ll` demonstrates new capabilities: the SPIRV code emitted for this test is now (1) valid in a sense of data field types and (2) accepted by `spirv-val`. More strict LIT checks, support of more composite data structures and improvement of fun calls from the perspective of type correctness are main todo's at the moment.	2024-03-28 08:08:06 +01:00
Heejin Ahn	6b7ecc7979	Revert "[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633 )" This reverts commit 52431fdb1ab8d29be078edd55250e06381e4b6b0. The PR assumed `__threwValue` couldn't be 0, but it could be when the thrown thing is not a longjmp but an exception, so that `if` check was actually necessary.	2024-03-28 04:41:29 +00:00
Eli Friedman	036e7ee9d1	[NFC][AArch64] Regenerate regression tests.	2024-03-27 17:08:02 -07:00
Philip Reames	8881281902	[RISCV] Add test coverage for (add (shl Z, c1), Y, (shl Z, c2)) variants Basically, testing for interaction of shNadd matching with one step of reassociation in the add.	2024-03-27 14:47:45 -07:00
Shilei Tian	0a43ca731b	[AMDGPU] Fix missing `IsExact` flag when expanding vector binary operator (#86712 )	2024-03-27 17:40:58 -04:00
Florian Hahn	b9cd48f96a	Revert "[TBAA] Add verifier for tbaa.struct metadata (#86709 )" This reverts commit df75183d70e029352a49c93f275db703c81a65c1. Revert for now as this appears to cause failures on some buildbots, e.g.: https://lab.llvm.org/buildbot/#/builders/93/builds/19428/steps/10/logs/stdio	2024-03-27 21:22:15 +00:00
Craig Topper	acab142751	[LegalizeDAG] Freeze index when converting insert_elt/insert_subvector to load/store on stack. We try clamp the index to be within the bounds of the stack object we create, but if we don't freeze it, poison can propagate into the clamp code. This can cause the access to leave the bounds of the stack object. We have other instances of this issue in type legalization and extract_elt/subvector, but posting this patch first for direction check. Fixes #86717	2024-03-27 13:01:23 -07:00
Craig Topper	0d7ea50d20	[AArch64] Pre-commit test for #86717 . NFC	2024-03-27 13:01:23 -07:00
David Green	36e74cfdbd	[AArch64] Clear kill flags when removing FMOVDr. (#86308 ) The uses of OldDef/NewDef may not be killed in the same place they previously were after they are replaced, and so need to be cleared.	2024-03-27 18:36:02 +00:00
Heejin Ahn	52431fdb1a	[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633 ) Currently the code thinks a `longjmp` occurred if both `__THREW__` and `__threwValue` are nonzero. But `__threwValue` can be 0, and the `longjmp` library function should change it to 1 in case it is 0: https://en.cppreference.com/w/c/program/longjmp Emscripten libraries were not consistent about that, but after https://github.com/emscripten-core/emscripten/pull/21493 and https://github.com/emscripten-core/emscripten/pull/21502, we correctly pass 1 in case the input is 0. So there will be no case `__threwValue` is 0. And regardless of what `longjmp` library function does, treating `longjmp`'s 0 input to its second argument as "not longjmping" doesn't seem right. I'm not sure where that `__threwValue` checking came from, but probably I was porting then fastcomp's implementation and moved this part just verbatim: `9bdc7bb4fc/lib/Target/JSBackend/CallHandlers.h (L274-L278)` Just for the context, how this was discovered: https://github.com/emscripten-core/emscripten/pull/21502#pullrequestreview-1942160300	2024-03-27 11:11:16 -07:00
Simon Pilgrim	5d3ef06509	[X86] combine-pavg.ll - add demandedelts test coverage for #86284	2024-03-27 17:15:48 +00:00
Simon Pilgrim	dcd0f2b610	[X86] combineExtractFromVectorLoad support extraction from vector of different types to the extraction type/index combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles. Fixes #85419	2024-03-27 17:01:41 +00:00
Simon Pilgrim	f92fa7e2cf	[X86] Add -verify-machineinstrs to huge stack tests Help identify EXPENSIVE_CHECKS regressions identified in #84114	2024-03-27 16:26:10 +00:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
David Green	313bf28f98	[ARM][MVE] Remove kill flags when reusing VPR register. (#86300 ) The vpr register may no longer be killed where it was, so we should be removing the kill flags.	2024-03-27 16:04:48 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
Brandon Wu	91896607ff	[RISCV] RISCV vector calling convention (1/2) (#77560 ) [RISCV] RISCV vector calling convention (1/2) This is the vector calling convention based on https://github.com/riscv-non-isa/riscv-elf-psabi-doc, the idea is to split between "scalar" callee-saved registers and "vector" callee-saved registers. "scalar" ones remain the original strategy, however, "vector" ones are handled together with RVV objects. The stack layout would be: \|--------------------------\| <-- FP \| callee-allocated save \| \| area for register varargs\| \|--------------------------\| \| callee-saved registers \| <-- scalar callee-saved \| (scalar) \| \|--------------------------\| \| RVV alignment padding \| \|--------------------------\| \| callee-saved registers \| <-- vector callee-saved \| (vector) \| \|--------------------------\| \| RVV objects \| \|--------------------------\| \| padding before RVV \| \|--------------------------\| \| scalar local variables \| \|--------------------------\| <-- BP \| variable size objects \| \|--------------------------\| <-- SP Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2. It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2). Differential Revision: https://reviews.llvm.org/D154576	2024-03-27 23:03:13 +08:00
Simon Pilgrim	6d3ec56d3c	[X86] combineExtractWithShuffle - use combineExtractFromVectorLoad to extract scalar load from shuffled vector load Improves #85419	2024-03-27 14:54:25 +00:00
Kevin P. Neal	f5296df97c	[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705 ) The AMDGPUSimplifyLibCalls pass was lowering function calls with the strictfp attribute to sequences that included function calls incorrectly lacking the attribute. This patch corrects that. The pass now also emits the correct constrained fp call instead of normal FP instructions when in a function with the strictfp attribute. Replacing non-constrained calls with constrained calls when required is still on the IRBuilder's TODO list.	2024-03-27 10:20:00 -04:00
Justin Cady	26464f2662	[FreeBSD] Mark __stack_chk_guard dso_local except for PPC64 (#86665 ) Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645. D113443 is the original attempt to bring this FreeBSD patch to llvm-project, but it never landed. This change is required to build FreeBSD kernel modules with -fstack-protector using a standard LLVM toolchain. The FreeBSD kernel loader does not handle R_X86_64_REX_GOTPCRELX relocations. Fixes #50932.	2024-03-27 09:03:46 -04:00
Simon Pilgrim	e82765bf07	[X86] masked_store.ll - add nounwind to remove cfi noise	2024-03-27 12:22:31 +00:00
Matt Arsenault	ef316da4a2	AMDGPU: Fix dead check prefixes in test	2024-03-27 14:42:47 +03:00
Luke Lau	f15b7deeaa	[RISCV] Add test case to show missing vmerge fold on tied pseudos. NFC Note we can't use vwaddu.wv because it will get combined away with #78403	2024-03-27 17:42:45 +08:00
Julian Nagele	df75183d70	[TBAA] Add verifier for tbaa.struct metadata (#86709 ) Adds logic to the IR verifier that checks whether !tbaa.struct nodes are well-formed. That is, it checks that the operands of !tbaa.struct nodes are in groups of three, that each group of three operands consists of two integers and a valid tbaa node, and that the regions described by the offset and size operands are non-overlapping. PR: https://github.com/llvm/llvm-project/pull/86709	2024-03-27 10:30:27 +01:00
Luke Lau	6d13263d4a	[RISCV] Add tests for combineBinOpOfZExts. NFC (#86689 ) Unlike add, sub and mul, we don't have widening instructions for div, rem and logical ops, so we don't have any test coverage if we were to extend combineBinOpOfZExts to handle them. Adding tests coincidentally revealed that logical ops are already narrowed as a generic DAG combine via DAGCombiner::hoistLogicOpWithSameOpcodeHands. So we don't actually need to run combineBinOpOfZExts on them.	2024-03-27 15:23:34 +08:00
Yeting Kuo	22bfc58cd0	[RISCV] Teach RISCVMakeCompressible handle byte/half load/store for Zcb. (#83375 ) For targets with Zcb, this patch makes llvm generate more compress c.lb/lbu/lh/lhu/sb/sh instructions.	2024-03-27 13:40:38 +08:00
Craig Topper	8a9c170170	[RISCV] Align stack size down to a multiple of 16 before using cm.push/pop. (#86073 ) This an alternative to #84935 to fix the miscompile, but not be optimal. The immediate for cm.push/pop must be a multiple of 16. For RVE, it might not be. It's not easy to increase the stack size without messing up cfa directives and maybe other things. This patch rounds the stack size down to a multiple of 16 before clamping it to 48. This causes an extra addi to be emitted to handle the remainder. Once this commited, I can commit #84989 to add verification for these instructions being generated with valid offsets.	2024-03-26 21:37:19 -07:00
Craig Topper	4d03a9ecc6	[RISCV] Preserve MMO when expanding PseudoRV32ZdinxSD/PseudoRV32ZdinxLD. (#85877 ) This allows the asm printer to print the stack spill/reload messages.	2024-03-26 20:42:14 -07:00
Michael Maitland	54a9f0e441	[RISCV][GISEL] Legalize, regbankselect, and instruction-select G_VSCALE (#85967 ) G_VSCALE should be lowered using VLENB. If the type is not sXLen it should be lowered using a G_VSCALE on the narrow type and a G_MUL. regbank select and instruction select are straightforward so we really only need to add tests to show it works.	2024-03-26 20:17:22 -04:00
Björn Pettersson	3e6e54eb79	[X86] Fix miscompile in combineShiftRightArithmetic (#86597 ) When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2 as unsigned to find out if the combined shift should be a left or right shift. Also do an early out during pre-legalization in case c1 and c2 has differet types, as that otherwise complicated the comparison of c1 and c2 a bit.	2024-03-26 20:53:34 +01:00
Bjorn Pettersson	982ebeb212	[X86] Pre-commit test case for bug in combineShiftRightArithmetic It has been noticed that combineShiftRightArithmetic isn't dealing properly with large shift amounts, as demonstrated by the test case added in this commit. I think the problem partly is related to X86 using i8 as shift amount type during ISel. So shift amount larger then 127 may be treated as negative shift amounts if not being careful.	2024-03-26 20:49:15 +01:00
Thorsten Schütt	da6cc4a24f	[CodeGen] Add nneg and disjoint flags (#86650 ) MachineInstr learned the new flags.	2024-03-26 18:44:34 +01:00
Farzon Lotfi	5cf1e2e2ec	[DXIL] Implement log intrinsic Lowering (#86569 ) Completes #86192 `DXIL.td` - add log2 to dxilop lowering `DXILIntrinsicExpansion.cpp` - add log and log10 to log2 expansions	2024-03-26 12:46:11 -04:00
Simon Pilgrim	c8b85add2e	[X86] extractelement-load.ll - add test case for #85419	2024-03-26 16:14:11 +00:00
Simon Pilgrim	3140d138e4	[X86] extractelement-load.ll - use X86 instead of X32 check prefix. NFC X32 should be used for gnux32 triples	2024-03-26 16:14:11 +00:00
Luke Lau	87519a2830	[RISCV] Combine (mul (zext, zext)) -> (zext (mul (zext, zext))) (#86465 ) Building on #86248, we can also narrow the width of a mul of zexts. This is specifically legal because on RVV we always extend to the next power of 2 width, and multiplying two N bit integers produces a maximum value of 2\N bits. So as long as we keep an inner zext of 2\N, we will have enough space for the multiply and won't overflow. Alive2 proof: https://alive2.llvm.org/ce/z/XteYyb	2024-03-26 23:28:04 +08:00
Simon Pilgrim	d18bee2313	[X86] combineConcatVectorOps - concatenate FADD/FSUB/FMUL ops if we don't increase the number of INSERT_SUBVECTOR nodes. FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.	2024-03-26 15:03:41 +00:00
Simon Pilgrim	e933c05cd2	[X86] Add fadd/fsub/fmul tests showing failure to concat operands together and perform as a wider vector We don't want to concat fadd/fsub/fmul if both operands would need concatenating (as the fp op is usually cheaper than the concat), but if at least one operand is free to concat (i.e. constant or extracted from a wider vector), then we should try to concat the fp op.	2024-03-26 15:03:41 +00:00
Il-Capitano	308ed0233a	[Intrinsics] Make `patchpoint.i64` generic on its return type (#85911 ) Currently patchpoints can only have two result types, `void` and `i64`. This limits the result to general purpose registers. This patch makes `patchpoint.i64` an overloadable intrinsic, allowing result values that can fit in a single register (e.g. integers, pointers, floats).	2024-03-26 19:08:52 +05:30
Sander de Smalen	f914e8e77c	[AArch64][SME] Add coalescer barrier for args/results in locally streaming functions. (#85388 ) Similar to how we protected FP/fixed-vector arguments and results from calls, we should do the same for arguments/results from locally-streaming functions such that those are not spilled/filled as ZPR registers. This may cause a small regression (additional spills/fills), which is addressed by #85386.	2024-03-26 11:40:31 +00:00
Simon Pilgrim	5b544b511c	[Mips] ctpop.mir - regenerate checks to improve codegen diff in #86505	2024-03-26 10:43:29 +00:00
Simon Pilgrim	5fc619b5ee	[DAG] Update ISD::AVG folds to use hasOperation to allow Custom matching prior to legalization Fixes issue where AVX1 targets weren't matching 256-bit AVGCEILU cases.	2024-03-26 10:41:07 +00:00
Michal Paszkowski	d06ba37683	[SPIR-V] Support extension toggling and enabling all (#85503 )	2024-03-26 03:04:49 -07:00
Simon Pilgrim	c7198e0af3	[DAG] Fold insert_subvector(N0, extract_subvector(N0, N2), N2) --> N0 (#86487 ) Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.	2024-03-26 10:03:42 +00:00

... 2 3 4 5 6 ...

52796 Commits