llvm-project

Author	SHA1	Message	Date
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Simon Pilgrim	0e7d14d2e8	[X86] Regenerate mmx-intrinsics.ll test checks	2024-04-10 10:42:01 +01:00
Noah Goldstein	6c40d463c2	[X86] Use `nneg` flag when trying to convert `uitofp` -> `sitofp` Closes #86694	2024-04-09 23:06:55 -05:00
Noah Goldstein	84a5332a68	[X86] Add tests for `uitofp nneg` -> `sitofp`; NFC	2024-04-09 23:06:55 -05:00
Simon Pilgrim	961d91abd3	[X86] shuffle-vs-trunc-128.ll - add common AVX2 check prefix	2024-04-09 14:14:01 +01:00
Simon Pilgrim	a4cf479cdf	[X86] shuffle-vs-trunc-128.ll - add BWVL-ONLY/VBMI/VBMI-FAST/VBMI-SLOW check prefixes to recover missing test checks It is VERY annoying that update_llc_test_checks.py silently fails instead of correctly warning when this happens :(	2024-04-09 13:44:01 +01:00
Simon Pilgrim	866a1bc814	[X86] Add test coverage for #88030	2024-04-09 13:23:44 +01:00
Simon Pilgrim	4023329bbf	[X86] collectConcatOps - add ability to recurse through insert_subvector chains Allows us to match insert_subvector(insert_subvector(undef, insert_subvector(insert_subvector(undef, x, 0), y, 1), 0), 0), insert_subvector(insert_subvector(undef, z, 0), w, 1), 2)	2024-04-09 13:23:44 +01:00
Simon Pilgrim	0bbe953aa3	[X86] Fold extract_subvector(cvtps2dq(x),c) -> cvtps2dq(extract_subvector(x,c)) Help unblock #83402	2024-04-09 11:06:18 +01:00
Alexandre Ganea	ec1af63dde	[Codegen][X86] Fix /HOTPATCH with clang-cl and inline asm (#87639 ) This fixes an edge case where functions starting with inline assembly would assert while trying to lower that inline asm instruction. After this PR, for now we always add a no-op (xchgw in this case) without considering the size of the next inline asm instruction. We might want to revisit this in the future. This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH. Should close https://github.com/llvm/llvm-project/issues/56234	2024-04-08 20:02:19 -04:00
Arthur Eubanks	922700df44	Revert "[X86] Change how we treat functions with explicit sections as small/large (#87838 )" This reverts commit e27c3736f975ca463476223c465e4777186f603f. Breaks ExecutionEngine/MCJIT/test-global-ctors.ll on windows, e.g. https://lab.llvm.org/buildbot/#/builders/117/builds/18749.	2024-04-08 23:00:01 +00:00
Arthur Eubanks	e27c3736f9	[X86] Change how we treat functions with explicit sections as small/large (#87838 ) Following #78348, we should treat functions with an explicit section as small, unless the section name is (or has the prefix) ".ltext". Clang emits global initializers into a ".text.startup" section on Linux. If we mix small/medium code model object files with large code model object files, we'll end up mixing sections with and without the large section flag.	2024-04-08 15:40:19 -07:00
Leonard Grey	c23135c548	-fsanitize=function: fix .subsections_via_symbols (#87527 ) -fsanitize=function emits a signature and function hash before a function. Similar to 7f6e2c9, these can be sheared off when `.subsections_via_symbols` is used. This change uses the same technique 7f6e2c9 introduced for prefixes: emitting a symbol for the metadata, then marking the actual function entry as an .alt_entry symbol.	2024-04-08 16:05:52 -04:00
Kevin P. Neal	eeedb1e962	[FPEnv][X86] Correct one more strictfp test. Correct a strictfp test to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics This test needed the strictfp attribute added to some function definitions. FP wait instructions now appear as a result. The need for the wait instructions is explained by Andy Kaylor in PR#87791: https://github.com/llvm/llvm-project/pull/87791 Test changes verified with D146845.	2024-04-08 14:39:08 -04:00
Kevin P. Neal	8ccf1c117b	[FPEnv][X86] Correct strictfp tests. (#87791 ) Correct strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to some function definitions. FP wait instructions now appear as a result. Test changes verified with D146845.	2024-04-08 10:14:02 -04:00
Matt Arsenault	8cb642bf18	GlobalISel: Regenerate test checks	2024-04-08 08:32:04 -04:00
Simon Pilgrim	170c525d79	[X86] combineExtractVectorElt - fold extract(trunc(x),c) -> trunc(extract(x,c))	2024-04-08 11:01:19 +01:00
Haohai Wen	cebf77fb93	[CodeGen][DebugInfo] Add missing DebugLoc for SplitCriticalEdge (#72192 ) In SplitCriticalEdge, DebugLoc of the branch instruction in new created MBB was set to empty. It should be set and we can find proper DebugLoc for it in most cases. This patch set it to non empty merged DebugLoc of current MBB branches.	2024-04-08 09:44:34 +08:00
AtariDreams	8389b3bf60	[X86] Fix typo: QWORD alignment is greater than or equal to 8, not greater than 8 (#87819 ) Align(8) is QWORD aligned, but this was checking to see if alignment was greater than that, when it should have been checking for being greater than OR EQUAL to Align(8). This bug was introduced in https://github.com/llvm/llvm-project/commit/6a6af30d433d7 during the transition to the Align type.	2024-04-07 08:43:13 +08:00
Matt Arsenault	4cb110a84f	[RFC] IR: Support atomicrmw FP ops with vector types (#86796 ) Allow using atomicrmw fadd, fsub, fmin, and fmax with vectors of floating-point type. AMDGPU supports atomic fadd for <2 x half> and <2 x bfloat> on some targets and address spaces. Note this only supports the proper floating-point operations; float vector typed xchg is still not supported. cmpxchg still only supports integers, so this inserts bitcasts for the loop expansion. I have support for fp vector typed xchg, and vector of int/ptr separately implemented but I don't have an immediate need for those beyond feature consistency.	2024-04-06 15:27:45 -04:00
Simon Pilgrim	b861e2736a	[X86] pr45995.ll - add nounwind to silence cfi noise	2024-04-05 16:36:35 +01:00
Simon Pilgrim	6a6335fa39	[X86] bool-vector.ll - add nounwind to silence cfi noise	2024-04-05 16:36:34 +01:00
Craig Topper	51f1cb5355	[X86] Add or_is_add patterns for INC. (#87584 ) Should fix the cases noted in #86857	2024-04-04 08:04:21 -07:00
Simon Pilgrim	c1742525d0	[X86] evex-to-vex-compress.mir - update test checks missed in #87636	2024-04-04 15:42:29 +01:00
Simon Pilgrim	2d0087424f	[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480 ) Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds. This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements. There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads? Fixes #86968	2024-04-04 11:10:55 +01:00
Amaury Séchet	1aedf949e0	[NFC] Automatically generate indirect-branch-tracking-eh2.ll	2024-04-03 15:22:23 +00:00
aniplcc	d650fcd6bf	[DAG] SimplifyDemandedVectorElts - add ISD::AVGCEILS/AVGCEILU/AVGFLOORS/AVGFLOORU nodes (#86284 ) Fixes #84768	2024-04-03 15:00:50 +01:00
Simon Pilgrim	2bf7ddf06f	[X86] Add vector truncation tests for nsw/nuw flags Based off #85592 - our truncation -> PACKSS/PACKUS folds should be able to use the nsw/nuw flags to recognise when we don't need to mask/sext_inreg prior to the PACKSS/PACKUS nodes.	2024-04-03 13:35:55 +01:00
Gleb Popov	0356d0cfdc	Print more descriptive error message when trying to link a global with appending linkage (#69613 ) This is a proper fix for https://github.com/llvm/llvm-project/issues/40308	2024-04-03 12:26:12 +01:00
Simon Pilgrim	8bc2d19c13	[X86] canonicalizeShuffleWithOp - don't fold VPERMI(BINOP(X,Y)) -> BINOP(VPERMI(X),VPERMI(Y)) VPERMI (VPERMQ/PD) is nearly always lane-crossing and poorly merges with target shuffles (other than itself). For now, I've restricted VPERMI to only merge with itself, constants, loads and splats. We might be able to merge with a few other special cases (AND/ANDNP with constant?), which could help the shuffle-vs-trunc-256.ll AVX512VL regression, but since that now gives similar codegen to the other AVX512 variants, I'd prefer to improve the shuffle lowering for that properly.	2024-04-02 18:38:37 +01:00
Vitaly Buka	20f56e1f8e	[CodeGen] Add default lowering for llvm.allow.{runtime,ubsan}.check() (#86049 ) RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641	2024-03-31 22:19:33 -07:00
Craig Topper	23d45e55ed	[MCP] Remove dead copies from basic blocks with successors. (#86973 ) Previously we wouldn't remove dead copies from basic blocks with successors. The comment said we didn't want to trust the live-in lists. The comment is very old so I'm not sure if that's still a concern today. This patch checks the live-in lists and removes copies from MaybeDeadCopies if they are referenced by any live-ins in any successors. We only do this if the tracksLiveness property is set. If that property is not set, we retain the old behavior.	2024-03-28 14:43:49 -07:00
Craig Topper	f90813543b	[MCP] Use MachineInstr::all_defs instead of MachineInstr::defs in hasOverlappingMultipleDef. (#86889 ) defs does not return the defs for inline assembly. We need to use all_defs to find them. Fixes #86880.	2024-03-28 08:37:19 -07:00
Freddy Ye	36b4b9d988	[X86] Support immediate folding for CCMP/CTEST (#86616 ) E.g. %0:gr32 = MOV32ri 81 CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags => CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags	2024-03-28 18:54:32 +08:00
Craig Topper	acab142751	[LegalizeDAG] Freeze index when converting insert_elt/insert_subvector to load/store on stack. We try clamp the index to be within the bounds of the stack object we create, but if we don't freeze it, poison can propagate into the clamp code. This can cause the access to leave the bounds of the stack object. We have other instances of this issue in type legalization and extract_elt/subvector, but posting this patch first for direction check. Fixes #86717	2024-03-27 13:01:23 -07:00
Simon Pilgrim	5d3ef06509	[X86] combine-pavg.ll - add demandedelts test coverage for #86284	2024-03-27 17:15:48 +00:00
Simon Pilgrim	dcd0f2b610	[X86] combineExtractFromVectorLoad support extraction from vector of different types to the extraction type/index combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles. Fixes #85419	2024-03-27 17:01:41 +00:00
Simon Pilgrim	f92fa7e2cf	[X86] Add -verify-machineinstrs to huge stack tests Help identify EXPENSIVE_CHECKS regressions identified in #84114	2024-03-27 16:26:10 +00:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
Simon Pilgrim	6d3ec56d3c	[X86] combineExtractWithShuffle - use combineExtractFromVectorLoad to extract scalar load from shuffled vector load Improves #85419	2024-03-27 14:54:25 +00:00
Justin Cady	26464f2662	[FreeBSD] Mark __stack_chk_guard dso_local except for PPC64 (#86665 ) Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645. D113443 is the original attempt to bring this FreeBSD patch to llvm-project, but it never landed. This change is required to build FreeBSD kernel modules with -fstack-protector using a standard LLVM toolchain. The FreeBSD kernel loader does not handle R_X86_64_REX_GOTPCRELX relocations. Fixes #50932.	2024-03-27 09:03:46 -04:00
Simon Pilgrim	e82765bf07	[X86] masked_store.ll - add nounwind to remove cfi noise	2024-03-27 12:22:31 +00:00
Björn Pettersson	3e6e54eb79	[X86] Fix miscompile in combineShiftRightArithmetic (#86597 ) When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2 as unsigned to find out if the combined shift should be a left or right shift. Also do an early out during pre-legalization in case c1 and c2 has differet types, as that otherwise complicated the comparison of c1 and c2 a bit.	2024-03-26 20:53:34 +01:00
Bjorn Pettersson	982ebeb212	[X86] Pre-commit test case for bug in combineShiftRightArithmetic It has been noticed that combineShiftRightArithmetic isn't dealing properly with large shift amounts, as demonstrated by the test case added in this commit. I think the problem partly is related to X86 using i8 as shift amount type during ISel. So shift amount larger then 127 may be treated as negative shift amounts if not being careful.	2024-03-26 20:49:15 +01:00
Simon Pilgrim	c8b85add2e	[X86] extractelement-load.ll - add test case for #85419	2024-03-26 16:14:11 +00:00
Simon Pilgrim	3140d138e4	[X86] extractelement-load.ll - use X86 instead of X32 check prefix. NFC X32 should be used for gnux32 triples	2024-03-26 16:14:11 +00:00
Simon Pilgrim	d18bee2313	[X86] combineConcatVectorOps - concatenate FADD/FSUB/FMUL ops if we don't increase the number of INSERT_SUBVECTOR nodes. FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.	2024-03-26 15:03:41 +00:00
Simon Pilgrim	e933c05cd2	[X86] Add fadd/fsub/fmul tests showing failure to concat operands together and perform as a wider vector We don't want to concat fadd/fsub/fmul if both operands would need concatenating (as the fp op is usually cheaper than the concat), but if at least one operand is free to concat (i.e. constant or extracted from a wider vector), then we should try to concat the fp op.	2024-03-26 15:03:41 +00:00
Simon Pilgrim	5fc619b5ee	[DAG] Update ISD::AVG folds to use hasOperation to allow Custom matching prior to legalization Fixes issue where AVX1 targets weren't matching 256-bit AVGCEILU cases.	2024-03-26 10:41:07 +00:00

1 2 3 4 5 ...

20098 Commits