llvm-project

Author	SHA1	Message	Date
Freddy Ye	f4509cf284	[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473 ) apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266 apx-syntax-recommendation: https://cdrdv2.intel.com/v1/dl/getContent/817241	2024-04-11 10:18:29 +08:00
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Noah Goldstein	6c40d463c2	[X86] Use `nneg` flag when trying to convert `uitofp` -> `sitofp` Closes #86694	2024-04-09 23:06:55 -05:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Simon Pilgrim	4023329bbf	[X86] collectConcatOps - add ability to recurse through insert_subvector chains Allows us to match insert_subvector(insert_subvector(undef, insert_subvector(insert_subvector(undef, x, 0), y, 1), 0), 0), insert_subvector(insert_subvector(undef, z, 0), w, 1), 2)	2024-04-09 13:23:44 +01:00
Simon Pilgrim	0bbe953aa3	[X86] Fold extract_subvector(cvtps2dq(x),c) -> cvtps2dq(extract_subvector(x,c)) Help unblock #83402	2024-04-09 11:06:18 +01:00
Alexandre Ganea	ec1af63dde	[Codegen][X86] Fix /HOTPATCH with clang-cl and inline asm (#87639 ) This fixes an edge case where functions starting with inline assembly would assert while trying to lower that inline asm instruction. After this PR, for now we always add a no-op (xchgw in this case) without considering the size of the next inline asm instruction. We might want to revisit this in the future. This fixes Unreal Engine 5.3.2 compilation with clang-cl and /HOTPATCH. Should close https://github.com/llvm/llvm-project/issues/56234	2024-04-08 20:02:19 -04:00
Simon Pilgrim	170c525d79	[X86] combineExtractVectorElt - fold extract(trunc(x),c) -> trunc(extract(x,c))	2024-04-08 11:01:19 +01:00
AtariDreams	8389b3bf60	[X86] Fix typo: QWORD alignment is greater than or equal to 8, not greater than 8 (#87819 ) Align(8) is QWORD aligned, but this was checking to see if alignment was greater than that, when it should have been checking for being greater than OR EQUAL to Align(8). This bug was introduced in https://github.com/llvm/llvm-project/commit/6a6af30d433d7 during the transition to the Align type.	2024-04-07 08:43:13 +08:00
Simon Pilgrim	f09f9bc0aa	[X86] Add TODO to remove getGSScalarCost and use BaseT / getCommonMaskedMemoryOpCost directly There are only a few differences in the use of AddressSpace and getScalarizationOverhead that need to be handled.	2024-04-05 13:16:27 +01:00
Simon Pilgrim	1b4c37fec2	[TTI][X86] getGSVectorCost/getGSScalarCost - add CostKind to the function arguments. Initial refactor - only getGSScalarCost can actually use CostKind so far, and currently both are only ever set to TCK_RecipThroughput.	2024-04-05 11:15:46 +01:00
Simon Pilgrim	ed41249498	[CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 17:17:55 +01:00
Simon Pilgrim	5fd9babbfc	[X86] Rename Zn3FPP# ports -> Zn3FP#. NFC Matches Zn4FP# (which is mostly a copy) and avoids an issue in llvm-exegesis which is terrible at choosing the right portname when they have aliases.	2024-04-04 16:54:33 +01:00
Craig Topper	51f1cb5355	[X86] Add or_is_add patterns for INC. (#87584 ) Should fix the cases noted in #86857	2024-04-04 08:04:21 -07:00
Simon Pilgrim	ecb34599bd	[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636 ) Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-04-04 15:20:16 +01:00
Simon Pilgrim	a69673615b	[X86] Haswell/Broadwell - fix (V)ROUNDri sched behaviours to use 2Port1 We were only using the Port23 memory ports and were missing the 2*Port1 uops entirely. Confirmed by Agner + uops.info/uica	2024-04-04 15:19:07 +01:00
Simon Pilgrim	3871eaba6b	[CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 12:26:35 +01:00
Simon Pilgrim	2d0087424f	[DAG] Remove extract_vector_elt(freeze(x)), idx -> freeze(extract_vector_elt(x), idx) fold (#87480 ) Reverse the fold with handling inside canCreateUndefOrPoison for cases where we know that the extract index is in bounds. This exposed a number or regressions, and required some initial freeze handling of SCALAR_TO_VECTOR, which will require us to properly improve demandedelts support to handle its undef upper elements. There is still one outstanding regression to be addressed in the future - how do we want to handle folds involving frozen loads? Fixes #86968	2024-04-04 11:10:55 +01:00
Simon Pilgrim	1f7c3d609b	[X86] getEffectiveX86CodeModel - take a Triple argument instead of just a Is64Bit flag. NFC. (#87479 ) Matches what most other targets do and makes it easier to specify code model based off other triple settings in the future.	2024-04-03 15:06:54 +01:00
Simon Pilgrim	51107be7dd	[X86] Haswell/Broadwell/Skylake DPPS folded instructions use an extra port06 resource This is an extension to 07151f0241d3f893cb36eb2dbc395d4098f74a87 which handled SandyBridge so we at least model the regression identified in #14640 Confirmed by Agner + uops.info/uica (SkylakeServer also had an incorrect use of Port015 instead of just Port01) I raised #86669 as a proposal for a 'x86 unfold' pass that can unfold these (if we have the free registers) driven by the scheduler model.	2024-04-03 12:28:46 +01:00
Prabhuk	212b1a84a6	[CallSiteInfo][NFC] CallSiteInfo -> CallSiteInfo.ArgRegPairs (#86842 ) CallSiteInfo is originally used only for argument - register pairs. Make it struct, in which we can store additional data for call sites. Also, the variables/methods used for CallSiteInfo are named for its original use case, e.g., CallFwdRegsInfo. Refactor these for the upcoming use, e.g. addCallArgsForwardingRegs() -> addCallSiteInfo(). An upcoming patch will add type ids for indirect calls to propogate them from middle-end to the back-end. The type ids will be then used to emit the call graph section. Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html Differential Revision: https://reviews.llvm.org/D107109?id=362888 Co-authored-by: Necip Fazil Yildiran <necip@google.com>	2024-04-02 13:05:16 -07:00
Simon Pilgrim	8bc2d19c13	[X86] canonicalizeShuffleWithOp - don't fold VPERMI(BINOP(X,Y)) -> BINOP(VPERMI(X),VPERMI(Y)) VPERMI (VPERMQ/PD) is nearly always lane-crossing and poorly merges with target shuffles (other than itself). For now, I've restricted VPERMI to only merge with itself, constants, loads and splats. We might be able to merge with a few other special cases (AND/ANDNP with constant?), which could help the shuffle-vs-trunc-256.ll AVX512VL regression, but since that now gives similar codegen to the other AVX512 variants, I'd prefer to improve the shuffle lowering for that properly.	2024-04-02 18:38:37 +01:00
Freddy Ye	db7d243978	[X86][MC] Support enc/dec for IMULZU. (#86653 ) apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266 apx-syntax-recommendation: https://cdrdv2.intel.com/v1/dl/getContent/817241	2024-03-29 15:52:41 +08:00
Simon Pilgrim	5b06de7f99	[X86] Add isLogicOp helper to match ISD::AND/OR/XOR and X86ISD::ANDNP We could easily support the X86ISD 'float' variants of the logic ops as well, but we don't have good test coverage at the moment (they're mainly for SSE1 targets).	2024-03-28 19:39:17 +00:00
Freddy Ye	36b4b9d988	[X86] Support immediate folding for CCMP/CTEST (#86616 ) E.g. %0:gr32 = MOV32ri 81 CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags => CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags	2024-03-28 18:54:32 +08:00
Simon Pilgrim	dcd0f2b610	[X86] combineExtractFromVectorLoad support extraction from vector of different types to the extraction type/index combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles. Fixes #85419	2024-03-27 17:01:41 +00:00
Simon Pilgrim	78f0871bee	Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114 )" This is failing on some EXPENSIVE_CHECKS buildbots	2024-03-27 16:16:15 +00:00
Wesley Wiser	58de1e2c5e	Fix stack layout for frames larger than 2gb (#84114 ) For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames. Fixes #48911	2024-03-27 15:05:58 +00:00
Simon Pilgrim	6d3ec56d3c	[X86] combineExtractWithShuffle - use combineExtractFromVectorLoad to extract scalar load from shuffled vector load Improves #85419	2024-03-27 14:54:25 +00:00
Simon Pilgrim	875aed17b9	[X86] Add combineExtractFromVectorLoad helper - pulled out of combineExtractVectorElt Prep work for #85419 to make it easier to reuse in other combines	2024-03-27 12:22:31 +00:00
Björn Pettersson	3e6e54eb79	[X86] Fix miscompile in combineShiftRightArithmetic (#86597 ) When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2 as unsigned to find out if the combined shift should be a left or right shift. Also do an early out during pre-legalization in case c1 and c2 has differet types, as that otherwise complicated the comparison of c1 and c2 a bit.	2024-03-26 20:53:34 +01:00
Simon Pilgrim	d18bee2313	[X86] combineConcatVectorOps - concatenate FADD/FSUB/FMUL ops if we don't increase the number of INSERT_SUBVECTOR nodes. FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.	2024-03-26 15:03:41 +00:00
Il-Capitano	308ed0233a	[Intrinsics] Make `patchpoint.i64` generic on its return type (#85911 ) Currently patchpoints can only have two result types, `void` and `i64`. This limits the result to general purpose registers. This patch makes `patchpoint.i64` an overloadable intrinsic, allowing result values that can fit in a single register (e.g. integers, pointers, floats).	2024-03-26 19:08:52 +05:30
Simon Pilgrim	5d7e7abc82	[X86] ICX - vector XMM splat use Port 1 or 5 when boradcasting the shift amount Noticed while trying to compare splat vs per-element shift perf stats for #39424 Confirmed with uops.info	2024-03-26 10:07:07 +00:00
Simon Pilgrim	3dcf62b5ee	[X86] HSW/BDW - vector splat shifts don't use Port5 when loading the shift amount Noticed while trying to compare splat vs per-element shift perf stats for #39424 Confirmed with uops.info	2024-03-25 18:22:29 +00:00
Evgenii Kudriashov	fb394562a3	[X86][GlobalISel] Fix referencing nonexistent operand in G_ICMP (#86221 ) Fixes #86203	2024-03-25 16:46:12 +01:00
Sergei Barannikov	5e5b656102	[MC] Make `MCParsedAsmOperand::getReg()` return `MCRegister` (#86444 )	2024-03-25 05:13:48 +03:00
Phoebe Wang	2e4e04c590	[X86][BF16] Do not lower to VCVTNEPS2BF16 without AVX512VL (#86395 ) Fixes: #86305	2024-03-25 10:06:12 +08:00
Evgenii Kudriashov	d365a45cb3	[GlobalISel] Introduce G_TRAP, G_DEBUGTRAP, G_UBSANTRAP (#84941 ) Here we introduce three new GMIR instructions to cover a set of trap intrinsics. The idea behind it is that generic intrinsics shouldn't be used with G_INTRINSIC opcode. These new instructions can match perfectly with existing trap ISD nodes. It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for selection and avoid manual selection. However AMDGPU is an exception. It selects traps during legalization regardless SelectionDAG or GlobalISel. Since there are not many places where traps are used, this change attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So, there is no stage when both G_TRAP and G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.	2024-03-23 13:12:44 +01:00
Freddy Ye	3e4caa9da4	[X86] Support DomainReassignment for APX NDD instructions (#85737 )	2024-03-22 08:52:40 +08:00
Simon Pilgrim	ee5e027cc6	[X86] getShuffleCost - recognise concat_vector(X,Y) shuffle as InsertSubvector instead of PermuteTwoSrc We don't have a concat_vector shuffle kind and improveShuffleKindFromMask won't alter the base type to match it as InsertSubvector. But since this is how X86 will lower concat_vector anyhow, just recognise it explicitly. Another step for #67803	2024-03-21 09:29:39 +00:00
Simon Pilgrim	7ccb31a5bc	[X86] splitVectorOp - share the same SDLoc argument instead of recreating it over and over again.	2024-03-20 11:21:09 +00:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
Jonas Paulsson	09bc6abba6	[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001 ) - Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code. - Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().	2024-03-18 10:37:59 -04:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
XinWang10	7b766a6f50	[X86] Support APX CMOV/CFCMOV instructions (#82592 ) This patch support ND CMOV instructions and CFCMOV instructions. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-03-17 20:18:56 +08:00
SahilPatidar	186565513c	[X86][AVX] Fix handling of out-of-bounds SRA shift amounts in AVX2 vector shift nodes (#84426 )	2024-03-15 16:34:33 +00:00
Simon Pilgrim	c957715d72	[X86] isGuaranteedNotToBeUndefOrPoisonForTargetNode - generalize shuffle decoding to support more target shuffles in the future.	2024-03-15 14:13:14 +00:00
Phoebe Wang	f4676b6be6	[X86] Add Support for X86 TLSDESC Relocations (#83136 )	2024-03-15 22:09:56 +08:00
Ganesh	61fadd0b09	[X86] Fast AVX-512-VNNI vpdpwssd tuning (#85375 ) Adding a tuning feature to fix https://github.com/llvm/llvm-project/issues/84182 Generates vpdpwssd (instead of vpmaddwd + vpaddd sequence)	2024-03-15 16:45:41 +05:30

1 2 3 4 5 ...

24419 Commits