llvm-project

Author	SHA1	Message	Date
chenglin.bi	9403a8bc37	[GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel The miscompile case's G_ZEXT has a G_FREEZE source. Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel. Fix #58431 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D136433	2022-10-26 09:54:13 +08:00
chenglin.bi	e95c74b423	[AArch64] Add precommit test for bcmp; NFC	2022-10-25 17:23:03 +08:00
Cullen Rhodes	1e02a29e47	[AArch64][SVE] Use more flag-setting instructions If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the opcode so the PTEST becomes redundant. This patch extends this existing optimization in AArch64::optimizePTestInstr to cover all flag-setting opcodes. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136083	2022-10-25 09:02:21 +00:00
Cullen Rhodes	5621caeb82	[AArch64][SVE] NFC: extend tests for flag-setting predicate instructions A follow on patch will extend existing PTEST(PG, OP(PG, ...)) -> OP_FLAG_SETTING(PG, ...) optimization in AArch64InstrInfo::optimizePTestInstr to cover more of the flag-setting instructions Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136161	2022-10-25 09:02:20 +00:00
Sander de Smalen	19b9e6204a	[AArch64][SME] Fix chain for arm_locally_streaming functions. The Chain wasn't set correctly in the DAG for functions marked with aarch64_pstate_sm_body, which meant that SelectionDAG would dead-code some of the CopyToReg's. This didn't show up in the existing tests because all uses were in the same block, but when adding some control-flow, suddenly things would break. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D136579	2022-10-25 08:14:51 +00:00
Ahmed Bougacha	718bb22c28	[AArch64][PAC] Select XPAC for ptrauth.strip intrinsic. Differential Revision: https://reviews.llvm.org/D132385	2022-10-24 08:15:56 -07:00
Petar Avramovic	e6c778f861	GlobalISel: Artifact combine merge-like and unmerge into unmerge Recognize when source could have been unmerged to pieces with DstTy without having to split source to smaller elements and then merge small elements into DstTy pieces. This happens when vector was meant to be split to sub-vectors but there was leftover. At this point artifact combiner have already dealt with leftover and we can continue to use sub-vectors. Differential Revision: https://reviews.llvm.org/D109241	2022-10-24 13:33:05 +02:00
Petar Avramovic	f1aa598046	GlobalISel: Artifact combine merge-like and unmerge into copy Recognize copy that is represented as split of a source register to elements that were reassembled to another register with the same type. Differential Revision: https://reviews.llvm.org/D109240	2022-10-24 13:33:05 +02:00
Petar Avramovic	51b98db487	GlobalISel: Precommit for artifact combine patches Differential Revision: https://reviews.llvm.org/D117655	2022-10-24 13:33:05 +02:00
Craig Topper	db25f51e37	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d. The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but passes locally. I'm really confused.	2022-10-22 22:50:43 -07:00
Craig Topper	e8b3ffa532	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-22 21:51:45 -07:00
chenglin.bi	ec4db1d0dc	[AAArch64][Windows] Fix the crash when running ninja check-asan The crash comes from mismatch between load count in epilogue and seh instruction count. Still because of the pass AArch64LoadStoreOpt. It remove some load in the epilogue but haven't remove the corresponding seh instruction. This patch don't optimize the load in the epilogue to fix the issue. Fix: #58516 Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D136430	2022-10-21 22:11:54 +08:00
Eli Friedman	a6ac968360	[Arm64EC] Refer to dllimport'ed functions correctly. Arm64EC has two different ways to refer to dllimport'ed functions in an object file. One is using the usual __imp_ prefix, the other is using an Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while __imp_ points to some linker-generated code that calls the exit thunk. So __imp_aux_ is used to refer to the address in non-call contexts, while __imp_ is used for calls to avoid the indirect call checker. There's one twist to this, though: if an object refers to a symbol using the __imp_aux_ prefix, the object file's symbol table must also contain the symbol with the usual __imp_ prefix. The symbol doesn't actually have to be used anywhere, it just has to exist; otherwise, the linker's symbol lookup in x64 import libraries doesn't work correctly. Currently, this is handled by emitting a .globl __imp_foo directive; we could try to design some better way to handle this. One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC prefers to use a linker-synthesized stub to call dllimport'ed functions, instead of branching directly. The linker stub appears to do the same thing that inline code would do, so not sure if it's just a code-size optimization, or if the synthesized stub can actually do something other than just load from the import table in some circumstances. Differential Revision: https://reviews.llvm.org/D136202	2022-10-20 15:08:56 -07:00
Eli Friedman	decb743e80	[AArch64] Fix scheduler crash in fusion code. Make sure we don't call getReg() on the first operand of instruction without knowing that operand is actually a register. (This codepath isn't enabled for most CPUs; only triggers on certain CPUs, like Cortex-X1.) Differential Revision: https://reviews.llvm.org/D136296	2022-10-20 10:47:44 -07:00
zhongyunde	74c2d4f602	[AArch64][SelectionDAG] Lower multiplication by a constant to shl+add+shl+add Change the costmodel to lower a = b * C where C = (1 + 2^m) * (1 + 2^n) to add w8, w0, w0, lsl #m add w0, w8, w8, lsl #n Note: The latency can vary depending on the shirt amount Reviewed By: efriedma, dmgreen Differential Revision: https://reviews.llvm.org/D135441	2022-10-21 00:33:49 +08:00
Sander de Smalen	5e5baf917b	[AArch64][SME] Remove get.pstatesm intrinsic. This intrinsic can be removed in favour of using a call to __arm_sme_state() directly and testing the LSB of X0. In IR that would look like: %pstate = call aarch64_sme_preservemost_from_x2 {i64, i64} @__arm_sme_state() %pstate.x0 = extractvalue {i64, i64} %pstate, 0 %pstate.sm = and i64 %pstate.x0, 1	2022-10-20 12:25:32 +00:00
Caroline Concatto	2ecbe8c38c	[AArch64] SME2 Single-multi vector ternary int/FP 2 and 4 registers This patch adds the assembly/disassembly for the following instructions: For INT: ADD(array results, multiple and single vector): Add replicated single vector to multi-vector with ZA array vector results. SUB(array results, multiple and single vector): Subtract replicated single vector from multi-vector with ZA array vector results. For FP: FMLA (multiple and single vector): Multi-vector floating-point fused multiply-add by vector. FMLS (multiple and single vector): Multi-vector floating-point multiply-subtract long by vector. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 The Matriz Operand has 2 new sizes 32(.s) and 64(.d) bits (MatrixOp32 and MatrixOp64) Depends on: D135448 Depends on: D135952 Differential Revision: https://reviews.llvm.org/D135455	2022-10-19 17:49:48 +01:00
Caroline Concatto	579ca5e7e1	[AArch64] Replace sme-i64 by sme-i16i64 and sme-f64 by sme-f64f64 The names in developer.arm for these SME features are: HaveSMEI16I64 and HaveSMEF64F64 so the new flag names are consistent with the documentation page Reviewed By: sdesmalen, c-rhodes Differential Revision: https://reviews.llvm.org/D135974	2022-10-19 10:56:46 +01:00
Mingming Liu	34d18fd241	[AArch64] Enhance bit-field-positioning op matcher to see through 'any_extend' for pattern 'and(any_extend(shl(val, N)), shifted-mask)' Before this patch (and refactor patch D135843), isBitfieldPositioningOp won't handle "and(any_extend(shl(val, N), shifted-mask)" (bail out if AND op is not SHL) After this patch, isBitfieldPositioningOp will see through "any_extend" to find "shl" to find possible bit-field-positioning nodes. https://gcc.godbolt.org/z/3ncGKbGW6 is a four-liner LLVM IR that could be optimized to UBFIZ (see added test case test_and_extended_shift_with_imm in llvm/test/CodeGen/AArch64/bitfield-insert.ll). One existing test case also improves. Differential Revision: https://reviews.llvm.org/D135852	2022-10-18 09:07:14 -07:00
chenglin.bi	327c45da26	[AArch64] add test case for pattern ((X >> C) - Y) + Z; NFC	2022-10-18 19:18:23 +08:00
Mingming Liu	db0286a096	[AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N). Before this patch (and D135844) - Given DAG node shl(op, N), isBitfieldPositioningOp uses (optionally shifted [1] ) op as the Src (least significant bits of Src are inserted into DstLSB of Dst node). After this patch - If op is and(val, mask), isBitfieldPositioningOp tries to see through and and find if val is a simpler source than op. It helps in a similar (probably symmetric) way how isSeveralBitsExtractOpFromShr [2] optimizes isBitfieldExtractOpFromShr Existing test cases are improved without regressions. [1] `cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2546)` [2] `cbd8464595/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (L2057)` Differential Revision: https://reviews.llvm.org/D135850	2022-10-17 09:01:29 -07:00
Nicola Lancellotti	43fe14c056	[AArch64] Canonicalize ZERO_EXTEND to VSELECT Differential Revision: https://reviews.llvm.org/D135596	2022-10-17 15:42:46 +01:00
Amara Emerson	13792ba417	[AArch64][GlobalISel] When lowering signext i1 parameters, don't zero-extend to s8 first. Fixes https://github.com/llvm/llvm-project/issues/57181	2022-10-15 20:25:43 -07:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Martin Storsjö	6eb205b257	Reapply [AArch64] Fix aligning the stack after calling __chkstk Whenever a call to __chkstk was made, the frame lowering previously omitted the aligning (as NumBytes was reset to zero before doing alignment). This fixes https://github.com/llvm/llvm-project/issues/56182. The initial version of this produced invalid code for small functions with no local stack allocations, if those functions were marked with the "stackrealign" attribute. If building with -mstack-alignment=16 (which otherwise mostly would be a no-op), this attribute is added on the main function. Differential Revision: https://reviews.llvm.org/D135687	2022-10-15 00:40:13 +03:00
Filipp Zhinkin	ef774bec63	[AArch64] Support SETCCCARRY lowering Support SETCCCARRY lowering to SBCS instruction. Related issue: https://github.com/llvm/llvm-project/issues/44629 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135302	2022-10-14 22:29:31 +03:00
Caroline Concatto	60e2aad109	[AArch64]Change printVectorList to print SVE vector range This patch has the prefered disassembly changed for SVE vector list. For instance, instead of printing this assembly: ld4d { z1.d, z2.d, z3.d, z4.d }, p0/z, [x0] it will print this: ld4d { z1.d-z4.d }, p0/z, [x0] Differential Revision: https://reviews.llvm.org/D135952	2022-10-14 18:59:56 +01:00
Hassnaa Hamdi	2c72d90ecc	[AArch64-SVE]: Force generating code compatible to streaming mode. Add a compile-time flag for enabling streaming mode. When streaming mode is enabled, lower basic loads and stores of fixed-width vectors; to generate code that is compatible to streaming mode. Differential Revision: https://reviews.llvm.org/D133433	2022-10-14 17:46:56 +00:00
chenglin.bi	c1909d7337	[DAGCombiner] Fix crash for the merge stores with different value type The crash case comes from #58350. It have two stores, one store is type f32 and the other is v1f32. When we try to merge these two stores on v1f32, the memVT is vector type so the old code will use ISD::EXTRACT_SUBVECTOR for type f32 also then compiler crash. So this patch insert a build_vector for f32 store to generate v1f32 also when memVT is v1f32. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D135954	2022-10-15 01:16:35 +08:00
Sander de Smalen	02df03c5b7	[AArch64][SME] Add support for arm_locally_streaming functions. Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start of the function, and a SMSTOP at the end of the function, such that all operations use the right value for vscale. Because the placement of these nodes is critically important (i.e. no vscale-dependent operations should be done before SMSTART has been issued), we require glueing the CopyFromReg to the Entry node such that we can insert the SMSTART as part of that glued chain. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131582	2022-10-14 13:47:53 +00:00
chenglin.bi	85e41fcaac	[AArch64] Select to CCMN when the CCMP's second operator is negative constant CCMP/CCMN's second operator support const from 0 to 31. When the CCMP's second operator is in the range [-31, -1] we can replace it with CCMN to avoid extra mov. Fix: #57034 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135939	2022-10-14 21:41:25 +08:00
Martin Storsjö	f309f095e7	Revert "[AArch64] Fix aligning the stack after calling __chkstk" This reverts commit 50e0aced4521260af842dba73f1d8c50d36314ea. This could accidentally start producing invalid code in some cases (in particular, if compiling with -mstack-alignment=16, which one could expect to be a no-op for a target where the stack always is aligned to 16 bytes anyway).	2022-10-14 11:55:59 +03:00
chenglin.bi	07c5270043	[AArch64] add tests for ccmp with negative constant op1; NFC	2022-10-14 12:07:43 +08:00
David Green	16e4e4ab87	[CodeGenPrep] Handle constants in ConvertPhiType This is a simple addition to the convertPhiTypes in CodeGenPrepare to consider and convert constants as it converts the phi type. Someone fixed the bug in the motivating example, so the undef is now a constant 0. This does mean converting between integer and floating point constants, which may have different materialization. Differential Revision: https://reviews.llvm.org/D135561	2022-10-13 16:41:44 +01:00
David Green	1e80201f7f	[AArch64] Add ConvertPhiType constant tests. NFC	2022-10-13 16:23:34 +01:00
Sheng	62fc58a61d	[AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases To achieve this, we need this observation: `uzp1` is just a `xtn` that operates on two registers For example, given the following register with type v2i64: LSB_______MSB x0 x1 x2 x3 Applying xtn on it we get: x0 x2 This is equivalent to bitcast it to v4i32, and then applying uzp1 on it: x0 x1 x2 x3 \| uzp1 v x0 x2 <value from other register> We can transform xtn to uzp1 by this observation, and vice versa. This observation only works on little endian target. Big endian target has a problem: the uzp1 cannot be replaced by xtn since there is a discrepancy in the behavior of uzp1 between the little endian and big endian. To illustrate, take the following for example: LSB____________________MSB x0 x1 x2 x3 On little endian, uzp1 grabs x0 and x2, which is right; on big endian, it grabs x3 and x1, which doesn't match what I saw on the document. But, since I'm new to AArch64, take my word with a pinch of salt. This bevavior is observed on gdb, maybe there's issue in the order of the value printed by it ? Whatever the reason is, the execution result given by qemu just doesn't match. So I disable this on big endian target temporarily until we find the crux. Fixes #57502 Reviewed By: dmgreen, mingmingl Co-authored-by: Mingming Liu <mingmingl@google.com> Differential Revision: https://reviews.llvm.org/D133850	2022-10-13 19:08:33 +08:00
Martin Storsjö	cbd8464595	[MC] [Win64EH] Check that ARM64 prologs and epilogs have the right matching number of instructions This matches what was done for the ARM implementation (where getting the instruction sizes right is even more tricky, and hence needed tighter testing). This will allow catching any future cases where prologs and epilogs don't match the instructions within them. Differential Revision: https://reviews.llvm.org/D131394	2022-10-13 09:47:39 +03:00
Martin Storsjö	50e0aced45	[AArch64] Fix aligning the stack after calling __chkstk Whenever a call to __chkstk was made, the frame lowering previously omitted the aligning (as NumBytes was reset to zero before doing alignment). This fixes https://github.com/llvm/llvm-project/issues/56182. Differential Revision: https://reviews.llvm.org/D135687	2022-10-13 09:47:38 +03:00
Martin Storsjö	bd3fa31887	[AArch64] Generate SEH info for PAC instructions Without this, unwinding through functions that does use PAC would fail, if PAC actually was active. Differential Revision: https://reviews.llvm.org/D135103	2022-10-12 22:21:03 +03:00
Mingming Liu	0849f0a5a3	[AArch64] Pre-commit test case to show sub-optimal codegen for Github issue #57502 Pre-commit test cases to show cases when UZP1 (TRUNC, TRUNC) could be combined into TRUNC (UZP1) (with some proper bit conversions in the middle) to generate more efficient code. Differential Revision: https://reviews.llvm.org/D133280	2022-10-12 10:29:09 -07:00
David Green	1e723b7ab3	Revert "[AArch64] Add support for 128-bit non temporal loads." This reverts commit 661403b85c219a83baa37335a870d4d93dc4b1c3 as the custom lowering of loads prevents expanding unaligned loads with strict-align.	2022-10-12 11:11:32 +01:00
Martin Storsjö	a07787c9a5	[AArch64] Exclude instructions after setting the FP from SEH prologues After setting up the FP, the rest of the prologue doesn't need to be replayed for unwinding the stack frame. This allows reverting the functional parts of 2f7fbf837625267193351cc334e506a3a9161958 (but fixing inconsistent duplicate setting of HasWinCFI). Differential Revision: https://reviews.llvm.org/D135686	2022-10-12 12:36:21 +03:00
Cullen Rhodes	a17fcb2230	[AArch64][SVE] Fix BRKNS bug in optimizePTestInstr The BRKNS instruction is unlike the other instructions that set flags since it has an all active implicit predicate, so the existing PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B) in AArch64InstrInfo::optimizePTestInstr is incorrect, however PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B) is correct. Spotted by @paulwalker-arm in D134946. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135655	2022-10-12 08:34:41 +00:00
Cullen Rhodes	495d9e1f3f	[AArch64] NFC: Auto-generate llvm/test/CodeGen/AArch64/sve-ptest-removal-brk.ll	2022-10-12 08:34:41 +00:00
chenglin.bi	41f5bbe18b	[AArch64][Windows] Check sret attribute also for inreg attribute Fix the issue: #57684 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D135512	2022-10-12 09:58:50 +08:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Jessica Paquette	036a13065b	[GlobalISel] Combine (X op Y) == X --> Y == 0 This matches patterns of the form ``` (X op Y) == X ``` And transforms them to ``` Y == 0 ``` where appropriate. Example: https://godbolt.org/z/hfW811c7W Differential Revision: https://reviews.llvm.org/D135380	2022-10-11 09:52:48 -07:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Martin Storsjö	018ac7847b	[AArch64] Add SEH_Nop opcodes for BTI hints These are harmless for the unwinder - the unwinder doesn't need to handle them for being able to unwind correctly. Only add the opcodes when the branch target is in a SEH prologue; for jumptables e.g. within a function, we shouldn't add any SEH opcodes. Differential Revision: https://reviews.llvm.org/D135277	2022-10-11 14:32:01 +03:00

1 2 3 4 5 ...

6033 Commits