llvm-project

Author	SHA1	Message	Date
Matt Arsenault	b59022b42e	DAG: Handle lowering of unordered fcZero\|fcSubnormal to fcmp	2023-07-11 18:30:15 -04:00
Brendan Dahl	220fe00a7c	[WebAssembly] Support `annotate` clang attributes for marking functions. Annotation attributes may be attached to a function to mark it with custom data that will be contained in the final Wasm file. The annotation causes a custom section named "func_attr.annotate.<name>.<arg0>.<arg1>..." to be created that will contain each function's index value that was marked with the annotation. A new patchable relocation type for function indexes had to be created so the custom section could be updated during linking. Reviewed By: sbc100 Differential Revision: https://reviews.llvm.org/D150803	2023-07-11 15:17:26 -07:00
Jon Chesterfield	980cd18354	[amdgpu][nfc] Drop lds strategy noise from some tests	2023-07-11 21:18:49 +01:00
Eduard Zingerman	18e13739b8	[BPF] Undo transformation for LICM.cpp:hoistMinMax() Extended BPFCheckAndAdjustIR pass with sinkMinMax() transformation that undoes LICM hoistMinMax pass. The undo transformation converts the following patterns: x < min(a, b) -> x < a && x < b x > min(a, b) -> x > a \|\| x > b x < max(a, b) -> x < a \|\| x < b x > max(a, b) -> x > a && x > b Where 'a' or 'b' is a constant. Also supports `sext min(...) ...` and `zext min(...) ...`. ~~~ This was previously commited as 09feee559a29 and reverted in 0bf9bfeacc8c because of the testbot memory leak report: https://lab.llvm.org/buildbot/#/builders/5/builds/34931 The memory leak issue was caused by incorrect instruction removal sequence in skinMinMaxBB(): I->dropAllReferences(); --------> I->eraseFromParent(); I->removeFromParent(); fixed to Differential Revision: https://reviews.llvm.org/D147990	2023-07-11 22:30:34 +03:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Philip Reames	5cd41dc62d	[RISCV] Remove legacy TA/TU pseudo distinction for binary instructions This change continues with the line of work discussed in https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295. This change handles most of the binary pseudos. I excluded pseudos which _TIED variants, and those that produce mask results. Both a bit different in functionality, and deserve their own change and review. As with previous changes in the series, we replace the existing TA and TU forms with a single unified pseudo with a passthru (which may be implicit_def) and a policy operand. As before, we see codegen changes (some improvements and some regressions) due to scheduling differences caused by the extra implicit_def instructions. Differential Revision: https://reviews.llvm.org/D154245	2023-07-11 10:21:42 -07:00
David Green	d96387f005	[AArch64] Extra tests for smull/umull, especially of smaller vector size. NFC See D153632 and D154063	2023-07-11 18:16:23 +01:00
Simon Pilgrim	72fddda8af	[X86] ReplaceNodeResults - widen vector truncate nodes on pre-SSSE3 targets Building on the support for wider input vector types from D154592, try to more aggressively widen inputs instead of scalarizing them.	2023-07-11 17:21:20 +01:00
David Mo	ef7ca14fa5	[WebAssembly] Report error for inline assembly with unsupported opcodes For inline WebAssembly, passing a numeric operand to global.get is unsupported. This causes encodeInstruction to reach an llvm_unreachable call, leading to undefined behaviors. This patch fixes the issue for this invalid instruction encoding, making it report an error by adding an MCContext field in class WebAssemblyMCCodeEmitter. Reviewed By: sbc100, bryanpkc Differential Revision: https://reviews.llvm.org/D154734	2023-07-11 10:36:25 -04:00
Phoebe Wang	db8b624de1	[X86][FP16] Fix mis-combination from FMULC to FCMULC The combination was designed to combine a negative imaginary value rather then a full negative complex value. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154213	2023-07-11 22:26:20 +08:00
Georgi Mirazchiyski	103df542b5	Fix shr/and pair replace with bfe Co-Authored-By: Aidan Belton <aidan.belton@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D117118	2023-07-11 14:43:35 +01:00
Simon Wallis	82458ce69e	[ARM] mark tMOVi32imm as killing flags Mark the tMOVi32imm pseudo instr as killing the flags register. The pseudo instruction expands to a sequence of 7 movs/lsls/adds instructions, which are all Thumb-1 flag setting instructions. For a test case, take an existing arm test which checks for "Don't CSE a cmp across a call that clobbers CPSR." and retarget it at thumbv6m execute-only. Reviewed By: stuij Differential Revision: https://reviews.llvm.org/D154845 Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d	2023-07-11 14:42:07 +01:00
Simon Pilgrim	6656c8d01f	[X86] shuffle-vs-trunc-256.ll - move comment outside test. NFC.	2023-07-11 14:31:50 +01:00
Jim Lin	b515133088	[RISCV] Merge rv32/rv64 vector reduction intrinsic tests that have the same content. NFC.	2023-07-11 19:04:56 +08:00
Simon Pilgrim	77b3f890cc	[X86] combineAndMaskToShift - match constant splat with X86::isConstantSplat Using X86::isConstantSplat instead of ISD::isConstantSplatVector allows us to detect constant masks after they've been lowered to constant pool loads. Addresses regression from D154592	2023-07-11 11:25:34 +01:00
Ties Stuij	f0ae3c23b5	[ARM] in LowerConstantFP, make sure we cover armv6-m execute-only Currently in LowerConstantFP, when we compile for execute-only (XO) we don't check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get here for v6m, so put in an assert. Reviewed By: simonwallis2, dmgreen Differential Revision: https://reviews.llvm.org/D154506	2023-07-11 10:42:15 +01:00
Simon Pilgrim	842a6728d9	[X86] LowerTRUNCATE - improve handling during type legalization to PACKSS/PACKUS patterns Extend coverage for lowering wide vector types during type legalization to allow us to use PACKSS/PACKUS patterns instead of dropping down to shuffle lowering. First step towards avoiding premature folds of TRUNCATE to PACKSS/PACKUS nodes as described on Issue #63710 - which causes a large number of regressions on D152928 - we will next need to tweak the TRUNCATE widening in ReplaceNodeResults Differential Revision: https://reviews.llvm.org/D154592	2023-07-11 10:39:44 +01:00
pvanhout	655714a300	[AArch64] Use GlobalISel MatchTable Combiner Backend Only a few minor test changes needed because I removed the "helper" suffix from the combiner name, as it's not really a helper anymore but more like the implementation itself. Depends on D153757 NOTE: This would land iff D153757 (RFC) lands too. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D153850	2023-07-11 11:27:14 +02:00
Jim Lin	0fd212c91d	[RISCV] Merge rv32/rv64 vector widening intrinsic tests that have the same content. NFC.	2023-07-11 16:25:12 +08:00
Nabeel Omer	e148899ad9	[X86] Preserve volatile ATOMIC_LOAD_OR nodes Fixes #63692. In reference to volatile memory accesses, the langref says: > the backend should never split or merge target-legal volatile load/store instructions. Differential Revision: https://reviews.llvm.org/D154609	2023-07-11 08:05:38 +00:00
Zi Xuan Wu (Zeson)	2ccb2dbc8d	[RISCV] Don't fold RISCVISD::VMV_V_X_VL series node and scalar load to vector load when scalar load is update load We try to fold RISCVISD::VMV_V_X_VL series node + scalar load -> vector load. But if scalar load is indexed load (load update form), it's not profitable to fold because load update node can't be removed after fold. Differential Revision: https://reviews.llvm.org/D152222	2023-07-11 15:56:31 +08:00
wangpc	99809f4377	[RISCV] Simplify the definitions of interrupt CSRs For `CSR_Interrupt`, we can generate the register list via a single `sequence`. For `CSR_XLEN_F32_Interrupt` and `CSR_XLEN_F64_Interrupt`, I don't see the reason why we need to keep the order the same as how we used to allocate registers (and we have changed the order in D146488), so I fold them into one `sequence`. There are some *.ll changes because of the order change. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154837	2023-07-11 11:20:24 +08:00
Piyou Chen	299b2c2d93	[RISCV] precommit for prefetch locality support Reviewed By: asb Differential Revision: https://reviews.llvm.org/D154690	2023-07-10 20:07:18 -07:00
Matt Arsenault	1d92b68ead	DAG: Correct chain management for frexp libcalls We need to replace the other uses of the call chain with the new load chain. Fixes not preserving the return def with unused x86_fp80 results. Regression reported here: https://reviews.llvm.org/rGb15bf305ca3e9ce63aaef7247d32fb3a75174531#1224999	2023-07-10 21:39:15 -04:00
Wang Rui	f9e0845ef2	[LoongArch] Explicitly specify instruction properties This revision explicitly specifies the machine instruction properties instead of relying on guesswork. This is because guessing instruction properties has proven to be inaccurate, such as the machine LICM not working: ``` void func(char a, char b) { int i; for (i = 0; i != 72526; i++) a[i] = b[i]; } ``` Guessing instruction properties: ``` func: # @func move $a2, $zero .LBB0_1: # =>This Inner Loop Header: Depth=1 ldx.b $a3, $a1, $a2 stx.b $a3, $a0, $a2 addi.d $a2, $a2, 1 lu12i.w $a3, 17 ori $a3, $a3, 2894 bne $a2, $a3, .LBB0_1 ret .Lfunc_end0: ``` Explicitly specify instruction properties: ``` func: # @func lu12i.w $a2, 17 ori $a2, $a2, 2894 move $a3, $zero .LBB0_1: # =>This Inner Loop Header: Depth=1 ldx.b $a4, $a1, $a3 stx.b $a4, $a0, $a3 addi.d $a3, $a3, 1 bne $a3, $a2, .LBB0_1 ret .Lfunc_end0: ``` Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D154192	2023-07-11 08:45:24 +08:00
Han Shen	8df75969ae	[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile. For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles. That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples. With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark. This is #1 of the two patches that enables the improvement. Reviewed By: wenlei, snehasish, xur Differential Revision: https://reviews.llvm.org/D152399	2023-07-10 16:00:30 -07:00
David Green	44479b80a6	[AArch64] Ensure constrained register class in INS peephole. Ensure we constrain the register class of the NewDef to that of OldDef, in case they do not match. Fixes #63777	2023-07-10 22:48:10 +01:00
Craig Topper	dbd47c4489	[RISCV] Don't allow X0 to be used for 'r' constraint in inline assembly Some instructions treat x0 as a special encoding rather than as a value of 0. Since we don't parse the inline assembly to know what the instruction is, chooser the safest option of never using x0. Fixes #63747. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D154744	2023-07-10 13:25:17 -07:00
Jake Egan	bbd0d123d3	Implement -frecord-command-line for XCOFF This patch extends support of the option `-frecord-command-line` to XCOFF. XCOFF doesn’t have custom sections like ELF, so the command line data is emitted to a .info section instead. A C_INFO symbol is generated with the .info section to preserve the command line data past the link step. Multiple command lines are separated by newlines and null bytes. The command line data can be retrieved on AIX with command `what file_name`. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D153600	2023-07-10 12:47:07 -04:00
Simon Pilgrim	1ab442464b	[X86] Regenerate or-address.ll test checks	2023-07-10 16:55:52 +01:00
Nabeel Omer	862e5dcb7e	[X86] Pre-commit test for lowerAtomicArith Test for https://reviews.llvm.org/D154609	2023-07-10 14:21:20 +00:00
Jingu Kang	041db60bc1	[tests] Update precommit test for MachineLICM subloops precommit test for https://reviews.llvm.org/D154205	2023-07-10 14:32:45 +01:00
Igor Kirillov	0aecf7ff0d	[CodeGen] Fix incorrectly detected reduction bug in ComplexDeinterleaving pass Using ACLE intrinsics, it is possible to create a loop that the deinterleaving pass incorrectly classified as a reduction loop. For example, for fixed-width vectors the loop was like below: vector.body: %a = phi <4 x float> [ %init.a, %entry ], [ %updated.a, %vector.body ] %b = phi <4 x float> [ %init.b, %entry ], [ %updated.b, %vector.body ] ... ; Does not depend on %a or %b: %updated.a = ... %updated.b = ... Differential Revision: https://reviews.llvm.org/D154598	2023-07-10 12:54:38 +00:00
Amara Emerson	3a80bdb316	[GlobalISel] Remove an erroneous oneuse check in the G_ADD reassociation combine. This check was unnecessary/incorrect, it was already being done by the target hook default implementation, and the one in the matcher was checking for a completely different thing. This change: 1) Removes the check and updates affected tests which now do some more reassociations. 2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse check. Not sure why I didn't do this the first time.	2023-07-10 01:03:12 -07:00
Alex Bradbury	29f630a1dd	[RISCV][MC] MC layer support for the experimental zacas extension This implements the v1.0-rc1 draft extension. amocas.d on RV32 and amocas.q have the restriction that rd and rs2 must be even registers. I've opted to implement this restriction in RISCVAsmParser::validateInstruction even though for codegen we'll need a new register class and can then remove this validation. This also sidesteps, for now, the issue of amocas.d being different on rv32 vs rv64. See <https://github.com/riscv-non-isa/riscv-c-api-doc/issues/37> for the issue of needing an agreed asm register constraint for register pairs. Differential Revision: https://reviews.llvm.org/D149248	2023-07-10 08:26:31 +01:00
Wang, Xin10	284a059b33	Revert "[X86]Remove TEST in AND32ri+TEST16rr in peephole-opt" This reverts commit 2c64226d84174dd1d9f93e1884c1b0bd432f89b5. revert first due to buildbot fail https://lab.llvm.org/buildbot/#/builders/85/builds/17571	2023-07-10 03:20:11 -04:00
XinWang10	2c64226d84	[X86]Remove TEST in AND32ri+TEST16rr in peephole-opt Previously we remove a pattern like: %reg = and32ri %in_reg, 5 ... // EFLAGS not changed. %src_reg = subreg_to_reg 0, %reg, %subreg.sub_index test64rr %src_reg, %src_reg, implicit-def $eflags We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially. And this case is also can be opted for the same reason, like: %reg = and32ri %in_reg, 5 ... // EFLAGS not changed. %src_reg = copy %reg.sub_16bit:gr32 test16rr %src_reg, %src_reg, implicit-def $eflags The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D154193	2023-07-09 23:21:32 -04:00
Simon Pilgrim	7428739ea8	[X86] matchAddressRecursively - peek through ZEXT nodes to match foldMaskAndShiftToExtract Handle (zero_extend (and (srl X, C1), C2)) patterns to allow foldMaskAndShiftToExtract to match h-register extractions from smaller types Ideally matchAddressRecursively needs to be able to recurse through ZEXT/SEXT nodes generally but for now we should just handle specific cases when they occur Addresses regressions in D146121	2023-07-09 15:41:38 +01:00
Simon Pilgrim	848f6abfdb	[X86] Add tests showing failure by matchAddressRecursively to peek through ZEXT nodes to match foldMaskAndShiftToExtract Test coverage for a similar regression in D146121	2023-07-09 15:41:38 +01:00
David Green	758c4640c9	[CGP] Enable CodeGenPrepares phi type convertion. This is a recommit of 67121d7, enabling the CodeGenPrepare OptimizePhiTypes option that can help with the type of phi instructions into ISel.	2023-07-09 10:32:11 +01:00
LiaoChunyu	1575063db2	[RISCV] Match shl_vl (ext_vl v, splat 1) to vwadd_vl Similer to: D153112, match shl (v, splat 1) to vwadd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154726	2023-07-08 08:03:15 +08:00
Matt Arsenault	310f839612	DAG: Lower is.fpclass fcInf to fcmp of fabs InstCombine should have taken care of this, but I think this is more useful in the future when the expansion tries to handle multiple cases at a time with fcmp. x87 looks worse to me but the only thing I know about it is that I aggressively do not care about it. https://reviews.llvm.org/D143198	2023-07-07 17:00:10 -04:00
Matt Arsenault	64d325454b	AMDGPU: Delete custom combine on class intrinsic This is no longer necessary as class-with-constant will always be transformed to the generic class intrinsic. https://reviews.llvm.org/D153901	2023-07-07 15:28:21 -04:00
Nemanja Ivanovic	b0e249d5e2	Reland "[PowerPC] Remove extend between shift and and" The commit originally caused a bootstrap failure on the big endian PPC bot as the combine was interfering with the legalizer when applied on illegal types. This update restricts the combine to the only types for which it is actually needed. Tested on PPC BE bootstrap locally.	2023-07-07 14:45:05 -04:00
Christudasan Devadasan	7a98f084c4	[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs Currently, the custom SGPR spill lowering pass spills SGPRs into physical VGPR lanes and the remaining VGPRs are used by regalloc for vector regclass allocation. This imposes many restrictions that we ended up with unsuccessful SGPR spilling when there won't be enough VGPRs and we are forced to spill the leftover into memory during PEI. The custom spill handling during PEI has many edge cases and often breaks the compiler time to time. This patch implements spilling SGPRs into virtual VGPR lanes. Since we now split the register allocation for SGPRs and VGPRs, the virtual registers introduced for the spill lanes would get allocated automatically in the subsequent regalloc invocation for VGPRs. Spill to virtual registers will always be successful, even in the high-pressure situations, and hence it avoids most of the edge cases during PEI. We are now left with only the custom SGPR spills during PEI for special registers like the frame pointer which is an unproblematic case. Differential Revision: https://reviews.llvm.org/D124196	2023-07-07 23:14:32 +05:30
Christudasan Devadasan	b4a62b1fa5	[AMDGPU] Enable whole wave register copy So far, we haven't exposed the allocation of whole-wave registers to regalloc. We hand-picked them for various whole wave mode operations. With a future patch, we want the allocator to efficiently allocate them rather than using the custom pre-allocation pass. Any liverange split of virtual registers involved in whole-wave operations require the resulting COPY introduced with the split to be performed for all lanes. It isn't implemented in the compiler yet. This patch would identify all such copies and manipulate the exec mask around them to enable all lanes without affecting the value of exec mask elsewhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143762	2023-07-07 22:58:55 +05:30
Christudasan Devadasan	b78b36e1a2	[AMDGPU] Implement whole wave register spill To reduce the register pressure during allocation, when the allocator spills a virtual register that corresponds to a whole wave mode operation, the spill loads and restores should be activated for all lanes by temporarily flipping all bits in exec register to one just before the spills. It is not implemented in the compiler as of today and this patch enables the necessary support. This is a pre-patch before the SGPR spill to virtual VGPR lanes that would eventually causes the whole wave register spills during allocation. Reviewed By: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D143759	2023-07-07 22:51:45 +05:30
Yashwant Singh	b7836d8562	[CodeGen]Allow targets to use target specific COPY instructions for live range splitting Replacing D143754. Right now the LiveRangeSplitting during register allocation uses TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a problem as we have both vector and scalar copies. Vector copies perform a copy over a vector register but only on the lanes(threads) that are active. This is mostly sufficient however we do run into cases when we have to copy the entire vector register and not just active lane data. One major place where we need that is live range splitting. Allowing targets to use their own copy instructions(if defined) will provide a lot of flexibility and ease to lower these pseudo instructions to correct MIR. - Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting. - Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline. Reviewed By: arsenm, cdevadas, kparzysz Differential Revision: https://reviews.llvm.org/D150388	2023-07-07 22:29:50 +05:30
Qiu Chaofan	a2b5117df7	[PowerPC] Update InputOps of Power10 SchedModel Count of input operands affect pipeline forwarding in scheduling model. Previous Power10 model definition arranges some instructions into incorrect groups, by counting the wrong number of input operands. This patch updates the model, setting the input operands count correctly by excluding irrelevant immediate operands and count memory operands of load instructions correctly. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D153842	2023-07-07 22:46:22 +08:00
Luke Lau	02bb33c3ce	[RISCV] Check for alignment when lowering interleaved/deinterleaved loads/stores As noted by @reames, we should be checking that the memory access is aligned to the element size (or the unaligned vector memory access feature is enabled) before lowering vlseg/vsseg intrinsics via the interleaved access pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D154536	2023-07-07 15:34:24 +01:00

... 77 78 79 80 81 ...

52796 Commits