llvm-project

Author	SHA1	Message	Date
Tomas Matheson	fc712eb7aa	[AArch64] Fix Copy Elemination for negative values Redundant Copy Elimination was eliminating a MOVi32imm -1 when it determined that the value of the destination register is already -1. However, it didn't take into account that the MOVi32imm zeroes the upper 32 bits (which are FFFFFFFF) and therefore cannot be eliminated. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D93100	2020-12-18 13:30:46 +00:00
Paul Walker	c0bc169cb1	[NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions. During isel there's no need to protect illegal types. Patch also adds a missing unit test for tbl2 intrinsic using bfloat types. Differential Revision: https://reviews.llvm.org/D93404	2020-12-18 13:20:41 +00:00
Kerry McLaughlin	52e4084d9c	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Simon Pilgrim	992fad03e2	[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling. Simplifies a few more cases, notably shuffle demanded elts cases.	2020-12-18 11:51:10 +00:00
Bjorn Pettersson	a89d751fb4	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
QingShan Zhang	477b6505fa	[PowerPC] Select the D-Form load if we know its offset meets the requirement The LD/STD likewise instruction are selected only when the alignment in the load/store >= 4 to deal with the case that the offset might not be known(i.e. relocations). That means we have to select the X-Form load for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select the D-Form load if the offset is known. So, we only query the load/store alignment when we don't know if the offset is a multiple of 4. Reviewed By: jji, Nemanjai Differential Revision: https://reviews.llvm.org/D93099	2020-12-18 07:27:26 +00:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Monk Chiang	ee2cb90e3b	[RISCV] Define vsadd/vsaddu/vssub/vssubu intrinsics. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Co-Authored-by: Monk Chiang <monk.chiang@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93366	2020-12-18 10:24:24 +08:00
Layton Kifer	385e9a2a04	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
Zakk Chen	4b07c515ef	[RISCV] Define vlse/vsse intrinsics. Define vlse/vsse intrinsics and lower to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93445	2020-12-17 17:00:01 -08:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit d20e0c3444ad9ada550d9d6d1d56fd72948ae444.	2020-12-17 21:00:37 +00:00
Baptiste Saleil	c2892978e9	[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_ On PPC, the vector pair instructions are independent from MMA. This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names. We also move the vector pair type/intrinsic/builtin tests to their own files. Differential Revision: https://reviews.llvm.org/D91974	2020-12-17 13:19:27 -05:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Yvan Roux	923ca0b411	[ARM][MachineOutliner] Fix costs model. Fix candidates calls costs models allocation and prepare stack fixups handling. Differential Revision: https://reviews.llvm.org/D92933	2020-12-17 16:08:23 +01:00
Kerry McLaughlin	7c504b6dd0	[AArch64] Renamed sve-masked-scatter-legalise.ll. NFC.	2020-12-17 11:40:09 +00:00
Kerry McLaughlin	6d2a78996b	[SVE][CodeGen] Add bfloat16 support to scalable masked gather Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93307	2020-12-17 11:08:15 +00:00
Simon Pilgrim	cdb692ee0c	[X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969) Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns. This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts. As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes. Later patches will continue to replace X86ISD::SUBV_BROADCAST usage. Differential Revision: https://reviews.llvm.org/D92645	2020-12-17 10:25:25 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Xiang1 Zhang	39584ae5b5	[Debugify] Support checking Machine IR debug info Add mir-check-debug pass to check MIR-level debug info. For IR-level, currently, LLVM have debugify + check-debugify to generate and check debug IR. Much like the IR-level pass debugify, mir-debugify inserts sequentially increasing line locations to each MachineInstr in a Module, But there is no equivalent MIR-level check-debugify pass, So now we support it at "mir-check-debug". Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D91595	2020-12-16 22:17:25 -08:00
Xiang1 Zhang	1e42ad9d62	Revert "[Debugify] Support checking Machine IR debug info" This reverts commit 50aaa8c274910d78d7bf6c929a34fe58b1f45579.	2020-12-16 20:12:33 -08:00
Hsiangkai Wang	a5e4a513b0	[RISCV] Define vector widening mul intrinsics. Define vector widening mul intrinsics and lower them to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93381	2020-12-17 11:50:33 +08:00
Hsiangkai Wang	dd5281e7cc	[RISCV] Define vector mul/div/rem intrinsics. Define vector mul/div/rem intrinsics and lower them to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93380	2020-12-17 11:50:17 +08:00
Hsiangkai Wang	f03609b5c7	[RISCV] V does not imply F. If users want to use vector floating point instructions, they need to specify 'F' extension additionally. Differential Revision: https://reviews.llvm.org/D93282	2020-12-17 10:57:36 +08:00
Matt Arsenault	f333736757	AMDGPU: Remove SGPRSpillVGPRDefinedSet hack These VGPRs should be reserved and therefore do not need "correct" liveness. They should not have undef uses, which can still cause issues.	2020-12-16 21:33:35 -05:00
Zakk Chen	c1d6d461aa	[RISCV] Define vle/vse intrinsics. Define vle/vse intrinsics and lower to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93359	2020-12-16 18:08:15 -08:00
Xiang1 Zhang	50aaa8c274	[Debugify] Support checking Machine IR debug info Add mir-check-debug pass to check MIR-level debug info. For IR-level, currently, LLVM have debugify + check-debugify to generate and check debug IR. Much like the IR-level pass debugify, mir-debugify inserts sequentially increasing line locations to each MachineInstr in a Module, But there is no equivalent MIR-level check-debugify pass, So now we support it at "mir-check-debug". Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D91595	2020-12-16 18:04:05 -08:00
Guozhi Wei	687e80be7f	[MBP] Add whole chain to BlockFilterSet instead of individual BB Currently we add individual BB to BlockFilterSet if its frequency satisfies LoopFreq / Freq <= LoopToColdBlockRatio LoopFreq is edge frequency from outside to loop header. LoopToColdBlockRatio is a command line parameter. It doesn't make sense since we always layout whole chain, not individual BBs. It may also cause a tricky problem. Sometimes it is possible that the LoopFreq of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop, like .cold in the test case. So it is added to the chain of inner loop. When work on the outer loop, .cold is not added to BlockFilterSet, so the edge to successor .problem is not counted in UnscheduledPredecessors of .problem chain. But other blocks in the inner loop are added BlockFilterSet, so the whole inner loop chain can be layout, and markChainSuccessors is called to decrease UnscheduledPredecessors of following chains. markChainSuccessors calls markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold, so .problem chain's UnscheduledPredecessors is decreased, but this edge was not counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors becomes 0 when it still has an unscheduled predecessor .pred! And it causes problems in following various successor BB selection algorithms. Differential Revision: https://reviews.llvm.org/D89088	2020-12-16 15:45:47 -08:00
Harald van Dijk	09d0e7a7c1	[X86] Avoid %fs:(%eax) references in x32 mode The ABI explains that %fs:(%eax) zero-extends %eax to 64 bits, and adds that the TLS base address, but that the TLS base address need not be at the start of the TLS block, TLS references may use negative offsets. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93158	2020-12-16 22:39:57 +00:00
Reid Kleckner	15ca54525d	Fix XCore test on Windows, the register order is reversed, for reasons unknown	2020-12-16 13:33:24 -08:00
Mircea Trofin	cd551f8564	[NFC] Remove unused prefixes in llvm/test/CodeGen/X86 Differential Revision: https://reviews.llvm.org/D92965	2020-12-16 12:41:49 -08:00
Esme-Yi	2ea7210e39	Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA." This reverts commit 1c0941e1524f499e3fbde48fc3bdd0e70fc8f2e4.	2020-12-16 17:12:24 +00:00
Simon Pilgrim	e5039aad45	[X86] Regenerate bit extraction tests, cleaning up check-prefixes. As noticed on D92965, we needed to simplify the prefixes to ensure all RUNs were properly covered. We should never have a target with BMI2 without BMI1, so use that as the 'BMI level' and then check with/without TBM (all TBM targets have at least BMI1).	2020-12-16 14:48:21 +00:00
diggerlin	a1e1dcabe4	[XCOFF][AIX] Emit EH information in traceback table SUMMARY: In order for the runtime on AIX to find the compact unwind section(EHInfo table), we would need to set the following on the traceback table: The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag. The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents. Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table. The patch is authored by Jason Liu. Reviewers: David Tenty, Digger Lin Differential Revision: https://reviews.llvm.org/D92766	2020-12-16 09:34:59 -05:00
Denis Antrushin	c5771a2f2d	[Statepoints] Extract invoke tests into separate file. NFC. Extract VReg lowering tests with invokes into separate file for easier maintenance/modification. Check MIR after register allocation - at this point all transformations we're interested in has been applied and verifying of MIR is simpler than that of assembly.	2020-12-16 20:53:28 +07:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit cf638d793c489632bbcf0ee0fbf9d0f8c76e1f48.	2020-12-16 11:52:30 +00:00
Simon Pilgrim	e55f7de946	[X86][SSE] combineReductionToHorizontal - don't rely on widenSubVector to handle illegal vector types. Thanks to @asbirlea for reporting the bug.	2020-12-16 11:24:40 +00:00
Stanislav Mekhanoshin	eb66bf0802	[AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353	2020-12-15 22:44:30 -08:00
Zakk Chen	15ce0ab7ac	[RISCV] Refine vector load/store tablegen pattern, NFC. Refine tablegen pattern for vector load/store, and follow D93012 to separate masked and unmasked definitions for pseudo load/store instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93284	2020-12-15 18:55:55 -08:00
Krzysztof Parzyszek	0f903015c7	[Hexagon] Rename test case, NFC	2020-12-15 19:05:31 -06:00
Krzysztof Parzyszek	16385643bb	[Hexagon] Emit enough stores when aligning vector addresses	2020-12-15 18:59:53 -06:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Matt Arsenault	60eba8161b	RegisterCoalescer: Remove phi-only subranges when erasing identity copies Undef subranges are not present in the live range values, except when they cross block boundaries. In this situation, a identity copy is inside a loop, and one of the lanes is undefined. It only appears alive inside the loop due to the copy. Once the copy was erased, it would leave behind a segment inside the loop body with no corresponding def anywhere in the program. When RenameIndependentSubregs processed this dummy interval, it would introduce a "Multiple connected components in live interval" verifier error when IMPLICIT_DEFs were added to the other two blocks. I believe there is a missing verifier check for this type of dummy interval. I have found additional cases from the same fundamental problem in other areas I haven't managed to fix yet (e.g. the commented out prune_subrange_phi_value_* cases).	2020-12-15 17:36:32 -05:00
Hsiangkai Wang	c1dac6bac5	[RISCV] Define vfadd/vfsub/vfrsub intrinsics. Define vfadd/vfsub/vfrsub intrinsics and lower to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93291	2020-12-16 06:31:47 +08:00
Hsiangkai Wang	903f295009	[RISCV] Define vmin/vminu/vmax/vmaxu intrinsics. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93218	2020-12-16 06:31:47 +08:00
Hsiangkai Wang	fd27164563	[RISCV] Define vnsrl/vnsra intrinsics. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93207	2020-12-16 06:31:47 +08:00
Hsiangkai Wang	95795e7a65	[RISCV] Define vsll/vsrl/vsra intrinsics. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93193	2020-12-16 06:31:47 +08:00
Hsiangkai Wang	19db6a652b	[RISCV] Define vadc/vmadc/vsbc/vmsbc intrinsics. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93175	2020-12-16 06:31:47 +08:00
Krzysztof Parzyszek	71601d2ac9	[Hexagon] Fix bitcasting v1i8 -> i8	2020-12-15 16:01:24 -06:00
Baptiste Saleil	57d83c3a90	[PowerPC] Enable paired vector type and intrinsics when MMA is disabled This patch enables the Clang type __vector_pair and its associated LLVM intrinsics even when MMA is disabled. With this patch, the type is now controlled by the PPC paired-vector-memops option. The builtins and intrinsics will be renamed to drop the mma prefix in another patch. Differential Revision: https://reviews.llvm.org/D91819	2020-12-15 15:14:11 -06:00
Mircea Trofin	1183e55580	[NFC] update extract-lowbits.ll and scalar-pf-to-i64.ll Auto-updated with update_llc_test_checks	2020-12-15 10:04:45 -08:00

1 2 3 4 5 ...

36897 Commits