llvm-project

Author	SHA1	Message	Date
Nemanja Ivanovic	0464586ac5	[PowerPC] Combine 64-bit bswap(load) without LDBRX When targeting CPUs that don't have LDBRX, we end up producing code that is very inefficient and large for this common idiom. This patch just optimizes it two 32-bit LWBRX instructions along with a merge. This fixes https://bugs.llvm.org/show_bug.cgi?id=49610 Differential revision: https://reviews.llvm.org/D104836	2021-06-24 15:11:47 -05:00
Jonas Paulsson	1eda5453f2	[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions. - When emitting libcalls, do not only pass the calling convention from the function prototype but also the attributes. - Do not pass attributes from e.g. libc memcpy to llvm.memcpy. Review: Reid Kleckner, Eli Friedman, Arthur Eubanks Differential Revision: https://reviews.llvm.org/D103992	2021-06-24 14:47:24 -05:00
Roman Lebedev	4867641f30	[NFC][Codegen] Autogenerate Thumb2/setjmp_longjmp.ll test	2021-06-24 21:35:05 +03:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Pablo Barrio	571c8c5263	[AArch64][v8.3A] Avoid inserting implicit landing pads (PACISP) PACISP have the advantage that they are in HINT space, meaning they can be run successfully in hardware without PAuth support - they will just behave as a NOP. However, PACISP are also implicit landing pads (think of an extra BTI jc). Therefore, they allow indirect jumps of all kinds into them, potentially inserting new gadgets. This patch replaces PACISP by PACI* LR, SP when compiling explicitly for hardware with full PAuth support. PACI* is not in the HINT space, therefore it will fault when run in hardware without PAuth support, but it is also not a landing pad, making programs safer in newer HW. Differential Revision: https://reviews.llvm.org/D101920	2021-06-24 18:24:32 +01:00
Craig Topper	03f9e04bc3	[TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant. We don't constant fold based on demanded bits elsewhere in SimplifyDemandedBits, so I don't think we should shrink them either. The affected ARM test changes because a constant become non-opaque and eventually enabled some constant folding. This no longer happens. I checked and InstCombine is able to simplify this test. I'm not sure exactly what it was trying to test. Reviewed By: lebedev.ri, dmgreen Differential Revision: https://reviews.llvm.org/D104832	2021-06-24 10:09:36 -07:00
Sjoerd Meijer	c74aea4663	[AArch64] Precommit extending load tests for D104782. NFC.	2021-06-24 15:59:53 +01:00
David Green	1113e06821	[ARM] Extend narrow values to allow using truncating scatters As a minor adjustment to the existing lowering of offset scatters, this extends any smaller-than-legal vectors into full vectors using a zext, so that the truncating scatters can be used. Due to the way MVE legalizes the vectors this should be cheap in most situations, and will prevent the vector from being scalarized. Differential Revision: https://reviews.llvm.org/D103704	2021-06-24 13:09:11 +01:00
Florian Hahn	a54c6fc083	[X86] Exclude invalid element types for bitcast/broadcast folding. It looks like the fold introduced in 63f3383ece25efa can cause crashes if the type of the bitcasted value is not a valid vector element type, like x86_mmx. To resolve the crash, reject invalid vector element types. The way it is done in the patch is a bit clunky. Perhaps there's a better way to check? Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104792	2021-06-24 12:39:01 +01:00
Simon Pilgrim	c4d3eedc7f	[X86] Fold nested select_cc to select (cmp*ge/le Cond0, Cond1), LHS, Y) select (cmpeq Cond0, Cond1), LHS, (select (cmpugt Cond0, Cond1), LHS, Y) --> (select (cmpuge Cond0, Cond1), LHS, Y) etc, We already perform this fold in DAGCombiner for MVT::i1 comparison results, but these can still appear after legalization (in x86 case with MVT::i8 results), where we need to be more careful about generating new comparison codes. Pulled out of D101074 to help address the remaining regressions. Differential Revision: https://reviews.llvm.org/D104707	2021-06-24 11:27:57 +01:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Roman Lebedev	cba4b104a9	[NFC][AArch64] Un-autogenerate swifterror.ll tests It appears the change needed in D104597 is minimal and obvious, so let's not make them so verbose.	2021-06-24 13:11:26 +03:00
Fraser Cormack	a4729f7f88	[RISCV] Lower RVV vector SELECTs to VSELECTs This patch optimizes the code generation of vector-type SELECTs (LLVM select instructions with scalar conditions) by custom-lowering to VSELECTs (LLVM select instructions with vector conditions) by splatting the condition to a vector. This avoids the default expansion path which would either introduce control flow or fully scalarize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104772	2021-06-24 10:12:51 +01:00
Carl Ritson	98f48723f2	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Kai Luo	767e200b43	[PowerPC] Add test to show passes in O3 pipeline. NFC.	2021-06-24 03:20:35 +00:00
Jon Chesterfield	660cae84c3	Revert "[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees" This reverts commit 6a3beb1f68d6791a4cd0190f68b48510f754a00a. Test case that triggers an infinite loop before the revert is at the review for D103138.	2021-06-24 02:33:50 +01:00
Craig Topper	91319534ba	[CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt. This optimization pre-promotes the input and constants for a switch instruction to a legal type so that all the generated compares share the same extend. Since RISCV prefers sext for i32 to i64 extends, we should honor that to use sext.w instead of a pair of shifts. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D104612	2021-06-23 15:38:11 -07:00
Xun Li	f09ec01f1f	[SjLj] Insert UnregisterFn before musttail call When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104807	2021-06-23 15:33:55 -07:00
Xun Li	f8c84da23b	Revert "[SjLj] Insert UnregisterFn before musttail call" This reverts commit f36703ada3dc18388ef5cdcbb8f39f74c27ad8e9. Test failure: https://lab.llvm.org/buildbot#builders/104/builds/3450	2021-06-23 15:31:35 -07:00
Xun Li	f36703ada3	[SjLj] Insert UnregisterFn before musttail call When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract. Differential Revision: https://reviews.llvm.org/D104807	2021-06-23 14:29:46 -07:00
Nikita Popov	70b1a8c095	[PatternMatch] Make m_VScale compatible with opaque pointers Use GEP source type instead of pointer element type.	2021-06-23 23:02:13 +02:00
Roman Lebedev	e6a353061f	[NFC][AArch64] Autogenerate assembly checklines in arm64-instruction-mix-remarks.ll	2021-06-24 00:01:13 +03:00
Stanislav Mekhanoshin	d274d64ef4	[AMDGPU] Check for pointer operand while refining LDS align Also skips the propagation if alignment is 1. Differential Revision: https://reviews.llvm.org/D104796	2021-06-23 12:27:55 -07:00
Jinsong Ji	c125af82a5	[DAGCombine] Check reassoc flags in aggressive fsub fusion The is from discussion in https://reviews.llvm.org/D104247#inline-993387 The contract and reassoc flags shouldn't imply each other . All the aggressive fsub fusion reassociate operations, we should guard them with reassoc flag check. Reviewed By: mcberg2017 Differential Revision: https://reviews.llvm.org/D104723	2021-06-23 13:59:40 +00:00
Roman Lebedev	eb7ce97870	[NFC][ARM] Fix update_llc_test_checks for thumbv7-apple-darwin, autogenerate thumb2-ifcvt1.ll	2021-06-23 16:31:19 +03:00
Roman Lebedev	b77972ac4f	[NFC][AArch64] Autogenerate a few more tests	2021-06-23 16:31:19 +03:00
Roman Lebedev	3c94869632	[NFC][ARM] Fix update_llc_test_checks for aarch64-apple-ios/thumbv7s-apple-darwin, autogenerate a few tests	2021-06-23 16:31:19 +03:00
Roman Lebedev	15be15073e	[NFC][ARM] Fix update_llc_test_checks for thumbv7-apple-ios, autogenerate switch-minsize.ll	2021-06-23 16:31:19 +03:00
Roman Lebedev	4de0c40031	[NFC][ARM] Fix update_llc_test_checks for armv7-apple-ios, autogenerate ifcvt5.ll/ifcvt6.ll	2021-06-23 16:31:19 +03:00
Rosie Sumpter	12cb8ca668	[AArch64] Add CodeGen tests for vector reduction intrinsics. NFC Tests are added for vector reduce OR, AND and XOR. Differential Revision: https://reviews.llvm.org/D104771	2021-06-23 13:46:16 +01:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Stanislav Mekhanoshin	2b43209ee3	[AMDGPU] Propagate LDS align into to instructions Differential Revision: https://reviews.llvm.org/D104316	2021-06-23 00:57:16 -07:00
Martin Storsjö	1cb7849a55	Revert "[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier" This reverts commit ea011ec5ed53599305de62ca5fcfd31f4b3448c3. This still causes some miscompiles, I'll follow up in the phabricator review with a sample of that issue (which is part of the sample of the previous issue).	2021-06-23 09:54:16 +03:00
Jim Lin	0365af1a87	[M68k] Add testcases for shift and rotate instructions Add codegen testcases for lsl, lsr, asr, rol and ror instructions. Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D104685	2021-06-23 13:26:58 +08:00
Jim Lin	5cb5225cf5	[M68k] Refactor codegen patterns for logic operations and add tests for it Refactor pat for and, or and xor operation and add missing tests for it Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D104626	2021-06-23 13:25:24 +08:00
Jon Roelofs	493d6928fe	[Remarks] Make memsize remarks report as an analysis, not a missed opportunity. Differential revision: https://reviews.llvm.org/D104078	2021-06-22 18:22:47 -07:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Matt Arsenault	2e120920ac	AMDGPU: Add baseline test for instructions zeroing high bits	2021-06-22 13:27:39 -04:00
Matt Arsenault	9ad8a1f6fb	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
zhijian	bd240b3d77	[AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table. Summary: generate eh_info when vector registers are saved according to the traceback table. struct eh_info_t { unsigned version; /* EH info version 0 / #if defined(64BIT) char _pad[4]; / padding / #endif unsigned long lsda; / Pointer to Language Specific Data Area / unsigned long personality; / Pointer to the personality routine */ }; the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function Reviewers: Jason Liu Differential Revision: https://reviews.llvm.org/D103651	2021-06-22 13:01:31 -04:00
Stanislav Mekhanoshin	d797a7f8da	[AMDGPU] Use performOptimizedStructLayout for LDS sort This gives better packing. Differential Revision: https://reviews.llvm.org/D104331	2021-06-22 09:58:10 -07:00
Fangrui Song	f53d791520	Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular) Before: `warning: stack size limit exceeded (888) in main` After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned) Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104667	2021-06-22 09:55:20 -07:00
Meera Nakrani	ea011ec5ed	[AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier This is a recommit that fixes unwanted STP generation by checking that the base register has not been modified or used elsewhere. Our initial motivating case was memcpy's with alignments > 16. The loads/stores, to which small memcpy's expand, are kept together in several places so that we get a sequence like this for a 64 bit copy: LD w0 LD w1 ST w0 ST w1 The load/store optimiser can generate a LDP/STP w0, w1 from this because the registers read/written are consecutive. In our case however, the sequence is optimised during ISel, resulting in: LD w0 ST w0 LD w0 ST w0 This instruction reordering allows reuse of registers. Since the registers are no longer consecutive (i.e. they are the same), it inhibits LDP/STP creation. The approach here is to perform renaming: LD w0 ST w0 LD w1 ST w1 to enable the folding of the stores into a STP. We do not yet generate the LDP due to a limitation in the renaming implementation, but plan to look at that in a follow-up so that we fully support this case. While this was initially motivated by certain memcpy's, this is a general approach and thus is beneficial for other cases too, as can be seen in some test changes. Differential Revision: https://reviews.llvm.org/D103597	2021-06-22 15:29:13 +00:00
Nick Desaulniers	8ace121305	[IR] convert warn-stack-size from module flag to fn attr Otherwise, this causes issues when building with LTO for object files that use different values. Link: https://github.com/ClangBuiltLinux/linux/issues/1395 Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D104342	2021-06-21 15:09:25 -07:00
Eli Friedman	bf0d0671a1	[ARM] Make sure we don't transform unaligned store to stm on Thumb1. This isn't likely to come up in practice; the combination of compiler flags required to hit this issue should be rare. Found by inspection.	2021-06-21 14:32:42 -07:00
Jinsong Ji	3996311ee1	[DAGCombine] reassoc flag shouldn't enable contract According to IR LangRef, the FMF flag: contract Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add). reassoc Allow reassociation transformations for floating-point instructions. This may dramatically change results in floating-point. My understanding is that these two flags shouldn't imply each other, as we might have a SDNode that can be reassociated with others, but not contractble. eg: We may want following fmul/fad/fsub to freely reassoc, but don't want fma being generated here. %F = fmul reassoc double %A, %B ; <double> [#uses=1] %G = fmul reassoc double %C, %D ; <double> [#uses=1] %H = fadd reassoc double %F, %G ; <double> [#uses=1] %I = fsub reassoc double %H, %E ; <double> [#uses=1] Before https://reviews.llvm.org/D45710, `reassoc` flag actually did not imply isContratable either. The current implementation also only check the flag in fadd node, ignoring fmul node, this patch update that as well. Reviewed By: spatel, qiucf Differential Revision: https://reviews.llvm.org/D104247	2021-06-21 21:15:43 +00:00
Craig Topper	9080659ac7	[RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104163	2021-06-21 11:27:44 -07:00
Matt Arsenault	4819cd162e	AMDGPU: Add missing tests for v_fma_mixlo	2021-06-21 10:58:53 -04:00
Sam Tebbs	bbe16b7af2	[ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix Conversion from a fixed-point number to a floating-point number is done by multiplying the fixed-point number by 2^(-n) where n is the number of fractional bits. Currently this is lowered to a vcvt (integer to floating-point) then a vmul, but it can instead be lowered directly to a vcvt (fixed-point to floating-point). This patch enables such transformations as long as the multiplication factor is a power of 2. Differential Revision: https://reviews.llvm.org/D103903	2021-06-21 14:14:09 +01:00

1 2 3 4 5 ...

39354 Commits