llvm-project

Author	SHA1	Message	Date
Amaury Séchet	7b54626bac	[NFC] Fix indentation in addcarry.ll	2023-07-05 16:45:14 +00:00
Luke Lau	60be17a685	[RISCV] Add VFCVT pseudos with no mask When emitting a vfcvt with a rounding mode, we end up generating an unnecessary vmset because the only rounding mode pseudos have a mask operand. This patch adds a pseudo without a mask, and marks the masked variant with the MaskedPseudo class so the doPeepholeMergeVMV optimisation knows to remove the redundant vmset. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D154266	2023-07-05 17:28:43 +01:00
Simon Pilgrim	38721f29f8	[X86] ComputeNumSignBitsForTargetNode - attempt to recognise PACKSSDW(PACKSSDW(X,Y),PACKSSDW(Z,W)) patterns These are often used when we're packing vXi64 comparison results, but we don't have PACKSSQD so have to bitcast, which doesn't work well with num sign bits value tracking.	2023-07-05 16:43:43 +01:00
Simon Pilgrim	a32d14fd4c	[X86] Fold BITOP(PACKSS(X,Z),PACKSS(Y,W)) --> PACKSS(BITOP(X,Y),BITOP(Z,W)) Fold allsignbits pack patterns to make better use of cheap (and commutable) logic ops	2023-07-05 16:43:43 +01:00
David Green	ae8f929b93	[AArch64] Use known zero bits when creating BIC If we know bits are already 0, we will not need to clear them again with a BIC. So we can use KnownBits to shrink the size of the constant in the creation BIC from And, potentially undoing the known-bits folds that happen during compilation. BIC only has a single register operand for input and output, so has less scheduling freedom than a AND, but usually saves the materialization of a constant. Differential Revision: https://reviews.llvm.org/D154217	2023-07-05 15:42:33 +01:00
David Green	86bd9a420f	[AArch64] Additional tests for creating BIC from known bits. NFC	2023-07-05 15:42:33 +01:00
Alex Bradbury	2ae71f541e	[RISCV][test] Add commented out f128 test for llvm.frexp.ll This represents the crash reported in <https://github.com/llvm/llvm-project/issues/63661>	2023-07-05 11:18:51 +01:00
Alex Bradbury	7de4c6f8d9	[RISCV][test] Add test coverage for llvm.frexp.. intrinsics Reapply - the issue was that the `< %s` was missing in the RUN lines, which didn't impact update_llc_test_checks but of course caused issues for lit. The test file is copied from X86 (which is also mostly shared with Arm, PowerPC) rather than integrated into float-intrinsics.ll and double-intrinsics.ll. There's currently a compiler crash for the soft float cases (expect this is the issue in <https://github.com/llvm/llvm-project/issues/63661>) which will be a addressed with a follow-on patch posted for review.	2023-07-05 10:40:39 +01:00
Freddy Ye	7717c0071d	[X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def and move those information into ProcInfo of X86TargetParser.cpp. Since these two files both maintain a table with redundant info such as cpuname and its features supported. CPU_SPECIFIC* MACROs define some different information. This patch dealt with them in these ways when moving: 1.mangling This is now moved to Mangling in ProcInfo and directly initialized at array of Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as mangling. 2.CPU alias The alias cpu will also be initialized in array of Processors, its attributes will be same as its alias target cpu. Same feature list, same mangling. 3.TUNE_NAME Before my change, some cpu names support cpu_dispatch/specific are not supported in X86.td, which means optimizer/backend doesn't recognize them. So they use a different TUNE_NAME to generate in IR. In this patch, I added these missing cpu support at X86.td by utilizing existing Features and XXXTunings, so that each cpu name can directly use its own name as TUNE_NAME to be supported by optimizer/backend. 4.Feature list The feature list of one CPU maintained in X86TargetParser.def is not same as the one in X86TargetParser.cpp. It only maintains part of features of one CPU (features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC* MACROs because assigning a CPU with a complete one doesn't affect the functionality of cpu_dispatch/specific. Except these four info, since some of CPUs supported by cpu_dispatch/specific doesn's support clang options like -march, -mtune before, this patch also kept this behavior still by adding another member OnlyForCPUDispatchSpecific in ProcInfo. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D151696	2023-07-05 17:32:00 +08:00
Alex Bradbury	80c5698ec3	Revert "[RISCV][test] Add test coverage for llvm.frexp.. intrinsics" Reverting due to weird failure. This reverts commit 4b8162fe9ccb68b5b42f683df8df42ed43bfd5e7.	2023-07-05 10:29:06 +01:00
Alex Bradbury	4b8162fe9c	[RISCV][test] Add test coverage for llvm.frexp.. intrinsics The test file is copied from X86 (which is also mostly shared with Arm, PowerPC) rather than integrated into float-intrinsics.ll and double-intrinsics.ll. There's currently a compiler crash for the soft float cases (expect this is the issue in <https://github.com/llvm/llvm-project/issues/63661>) which will be a addressed with a follow-on patch posted for review.	2023-07-05 10:24:30 +01:00
esmeyi	2d74cf1f24	[XCOFF] Force recording a relocation for weak symbol label. Summary: Currently, if there are multiple definitions of the same symbol declared has weak linkage, the linker may choose the wrong one when they are compiled with integrated-as. This patch fixes the issue. If the target symbol is a weak label we must not attempt to resolve the fixup directly. Emit a relocation and leave resolution of the final target address to the linker. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D153839	2023-07-05 01:58:18 -04:00
Lei Huang	c7c3d71414	[PowerPC] add testcase for vector add and shift	2023-07-04 10:45:19 -04:00
Stephen Thomas	2dfb4b56fe	[AMDGPU] Fix incorrect hazard mitigation GCNHazardRecognizer::fixVcmpxExecWARHazard() mitigates a specific hazard by inserting a wait on sa_sdst==0 if such a wait isn't already present. Unfortunately, the check for an existing wait incorrectly checks for one that doesn't actually care about sa_sdst itself, but requires that no other counters are waited for. Once the check is performed correctly, a lit test needs to be updated, since it is currently testing for the incorrect behaviour. Differential Revision: https://reviews.llvm.org/D154438	2023-07-04 14:42:51 +01:00
Jay Foad	f2c164c815	[AMDGPU] Do not wait for vscnt on function entry and return SIInsertWaitcnts inserts waitcnt instructions to resolve data dependencies. The GFX10+ vscnt (VMEM store count) counter is never used in this way. It is only used to resolve memory dependencies, and that is handled by SIMemoryLegalizer. Hence there is no need to conservatively wait for vscnt to be 0 on function entry and before returns. Differential Revision: https://reviews.llvm.org/D153537	2023-07-04 12:22:38 +01:00
Ties Stuij	d145abcfb3	[ARM] fix typo in large-stack.ll introduced when fixing another typo	2023-07-04 11:23:24 +01:00
Ties Stuij	61bcaae7ab	[ARM] fix typo in large-stack.ll test In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently often configured case-insensitive.	2023-07-04 11:18:25 +01:00
Ties Stuij	1f082d2da0	[ARM] make execute only long call test checks more robust Reviewed By: olista01 Differential Revision: https://reviews.llvm.org/D154355	2023-07-04 10:51:48 +01:00
Harvin Iriawan	c35d2071d8	[AArch64] NFC : Change the way SVE pseudos are appended * SVE pseudos don't pick up the right latency information during MI scheduling as the regex do not match with instruction name. * Move UNDEF, PSEUDO, and ZERO to the end of actual SVE instruction * Some CPUs *td files will be fixed in the next commit Differential Revision: https://reviews.llvm.org/D154232	2023-07-04 10:41:56 +01:00
Ties Stuij	112d769e5e	[ARM] generate correct code for armv6-m XO big stack operations The ARM backend codebase is dotted with places where armv6-m will generate constant pools. Now that we can generate execute-only code for armv6-m, we need to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of these. Big stacks is one of the obvious places. In this patch we take care of two sites: 1. take care of big stacks in prologue/epilogue 2. take care of save/tSTRspi nodes, which implicitly fixes emitThumbRegPlusImmInReg which is used in several frame lowering fns Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D154233	2023-07-04 10:40:06 +01:00
Igor Kirillov	e13582e9e3	[CodeGen] Precommit tests for D153355 Differential Revision: https://reviews.llvm.org/D153856	2023-07-04 09:29:38 +00:00
Ben Shi	3e6b80b1bd	[CSKY] Optimize conditional select with CLRT/CLRF Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154409	2023-07-04 15:22:18 +08:00
Ben Shi	ef53ec969b	[CSKY][test][NFC] Add more tests of conditional select Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154408	2023-07-04 15:22:18 +08:00
Paul Walker	c9eec3b085	[SVE] Extend incp/decp testing to cover 32-bit use cases.	2023-07-03 15:36:56 +01:00
David Spickett	ab3bb86d44	Revert "[ARM] Adjust strd/ldrd codegen alignment requirements" This reverts commit 92a9c30c61da7f973d55cd84fade424159b9cac9. This has caused a test failure in the 2nd stage of Linaro's Arm 32 bit buildbots. LLVM::simplified-template-names.s 7: error: Simplified template DW_AT_name could not be reconstituted: check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: original: f3<unsigned char, (unsigned char)'\x00'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9: reconstituted: f3<unsigned char, (unsigned char)'\x7f'> check:10'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I suspect a load/store is slightly off.	2023-07-03 14:05:49 +00:00
Aleksandr Popov	22f2173837	[AArch64] Add PredictableSelectIsExpensive feature to all the cpus that have FeatureEnableSelectOptimize In the revision https://reviews.llvm.org/D138990 was enabled select optimize pass for AArch64. We were doing some benchmarking on the Neoverse V1 and were experimenting with select optimize heuristics. We found out that there are some additional profitable transformations to predictable branches (with prediction rate > 75% according to Agner Fog's rule of thumb) can be done by base heuristic from SelectOptimize pass or by optimizeSelectInst form CodeGenPrepare pass. But they are blocked on the Neoverse V1, since PredictableSelectIsExpensive feature is not set for that subtarget. Note that to achieve this results we also changed predictable branch threshold from 99% to 75% Looks like it makes sense to add this feature to all targets where was enabled select optimize pass in the https://reviews.llvm.org/D138990. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D143162	2023-07-02 09:23:43 +02:00
Wang, Xin10	f64e11369f	[X86]Precommit test cases for D154193 Add mir test cases for D154193, which tend to remove test16rr in possible and32ri+test16rr, similar to what we did for and32*+test64rr. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D154322	2023-07-03 04:43:50 -04:00
Ben Shi	d53063c3e2	[CSKY] Optimize conditional branch with BLZ32/BLSZ32/BHZ32/BHSZ32 Add more `Pat`s to generate BLZ32/BLSZ32/BHZ32/BHSZ32. Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153607	2023-07-03 15:03:43 +08:00
Ben Shi	86829d15f4	[CSKY] Optimize IR pattern icmp-select with DECT32/DECF32 Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153518	2023-07-03 15:03:43 +08:00
Maurice Heumann	92a9c30c61	[ARM] Adjust strd/ldrd codegen alignment requirements In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted. This change here adopts the calculation for the remaining non-volatile occurances. Differential Revision: https://reviews.llvm.org/D153800	2023-07-02 14:25:25 -07:00
David Green	f55d96b9a2	[DAG][AArch64] Handle vector types when expanding sdiv/udiv into mulh The aarch64 backend will benefit from expanding 64vector sdiv/udiv into mulh using shift(mul(ext, ext)), as the larger type size is legal and the mul(ext, ext) can efficiently use smull/umull instructions. This extends the existing code in GetMULHS to handle vector types for it. Differential Revision: https://reviews.llvm.org/D154049	2023-07-02 15:02:52 +01:00
Simon Pilgrim	8269fd2db5	[GlobalIsel][X86] Add initial scalar G_MUL/G_SMULH/G_UMULH instruction selection handling Reuse the existing div/rem selection code to also handle mul/imul to support G_MUL/G_SMULH/G_UMULH, as they have a similar pattern using rDX/rAX for mulh/mul results, plus the AH/AL support for i8 multiplies.	2023-07-02 12:56:41 +01:00
David Green	878e498f05	[AArch64] Expand typesizes of tests for constant srem/urem. NFC See D154049.	2023-07-02 12:44:15 +01:00
Igor Kudrin	6e54fccede	[AArch64] Emit fewer CFI instructions for synchronous unwind tables The instruction-precise, or asynchronous, unwind tables usually take up much more space than the synchronous ones. If a user is concerned about the load size of the program and does not need the features provided with the asynchronous tables, the compiler should be able to generate the more compact variant. This patch changes the generation of CFI instructions for these cases so that they all come in one chunk in the prolog; it emits only one `.cfi_def_cfa*` instruction followed by `.cfi_offset` ones after all stack adjustments and register spills, and avoids generating CFI instructions in the epilog(s) as well as any other exceeding CFI instructions like `.cfi_remember_state` and `.cfi_restore_state`. Effectively, it reverses the effects of D111411 and D114545 on functions with the `uwtable(sync)` attribute. As a side effect, it also restores the behavior on functions that have neither `uwtable` nor `nounwind` attributes. Differential Revision: https://reviews.llvm.org/D153098	2023-07-01 16:31:09 -07:00
Evandro Menezes	6a5da11b87	[AArch64] Add scheduling model for Neoverse N1 Add the scheduling model for Neoverse N1. Differential revision: https://reviews.llvm.org/D152417	2023-07-01 12:35:22 -05:00
Luke Lau	e8e0f32958	[RISCV] Fix vfwcvt/vfncvt pseudos w/ rounding mode lowering Some signed opcodes were being lowered to their unsigned counterparts and vice-versa. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D154234	2023-06-30 21:43:19 +01:00
Jeffrey Sandoval	6555c47448	[OpenMP][NVPTX] Handle additional invalid PTX characters For OpenMP offload, Clang emits global symbols containing the string '<captured>', which contains characters that are invalid in PTX. Extend the existing pass that replaces '.' and '@' characters with '_$_' to also replace '<' and '>' characters. Reviewed By: cchen Differential Revision: https://reviews.llvm.org/D154241	2023-06-30 14:58:37 -05:00
Nikita Popov	bb3763e497	Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values" This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288. https://reviews.llvm.org/D153966#4464594 reports an optimization regression in Rust. Additionally this change has caused an unexpected 0.3% compile-time regression.	2023-06-30 21:24:05 +02:00
Matt Arsenault	8f9eee3602	AMDGPU: Fix opaque pointer conversion error in test The * was in the wrong place so this was missed by the script.	2023-06-30 15:04:03 -04:00
Thomas Lively	4f065fcb57	[WebAssembly] Fix incorrect assertion in SIMD reduction codegen The codegen routine introduced in 18077e9fd688 did not account for vectors with more than 16 lanes. Remove the incorrect assertion and bail out of the optimization when encountering this case. Add test cases that previously triggered the assertion. Unfortunately, these test cases now have terrible codegen, but that is at least better than crashing. Fixes #63500. Differential Revision: https://reviews.llvm.org/D154124	2023-06-30 11:30:18 -07:00
Simon Pilgrim	5e2f0947c5	[X86] Add common SSE2/SSSE3 check prefix to vector truncation tests Reduced duplicate checks where we can	2023-06-30 18:49:26 +01:00
Matt Arsenault	53acadafdd	Verifier: Verify absolute_symbol metadata This is the same as !range except for one edge case.	2023-06-30 12:31:32 -04:00
Fangrui Song	afd20587f9	MachineFunction: -fsanitize={function,kcfi}: ensure 4-byte alignment Fix https://github.com/llvm/llvm-project/issues/63579 ``` % cat a.c void foo() {} % clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - \| grep p2align .p2align 1 % clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - \| grep p2align .p2align 1 ``` Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at least 4, so that loading the type hash before the function label will not cause a misaligned access. This is especially important for -mno-unaligned-access configurations that don't set `setMinFunctionAlignment` to 4 or greater. With this patch, the generated assembly for the examples above will contain `.p2align 2` before the type hash. If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the larger alignment will be used. Reviewed By: simon_tatham, samitolvanen Differential Revision: https://reviews.llvm.org/D154125	2023-06-30 09:13:19 -07:00
Alex Bradbury	5ba40c7be3	[RISCV] Custom lower FP_TO_FP16 and FP16_TO_FP to correct ABI of of libcall As introduced in D99148, RISC-V uses the softPromoteHalf legalisation for fp16 values without zfh, with logic ensuring that f16 values are passed in lower bits of FPRs (see D98670) when F or D support is present. This legalisation produces ISD::FP_TO_FP16 and ISD::FP16_TO_FP nodes which (as described in ISDOpcodes.h) provide a "semi-softened interface for dealing with f16 (as an i16)". i.e. the return type of the FP_TO_FP16 is an integer rather than a float (and the arg of FP16_TO_FP is an integer). The remainder of the description focuses primarily on FP_TO_FP16 for ease of explanation. FP_TO_FP16 is lowered to a libcall to `__truncsfhf2 (float)` or `__truncdfhf2 (double)`. As of D92241, `_Float16` is used as the return type of these libcalls if the host compiler accepts `_Float16` in a test input (i.e. dst_t is set to `_Float16`). `_Float16` is enabled for the RISC-V target as of D105001 and so the return value should be passed in an FPR on hard float ABIs. This patch fixes the ABI issue in what appears to be a minimally invasive way - leaving the softPromoteHalf logic undisturbed, and lowering FP_TO_FP16 to an f32-returning libcall, converting its result to an XLen integer value. As can be seen in the test changes, the custom lowering for FP16_TO_FP means the libcall is no longer tail-callable. Although this patch fixes the issue, there are two open items: * Redundant fmv.x.w and fmv.w.x pairs are now somtimes produced during lowering (not a correctness issue). * Now coverage for STRICT variants of FP16 conversion opcodes. Differential Revision: https://reviews.llvm.org/D151284	2023-06-30 16:41:49 +01:00
Alex Bradbury	ee5aaa8e6c	[RISCV][test] Add additional RUN lines to half-convert.ll in preparation for D151824 There wasn't previous coverage for rv32id-ilp32, rv64id-lp64, rv32id-ilp32d, or rv64id-lp64d. This is needed as D151284 fixes a bug related to the ABI used for libcalls for fp<->fp16 conversion when hard FP support is present.	2023-06-30 16:41:49 +01:00
Ben Shi	8099d6c20b	[CSKY] Optimize IR pattern icmp-select with INCT32/INCF32 Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153436	2023-06-30 22:55:25 +08:00
Ben Shi	6d254a25cb	[CSKY][test][NFC] Add tests of IR pattern icmp-select These tests will be optimized with INCT32/INCF32/DECT32/DECF32 in the future. Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153434	2023-06-30 22:55:24 +08:00
Simon Pilgrim	4742715eb7	[DAG] Fold (ext (_extend_vector_inreg x)) -> (*_extend_vector_inreg x)	2023-06-30 14:42:49 +01:00
Nikita Popov	20f0c68fd8	[SimplifyCFG] Allow dropping block that only contains ephemeral values Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if the block is empty except for ephemeral values. The ephemeral values will be dropped in that case. This makes sure that assumes don't block this transforms, as reported in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609. Differential Revision: https://reviews.llvm.org/D153966	2023-06-30 15:24:01 +02:00
Phoebe Wang	8d0fecd34a	[X86][FP16] Pre-commit test to show a mis-combination	2023-06-30 21:08:15 +08:00

... 79 80 81 82 83 ...

52796 Commits