llvm-project

Author	SHA1	Message	Date
Yeting Kuo	ab89cfd0f1	[RISCV] Use vwsll.vi/vx + vwaddu.wv to lower vector.interleave when Zvbb enabled. (#67521 ) The replacement could avoid an assignment to GPR when the type is vector of i8/i16 and vwmaccu.wv which may have higher cost than vwsll.vi/vx.	2023-09-28 09:10:03 +08:00
Luke Lau	bd675f5899	[RISCV] Reduce LMUL when index is known when lowering insert_vector_elt (#66087 ) Continuing on from #65997, if the index of insert_vector_elt is a constant then we can work out what the minimum number of registers will be needed for the slideup and choose a smaller type to operate on. This reduces the LMUL for not just the slideup but also for the scalar insert.	2023-09-27 20:26:36 +01:00
Craig Topper	8e6db7e2b2	[RISCV] Fix a crash from trying to truncate an FP type in lowerBuildV… (#67488 ) …ectorOfConstants. ComputeNumSignBits can return an answer for FP constants based on bitcasting them to int. Check for an integer type so we don't create an illegal truncate. We could support this case with bitcasts, but I leave that to a separate patch.	2023-09-27 12:21:10 -07:00
Luke Lau	5ffbdd9ed5	[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419 ) Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart. This is a reland of aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9 with the refactoring omitted.	2023-09-27 19:53:50 +01:00
Philip Reames	487dd5f1e3	Revert "[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419 )" This reverts commit aff6ffc8760b99cc3d66dd6e251a4f90040c0ab9. Version landed differs from version reviewed in (stylistic) manner worthy of separate review.	2023-09-27 11:24:49 -07:00
Fangrui Song	e705b37a77	[CodeLayout] Add unittest for cache-directed sort The function reordering algorithm added by https://reviews.llvm.org/D152834 and used by BOLT (https://reviews.llvm.org/D153039) is untested. Add some tests at the appropriate layer. Depends on D159526 Differential Revision: https://reviews.llvm.org/D159527	2023-09-27 10:52:12 -07:00
Luke Lau	aff6ffc876	[RISCV] Handle .vx pseudos in hasAllNBitUsers (#67419 ) Vector pseudos with scalar operands only use the lower SEW bits (or less in the case of shifts and clips). This patch accounts for this in hasAllNBitUsers for both SDNodes in RISCVISelDAGToDAG. We also need to handle this in RISCVOptWInstrs otherwise we introduce slliw instructions that are less compressible than their original slli counterpart.	2023-09-27 18:12:29 +01:00
Nick Desaulniers	97187e1278	[AArch64] update "rm" inline asm test (#67472 ) Because `x0` is not listed in the clobber list, regalloc could (one day when #20571 is fixed) allocate `$0` to `x0`: ldr x0, x0 This will produce an error when validating the instruction. The intent of this test FWICT is to check that the parameter in w0 is stored to a stack slot using w0, since this target triple is the exotic arm64_32 (ILP32). Update the test to simply use "m" constraint. The clobber list is underconstrained otherwise.	2023-09-27 08:30:36 -07:00
Momchil Velikov	eff4ef25b3	Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )" This reverts commit ace20e24287bf531bb1185e213642c3b49eb293c. This might be causing a buildbot failure at https://green.lab.llvm.org/green/job/clang-stage1-RA/35786/	2023-09-27 14:24:59 +01:00
Nikita Popov	47b7f33b13	[IR] Allow llvm.ptrmask of vectors (#67434 ) llvm.ptrmask is currently limited to pointers only, and does not accept vectors of pointers. This is an unnecessary limitation, especially as the underlying instructions (getelementptr etc) do support vectors of pointers. We should relax this sooner rather than later, to avoid introducing code that assumes non-vectors (#67166).	2023-09-27 15:01:43 +02:00
Simon Pilgrim	57b0194b69	[X86] IsNOT - fold PCMPGT(C, X) -> PCMPGT(X,C-1) To invert the result, we can profitably commute a PCMPGT node if the LHS was a constant (C > min_signed_value): https://alive2.llvm.org/ce/z/LxcPqm Allows the constant to fold, and helps reduce register pressure Fixes #67347	2023-09-27 12:33:55 +01:00
Ivan Kosarev	be8b559956	[AMDGPU] Test codegen'ing True16 additions. The GlobalISel part is to be addressed later. Differential Revision: https://reviews.llvm.org/D156106	2023-09-27 11:10:48 +01:00
Ivan Kosarev	3ff7d51eb8	[AMDGPU][True16] Pre-commit addition tests. Differential Revision: https://reviews.llvm.org/D156529	2023-09-27 10:27:33 +01:00
Momchil Velikov	ace20e2428	[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )	2023-09-27 10:05:32 +01:00
Sam McCall	0afbcb20fd	Revert "[NVPTX] Add support for maxclusterrank in launch_bounds (#66496 )" This reverts commit dfab31b41b4988b6dc8129840eba68f0c36c0f13. SemaDeclAttr.cpp cannot depend on Basic's private headers (lib/Basic/Targets/NVPTX.h)	2023-09-27 10:59:04 +02:00
Jianjian GUAN	5278cc364b	[RISCV] Support select/merge like ops for fp16 vectors when only have Zvfhmin This patch supports VP_MERGE, VP_SELECT, SELECT, SELECT_CC for fp16 vectors when only have Zvfhmin. Reviewed By: michaelmaitland Differential Revision: https://reviews.llvm.org/D159053	2023-09-27 14:53:14 +08:00
Jakub Chlanda	dfab31b41b	[NVPTX] Add support for maxclusterrank in launch_bounds (#66496 ) Since SM_90 CUDA supports specifying additional argument to the launch_bounds attribute: maxBlocksPerCluster, to express the maximum number of CTAs that can be part of the cluster. See: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives-maxclusterrank and https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#launch-bounds for details.	2023-09-27 08:51:26 +02:00
Jim Lin	d6b1b998a0	[RISCV] Support fmaximum/fminimum for fp16 vector when only Zvfhmin enabled (#67393 ) This patch promotes fmaximum/fminimum for fp16 vector to float operation.	2023-09-27 13:02:35 +08:00
Jianjian Guan	435da4ef55	[RISCV] Promote SETCC and VP_SETCC of f16 vectors when only have zvfhmin (#66866 ) This patch implements the promotion of fp16 vectors SETCC and VP_SETCC when we only have zvfhmin but no zvfh.	2023-09-27 11:00:19 +08:00
Nick Desaulniers	35a364fa5c	[TargetLowering] fix index OOB (#67494 ) I accidentally introduced this in commit 330fa7d2a4e0 ("[TargetLowering] Deduplicate choosing InlineAsm constraint between ISels (#67057)") Fix forward.	2023-09-26 15:50:26 -07:00
Nick Desaulniers	2e2c61ebd7	[x86] precommit test conversion via update_llc_test_checks.py (#67463 ) I'm looking to update this test; pre-processing it with update_llc_test_checks.py makes it clearer what I'm changing in #20571.	2023-09-26 15:22:11 -07:00
Craig Topper	b26157edf0	[RISCV] Correct Zhinx load/store patterns to use AddrRegImm.	2023-09-26 11:54:59 -07:00
Craig Topper	9eebfa80f5	[RISCV] Autogenerate tests to add missing CHECK lines. NFC	2023-09-26 11:35:53 -07:00
Douglas Yung	6716d3dd77	Move test split-deadloop.mir that was added in e3d714f to AArch64 directory instead of ARM.	2023-09-26 09:51:47 -07:00
Jay Foad	e3d714f2cc	[AMDGPU] Add gfx1150 test coverage in trans-forwarding-hazards.mir This demonstrates that gfx1150 does not have FeatureVALUTransUseHazard.	2023-09-26 17:24:43 +01:00
David Green	b10721e941	[AArch64] A few extra rshrn intrinsic tests. NFC	2023-09-26 17:13:27 +01:00
Zhaoxuan Jiang	baf3903218	[AArch64] Bail out of HomogeneousPrologEpilog for functions with swif… (#67417 ) …tasync argument swiftasync introduces a number of frame adjustments which is incompatible with current implementation of HomogeneousPrologEpilog pass.	2023-09-26 08:42:01 -07:00
weiguozhi	31f81e96a4	[RA] Don't split a register generated from another split (#67351 ) Split a register generated from another split usually doesn't bring us too much benefit. It may also cause dead loop as pr67188 shows if the heuristic cost always satisfy the split condition. So prevent such splitting. It fixed pr67188.	2023-09-26 08:38:18 -07:00
Philip Reames	e39add89cd	[RISCV] Transform build_vector((binop X_i, C_i)..) to binop (build_vector, build_vector) (#67358 ) If we have a build_vector of identical binops, we'd prefer to have a single vector binop in most cases. We do need to make sure that the two build_vectors aren't more difficult to materialize than the original build_vector. To start with, let's restrict ourselves to the case where one build_vector is a fully constant vector. Note that we don't need to worry about speculation safety here. We are not speculating any of the lanes, and thus none of the typical - e.g. div-by-zero - concerns apply. I'll highlight that the constant build_vector heuristic is just one we could chose here. We just need some way to be reasonable sure the cost of the two build_vectors isn't going to completely outweigh the savings from the binop formation. I'm open to alternate heuristics here - both more restrictive and more permissive. As noted in comments, we can extend this in a number of ways. I decided to start small as a) that helps keep things understandable in review and b) it covers my actual motivating case.	2023-09-26 07:53:35 -07:00
David Green	03647e2e4b	[AArch64] Handle scalable vectors in combineFMulOrFDivWithIntPow2. The transform will still not trigger as takeInexpensiveLog2 will bail out for any scalable vector, but this guards against a scalable typesize error.	2023-09-26 15:34:34 +01:00
Nathan Gauër	c01b5bbba3	[SPIRV] Add OpAccessChain instruction support (#66253 ) This commit adds 2 new instructions in the selector: - OpAccessChain - OpInBoundsAccessChain. The choice between the two relies on the `inbounds` marker. Those instruction are not used for OpenCL, to maintain the same behavior as previously. They are only added when building for logical SPIR-V, as it doesn't support the pointer equivalent. Because logical SPIR-V doesn't support pointer cast either, the assign_ptr_type intrinsic need to be generated so OpAccessChain gets lowered with the correct pointer type, instead of i8*. Fixes #66107 --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2023-09-26 16:33:17 +02:00
Ivan Kosarev	64482d5766	[AMDGPU] Fix passing CodeGen/AMDGPU/frem.ll on gfx1150. (#67425 ) We would currently crash on it trying to use t16 instructions instead of fake16 ones.	2023-09-26 15:13:23 +01:00
Ivan Kosarev	287f6cdd17	[AMDGPU] Remove the support for non-True16 copies between different register sizes. Differential Revision: https://reviews.llvm.org/D156985	2023-09-26 14:46:34 +01:00
Jingu Kang	ff68e43c81	[MachineLICM] Handle Subloops It is a re-commit from reverted commit 3454cf67bd0a650097dc6ca99874a34e1d59b500. Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outermost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-09-26 14:25:11 +01:00
Momchil Velikov	fe763d8ad4	[AArch64] Limit immediate offsets when folding instructions into addressing modes (#67345 ) Don't increase/decrease immediate offsets in folded instructions beyond the limits of `LDP`.	2023-09-26 14:21:32 +01:00
Muhammad Omair Javaid	431969ede1	Revert "[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 )" This reverts commit fc86d031fec5e47c6811efd3a871742ad244afdd. This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage https://lab.llvm.org/buildbot/#/builders/176/builds/5474 I am going to revert this patch as the bot has been failing for more than a day without a fix.	2023-09-26 15:47:16 +05:00
esmeyi	d7195c57d8	Reland https://reviews.llvm.org/D159073 . The patch failed in test-suite due to a liveness error after rebasing on https://reviews.llvm.org/D133103, and now it's fixed. ``` [PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel. Summary: rldicl/rldicr can be eliminated if it's used to clear thehigh-order or low-order n bits and all bits cleared will be ANDed with 0 byandi/andis. Or they can be folded to `andi 0` if all bits to AND are alreadyzero in the input. Reviewed By: qiucf, shchenz Differential Revision: https://reviews.llvm.org/D159073 ```	2023-09-26 06:24:47 -04:00
David Green	cab01a8b49	[AArch64] Additional testing for i128 and non-temporal loads/stores undef BE. NFC	2023-09-26 11:01:48 +01:00
Jay Foad	d85d143ad9	[AMDGPU] New image intrinsic optimizer pass (#67151 ) Implement a new pass to combine multiple image_load_2dmsaa and 2darraymsaa intrinsic calls into a single image_msaa_load if: - they refer to the same vaddr except for sample_id, - they use a constant sample_id and they fall into the same group, - they have the same dmask and the number of instructions and the number of vaddr/vdata dword transfers is reduced by the combine This should be valid on all GFX11 but a hardware bug renders it unworkable on GFX11.0.* so it is only enabled for GFX11.5. Based on a patch by Rodrigo Dominguez!	2023-09-26 09:33:49 +01:00
Kai Luo	5fabc8ba22	[PowerPC] Add test to show wrong target flags printed at MO_TLSGDM_FLAG operand. NFC.	2023-09-26 05:13:26 +00:00
Wang Pengcheng	08165c444e	[RISCV] Add searchable table for tune information (#66193 ) There are many information that can be used for tuning, like alignments, cache line size, etc. But we can't make all of them `SubtargetFeature` because some of them are not with enumerable value, for example, `PrefetchDistance` used by `LoopDataPrefetch`. In this patch, a searchable table `RISCVTuneInfoTable` is added, in which each entry contains the CPU name and all tune information defined in `RISCVTuneInfo`. Each field of `RISCVTuneInfo` should have a default value and processor definitions can override the default value via `let` statements. We don't need to define a `RISCVTuneInfo` for each processor and it will use the default value (which is for `generic`) if no `RISCVTuneInfo` defined. For processors in the same series, a subclass can inherit from `RISCVTuneInfo` and override the fields. And we can also override the fields in processor definitions if there are some differences in the same processor series. When initilizing `RISCVSubtarget`, we will use `TuneCPU` as the key to serach the tune info table. So, the behavior here is if we don't specify the tune CPU, we will use specified `CPU`, which is expected I think. This patch almost undoes 61ab106, in which I added tune features of preferred function/loop alignments. More tune information can be added in the future.	2023-09-26 12:26:35 +08:00
WANG Rui	6417ce4336	[LoongArch] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}' Similar to D156801 for RISCV. Link: https://github.com/rust-lang/rust/pull/114034 Link: https://github.com/llvm/llvm-project/issues/64090 Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D159252	2023-09-26 11:46:07 +08:00
WANG Rui	555e2397aa	[LoongArch] Add test cases for atomicrmw xchg {0,-1} {i8,i16} Add test cases for atomicrmw xchg {0,-1} {i8,i16}. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D159251	2023-09-26 11:46:06 +08:00
esmeyi	77147a95b8	Revert "[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel." This reverts commit 2de74e1bd4d540063d7495fa6254781abd41e179. A test-suite failure occurs due to this commit, will fix soon.	2023-09-25 23:31:34 -04:00
esmeyi	2de74e1bd4	[PowerPC][Peephole] Combine rldicl/rldicr and andi/andis after isel. Summary: rldicl/rldicr can be eliminated if it's used to clear the high-order or low-order n bits and all bits cleared will be ANDed with 0 by andi/andis. Or they can be folded to `andi 0` if all bits to AND are already zero in the input. Reviewed By: qiucf, shchenz Differential Revision: https://reviews.llvm.org/D159073	2023-09-25 23:11:34 -04:00
Jim Lin	5e1f5f4720	[RISCV] Fix the float value to test constantpool lowering under differe… (#67297 ) After https://reviews.llvm.org/D142953, the float value 1.0 can be optimized as lui+fmv.w.x. But this test aims to test the constantpool lowering under different code model. Fix the float value to cannot be optimized to lui+fmv.w.x .	2023-09-26 09:06:49 +08:00
Min-Yih Hsu	de17384c05	[RISCV][GISel] Add RegBank selection for G_SMULH (#67381 ) Along with its missing tests in instruction selection and legalizer.	2023-09-25 17:59:23 -07:00
Min-Yih Hsu	0d7c340c2c	[RISCV][GISel] Add instruction selection for G_SEXT, G_ZEXT, and G_SEXT_INREG (#67359 ) G_SEXT and G_ZEXT are supported via patterns imported from SDISel; G_SEXT_INREG is selected using hand-written code as there is no (functional) rule at this moment to import G_SEXT_INREG from ISD::SEXT_INREG. Credit helps from @topperc on G_SEXT and G_ZEXT.	2023-09-25 15:08:21 -07:00
Artem Belevich	671e2ba45b	[NVPTX] Improve lowering of v2i16 logical ops. (#67365 ) Bitwise logical ops can always be done as b32, regardless of availability of other v2i16 ops, that would need a new GPU. Includes the missing lowering for 2-argument register operation variants and additional tests for `and`.	2023-09-25 14:29:48 -07:00
Craig Topper	62f5636838	[RISCV] Don't set KILL flag on X0 in RISCVInstrInfo::movImm. Extracted from #67159.	2023-09-25 13:40:08 -07:00

... 51 52 53 54 55 ...

52796 Commits