llvm-project

Author	SHA1	Message	Date
Amaury Séchet	015323ff9b	[NFC] Autogenerate CodeGen/SPARC/LeonInsertNOPLoadPassUT.ll	2023-06-15 13:24:39 +00:00
Paul Walker	31c485c990	[AArch64CompressJumpTables] Prevent over-compression caused by invalid alignment. AArch64CompressJumpTables assumes it can calculate exact block offsets. This assumption is bogus because getInstSizeInBytes() only returns an upper bound rather than an exact size. The assumption is also invalid when a block alignment is bigger than the function's alignment. To mitigate both scenarios this patch changes the algorithm to compute the maximum upper bound for all block offsets. This is pessimistic but safe because all offsets are treated as unsigned. Differential Revision: https://reviews.llvm.org/D150009	2023-06-15 12:38:20 +00:00
Vladislav Dzhidzhoev	a7e7d34dc1	Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)" This reverts commit d04452d54829cd7af5b43d670325ffa755ab0030 since test llvm-project/llvm/test/Bitcode/DIImportedEntity_backward.ll is broken.	2023-06-15 14:35:54 +02:00
Vladislav Dzhidzhoev	d04452d548	[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7) RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Fixed PR51501 (tests from D112337). 1. Reuse of DISubprogram's 'retainedNodes' to track other function-local entities together with local variables and labels (this patch cares about function-local import while D144006 and D144008 use the same approach for local types and static variables). So, effectively this patch moves ownership of tracking local import from DICompileUnit's 'imports' field to DISubprogram's 'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout is considered unsupported (DwarfDebug would assert on such debug metadata). DICompileUnit's 'imports' field is supposed to track global imported declarations as it does before. This addresses various FIXMEs and simplifies the next part of the patch. 2. Postpone emission of function-local imported entities from `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`. While in `DwarfDebug::endFunctionImpl()` we do not have all the information about a parent subprogram or a referring subprogram (whether a subprogram inlined or not), so we can't guarantee we emit an imported entity correctly and place it in a proper subprogram tree. So now, we just gather needed details about the import itself and its parent entity (either a Subprogram or a LexicalBlock) during processing in `DwarfDebug::endFunctionImpl()`, but all the real work is done in `DwarfDebug::endModule()` when we have all the required information to make proper emission. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Differential Revision: https://reviews.llvm.org/D144004	2023-06-15 14:29:03 +02:00
Nikita Popov	03de1cb715	[InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really make sense to perform this transform in the middle-end, as we cannot actually CSE the instructions there. The backend already performs this fold in `18f5446a45/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (L4236)` on the SDAG level, however this only works within a single basic block. To handle cross-BB cases, we do need to handle this in the IR layer. This patch moves the fold from InstCombine to CGP in the backend, while keeping the same (somewhat dubious) heuristic. Differential Revision: https://reviews.llvm.org/D152541	2023-06-15 14:17:58 +02:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Ivan Kosarev	9aa026e9ff	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 9. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152902	2023-06-15 11:02:08 +01:00
Ivan Kosarev	9792c804f6	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 8. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152809	2023-06-15 10:47:04 +01:00
Ivan Kosarev	7680951ac8	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 7. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152808	2023-06-15 10:40:58 +01:00
Ivan Kosarev	c2887096f3	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 6. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152807	2023-06-15 10:39:31 +01:00
Nikita Popov	3210cc9a88	[X86] Add test for icmp/sub operand order across blocks (NFC)	2023-06-15 11:34:20 +02:00
Ivan Kosarev	79c8301478	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 5. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152805	2023-06-15 10:28:16 +01:00
Ivan Kosarev	980d2b337e	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 4. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152717	2023-06-15 10:26:41 +01:00
Ivan Kosarev	e9d77cd9b2	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 3. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152716	2023-06-15 09:55:25 +01:00
Amara Emerson	f79b0333fc	[DAGCombiner] Fix crash when trying to replace an indexed store with a narrow store. rdar://108818859 Differential Revision: https://reviews.llvm.org/D152978	2023-06-15 01:54:38 -07:00
eopXD	56c25575ce	[1/3][RISCV] Define machine instruction to write an immediate into vxrm This patch-set wants to model rounding mode for the fixed-point intrinsics of the RVV C intrinsics. The specification PR: [riscv-non-isa/rvv-intrinsic-doc#222](https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222) The 3 patches is a proof-of-concept with a bottom-up approach Going from machine instruction to LLVM intrinsics, then to the C intrinsics. The 3 patches applies the rounding mode control on the `vaadd` instruction. Proceeding patches will extend the change to all other fixed-point computations. --- This is the 1st commit of the patch-set. This patch gives a name to the machine instruction that writes an immediate into the CSR `vxrm`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151395	2023-06-15 01:37:43 -07:00
Simon Tatham	10e4228114	[ARM,AArch64] Add a full set of -mtp= options. AArch64 has five system registers intended to be useful as thread pointers: one for each exception level which is RW at that level and inaccessible to lower ones, and the special TPIDRRO_EL0 which is readable but not writable at EL0. AArch32 has three, corresponding to the AArch64 ones that aren't specific to EL2 or EL3. Currently clang supports only a subset of these registers, and not even a consistent subset between AArch64 and AArch32: - For AArch64, clang permits you to choose between the four TPIDR_ELn thread registers, but not the fifth one, TPIDRRO_EL0. - In AArch32, on the other hand, the //only// thread register you can choose (apart from 'none, use a function call') is TPIDRURO, which corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0. So there is no thread register that you can currently use in both targets! For custom and bare-metal purposes, users might very reasonably want to use any of these thread registers. There's no reason they shouldn't all be supported as options, even if the default choices follow existing practice on typical operating systems. This commit extends the range of values acceptable to the `-mtp=` clang option, so that you can specify any of these registers by (the lower-case version of) their official names in the ArmARM: - For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3 - For AArch32: tpidrurw, tpidruro, tpidrprw All existing values of the option are still supported and behave the same as before. Defaults are also unchanged. No command line that worked already should change behaviour as a result of this. The new values for the `-mtp=` option have been agreed with Arm's gcc developers (although I don't know whether they plan to implement them in the near future). Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D152433	2023-06-15 09:27:41 +01:00
David Green	98153b088e	[AArch64] Fix check lines for arm64-neon-across.ll. NFC Commit de0707a2b98162ab52fa2dd9277a9bbb4f7256c7 updated the check lines, but due to conflicting assembly not all functions kept their checks. This now distinguishes between selection-dag and global isel.	2023-06-15 09:25:28 +01:00
David Green	1643197e19	[AArch64][SVE] Enable shouldFoldSelectWithIdentityConstant for SVE. Instcombine will canonicalize `select(c, binop(a, b), a)` to `binop(select(c, b, identityvalue), a)`. The original select form makes a more natural form for vector predicated operations for vector architectures like SVE where predication is well supported. This patch enables shouldFoldSelectWithIdentityConstant for SVE so that more predicated instructions can be generated, helping simplify the handling with identity constants. Predicated FMA patterns have also been adjusted here as they need to look at FMF's. Other operations like add/sub, mul, and/or/xor and mla/mls have been recently updated. There is one test (scalable_int_min_max) that increases in size. There are multiple selects that could be combined into a single select but does not currently fold. Differential Revision: https://reviews.llvm.org/D149967	2023-06-15 09:17:50 +01:00
esmeyi	028a261350	[XCOFF] FixupOffsetInCsect should be 0 for R_REF relocation. Summary: The FixupOffsetInCsect should be 0 for R_REF relocation since it specifies a nonrelocating reference. Otherwise liker would try to relocate the symbol through its address and an error like following occurred. ``` ld: 0711-547 SEVERE ERROR: Object /tmp/1-2a7ea1.o cannot be processed. RLD address 0x65 for section 2 (.data) is not contained in the section. ``` Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D152777	2023-06-15 01:28:45 -04:00
Pravin Jagtap	03d92501f3	[AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy. The D147408 implemented new Iterative approach for scan computations and added new flag `amdgpu-atomic-optimizer-strategy` which is defaulted to DPP. The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506 adapts to the new changes in LLPC. This patch enables atomic optimizer pass and selects Iterative approach for scan computations by default for compute pipeline. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152649	2023-06-15 01:18:38 -04:00
Carl Ritson	0fd31b2880	[AMDGPU] Place returns on stack if they would violate VGPR limit Check no VGPRs above configured maximum would be used by a return when deciding if it can be lowered. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D152912	2023-06-15 14:05:32 +09:00
Carl Ritson	d0c0838705	[AMDGPU] Remove return VGPRs from callee save list There is no need to generate spill/restore for registers used in return value. This matters for amdgpu_gfx calling convention where CSR and Ret definitions overlap. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D152892	2023-06-15 14:05:32 +09:00
Amaury Séchet	e879fded2a	[NFC] Autogenerate several Mips test.	2023-06-14 22:27:15 +00:00
Amaury Séchet	0a76f7d9d8	[NFC] Autogenerate numerous SystemZ tests	2023-06-14 21:47:31 +00:00
Amaury Séchet	7a50e78621	[NFC] Autogenerate various Thumb2 tests.	2023-06-14 21:18:39 +00:00
Amaury Séchet	c67a326dc5	[NFC] Autogenerate several AArch64 tests.	2023-06-14 18:03:46 +00:00
Amaury Séchet	de0707a2b9	[NFC] Autogenerate several AArch64 tests.	2023-06-14 17:46:38 +00:00
Neumann Hon	8a7a2da18f	[SystemZ][z/OS] Correct value of length/4 of params field in PPA1. The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function. Differential Revision: https://reviews.llvm.org/D152920 Reviewed By: uweigand	2023-06-14 13:37:46 -04:00
Neumann Hon	049324ac5e	Revert "[SystemZ][z/OS] Correct value of length/4 of params field in PPA1." This reverts commit e0f7b0e0f704dc3759925602e474b9e669270fcb.	2023-06-14 13:34:16 -04:00
Igor Kirillov	2cbc265cc9	[CodeGen] Add support for reductions in ComplexDeinterleaving pass This commit enhances the ComplexDeinterleaving pass to handle unordered reductions in simple one-block vectorized loops, supporting both SVE and Neon architectures. Differential Revision: https://reviews.llvm.org/D152022	2023-06-14 17:27:26 +00:00
Neumann Hon	e0f7b0e0f7	[SystemZ][z/OS] Correct value of length/4 of params field in PPA1. The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function. Differential revision: https://reviews.llvm.org/D119049 Reviewed By: uweigand	2023-06-14 13:20:45 -04:00
Amaury Séchet	a03bcc2f9e	[NFC] Autogenerate CodeGen/AArch64/sve-vl-arith.ll	2023-06-14 17:09:55 +00:00
Artem Belevich	eb4f0d9f85	Revert "[NVPTX] Allow using v4i32 for memcpy lowering." The patch may trigger a hang: https://github.com/llvm/llvm-project/issues/63294 This reverts commit c16b7e54ac5b4da05c1d19e350ee8e75bf5f8980.	2023-06-14 10:03:30 -07:00
Amaury Séchet	0dab862650	[NFC] Autogenerate a couple of AArch64 tests.	2023-06-14 17:00:26 +00:00
Amaury Séchet	552ee85eb8	[NFC] Regenerate CodeGen/AArch64/sve-streaming-mode-fixed-length-*.ll	2023-06-14 16:38:18 +00:00
Amaury Séchet	2c83809fa8	[NFC] Automatically generate arm64-dagcombiner-dead-indexed-load.ll	2023-06-14 16:28:04 +00:00
Amaury Séchet	61f9cb002d	[NFC] Regenerate several VE codegen tests.	2023-06-14 16:20:37 +00:00
Amaury Séchet	b3bdfd3e4a	[NFC] Regen CodeGen/AArch64/bitfield-insert.ll	2023-06-14 16:08:54 +00:00
Igor Kirillov	211f27f37c	[CodeGen] Add pre-commit tests for D152022 and D152558 Differential Revision: https://reviews.llvm.org/D152025	2023-06-14 15:53:47 +00:00
Craig Topper	6bf79fb094	[SelectionDAG][RISCV] Add very basic PromoteIntegerResult/Op support for VP_SIGN/ZERO_EXTEND. We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've deviated a little from the non-VP lowering. My goal was to fix the crashes that occurs on these test cases without this patch. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D152854	2023-06-14 08:52:56 -07:00
zhongyunde	43b2df03e8	[LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast ``` Legalize t3: v2i16 = bitcast i32 with (v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0) ``` Fix https://github.com/llvm/llvm-project/issues/61638 NOTE: Don't touch getPreferredVectorAction like X86 as this will touch too many test cases. Reviewed By: dmgreen, paulwalker-arm, efriedma Differential Revision: https://reviews.llvm.org/D147678	2023-06-14 23:33:02 +08:00
zhongyunde	e108aee956	[test] Update the checking base for LE and BE precommit tests for D147678 as we need tests cover BE too. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152815	2023-06-14 23:33:01 +08:00
Simon Pilgrim	78a0b2be83	[GlobalIsel][X86] Regenerate legalize-add.mir with common CHECK prefix	2023-06-14 15:01:18 +01:00
Amaury Séchet	e559f270d9	[NFC] Add tests cases for isTruncateOf for D151916	2023-06-14 13:03:44 +00:00
Matt Arsenault	0696240384	LowerMemIntrinsics: Check address space aliasing for memmove expansion For cases where we cannot insert an addrspacecast, we can still expand like a memcpy if we know the address spaces cannot alias. Normally non-aliasing memmoves are optimized to memcpy, but we cannot rely on that for lowering. If a target has aliasing address spaces that cannot be casted between, we still have to give up lowering this.	2023-06-14 07:56:58 -04:00
Simon Pilgrim	f6ff2cc7e0	[X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets (REAPPLIED) lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space. This is an updated commit of ab4b924832ce26c21b88d7f82fcf4992ea8906bb after being reverted at 78de45fd4a902066617fcc9bb88efee11f743bc6	2023-06-14 12:48:33 +01:00
Jay Foad	6c03f402f7	[AMDGPU] Use a common check prefix in regbankselect-amdgcn.s.buffer.load.ll	2023-06-14 12:06:11 +01:00
Ivan Kosarev	150c73a072	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 2. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D152715	2023-06-14 11:49:12 +01:00
Carl Ritson	936c16a3a9	[AMDGPU] Pre-commit test for D152892 (NFC)	2023-06-14 17:14:05 +09:00

... 85 86 87 88 89 ...

52796 Commits