52796 Commits

Author SHA1 Message Date
Amaury Séchet
015323ff9b [NFC] Autogenerate CodeGen/SPARC/LeonInsertNOPLoadPassUT.ll 2023-06-15 13:24:39 +00:00
Paul Walker
31c485c990 [AArch64CompressJumpTables] Prevent over-compression caused by invalid alignment.
AArch64CompressJumpTables assumes it can calculate exact block
offsets. This assumption is bogus because getInstSizeInBytes()
only returns an upper bound rather than an exact size. The
assumption is also invalid when a block alignment is bigger than
the function's alignment.

To mitigate both scenarios this patch changes the algorithm to
compute the maximum upper bound for all block offsets. This is
pessimistic but safe because all offsets are treated as unsigned.

Differential Revision: https://reviews.llvm.org/D150009
2023-06-15 12:38:20 +00:00
Vladislav Dzhidzhoev
a7e7d34dc1 Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
This reverts commit d04452d54829cd7af5b43d670325ffa755ab0030 since
test llvm-project/llvm/test/Bitcode/DIImportedEntity_backward.ll is broken.
2023-06-15 14:35:54 +02:00
Vladislav Dzhidzhoev
d04452d548 [DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Fixed PR51501 (tests from D112337).

1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
   entities together with local variables and labels (this patch cares about
   function-local import while D144006 and D144008 use the same approach for
   local types and static variables). So, effectively this patch moves ownership
   of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
   'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
   is considered unsupported (DwarfDebug would assert on such debug metadata).

   DICompileUnit's 'imports' field is supposed to track global imported
   declarations as it does before.

   This addresses various FIXMEs and simplifies the next part of the patch.

2. Postpone emission of function-local imported entities from
   `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
   While in `DwarfDebug::endFunctionImpl()` we do not have all the
   information about a parent subprogram or a referring subprogram
   (whether a subprogram inlined or not), so we can't guarantee we emit
   an imported entity correctly and place it in a proper subprogram tree.
   So now, we just gather needed details about the import itself and its
   parent entity (either a Subprogram or a LexicalBlock) during
   processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
   done in `DwarfDebug::endModule()` when we have all the required
   information to make proper emission.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>

Differential Revision: https://reviews.llvm.org/D144004
2023-06-15 14:29:03 +02:00
Nikita Popov
03de1cb715 [InstCombine][CGP] Move swapMayExposeCSEOpportunities() fold
InstCombine tries to swap compare operands to match sub instructions
in order to expose "CSE opportunities". However, it doesn't really
make sense to perform this transform in the middle-end, as we cannot
actually CSE the instructions there.

The backend already performs this fold in
18f5446a45/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (L4236)
on the SDAG level, however this only works within a single basic block.

To handle cross-BB cases, we do need to handle this in the IR layer.
This patch moves the fold from InstCombine to CGP in the backend,
while keeping the same (somewhat dubious) heuristic.

Differential Revision: https://reviews.llvm.org/D152541
2023-06-15 14:17:58 +02:00
Matt Arsenault
28f3edd2be AMDGPU: Add llvm.amdgcn.exp2 intrinsic
Provide direct access to v_exp_f32 and v_exp_f16, so we can start
correctly lowering the generic exp intrinsics.

Unfortunately have to break from the usual naming convention of
matching the instruction name and stripping the v_ prefix. exp is
already taken by the export intrinsic. On the clang builtin side, we
have a choice of maintaining the convention to the instruction name,
or following the intrinsic name.
2023-06-15 07:00:07 -04:00
Ivan Kosarev
9aa026e9ff [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 9.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152902
2023-06-15 11:02:08 +01:00
Ivan Kosarev
9792c804f6 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 8.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152809
2023-06-15 10:47:04 +01:00
Ivan Kosarev
7680951ac8 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 7.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152808
2023-06-15 10:40:58 +01:00
Ivan Kosarev
c2887096f3 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 6.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152807
2023-06-15 10:39:31 +01:00
Nikita Popov
3210cc9a88 [X86] Add test for icmp/sub operand order across blocks (NFC) 2023-06-15 11:34:20 +02:00
Ivan Kosarev
79c8301478 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 5.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152805
2023-06-15 10:28:16 +01:00
Ivan Kosarev
980d2b337e [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 4.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152717
2023-06-15 10:26:41 +01:00
Ivan Kosarev
e9d77cd9b2 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 3.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152716
2023-06-15 09:55:25 +01:00
Amara Emerson
f79b0333fc [DAGCombiner] Fix crash when trying to replace an indexed store with a narrow store.
rdar://108818859

Differential Revision: https://reviews.llvm.org/D152978
2023-06-15 01:54:38 -07:00
eopXD
56c25575ce [1/3][RISCV] Define machine instruction to write an immediate into vxrm
This patch-set wants to model rounding mode for the fixed-point
intrinsics of the RVV C intrinsics.

The specification PR: [riscv-non-isa/rvv-intrinsic-doc#222](https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222)

The 3 patches is a proof-of-concept with a bottom-up approach
Going from machine instruction to LLVM intrinsics, then to the C
intrinsics. The 3 patches applies the rounding mode control on the
`vaadd` instruction. Proceeding patches will extend the change to all
other fixed-point computations.

---

This is the 1st commit of the patch-set.  This patch gives a name to
the machine instruction that writes an immediate into the CSR `vxrm`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151395
2023-06-15 01:37:43 -07:00
Simon Tatham
10e4228114 [ARM,AArch64] Add a full set of -mtp= options.
AArch64 has five system registers intended to be useful as thread
pointers: one for each exception level which is RW at that level and
inaccessible to lower ones, and the special TPIDRRO_EL0 which is
readable but not writable at EL0. AArch32 has three, corresponding to
the AArch64 ones that aren't specific to EL2 or EL3.

Currently clang supports only a subset of these registers, and not
even a consistent subset between AArch64 and AArch32:

 - For AArch64, clang permits you to choose between the four TPIDR_ELn
   thread registers, but not the fifth one, TPIDRRO_EL0.

 - In AArch32, on the other hand, the //only// thread register you can
   choose (apart from 'none, use a function call') is TPIDRURO, which
   corresponds to (the bottom 32 bits of) AArch64's TPIDRRO_EL0.

So there is no thread register that you can currently use in both
targets!

For custom and bare-metal purposes, users might very reasonably want
to use any of these thread registers. There's no reason they shouldn't
all be supported as options, even if the default choices follow
existing practice on typical operating systems.

This commit extends the range of values acceptable to the `-mtp=`
clang option, so that you can specify any of these registers by (the
lower-case version of) their official names in the ArmARM:

 - For AArch64: tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3
 - For AArch32: tpidrurw, tpidruro, tpidrprw

All existing values of the option are still supported and behave the
same as before. Defaults are also unchanged. No command line that
worked already should change behaviour as a result of this.

The new values for the `-mtp=` option have been agreed with Arm's gcc
developers (although I don't know whether they plan to implement them
in the near future).

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D152433
2023-06-15 09:27:41 +01:00
David Green
98153b088e [AArch64] Fix check lines for arm64-neon-across.ll. NFC
Commit de0707a2b98162ab52fa2dd9277a9bbb4f7256c7 updated the check lines, but
due to conflicting assembly not all functions kept their checks. This now
distinguishes between selection-dag and global isel.
2023-06-15 09:25:28 +01:00
David Green
1643197e19 [AArch64][SVE] Enable shouldFoldSelectWithIdentityConstant for SVE.
Instcombine will canonicalize `select(c, binop(a, b), a)` to
`binop(select(c, b, identityvalue), a)`. The original select form
makes a more natural form for vector predicated operations for
vector architectures like SVE where predication is well supported.
This patch enables shouldFoldSelectWithIdentityConstant for SVE so
that more predicated instructions can be generated, helping simplify
the handling with identity constants.

Predicated FMA patterns have also been adjusted here as they need to
look at FMF's. Other operations like add/sub, mul, and/or/xor and
mla/mls have been recently updated.

There is one test (scalable_int_min_max) that increases in size. There
are multiple selects that could be combined into a single select but
does not currently fold.

Differential Revision: https://reviews.llvm.org/D149967
2023-06-15 09:17:50 +01:00
esmeyi
028a261350 [XCOFF] FixupOffsetInCsect should be 0 for R_REF relocation.
Summary: The FixupOffsetInCsect should be 0 for R_REF relocation since it specifies a nonrelocating reference. Otherwise liker would try to relocate the symbol through its address and an error like following occurred.
```
ld: 0711-547 SEVERE ERROR: Object /tmp/1-2a7ea1.o cannot be processed.
	RLD address 0x65 for section 2 (.data) is
	not contained in the section.
```

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D152777
2023-06-15 01:28:45 -04:00
Pravin Jagtap
03d92501f3 [AMDGPU] Enable Atomic Optimizer and Default to Iterative Scan Strategy.
The D147408 implemented new Iterative approach for scan computations
and  added new flag `amdgpu-atomic-optimizer-strategy` which is
defaulted to DPP.

The changeset https://github.com/GPUOpen-Drivers/llpc/pull/2506
adapts to the new changes in LLPC.

This patch enables atomic optimizer pass and selects Iterative
approach for scan computations by default for compute pipeline.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152649
2023-06-15 01:18:38 -04:00
Carl Ritson
0fd31b2880 [AMDGPU] Place returns on stack if they would violate VGPR limit
Check no VGPRs above configured maximum would be used by a return
when deciding if it can be lowered.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D152912
2023-06-15 14:05:32 +09:00
Carl Ritson
d0c0838705 [AMDGPU] Remove return VGPRs from callee save list
There is no need to generate spill/restore for registers used in
return value.  This matters for amdgpu_gfx calling convention
where CSR and Ret definitions overlap.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D152892
2023-06-15 14:05:32 +09:00
Amaury Séchet
e879fded2a [NFC] Autogenerate several Mips test. 2023-06-14 22:27:15 +00:00
Amaury Séchet
0a76f7d9d8 [NFC] Autogenerate numerous SystemZ tests 2023-06-14 21:47:31 +00:00
Amaury Séchet
7a50e78621 [NFC] Autogenerate various Thumb2 tests. 2023-06-14 21:18:39 +00:00
Amaury Séchet
c67a326dc5 [NFC] Autogenerate several AArch64 tests. 2023-06-14 18:03:46 +00:00
Amaury Séchet
de0707a2b9 [NFC] Autogenerate several AArch64 tests. 2023-06-14 17:46:38 +00:00
Neumann Hon
8a7a2da18f [SystemZ][z/OS] Correct value of length/4 of params field in PPA1.
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.

Differential Revision: https://reviews.llvm.org/D152920

Reviewed By: uweigand
2023-06-14 13:37:46 -04:00
Neumann Hon
049324ac5e Revert "[SystemZ][z/OS] Correct value of length/4 of params field in PPA1."
This reverts commit e0f7b0e0f704dc3759925602e474b9e669270fcb.
2023-06-14 13:34:16 -04:00
Igor Kirillov
2cbc265cc9 [CodeGen] Add support for reductions in ComplexDeinterleaving pass
This commit enhances the ComplexDeinterleaving pass to handle unordered
reductions in simple one-block vectorized loops, supporting both
SVE and Neon architectures.

Differential Revision: https://reviews.llvm.org/D152022
2023-06-14 17:27:26 +00:00
Neumann Hon
e0f7b0e0f7 [SystemZ][z/OS] Correct value of length/4 of params field in PPA1.
The Length/4 of Params field in the PPA1 ought to be the length of the parameters for the current function. Currently we are storing the length of the parameter area in the current function's stack frame, which represents the length of the params of the longest callee in the current function.

Differential revision: https://reviews.llvm.org/D119049

Reviewed By: uweigand
2023-06-14 13:20:45 -04:00
Amaury Séchet
a03bcc2f9e [NFC] Autogenerate CodeGen/AArch64/sve-vl-arith.ll 2023-06-14 17:09:55 +00:00
Artem Belevich
eb4f0d9f85 Revert "[NVPTX] Allow using v4i32 for memcpy lowering."
The patch may trigger a hang:
https://github.com/llvm/llvm-project/issues/63294

This reverts commit c16b7e54ac5b4da05c1d19e350ee8e75bf5f8980.
2023-06-14 10:03:30 -07:00
Amaury Séchet
0dab862650 [NFC] Autogenerate a couple of AArch64 tests. 2023-06-14 17:00:26 +00:00
Amaury Séchet
552ee85eb8 [NFC] Regenerate CodeGen/AArch64/sve-streaming-mode-fixed-length-*.ll 2023-06-14 16:38:18 +00:00
Amaury Séchet
2c83809fa8 [NFC] Automatically generate arm64-dagcombiner-dead-indexed-load.ll 2023-06-14 16:28:04 +00:00
Amaury Séchet
61f9cb002d [NFC] Regenerate several VE codegen tests. 2023-06-14 16:20:37 +00:00
Amaury Séchet
b3bdfd3e4a [NFC] Regen CodeGen/AArch64/bitfield-insert.ll 2023-06-14 16:08:54 +00:00
Igor Kirillov
211f27f37c [CodeGen] Add pre-commit tests for D152022 and D152558
Differential Revision: https://reviews.llvm.org/D152025
2023-06-14 15:53:47 +00:00
Craig Topper
6bf79fb094 [SelectionDAG][RISCV] Add very basic PromoteIntegerResult/Op support for VP_SIGN/ZERO_EXTEND.
We don't have VP_ANY_EXTEND or VP_SIGN_EXTEND_INREG yet so I've
deviated a little from the non-VP lowering.

My goal was to fix the crashes that occurs on these test cases without this patch.

Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D152854
2023-06-14 08:52:56 -07:00
zhongyunde
43b2df03e8 [LegalizeTypes][AArch64] Use scalar_to_vector to eliminate bitcast
```
Legalize t3: v2i16 = bitcast i32
with   (v2i16 extract_subvector (v4i16 bitcast (v2i32 scalar_to_vector (i32 in))), 0)
```
Fix https://github.com/llvm/llvm-project/issues/61638

NOTE: Don't touch getPreferredVectorAction like X86 as this will touch
too many test cases.

Reviewed By: dmgreen, paulwalker-arm, efriedma
Differential Revision: https://reviews.llvm.org/D147678
2023-06-14 23:33:02 +08:00
zhongyunde
e108aee956 [test] Update the checking base for LE and BE
precommit tests for D147678 as we need tests cover BE too.

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D152815
2023-06-14 23:33:01 +08:00
Simon Pilgrim
78a0b2be83 [GlobalIsel][X86] Regenerate legalize-add.mir with common CHECK prefix 2023-06-14 15:01:18 +01:00
Amaury Séchet
e559f270d9 [NFC] Add tests cases for isTruncateOf for D151916 2023-06-14 13:03:44 +00:00
Matt Arsenault
0696240384 LowerMemIntrinsics: Check address space aliasing for memmove expansion
For cases where we cannot insert an addrspacecast, we can still expand
like a memcpy if we know the address spaces cannot alias. Normally
non-aliasing memmoves are optimized to memcpy, but we cannot rely on
that for lowering. If a target has aliasing address spaces that cannot
be casted between, we still have to give up lowering this.
2023-06-14 07:56:58 -04:00
Simon Pilgrim
f6ff2cc7e0 [X86] X86FixupVectorConstantsPass - attempt to replace full width integer vector constant loads with broadcasts on AVX2+ targets (REAPPLIED)
lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

This is an updated commit of ab4b924832ce26c21b88d7f82fcf4992ea8906bb after being reverted at 78de45fd4a902066617fcc9bb88efee11f743bc6
2023-06-14 12:48:33 +01:00
Jay Foad
6c03f402f7 [AMDGPU] Use a common check prefix in regbankselect-amdgcn.s.buffer.load.ll 2023-06-14 12:06:11 +01:00
Ivan Kosarev
150c73a072 [AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 2.
Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D152715
2023-06-14 11:49:12 +01:00
Carl Ritson
936c16a3a9 [AMDGPU] Pre-commit test for D152892 (NFC) 2023-06-14 17:14:05 +09:00