52796 Commits

Author SHA1 Message Date
WANG Rui
42dccf9c16 [LoongArch] Implement isLegalAddImmediate
This brings a trivial improvement in the and-add-lsr.ll test case.

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154762
2023-07-24 17:17:24 +08:00
WANG Rui
c100f35f02 [LoongArch] Add tests for (and (add x, c1), (lshr y, c2))
Add tests for (and (add x, c1), (lshr y, c2)).

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: SixWeining, xen0n

Differential Revision: https://reviews.llvm.org/D154809
2023-07-24 17:12:10 +08:00
WANG Rui
595d5f36f4 [DAGCombine] Canonicalize operands for visitANDLike
During the construction of SelectionDAG, there are no explicit canonicalization rules to adjust the order of operands for AND nodes.  This may prevent the optimization in DAGCombiner::visitANDLike from being triggered. This patch canonicalizes the operands before matches, which can be observed to improve optimization on the RISC-V target architecture.

Canonicalize:
```
and(x, add) -> and(add, x)
```

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154760
2023-07-24 16:52:04 +08:00
WANG Rui
cea980f380 [RISCV] Add tests for (and (add x, c1), (lshr y, c2))
Add tests for (and (add x, c1), (lshr y, c2)).

Signed-off-by: WANG Rui <wangrui@loongson.cn>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D154808
2023-07-24 16:52:02 +08:00
Sander de Smalen
6865fbd3da [AArch64][SME] Use fmov instead of NEON movi for FP value.
NEON `movi` is not valid in Streaming SVE mode, so use an `fmov`
instruction instead for zero-initializing a FP value.

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D155432
2023-07-24 08:48:19 +00:00
Antonio Frighetto
2dea969d83 [clang][CodeGen] Introduce -frecord-command-line for MachO
Allow clang driver command-line recording when
targeting MachO object files as well.

Reviewed-by: sgraenitz

Differential Revision: https://reviews.llvm.org/D155716
2023-07-24 09:24:59 +02:00
Craig Topper
74d16b212b [RISCV] Add Zicond RUN lines to xaluo.ll. NFC
A couple of these tests show a need for computeKnownBits support
for Zicond.
2023-07-23 23:03:18 -07:00
Jim Lin
37b474a20e [RISCV] Remove unused check prefixes for tests. NFC
Also remove the warning line for that these prefixes are unused.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156048
2023-07-24 13:42:49 +08:00
eopXD
78d91df452 [RISCV] Support register allocation for GHC when f/d is not specified in the architecture
This patch supports register allocation for floating-point types when
`zfinx` and `zdinx` is specified in the architecture for the GHC
calling convention.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155910
2023-07-23 22:40:10 -07:00
esmeyi
776195865d [XCOFF] Write source language ID and CPU version ID into C_FILE symbol.
Summary: The source language ID and CPU version ID are required by debuggers on AIX. AIX's system assembler determines the source language ID based on the source file's name suffix, and the behavior in this patch is consistent with it.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D155684
2023-07-24 00:35:24 -04:00
Pravin Jagtap
c48ed93cf8 [AMDGPU] Add llvm.amdgcn.wave.reduce.umin/umax Intrinsic.
When input to intrinsic is uniform value, reduced value is
same as input whereas if input value is divergent we need
to iterate over all active lanes of WaveFront to perform
the reduction.

The control flow for a `loop` has been set up, which
iterates over `only` active lanes to perform reduction.

Introduced WAVE_REDUCE_UMIN_PSEUDO_U32 and
WAVE_REDUCE_UMAX_PSEUDO_U32 Pseudos which
are lowered Post-ISel (in `EmitInstrWithCustomInserter `).

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D154858
2023-07-24 00:06:00 -04:00
Zhongyunde
0aaeb88532 [AArch64][GlobalISel] Legalize <2 x s8> and <4 x s8> for G_BUILD_VECTOR
Refer to commit ccffc27, the remaining types <2 x s8> and <4 x s8> should
also be promoted to <2 x s32> and <4 x s16>.

Fixes https://github.com/llvm/llvm-project/issues/58274

Reviewed By: aemerson, tschuett, paquette, dmgreen
Differential Revision: https://reviews.llvm.org/D153394
2023-07-24 11:25:26 +08:00
Jun Sha (Joshua)
f375ee36c4 [RISCV] Add codegen for Zfbfmin instructions
The implementation in https://reviews.llvm.org/D151313 is done for the circumstance without Zfbfmin. This patch adds codegen support for the 6 instructions provided in Zfbfmin extension.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153234
2023-07-24 10:37:58 +08:00
David Green
495bdfc7bb [AArch64] Lower fcvtl2 (fpext) via tablegen patterns.
This patch does two things. First it removes the tryHighFPExt DAG2DAG method
used to select fcvtl2 instructions, using tablegen patterns through
SelectExtractHigh instead. This essentially undoes D71515, in a way that should
hopefully avoid any regressions. The second is that a GI equivalent of
SelectExtractHigh is added in selectExtractHigh, from G_UNMERGE_VALUES. The
end result is that GlobalISel (and some constrained fpext) can now make use of
the fcvtl2 instructions, saving an extra dup/ext.

Differential Revision: https://reviews.llvm.org/D155871
2023-07-23 19:17:11 +01:00
David Green
6edc9a7662 [AArch64][GISel] Additional FPExt vector lowering
Similar to D155311, this adds lowering for more vector cases for FPExt

Differential Revision: https://reviews.llvm.org/D155601
2023-07-23 16:58:13 +01:00
Phoebe Wang
88b6d291bb [X86][FP16] Split v32f16 shuffle when feature BWI is off
Found this problem when investigating #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D156050
2023-07-23 20:56:09 +08:00
Simon Pilgrim
da0f24873d [X86] LowerFunnelShift - manually expand funnel shifts by splat constant patterns.
Followup to af32e51a43fb4343f - where the undef funnel shift amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.
2023-07-23 10:57:11 +01:00
Simon Pilgrim
92bf83cf60 [X86] Add basic test coverage for funnels shifts of sub-128-bit vector types 2023-07-23 10:57:11 +01:00
Kishan Parmar
41af6ece6c [PowerPC/SPE] powerpcspe load and store instruction has
8-bit offset instead of 16-bit unlike other load/store instructions.
so if stack grows any further than 8-bit, create one emergency slot
for spilling.
2023-07-23 13:24:35 +05:30
Amaury Séchet
88452508f3 [DAG] Improve carry reconstruction in combineCarryDiamond.
The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D154533
2023-07-22 22:49:48 +00:00
Simon Pilgrim
af32e51a43 [X86] LowerRotate - manually expand rotate by splat constant patterns.
Fixes issue identified on #63980 where the undef rotate amounts (during widening from v2i32 -> v4i32) were being constant folded to 0 when the shift amounts are created during expansion, losing the splat'd shift amounts.
2023-07-22 17:54:57 +01:00
Phoebe Wang
04527f1d32 [X86][BF16] Customize INSERT_VECTOR_ELT for bf16 when feature BF16 is on
Fixes root cause of #63017.
The reason is similar to BUILD_VECTOR. We have legal vector type but
still soft promote for scalar type. So we need to customize these scalar
to vector nodes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155961
2023-07-22 20:26:34 +08:00
Phoebe Wang
f11526b091 [X86][BF16] Do not scalarize masked load for BF16 when we have AVX512BF16
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-22 18:16:49 +08:00
Jon Roelofs
62a1fbe9f7
Enable compact unwind in all darwin simulators
... since they've always supported it.

rdar://104359594

Differential revision: https://reviews.llvm.org/D155988
2023-07-21 16:13:47 -07:00
Philip Reames
785939c15e Revert "[RISCV] Add test which shows alignment of constant pools and the functions which followed"
This reverts commit cbf2a6ce197e8176c01316fe25400aae0b7390c4.  This was a precommited test for a change which is being abandoned.
2023-07-21 16:03:15 -07:00
Nathan Chancellor
17f4f262fc
Revert "Reapply [IR] Mark and constant expressions as undesirable"
This reverts commit 086ee99564afbb11449c08ea2e094f7f49fadde5.

This patch causes an infinite loop when building arch/mips/mm/c-r4k.c in
the Linux kernel. See the comment in Phabricator for a reduced
reproducer: https://reviews.llvm.org/rG086ee99564afbb11449c08ea2e094f7f49fadde5
2023-07-21 15:57:03 -07:00
Matt Arsenault
8406c3568a AMDGPU: Implement new 2ulp fdiv lowering
Extends the new frexp scaled reciprocal to the general case. The
reciprocal case is just the same thing when frexp of 1 is constant
folded. Could probably clean up the code to rely on that constant
folding.

Improves results for the IEEE path for the default OpenCL division. We
used to only emit the fdiv.fast intrinsic with a 2.5 ulp accuracy
threshold with DAZ, which uses explicit range checks. This gives us a
better fast option with the default IEEE behavior.
2023-07-21 18:55:42 -04:00
Matt Arsenault
6699c37028 AMDGPU: Refactor AMDGPUCodeGenPrepare fdiv handling
NFC-ish. Does trigger some reordering of the fdiv scalarization. Also
skips scalarizing in more cases where nothing was going to happen. We
can still scalarize in some no-op edge cases.

https://reviews.llvm.org/D155740
2023-07-21 18:55:42 -04:00
Philip Reames
cbf2a6ce19 [RISCV] Add test which shows alignment of constant pools and the functions which followed 2023-07-21 15:02:43 -07:00
Joseph Huber
f4381d4644 [NVPTX] Add initial support for '.alias' in PTX
This patch adds initial support for using aliases when targeting PTX. We
perform a pretty strict conversion from the globals referenced to the
expected output. as described in the PTX documentation at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-alias

These cannot currently be used due to a bug in the `nvlink`
implementation that causes aliases to pruned functions to crash the
linker.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D155211
2023-07-21 16:43:46 -05:00
Matt Arsenault
8287f3af9d AMDGPU: Overhaul and improve rcp and rsq f32 formation
The highlight change is a new denormal safe 1ulp lowering which uses
rcp after using frexp to perform input scaling. This saves 2
instructions compared to other implementations which performed an
explicit denormal range change. This improves the OpenCL default, and
requires a flag for HIP. I don't believe there's any flag wired up for
OpenMP to emit the necessary fpmath metadata.

This provides several improvements and changes that were hard to
separate without regressing one case or another. Disturbingly the
OpenCL conformance test seems to have the reciprocal test commented
out. I locally hacked it back in to test this.

Starts introducing f32 rsq intrinsics in AMDGPUCodeGenPrepare. Like
the rcp case, we could do this in codegen if !fpmath were preserved
(although we would lose some computeKnownFPClass tricks). Start
requiring contract flags to form rsq. The rsq fusion actually improves
the result from ~2ulp to ~1ulp. We have some older fusion in codegen
which only keys off unsafe math which should be refined.

Expand rsq patterns by checking for denormal inputs and pre/post
multiplying like the current library code does. We also take advantage
of computeKnownFPClass to avoid the scaling when we can statically
prove the input cannot be a denormal. We could do the same for the rcp
case, but unlike rsq a large input can underflow to denormal. We need
additional upper bound exponent checks on the input in order to do the
same for rcp.

This rsq handling also now starts handling the negated case. We
introduce rsq with an fneg. In the case the fneg doesn't fold into its
user, it's a neutral change but provides improvement if it is foldable
as a source modifier.

Also starts respecting the arcp attribute properly, and more strictly
interprets afn. We were previously interpreting afn as implying you
could do the reciprocal expansion of an fdiv. The codegen handling of
these also needs to be revisited.

This also effectively introduces the optimization
combineRepeatedFPDivisors enables, just done in the IR instead (and
only for f32).

This is almost across the board better. The one minor regression is
for gfx6/buggy frexp case where for multiple reciprocals, we could
previously reuse rematerialized constants per instance (it's neutral
for a single rcp).

The fdiv.fast and sqrt handling need to be revisited next.

https://reviews.llvm.org/D155593
2023-07-21 16:35:53 -04:00
Daniel Hoekwater
0315fca912 [AArch64] Move branch relaxation after bbsection assignment
Because branch relaxation needs to factor in if branches target
a block in the same section or a different one, it needs to run
after the Basic Block Sections / Machine Function Splitting passes.

Because Jump table compression relies on block offsets remaining
fixed after the table is compressed, we must also move the JT
compression pass.

The only tests affected are ones enforcing just the ordering and
the a few that have basic block ids changed because RenumberBlocks
hasn't run yet.

Differential Revision: https://reviews.llvm.org/D153829
2023-07-21 20:24:52 +00:00
Matt Arsenault
37512d7629 AMDGPU: Add baseline test for fdiv combine 2023-07-21 16:04:12 -04:00
Simon Pilgrim
65c9153cf0 [X86] combineBitcastvxi1 - don't prematurely create PACKSS nodes.
Similar to Issue #63710 - by truncating the v8i16 result with a PACKSS node before type legalization, we fail to make use of various folds that rely on TRUNCATE nodes.

This required tweaks to LowerTruncateVecPackWithSignBits to recognise when the truncation source has been widened and to more closely match combineVectorSignBitsTruncation wrt truncating with PACKSS/PACKUS on AVX512 targets.

One of the last stages before we can finally get rid of combineVectorSignBitsTruncation.
2023-07-21 19:10:18 +01:00
Fangrui Song
9996e71f2d [Support] Implement LLVM_ENABLE_REVERSE_ITERATION for StringMap
ProgrammersManual.html says

> StringMap iteration order, however, is not guaranteed to be deterministic, so any uses which require that should instead use a std::map.

This patch makes -DLLVM_REVERSE_ITERATION=on (currently
-DLLVM_ENABLE_REVERSE_ITERATION=on works as well) shuffle StringMap
iteration order (actually flipping the hash so that elements not in the
same bucket are reversed) to catch violations, similar to D35043 for
DenseMap. This should help change the hash function (e.g., D142862,
D155781).

With a lot of fixes, there are still some violations. This patch
implements the "reverse_iteration" lit feature to skip such tests.
Eventually we should remove this feature.

`ninja check-{llvm,clang,clang-tools}` are clean with
`#define LLVM_ENABLE_REVERSE_ITERATION 1`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D155789
2023-07-21 08:46:51 -07:00
Fangrui Song
ffa829c4c5 [RISCV] Allow delayed decision for ADD/SUB relocations
For a label difference `A-B` in assembly, if A and B are separated by a
linker-relaxable instruction, we should emit a pair of ADD/SUB
relocations (e.g. R_RISCV_ADD32/R_RISCV_SUB32,
R_RISCV_ADD64/R_RISCV_SUB64).

However, the decision is made upfront at parsing time with inadequate
heuristics (`requiresFixup`). As a result, LLVM integrated assembler
incorrectly suppresses R_RISCV_ADD32/R_RISCV_SUB32 for the following
code:
```
// Simplified from a workaround https://android-review.googlesource.com/c/platform/art/+/2619609
// Both end and begin are not defined yet. We decide ADD/SUB relocations upfront and don't know they will be needed.
.4byte end-begin

begin:
  call foo
end:
```

To fix the bug, make two primary changes:

* Delete `requiresFixups` and the overridden emitValueImpl (from D103539).
  This deletion requires accurate evaluateAsAbolute (D153097).
* In MCAssembler::evaluateFixup, call handleAddSubRelocations to emit
  ADD/SUB relocations.

However, there is a remaining issue in
MCExpr.cpp:AttemptToFoldSymbolOffsetDifference. With MCAsmLayout, we may
incorrectly fold A-B even when A and B are separated by a
linker-relaxable instruction. This deficiency is acknowledged (see
D153097), but was previously bypassed by eagerly emitting ADD/SUB using
`requiresFixups`. To address this, we partially reintroduce `canFold` (from
D61584, removed by D103539).

Some expressions (e.g. .size and .fill) need to take the `MCAsmLayout`
code path in AttemptToFoldSymbolOffsetDifference, avoiding relocations
(weird, but matching GNU assembler and needed to match user
expectation). Switch to evaluateKnownAbsolute to leverage the `InSet`
condition.

As a bonus, this change allows for the removal of some relocations for
the FDE `address_range` field in the .eh_frame section.

riscv64-64b-pcrel.s contains the main test.
Add a linker relaxable instruction to dwarf-riscv-relocs.ll to test what
it intends to test.
Merge fixups-relax-diff.ll into fixups-diff.ll.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D155357
2023-07-21 08:37:58 -07:00
Phoebe Wang
fbae3d1d3c Revert "[X86][BF16] Do not scalarize masked load for BF16 when we have BWI"
This reverts commit ca1c05208ed35ba72869c65ad773b2cca4bbd360.

It caused Buildbot fail: https://lab.llvm.org/buildbot#builders/220/builds/24870
2023-07-21 23:29:11 +08:00
Phoebe Wang
ca1c05208e [X86][BF16] Do not scalarize masked load for BF16 when we have BWI
Fixes #63017

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D155952
2023-07-21 23:18:54 +08:00
Jay Foad
6c8f4472b4 [ARM] Extend regression test for D154281
Add a test case with a larger call frame which does not satisfy
ARMFrameLowering::hasReservedCallFrame.
2023-07-21 15:48:45 +01:00
Simon Pilgrim
ae60706da0 [DAG] SimplifyDemandedBits - call ComputeKnownBits for constant non-uniform ISD::SRL shift amounts
We only attempted to determine KnownBits for uniform constant shift amounts, but ComputeKnownBits is able to handle some non-uniform cases as well that we can use as a fallback.
2023-07-21 14:52:57 +01:00
Simon Pilgrim
be62041e7e [X86] matchBinaryShuffle - match PACKUS for v2i64 -> v4i32 shuffle truncation patterns.
Handle PACKUSWD on +SSE41 targets, or fallback to PACKUSBW on any +SSE2 target
2023-07-21 13:32:04 +01:00
Simon Pilgrim
c0a1f4624b [X86] Add packus.ll test coverage
Similar to the existing packss.ll tests
2023-07-21 13:32:04 +01:00
Simon Pilgrim
7196eb2541 [X86] packss.ll - add SSE4.2 test coverage 2023-07-21 13:32:03 +01:00
Jay Foad
e45a0c2994 [AMDGPU][RFC] Update isLegalAddressingMode for GFX9 SMEM signed offsets
Differential Revision: https://reviews.llvm.org/D155587
2023-07-21 10:56:43 +01:00
Jay Foad
787bef0bee [AMDGPU] Add tests for SMEM addressing modes in CodeGenPrepare
Differential Revision: https://reviews.llvm.org/D155854
2023-07-21 10:56:43 +01:00
Luke Lau
33a83c5486 [RISCV] Add SDNode patterns for vrol.[vv,vx] and vror.[vv,vx,vi]
These correspond to ROTL/ROTR nodes

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155439
2023-07-21 10:22:46 +01:00
Luke Lau
24628a14c4 [RISCV] Add patterns for vnsr[a,l].wx where shift amount has different type than vector element
We're currently only matching scalar shift amounts where the type is the same
as the vector element type. But because only the bottom log2(2*SEW) bits are
used, only 7 bits will be used at most so we can use any scalar type >= i8.

This patch adds patterns for the case above, as well as for when the shift
amount type is the same as the widened element type and doesn't need extended.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155698
2023-07-21 10:13:28 +01:00
Luke Lau
418e678ba3 [RISCV] Add tests for vnsr[l,a].wx patterns that could be matched
These patterns of ([l,a]shr v, ([s,z]ext splat)) only pick up the cases where
the scalar has the same type as the vector element. However since only the low
log2(SEW) bits of the scalar are read, we could use any scalar type that has
been extended.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155697
2023-07-21 10:13:26 +01:00
Nikita Popov
f060f095aa [X86] Expand constant expressions in test (NFC) 2023-07-21 10:40:47 +02:00
Nikita Popov
086ee99564 Reapply [IR] Mark and constant expressions as undesirable
Reapply after fixing an issue in canonicalizeLogicFirst() exposed
by this change (218f97578b26f7a89f7f8ed0748c31ef0181f80a).

-----

In preparation for removing support for and expressions, mark them
as undesirable. As such, we will no longer implicitly create such
expressions, but they still exist.
2023-07-21 10:10:50 +02:00