52796 Commits

Author SHA1 Message Date
David Majnemer
9eff001d3d [TargetLowering] Correctly yield NaN from FP_TO_BF16
We didn't set the exponent field, resulting in tiny numbers instead of
NaNs.
2024-02-21 22:17:02 +00:00
David Majnemer
ddc0f1d8fe [TargetLowering] Actually add the adjustment to the significand
The logic was supposed to be choosing between {0, 1, -1} as an
adjustment to the FP bit pattern. However, the adjustment itself was
used as the bit pattern instead which result in garbage results.
2024-02-21 19:34:11 +00:00
David Majnemer
cc13f3ba45
Correctly round FP -> BF16 when SDAG expands such nodes (#82399)
We did something pretty naive:
- round FP64 -> BF16 by first rounding to FP32
- skip FP32 -> BF16 rounding entirely
- taking the top 16 bits of a FP32 which will turn some NaNs into
infinities

Let's do this in a more principled way by rounding types with more
precision than FP32 to FP32 using round-inexact-to-odd which will negate
double rounding issues.
2024-02-21 12:37:02 -05:00
Luke Lau
2cd59bdc89 [RISCV] Add test case for miscompile in gather -> strided load combine. NFC
This shows the issue in #82430, but triggers it via the widening SEW combine
rather than a GEP that RISCVGatherScatterLowering doesn't detect.
2024-02-22 00:30:38 +08:00
Jonas Paulsson
9c0e45d7f0
[SystemZ] Use VT (not ArgVT) for SlotVT in LowerCall(). (#82475)
When an integer argument is promoted and *not* split (like i72 -> i128 on
a new machine with vector support), the SlotVT should be i128, which is
stored in VT - not ArgVT.

Fixes #81417
2024-02-21 16:26:16 +01:00
Dinar Temirbulatov
5a023f564f
[AArch64][SVE2] Enable dynamic shuffle for fixed length types. (#72490)
When SVE register size is unknown or the minimal size is not equal to
the maximum size then we could determine the actual SVE register size in
the runtime and adjust shuffle mask in the runtime.
2024-02-21 14:59:47 +00:00
Momchil Velikov
1a7166833d
[AArch64] Fix stack probing clobbering flags (#81879)
Certain stack probing sequences might clobber flags, then we can't use a
block as a prologue if the flags register is a live-in on entry to that
block.
2024-02-21 13:58:04 +00:00
Simon Pilgrim
b8c9b06134 [X86] LowerCTPOP - add i3 and i4 LUT 'shift+mask' expansions
Use the 3 or 4 active bits as a shift amount into a i32/i64 constant representing the number of set bits.

In future, it might be worthwhile to move this into a generic location in case other targets want to make use of them.

Another expansion pulled from #79823
2024-02-21 13:53:47 +00:00
Simon Pilgrim
98a07f72ee [X86] LowerCTPOP - "ctpop(i2 x) --> sub(x, (x >> 1))"
If we only have 2 active bits then we can avoid the i8 CTPOP multiply expansion entirely

Another expansion pulled from #79823
2024-02-21 13:53:47 +00:00
chuongg3
0fb3d4296f
[AArch64][GlobalISel] Refactor BITCAST Legalization (#80505)
Ensure BITCAST is only legal for types with the same amount of bits.
Enable BITCAST to work with non-legal vector types as well.
2024-02-21 13:24:45 +00:00
hev
dd3e0a4643
[LoongArch] Assume no-op addrspacecasts by default (#82332)
This PR indicates that `addrspacecasts` are always no-ops on LoongArch.

Fixes #82330
2024-02-21 21:15:17 +08:00
Chia
c50ca3daa4
[RISCV][ISel] Combine vector fadd/fsub/fmul with fp extend. (#81248)
Extend D133739 and #76785 to support vector widening floating-point
add/sub/mul instructions.

Specifically, this patch works for the below optimization case:

### Source code
```
define void @vfwmul_v2f32_multiple_users(ptr %x, ptr %y, ptr %z, <2 x float> %a, <2 x float> %b, <2 x float> %b2) {
  %c = fpext <2 x float> %a to <2 x double>
  %d = fpext <2 x float> %b to <2 x double>
  %d2 = fpext <2 x float> %b2 to <2 x double>
  %e = fmul <2 x double> %c, %d
  %f = fadd <2 x double> %c, %d2
  %g = fsub <2 x double> %d, %d2
  store <2 x double> %e, ptr %x
  store <2 x double> %f, ptr %y
  store <2 x double> %g, ptr %z
  ret void
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/aaEMs5s9h)
```
vfwmul_v2f32_multiple_users:
        vsetivli        zero, 2, e32, mf2, ta, ma
        vfwcvt.f.f.v    v11, v8
        vfwcvt.f.f.v    v8, v9
        vfwcvt.f.f.v    v9, v10
        vsetvli zero, zero, e64, m1, ta, ma
        vfmul.vv        v10, v11, v8
        vfadd.vv        v11, v11, v9
        vfsub.vv        v8, v8, v9
        vse64.v v10, (a0)
        vse64.v v11, (a1)
        vse64.v v8, (a2)
        ret
```

### After this patch
```
vfwmul_v2f32_multiple_users:
        vsetivli zero, 2, e32, mf2, ta, ma
        vfwmul.vv v11, v8, v9
        vfwadd.vv v12, v8, v10
        vfwsub.vv v8, v9, v10
        vse64.v v11, (a0)
        vse64.v v12, (a1)
        vse64.v v8, (a2)
```
2024-02-21 22:06:40 +09:00
Simon Pilgrim
3cb4f62de0 [X86] Regenerate vector tests to add missing avx512 constant broadcast comments 2024-02-21 10:46:12 +00:00
Simon Pilgrim
bdeb3d47d1 [X86] Regenerate saddsat/ssubsat vector tests
Adds missing avx512 constant broadcast comments
2024-02-21 10:46:12 +00:00
Nick Anderson
5db49f7266
[GlobalISel] replace right identity X * -1.0 with fneg(x) (#80526)
follow up patch to #78673

@Pierre-vh @jayfoad @arsenm Could you review when you have a chance.
2024-02-21 09:41:59 +00:00
Tuan Chuong Goh
1ff1e82383 [AArch64][GlobalISel] Pre-Commit Tests for Refactor BITCAST 2024-02-21 09:17:05 +00:00
Yingwei Zheng
02fad0565f
[RISCV][SDAG] Fold select c, ~x, x into xor -c, x (#82462)
This patch lowers select of constants if `TrueV == ~FalseV`.
Address the comment in
https://github.com/llvm/llvm-project/pull/82456#discussion_r1496881603.
2024-02-21 16:27:43 +08:00
Owen Anderson
44b717df4d
[GlobalISel] Clamp out-of-range G_EXTRACT_VECTOR_ELT constant indices when converting them into loads. (#82460)
This avoid turning a poison value into a segfault, and fixes
https://github.com/llvm/llvm-project/issues/78383
2024-02-21 00:42:22 -05:00
Sameer Sahasrabuddhe
a2afcd5721 Revert "Implement convergence control in MIR using SelectionDAG (#71785)"
This reverts commit 79889734b940356ab3381423c93ae06f22e772c9.

Encountered multiple buildbot failures.
2024-02-21 11:07:02 +05:30
Wang Pengcheng
b8ed69ecc0 [RISCV] Support llvm.readsteadycounter intrinsic
This intrinsic was introduced by #81331, which is a lot like
`llvm.readcyclecounter`.

For the RISCV implementation, we rename `ReadCycleWide` pseudo to
`ReadCounterWide` and make it accept two operands (the low and high
parts of the counter). As for legalization and lowering parts, we
reuse the code of `ISD::READCYCLECOUNTER` (make it able to handle
both intrinsics), and we use `time` CSR for `ISD::READSTEADYCOUNTER`.

Tests using Clang builtins are runned on real hardware and it works
as excepted.

Reviewers: asb, MaskRay, dtcxzyw, preames, topperc, jhuber6

Reviewed By: jhuber6, asb, MaskRay, dtcxzyw

Pull Request: https://github.com/llvm/llvm-project/pull/82322
2024-02-21 13:12:14 +08:00
Sameer Sahasrabuddhe
79889734b9
Implement convergence control in MIR using SelectionDAG (#71785)
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:

1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
   control intrinsics.
2. Model token values as untyped virtual registers in MIR.

The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.

The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
2024-02-21 10:06:37 +05:30
Michal Paszkowski
03203b79c6
[SPIR-V] Fix vloadn OpenCL builtin lowering (#81148)
This pull request fixes an issue with missing vector element count
immediate in OpExtInst calls and adds a case for generating bitcasts
before GEPs for kernel arguments of non-matching pointer type. The new
LITs are based on basic/vload_local and basic/vload_global OpenCL CTS
tests. The tests after this change pass SPIR-V validation.
2024-02-20 20:04:04 -08:00
Owen Anderson
c02b0d008c
[GlobalISel] Make sure to check for load barriers when merging G_EXTRACT_VECTOR_ELT into G_LOAD. (#82306)
Fixes https://github.com/llvm/llvm-project/issues/78477
2024-02-20 22:42:14 -05:00
Sumanth Gundapaneni
1219214a3b
[Hexagon] Update InstrInfo to include LD/ST offsets of vector instructions (#82386)
The hook HexagonInstrInfo::isValidOffset() is updated to evaluate
offsets of missed LD/ST vector instructions.
2024-02-20 15:29:05 -06:00
Valery Pykhtin
807ed697be
[AMDGPU] Use autogenerated test checks for sdwa-preserve.mir test. NFC. (#82380) 2024-02-20 20:05:44 +01:00
Yuta Saito
ba3c1f9ce3
[WebAssembly] Add segment RETAIN flag to support private retained data (#81539)
In WebAssembly, we have `WASM_SYMBOL_NO_STRIP` symbol flag to mark the
referenced content as retained. However, the flag is not enough to
express retained data that is not referenced by any symbol. This patch
adds a new segment flag`WASM_SEG_FLAG_RETAIN` to support "private"
linkage data that is retained by llvm.used.

This kind of data that is not referenced but must be retained is usually
used with encapsulation symbols (__start/__stop). Swift runtime uses
this technique and depends on the fact "all metadata sections in live
objects are retained", which was not guaranteed with `--gc-sections`
before this patch.

This is a revised version of https://reviews.llvm.org/D126950 (has been
reverted) based on @MaskRay's comments
2024-02-21 03:35:36 +09:00
Caroline Concatto
48af281f7a Revert "[AArch64] Restore Z-registers before P-registers (#79623)"
This reverts commit 3f0404aae7ed2f7138526e1bcd100a60dfe08227.

std::reverse is breaking some builds
2024-02-20 18:13:33 +00:00
Caroline Concatto
7af70643ca Revert "[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)"
Patch  3f0404aae7ed2 is breaking some debugs build so we cannot use the reverse here.

This reverts commit 493f10106f7f1799eb67be95058b251e6a3bf0af.
2024-02-20 18:13:33 +00:00
Simon Pilgrim
066773c411 [X86] computeKnownBitsForTargetNode - add generic handling of PSHUFB
When PSHUFB is used as a LUT (for CTPOP, BITREVERSE etc.), its the source operand that is constant and the index operand the variable. As long as the indices don't set the MSB (which zeros the output element), then the common known bits from the source operand can be used directly, even though the shuffle mask isn't constant.

Further helps to improve CTPOP reduction codegen
2024-02-20 17:14:49 +00:00
Simon Pilgrim
2f1e33df32 [X86] Fold add(psadbw(X,0),psadbw(Y,0)) -> psadbw(add(X,Y),0)
If the vXi8 add(X,Y) is guaranteed not to overflow then we can push the addition though the psadbw nodes (being used for reduction) and only need a single psadbw node.

Noticed while working on CTPOP reduction codegen
2024-02-20 15:58:29 +00:00
Sander de Smalen
493f10106f
[AArch64] Remove unused ReverseCSRRestoreSeq option. (#82326)
This patch removes the `-reverse-csr-restore-seq` option from
AArch64FrameLowering, since this is no longer used.
2024-02-20 15:08:06 +00:00
stephenpeckham
26db845536
[XCOFF] Support the subtype flag in DWARF section headers (#81667)
The section headers for XCOFF files have a subtype flag for Dwarf
sections. This PR updates obj2yaml, yaml2obj, and llvm-readobj so that
they recognize the subtype.
2024-02-20 08:42:12 -06:00
Shilei Tian
2ad43fa467
[AMDGPU] Fix operand types for V_DOT2_F32_BF16 (#82044) 2024-02-20 08:25:01 -05:00
Krasimir Georgiev
49a8fc0da4 Revert "[Hexagon] Optimize post-increment load and stores in loops. (#82011)"
This reverts commit 0e6a48c3e8cc53f9eb5945ec04f8e03f6d2bae37.

Temporary revert as it causes bad codegen: https://github.com/llvm/llvm-project/pull/82011#issuecomment-1951426107
2024-02-20 12:15:23 +00:00
David Green
1b12974ccb
[AArch64][AMDGPU][GlobalISel] Remove vector handling from unmerge_dead_to_trunc (#82224)
This combine transforms an unmerge where only the first element is used
into a truncate. That works OK for scalar but for vector needs to insert
a bitcast to integers, perform the truncate then bitcast back to
vectors. This generates more awkward code than using an Unmerge.
2024-02-20 10:54:44 +00:00
Thorsten Schütt
63a4b4f610
[GlobalIsel] Combine logic of floating point compares (#81886)
It is purely based on symmetry. Registers can be scalars, vectors, and
non-constants.

X < 5.0 || X > 5.0
  ->
X != 5.0

X < Y && X > Y
  ->
  FCMP_FALSE

X < Y && X < Y
  ->
  FCMP_TRUE

see InstCombinerImpl::foldLogicOfFCmps
2024-02-20 09:56:33 +01:00
Yeting Kuo
61ae7e4982
[RISCV] Select pattern (shl (sext_vl/zext_vl), 1) to VWADD/VWADDU. (#82225)
Previously, we already had similar selection pattern for (shl (ext)) and
(shl_vl (ext_vl)).
2024-02-20 09:23:31 +08:00
Michael Maitland
44a46a0b68
[RISCV][GISEL] Add IRTranslation for insertelement with scalable vector type (#80377)
This patch is stacked on #80372, #80307, and #80306.
2024-02-19 15:30:48 -05:00
Vyacheslav Levytskyy
66ebda46fc
Add support for the SPIR-V extension SPV_KHR_uniform_group_instructions (#82064)
This PR is to add support for the SPIR-V extension
SPV_KHR_uniform_group_instructions that adds new instructions to SPIR-V
to support additional group operations within uniform control flow.
2024-02-19 21:30:31 +01:00
Craig Topper
f8cbb67b10
[DAGCombiner] Preserve nneg flag from inner zext when we combine (z/s/aext (zext X)) (#82199) 2024-02-19 12:21:17 -08:00
Vyacheslav Levytskyy
8e8f9c0bc0
fix generation of unnecessary OpExecutionMode records (#81839)
SPIRV-V Backend generates unnecessary OpExecutionMode records, putting
into the id's which are not the Entry Point operands of an OpEntryPoint
(ref: https://github.com/llvm/llvm-project/issues/81753). This PR is to
fix the issue.
2024-02-19 20:51:32 +01:00
Craig Topper
2426055a64 [RISCV] Add more zext nneg tests. NFC
This adds additional tests for #82199.

These tests need us to propagate the nneg flag when we zero/sign
extend an existing zext nneg node. For these tests on RV64, call
lowering will need to sign extend or zero extend the existing zext
nneg to i64. getNode will fold this into a single zext. We should
propagate the nneg flag from the original zext nneg. This will allow
us to remove the zext nneg based on known sign bits during DAG combine.
2024-02-19 11:09:43 -08:00
Craig Topper
f668a08e00
[DAGCombiner][RISCV] Optimize (zext nneg (truncate X)) if X has known sign bits. (#82227)
This treats the zext nneg as sext if X is known to have sufficient sign
bits to allow the zext or truncate or both to removed. This code is
taken from the same optimization for sext.
2024-02-19 10:45:11 -08:00
Craig Topper
b1849a2c6b [RISCV] Add test cases for missed opportunites to treat a zext nneg as sext. NFC
These tests have a dominating icmp that require an i16 value to be
sign extended to do the compare. Because of this, the i16 will be
exported from the first basic block sign extended to XLen. We can
use this fact to remove the zext nneg in the scond block.
2024-02-19 10:27:32 -08:00
Simon Pilgrim
5bd374df3e [X86] psadbw.ll - add AVX2 target test coverage 2024-02-19 17:04:07 +00:00
Simon Pilgrim
8d8bb35ac3 [X86] Add some basic test coverage for #81765
Test cases demonstrating poor value tracking of PSADBW results
2024-02-19 15:20:09 +00:00
CarolineConcatto
3f0404aae7
[AArch64] Restore Z-registers before P-registers (#79623)
This is needed by PR#77665[1] that uses a P-register while restoring
Z-registers.

The reverse for SVE register restore in the epilogue was added to
guarantee performance, but further work was done to improve sve frame
restore and besides that the schedule also may change the order of the
restore, undoing the reverse restore.

[1]https://github.com/llvm/llvm-project/pull/77665
2024-02-19 13:39:24 +00:00
Vyacheslav Levytskyy
925768eeab
Add support for atomic instruction on floating-point numbers (#81683)
This PR adds support for atomic instruction on floating-point numbers:

* SPV_EXT_shader_atomic_float_add
* SPV_EXT_shader_atomic_float_min_max
* SPV_EXT_shader_atomic_float16_add

and fixes asm printer output for half floating-type.
2024-02-19 12:12:09 +01:00
Momchil Velikov
658e4763a2
[AArch64] Fix wrong condition in canUseAsPrologue (#81878)
Inline stack probing code may need a scratch register, hence basic
blocks where such register is not available cannot be used as prologues.

Checking for an available scratch regidster was incorrectly skipped when
the function uses stack probing.
2024-02-19 10:40:21 +00:00
Tim Northover
0215d2c58b arm64_32: extend @llvm.stackguard call to in-DAG 64-bits before handing off
Pointers are 64-bits in the DAG, so we need to extend the result of loading the
cookie when building the DAG.
2024-02-19 10:32:29 +00:00