52796 Commits

Author SHA1 Message Date
Alex Bradbury
1cffd26483 [TargetLowering][RISCV] Improve codegen for saturating bf16 to int conversion
Extending to f32 first (as is done for f16) results in better generated
code for RISC-V (and affects no other in-tree tests). Additionally,
performing the FP_EXTEND first seems equally justified for bf16 as for
f16.

Differential Revision: https://reviews.llvm.org/D156944
2023-08-07 11:21:25 +01:00
Alex Bradbury
7a1b2adc45 [RISCV] Implement straight-forward bf16<->int conversion cases
This ports over the test cases half-convert.ll and implements patterns
or RISCVISelLowering.cpp changes for all of the most straight-forward
cases (those that don't require changes outside of lib/Target/RISCV).
The remaining cases and noted poor codegen for saturating conversions
will be handled in follow-up patches.

Differential Revision: https://reviews.llvm.org/D156943
2023-08-07 11:12:51 +01:00
Jingu Kang
f580901d5d [MachineCSE] Add an option to override the profitability heuristics
Differential Revision: https://reviews.llvm.org/D157002
2023-08-07 10:06:02 +01:00
Simon Pilgrim
9d3b19e8e9 [X86] ReplaceNodeResults - relax the value type constraints for TRUNCATE widening
With SSSE3, widen the truncation for anything other than vXi64 -> vXi8 smaller than v8i64 (where PSHUFB would be better).
2023-08-07 09:41:38 +01:00
Yashwant Singh
3dc413e25d [AMDGPU] Skip debug instruction uses while optimizing live range of a reg in SIOptimizeVGPRLiveRange
This will prevent the `assert(!O.readsReg())` from firing in
SIOptimizeVGPRLiveRange::optimizeLiveRange

Fix for #64163

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D156893
2023-08-07 11:35:39 +05:30
David Green
ffc5ed976a [AArch64][GISel] Expand handling for G_FABS to more vector types.
This now reuses the existing lowering for G_FMIN/MAX for G_FABS too, which can
handle more type successfully. We can hopefully reuse the same pattern action
definition for other fp operations too.
2023-08-06 14:58:25 +01:00
David Green
0e757122a1 [AArch64][GISel] Expand lowering for fminimum and fmaximum
This replicates the G_FMINNUM and G_FMAXNUM lowering to G_FMINIMUM and
G_FMAXIMUM, reusing the same action definition for lowering.
2023-08-06 14:36:52 +01:00
David Green
6df2c2b4a2 [AArch64] Add a more extensive fabs test. NFC
Now covers gisel as well as selection dag, and more types are tested. The
existing tests for combines to fabs are moved to fabs-combine.ll.
2023-08-06 14:02:57 +01:00
Simon Pilgrim
ce2ec06516 [X86] Only fold broadcast with extract_vector_elt/scalar_to_vector if the scalar type matches the vector element type
Avoid handling implicit extension/truncation with scalar<->vector transfers

Fixes #64439
2023-08-05 16:01:22 +01:00
Simon Pilgrim
ef4330f4f3 [X86] truncateVectorWithPACK - handle vector truncations to sub-64-bit vector widths
Extend the existing 128-bit -> 64-bit truncation handling by widening/narrowing the src/dst vectors and use the lower half operand/result for PACKSS/PACKUS instructions.
2023-08-05 16:01:22 +01:00
Stanislav Mekhanoshin
0c7e8c06bc [AMDGPU] Change syncscopes.mir not to use undefined cpol bits. NFC. 2023-08-04 11:19:12 -07:00
Simon Pilgrim
e22908692c [X86] ReplaceNodeResults - widen sub-128-bit vector truncations if it would allow them to use PACKSS/PACKUS
We currently just scalarize sub-128-bit vector truncations, but if the input vector has sufficient signbits/zerobits then we should try to use PACKSS/PACKUS with a widened vector with don't care upper elements. Shuffle lowering will struggle to detect this if we wait until the scalarization has been revectorized as a shuffle.

Another step towards issue #63710
2023-08-04 17:36:19 +01:00
Craig Topper
814250191d [RISCV] Add vector legalization for fmaximum/fminimum.
Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D156937
2023-08-04 08:07:14 -07:00
Ben Shi
a133fb289a [CSKY][NFC] Fix broken tests in eac78fdf68f58e113b2cf18a14baccb8f5ebcf50 2023-08-04 22:01:44 +08:00
David Green
bbe945b8a1 [AArch64][GISel] Expand G_DUP and G_DUPLANE to v8s8 and v4s16
This fills in the gaps with v8s8 and v4s8 vectors for G_DUP and G_DUPLANE,
using the existing code that is generalized to more types.
2023-08-04 12:43:53 +01:00
Vladislav Dzhidzhoev
19d7ab14ec [GlobalISel] Handle sequences of trunc(sext/zext/anyext...) in artifact combiner
trunc(sext/zext/anyext... x) -> x pattern is handled in artifact combiner to avoid
extra copy instructions in https://reviews.llvm.org/D156831.
2023-08-04 13:29:49 +02:00
Jay Foad
34ffc30a90 [AMDGPU] Fix typo in comment in test 2023-08-04 11:04:54 +01:00
Ben Shi
eac78fdf68 [CSKY][test][NFC] Add tests of conditional branch and value select
These tests will be optimzied with BTSTI16/BTSTI32
in the future.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154767
2023-08-04 16:10:57 +08:00
Ben Shi
528831dd1a [CSKY] Optimize ANDI/ORI to BSETI/BCLRI for specific immediates
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153614
2023-08-04 16:10:36 +08:00
David Green
f3b9b94a8b [AArch64][GISel] Expand arm64-dup and arm64-rev tests for global isel. NFC 2023-08-04 09:06:47 +01:00
Craig Topper
40f3708205 [RISCV] Add a test case that would have failed before D156974. NFC
Tweak the immediate on two vror.vi test cases to use a uimm6 immediate
that would have failed before D156974 when we were looking for a simm6
immediate.
2023-08-03 11:23:55 -07:00
Craig Topper
a8c502a589 [RISCV] Add bf16 to isFPImmLegal.
Part of this test file was stolen from D156895. We should merge them
when committing.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156926
2023-08-03 08:27:38 -07:00
pvanhout
62ea799e6c [AMDGPU] Break Large PHIs: Take whole PHI chains into account
Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain".
The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector.
We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations.

The solution I found is to consider the whole "PHI chain" when looking at PHI.
That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole.

The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value.
In simple terms, two-thirds (rounded up) of the PHIs should be breakable.

This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue.

Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156414
2023-08-03 16:41:11 +02:00
Tamir Duberstein
055893beac
[BPF] Don't crash on missing line info
When compiling Rust code we may end up with calls to functions provided
by other code units. Presently this code crashes on a null pointer
dereference - this patch avoids that crash and adds a test.

Reviewed By: ast

Differential Revision: https://reviews.llvm.org/D156446
2023-08-03 09:18:12 -04:00
Piyou Chen
05041b78a7 [RISCV] emit .option directive for functions with target features which differ from module default
When function has different attributes from module, emit the .option <attribute> before the function body.  This allows non-integrated assemblers to properly assemble the functions (which may contain instructions dependent on the extra target features).

Reviewed By: craig.topper, reames

Differential Revision: https://reviews.llvm.org/D155155
2023-08-03 04:22:39 -07:00
Oliver Stannard
f2e7285b03 [AArch64][PtrAuth] Fix unwind state for tail calls
When generating unwind tables for code which uses return-address
signing, we need to toggle the RA_SIGN_STATE DWARF register around any
tail-calls, because these require the return address to be authenticated
before the call, and could throw an exception. This is done using the
.cfi_negate_ra_state directive before the call, and .cfi_restore_state
at the start of the next basic block.

However, since D153098, the .cfi_restore_state isn't being inserted,
because the CFIFixup pass isn't being run. This re-enables that pass
when return-adress signing is enabled.

Reviewed By: ikudrin, MaskRay

Differential Revision: https://reviews.llvm.org/D156428
2023-08-03 11:45:51 +01:00
Jay Foad
0da19a2be5 [PEI][WebAssembly] Switch to backwards frame index elimination
Backwards frame index elimination uses backwards register scavenging,
which is preferred because it does not rely on accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156691
2023-08-03 10:21:43 +01:00
Simon Pilgrim
7f9b94c044 [X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr
Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic
2023-08-03 10:20:20 +01:00
Jim Lin
a2938ba707 [RISCV] Add tests that m extension enabled in extractelt-int-rv64.ll. NFC.
It has been added in extractelt-int-rv32.ll.
2023-08-03 15:34:44 +08:00
Yeting Kuo
f68c6879ad [RISCV] Use max pushed register to get pushed register number.
Previously we used the number of registers needed saved and pushable as the
number of pushed registers. We also use pushed register number to caculate
the stack size. It is not correct because Zcmp pushes registers from $ra to the
max register needed saved and there is no gurantee that the needed saved
registers are a sequenced list from $ra.

There is an example about that. PushPopRegs should be 6 (ra,s0 - s4)= instead of 1.
```
; llc -mtriple=riscv32 -mattr=+zcmp
define void @foo() {
entry:
; Old:    .cfi_def_cfa_offset 16
; New:    .cfi_def_cfa_offset 32
  tail call void asm sideeffect "li s4, 0", "~{s4}"()
  ret void
}
```

Reviewed By: Jim, kito-cheng

Differential Revision: https://reviews.llvm.org/D156407
2023-08-03 14:49:15 +08:00
Alex Bradbury
8a71f44e00 [RISCV] Expand test coverage of bf16 operations with Zfbfmin and fix gaps
This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test
cases, it simply picks off an initial set that can be supported
especially easy. In order to make the review more manageable, I'll
follow up with other cases.

There is zero innovation in the test cases - they simply take the
existing half/float cases and replace f16->bf16 and half->bfloat.

Differential Revision: https://reviews.llvm.org/D156895
2023-08-03 07:06:57 +01:00
Bing1 Yu
6ee497aa0b [X86][Regcall] Add an option to respect regcall ABI v.4 in win64&win32
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D155863
2023-08-03 13:58:33 +08:00
Jim Lin
40cc106fa0 [RISCV] Scalarize binop followed by extractelement to custom lowered instruction
isOperationLegalOrCustomOrPromote returns true only if VT is other or legal
and operation action is Legal, Custom or Promote.
Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type.
One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering,
so vadd for element type i32 can be converted to addw.

Reviewed By: jacquesguan, craig.topper

Differential Revision: https://reviews.llvm.org/D156692
2023-08-03 13:02:49 +08:00
Craig Topper
c1c5da8f1f [RISCV] Merge fp-imm.ll and zfh-imm.ll into float/double/half-imm.ll. NFC
fp-imm.ll and zfh-imm.ll test 0.0 and -0.0 while float/double/half-imm.ll
tested other non-zero constants. It seems like they should all be
tested together.

There are slight coverage changes due to different command lines,
but I'm not sure its meaningful. For example, we now don't test
double 0.0 and -0.0 with only the F extension.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156929
2023-08-02 20:16:50 -07:00
Yeting Kuo
cd79599304 [RISCV] Teach lowerScalarInsert to handle scalar value is the first element of a fixed vector.
D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0)
and specifically converts fixed extracted vectors to scalable vectors when
lowering vector reduction. It's not enough because there is another way to
create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert
like #64327.

#64327: https://github.com/llvm/llvm-project/issues/64327.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156863
2023-08-03 10:53:14 +08:00
Phoebe Wang
4d6f4c9c93 [X86] Special handle for v1i1 during ExtractBitFromMaskVector
Fixes #64322

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D156855
2023-08-03 09:50:31 +08:00
Luke Lau
0834355227 [RISCV] Add VP patterns for vwsll.[vv,vx,vi]
This patch adds patterns for the existing riscv_shl_vl VL node.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156915
2023-08-03 00:43:13 +01:00
Matt Arsenault
54bda79335 AMDGPU: Simplify and improve sincos matching
The first trivial example I tried failed to merge due to the user scan
logic. Remove the complicated scan of users handling with distance
thresholds, with a same block restriction. The actual expansion of
sincos is basically the same size as sin or cos individually. Copy the
technique the generic optimization uses, which is to just use the
input instruction as the insert point or just insert at the start of
the entry block.

https://reviews.llvm.org/D156706
2023-08-02 17:48:35 -04:00
Philip Reames
660b740e4b [DAG] Support store merging of vector constant stores
Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged.

Differential Revision: https://reviews.llvm.org/D156349
2023-08-02 14:41:46 -07:00
Alex Bradbury
667602793b [RISCV] Implement support for bf16 select when zfbfmin is enabled
These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers.

Differential Revision: https://reviews.llvm.org/D156883
2023-08-02 20:04:30 +01:00
4vtomat
346c1f2641 [RISCV] Support vector crypto extension LLVM IR
Depends on D141672

Differential Revision: https://reviews.llvm.org/D138809
2023-08-02 10:25:36 -07:00
Philip Reames
fe4c99d1d6 [RISCV] Add test case showing CSE regression from issue 64282 2023-08-02 09:12:46 -07:00
Matt Arsenault
b953155b49 AMDGPU: Fix counting debug instructions in execz skip threshold 2023-08-02 08:09:41 -04:00
Mirko Brkusanin
acdc503d6c [AMDGPU][GlobalISel] Update applyMappingImpl for G_ABS and type v2s16
For G_ABS with type v2s16 and sgpr inputs break down into two s32 G_ABS
instructions.

Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D155867
2023-08-02 12:27:06 +02:00
Mirko Brkusanin
fadf3e7f2b [AMDGPU][GlobalISel] Update legalizer for G_ABS, G_SMIN, G_SMAX, G_UMIN, G_UMAX
There is no need to increase the size of odd sized vectors if they are
going to be scalarized by a different rule.

Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D155865
2023-08-02 12:18:18 +02:00
Alex Bradbury
8acb8a143f [RISCV] Make Zcf and Zcd imply the F and D extensions respectively
This was an omission in the spec that has now been addressed
https://github.com/riscv/riscv-code-size-reduction/pull/224.

Differential Revision: https://reviews.llvm.org/D156314
2023-08-02 10:40:38 +01:00
Alex Bradbury
be0dac268d [RISCV] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}'
As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's
more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand
with appropriate mask. There are a range of possible ways to go about
this - e.g. writing a combine based on the
`llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new
interface to AtomicExpandPass to allow target-specific atomics
conversions, or trying to lift the conversion into AtomicExpandPass
itself based on querying some target hook. Ultimately I've gone with
what appears to be the simplest approach - just covering this case in
emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a
different name way back when it was introduced.

This also handles the `atomicrmw xchg a, -1` case suggested by Craig
during review.

Fixes https://github.com/llvm/llvm-project/issues/64090

Differential Revision: https://reviews.llvm.org/D156801
2023-08-02 09:48:50 +01:00
Jay Foad
c2093b8504 [AMDGPU] Add target features for GDS and GWS
GFX9 subtargets from GFX90A onwards lack GDS but still have GWS.

Differential Revision: https://reviews.llvm.org/D156713
2023-08-02 09:02:07 +01:00
Jay Foad
8f973d5c45 [DebugInfo] Fix crash when printing malformed DBG machine instructions
MachineVerifier does not check that DBG_VALUE, DBG_VALUE_LIST and
DBG_INSTR_REF have the expected number of operands, so printing them
(e.g. with -print-after-all) should not crash.

Differential Revision: https://reviews.llvm.org/D156226
2023-08-02 08:28:20 +01:00
Jim Lin
d6a48a348a [RISCV] Fix the CFI offset for callee-saved registers stored by Zcmp push.
Issue mentioned: https://github.com/riscv/riscv-code-size-reduction/issues/182

The order of callee-saved registers stored by Zcmp push in memory is reversed.

Pseudo code for cm.push in https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-1/Zc.1.0.4-1.pdf

```
if (XLEN==32) bytes=4; else bytes=8;

addr=sp-bytes;
for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1)  {
  //if register i is in xreg_list
  if (xreg_list[i]) {
    switch(bytes) {
      4:  asm("sw x[i], 0(addr)");
      8:  asm("sd x[i], 0(addr)");
    }
    addr-=bytes;
  }
}
```

The placement order for push is s11, s10, ..., ra.

CFI offset should be calculed as reversed order for correct stack unwinding.

Reviewed By: fakepaper56, kito-cheng

Differential Revision: https://reviews.llvm.org/D156437
2023-08-02 13:03:21 +08:00