52796 Commits

Author SHA1 Message Date
Durgadoss R
340cc1702e
[LLVM][NVPTX]: Add intrinsic for setmaxnreg (#77289)
This patch adds an intrinsic for setmaxnreg PTX instruction.
* PTX Doc link for this instruction:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg

* The i32 argument, an immediate value, specifies the actual
  absolute register count for the instruction.
* The `setmaxnreg` instruction is available in SM90a.
  So, this patch adds 'hasSM90a' predicate to use in
  the NVPTX backend.
* lit tests are added to verify the lowering of the intrinsic.
* Verifier logic (and tests) are added to test the register
  count range and divisibility-by-8 requirements.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-09 12:04:13 -08:00
Fangrui Song
6c207ee5d2
[RISCV] Force relocations if initial MCSubtargetInfo contains FeatureRelax (#77436)
Regarding
```
.option norelax
j label
.option relax
// relaxable instructions
// For assembly input, RISCVAsmParser::ParseInstruction will set ForceRelocs (https://reviews.llvm.org/D46423).
// For direct object emission, ForceRelocs is not set after https://github.com/llvm/llvm-project/pull/73721
label:
```

The J instruction needs a relocation to ensure the target is correct
after linker relaxation. This is related a limitation in the assembler:
RISCVAsmBackend::shouldForceRelocation decides upfront whether a
relocation is needed, instead of checking more information (whether
there are relaxable fragments in between).

Despite the limitation, `j label` produces a relocation in direct object
emission mode, but was broken by #73721 due to the shouldForceRelocation
limitation.

Add a workaround to RISCVTargetELFStreamer to emulate the previous
behavior.

Link: https://github.com/ClangBuiltLinux/linux/issues/1965
2024-01-09 11:24:21 -08:00
Shilei Tian
b629b8662c
[AMDGPU][MC] Use normal ELF syntax for section switching (#77267)
For some reasons `SunStyleELFSectionSwitchSyntax` is set to `true` for
AMDGPU, but according to
https://github.com/llvm/llvm-project/issues/64862#issuecomment-1880419239
that syntax is only limited to Sun system.

Fix #64862.
2024-01-09 14:13:42 -05:00
Simon Pilgrim
3210ce2763 [X86] Fold (iX bitreverse(bitcast(vXi1 X))) -> (iX bitcast(shuffle(X)))
X86 doesn't have a BITREVERSE instruction, so if we're working with a casted boolean vector, we're better off shuffling the vector instead if we have PSHUFB (SSSE3 or later)

Fixes #77459
2024-01-09 19:06:32 +00:00
Simon Pilgrim
417df8ee4a [X86] Add test coverage for #77459 2024-01-09 18:55:05 +00:00
Fangrui Song
f972e4d343 [MC,ELF] .section: unconditionally print section flag 'G' after 'o'
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
  argument goes before the section group name, if a reader doesn't know
  that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
  them with 'G' is kinda ugly.

Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
2024-01-09 10:48:23 -08:00
Fangrui Song
7620f03ef7
[MC] Parse SHF_LINK_ORDER argument before section group name (#77407)
When both SHF_LINK_ORDER | SHF_GROUP flags are set, GNU assembler from
2.35 onwards (https://sourceware.org/PR25381
https://sourceware.org/binutils/docs/as/Section.html) parses the
SHF_LINK_ORDER argument before section group name, different from us.

This is unfortunate, but does not matter because the `.section` flag `o`
is a niche feature only used by compiler instrumentations, not adopted
by hand-written assembly, and using both flags is extremely rare. Let's
just match GNU assembler. There is another benefit: we now support
zero-flag section group with the SHF_LINK_ORDER flag, while previously
there isn't a syntax.

While here, print 'G' after 'o' to be clear that the 'G' argument is
parsed after the 'o' argument. To make the diff smaller, we don't print
'G' after 'w' in the absence of 'o' for now.
2024-01-09 10:42:34 -08:00
Matt Arsenault
888a20c466
AMDGPU: Drop amdgpu-no-lds-kernel-id attribute in LDS lowering (#71481)
This is in preparation for moving the run of AMDGPUAttributor earlier.
Currently it infers the lack of the corresponding intrinsic calls,
so if we introduce new ones we need to remove the attribute from any
possible transitive callers. This is more conservative than necessary,
we could try to identify specific subgraphs where LDS globals are not
used.

Other options include teaching the attributor to avoid adding it in
cases
where the lowering may choose the table, but this seems more complex.
Alternatively could add a second run which doesn't seem worth it.

Depends #71349
2024-01-10 00:12:40 +07:00
HaohaiWen
a2dba0c977
[SEH][CodeGen] Add test to track CFG optimization bug for SEH (#77441)
LiveDebugValues requires CFG only has one entry. BranchFolding and
MachineBlockPlacement may remove all predecessors of landing pad which
leaves it to be another entry.
2024-01-09 22:30:13 +08:00
wanglei
98c6aa7229
[LoongArch] Implement LoongArchRegisterInfo::canRealignStack() (#76913)
This patch fixes the crash issue in the test:
CodeGen/LoongArch/can-not-realign-stack.ll

Register allocator may spill virtual registers to the stack, which    
introduces stack alignment requirements (when the size of spilled     
    registers exceeds the default alignment size of the stack). If a  
function does not have stack alignment requirements before register   
allocation, registers used for stack alignment will not be preserved. 

Therefore, we should implement `canRealignStack()` to inform the      
register allocator whether it is allowed to perform stack realignment 
operations.
2024-01-09 20:35:49 +08:00
wanglei
f499472de3 [LoongArch] Pre-commit test for #76913. NFC
This test will crash with expensive check.

Crash message:
```
*** Bad machine code: Using an undefined physical register ***
- function:    main
- basic block: %bb.0 entry (0x20fee70)
- instruction: $r3 = frame-destroy ADDI_D $r22, -288
- operand 1:   $r22
```
2024-01-09 20:32:20 +08:00
Saiyedul Islam
4f7c402d9f
[AMDGPU][NFC] Update left over tests for COV5 (#76984)
Update AMDGPU CodeGen lit tests to check for COV5 ABI.
2024-01-09 17:31:42 +05:30
Matt Arsenault
9be29ad48c AMDGPU: Regenerate test checks
Fix test failures after auto-merge of f9fec402896a90f3b09cea359c330f65a0908649
2024-01-09 17:56:27 +07:00
Matt Arsenault
daecc303bb
AMDGPU: Replace sqrt OpenCL libcalls with llvm.sqrt (#74197)
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.

Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
2024-01-09 15:13:58 +07:00
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Shengchen Kan
38ce770ef1 [X86][test] Add test to check ah is not allocatable for register class gr8_norex2
This test should be added after #73529
2024-01-09 14:28:38 +08:00
Chia
0c24c175f2
[RISCV][ISel] Use vaaddu with rounding mode rdn for ISD::AVGFLOORU. (#76550)
This patch aims to use `vaaddu` with rounding mode rdn (i.e `vxrm[1:0] =
0b10`) for `ISD::AVGFLOORU`.

### Source code 
```
define <8 x i8> @vaaddu_auto(ptr %x, ptr %y, ptr %z) {
  %xv = load <8 x i8>, ptr %x, align 2
  %yv = load <8 x i8>, ptr %y, align 2
  %xzv = zext <8 x i8> %xv to <8 x i16>
  %yzv = zext <8 x i8> %yv to <8 x i16>
  %add = add nuw nsw <8 x i16> %xzv, %yzv
  %div = lshr <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %ret = trunc <8 x i16> %div to <8 x i8>
  ret <8 x i8> %ret 
}
```

### Before this patch 
```
vaaddu_auto: 
        vsetivli        zero, 8, e8, mf2, ta, ma
        vle8.v  v8, (a0)
        vle8.v  v9, (a1)
        vwaddu.vv       v10, v8, v9
        vnsrl.wi        v8, v10, 1
        ret
```
### After this patch 
```
vaaddu_auto: 
	vsetivli	zero, 8, e8, mf2, ta, ma
	vle8.v	v8, (a0)
	vle8.v	v9, (a1)
	csrwi	vxrm, 2
	vaaddu.vv	v8, v8, v9
	ret
```

### Note on signed averaging addition

Based on the rvv spec, there is also a variant for signed averaging
addition called `vaadd`.
But AFAIU, no matter in which rounding mode, we cannot achieve the
semantic of signed averaging addition through `vaadd`.
Thus this patch only introduces `vaaddu`.
2024-01-09 15:17:38 +09:00
Craig Topper
a8e9dceb49
[RISCV] Use getELen() instead of hardcoded 64 in lowerBUILD_VECTOR. (#77355)
This is needed to properly support Zve32x.
2024-01-08 19:36:15 -08:00
James Y Knight
b856e77b2d
Set MaxAtomicSizeInBitsSupported for remaining targets. (#75703)
Targets affected:

- NVPTX and BPF: set to 64 bits.
- ARC, Lanai, and MSP430: set to 0 (they don't implement atomics).

Those which didn't yet add AtomicExpandPass to their pass pipeline now
do so.

This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass. On all these targets, this
now matches what Clang already does in the frontend.

The only targets which do not configure AtomicExpandPass now are:
- DirectX and SPIRV: they aren't normal backends.
- AVR: a single-cpu architecture with no privileged/user divide, which
could implement all atomics by disabling/enabling interrupts, regardless
of size/alignment. Will be addressed by future work.
2024-01-08 22:34:28 -05:00
Jim Lin
96c4f1034c
[RISCV] Add support predicating for ANDN/ORN/XNOR with short-forward-branch-opt. (#77077)
ANDN/ORN/XNOR are like other ALU instructions. It should be able to be
predicated by the cpu that supports short-forward-branch.
2024-01-09 11:12:44 +08:00
Usman Nadeem
ac8b4f8749
[AArch64][SVE2] Add pattern for BCAX (#77159)
Bitwise clear and exclusive or
Add pattern for:
    xor x, (and y, not(z)) -> bcax x, y, z
2024-01-08 15:51:33 -08:00
Craig Topper
faa326de97
[RISCV] Add branch+c.mv macrofusion for sifive-p450. (#76169)
sifive-p450 supports a very restricted version of the short forward
branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a
conditional move operation. Due to encoding restrictions on c.mv, we
can't conditionally move from X0. That would require c.li instead.
2024-01-08 15:23:26 -08:00
Jay Foad
daa4728dee
[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825) 2024-01-08 19:13:38 +00:00
Min-Yih Hsu
478ec63312
[RISCV] Mark VFIRST and VCPOP as SignExtendingOpW (#77022)
Since their values are small enough ([-1, 65535] & [0, 65535],
respectively) to fit into signed 32 bits, any sext (or downcasting +
sext) will be redundnat. Hence marking them as SignExtendingOpW.
2024-01-08 10:59:06 -08:00
Min-Yih Hsu
4c66180e46 [RISCV] Use COPY to create artificial 64-bit uses in RISCVOptWInstrs's tests
In reflection of 4dd5d967975fa8d52b8c60596d892d9dd5615809, we can now
use COPY to physical registers to create artificial 64-bit uses to
prevent RISCVOptWInstrs from optimizing away sext in absent of the
IsSignExtendingOpW flag.

NFCI.
2024-01-08 10:03:32 -08:00
Simon Pilgrim
d460c1de3b
[DAG] SimplifyDemandedBits - don't fold sext(x) -> aext(x) if we lose an 0/-1 allsignbits mask (#77296)
For targets that use 0/-1 boolean results, we want to keep this pattern through extensions/truncations as much as possible - so avoid simplifying to any_extend even if we don't demand the upper bits.

Noticed in triage for https://reviews.llvm.org/D152928
2024-01-08 18:01:41 +00:00
Simon Pilgrim
52ebf61bac [X86] ftrunc.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.

Add common AVX check prefix for 32/64 bit test coverage
2024-01-08 17:25:44 +00:00
Simon Pilgrim
fbfc9cb7ea [X86] vector-shuffle-mmx.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.

Add nounwind to remove cfi noise as well.
2024-01-08 17:25:44 +00:00
Simon Pilgrim
9632f98716 [X86] legalize-shl-vec.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.

Add nounwind to remove cfi noise as well.
2024-01-08 17:25:44 +00:00
Simon Pilgrim
635f6d3845 [X86] inline-sse.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-08 17:25:44 +00:00
Simon Pilgrim
8bd16789ff [X86] lea-2.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only (although in this case the gnux32 tests share the X64 checks)
2024-01-08 17:25:43 +00:00
Simon Pilgrim
61dcfaa745 [X86] i64-mem-copy.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.

Add nounwind to remove cfi noise as well.
2024-01-08 17:25:43 +00:00
Simon Pilgrim
f3f6677311 [X86] combine-bextr.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.

Add nounwind to remove cfi noise as well.
2024-01-08 17:25:43 +00:00
Amara Emerson
ff47989ec2 [AArch64][GlobalISel] Allow anyexting loads from 32b -> 64b to be legal.
We can already support selection of these through imported patterns, we were
just missing the legalizer rule to allow these to be formed.

Nano size benefit overall.
2024-01-08 08:37:47 -08:00
Mirko Brkušanin
7ca4473dd9
[AMDGPU] Add new cache flushing instructions for GFX12 (#76944)
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2024-01-08 14:06:58 +00:00
Simon Pilgrim
0e4a38018a [X86] avx2-nontemporal.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-08 12:37:34 +00:00
Simon Pilgrim
f1e3a8f1eb [X86] avx2-gather.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-08 12:37:06 +00:00
Simon Pilgrim
e3f8e44b00 [X86] vector-lzcnt-256.ll / vector-tzcnt-256.ll - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-08 12:35:48 +00:00
Simon Pilgrim
eb523a4d27 [X86] vec_extract - replace X32 checks with X86. NFC.
We try to use X32 for gnux32 triples only.
2024-01-08 12:33:47 +00:00
Matt Arsenault
bdbaf6e61b
AMDGPU: Make v8bf16/v16bf16 legal types (#76678)
Depends #76217
2024-01-08 18:59:01 +07:00
Nathan Gauër
a9ffc92fc4
[SPIR-V] Add pre-headers to loops. (#75844)
This is the first of the 7 steps outlined in #75801. This PR explicitely
calls the SimplifyLoops pass. Directly following this pass should follow
the 6 others required to structurize the IR.

Running this pass could generate empty basic-blocks, which are implicit
fallthrough to the successor BB.
There was a specific condition in the SPIR-V ISel which handled implicit
fallthrough, but it couldn't work on empty basic-blocks. This commits
removes the old logic, and adds this new logic, which checks all
basic-blocks for implicit fallthroughs, including empty ones.

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2024-01-08 11:41:45 +01:00
Paschalis Mpeis
68a1583a89
[TLI] replace-with-veclib works with FRem Instruction. (#76166)
Updated SLEEF and ArmPL tests with Fixed-Width and Scalable cases for
frem. Those are mapped to fmod/fmodf.
2024-01-08 08:53:15 +00:00
Shengchen Kan
1c674666fa
[X86] Support EVEX compression for EGPR (#77202)
Compress promoted instruction (EVEX) to pre-promotion instruction
(legacy/VEX) when R16-R31 is not used.

Alternative of #77065
2024-01-08 16:50:23 +08:00
Amara Emerson
ca20c99bb1
[GlobalISel][IRTranslator] Port switch binary tree search optimization. (#77279)
This re-uses some code extracted earlier from SelectionDAG into
SwitchLoweringUtils

Much of the code is a straight port from SDAG's splitWorkItem(), with
minor changes needed for GISel.
2024-01-08 15:53:09 +08:00
Amara Emerson
9de81ce87d NFC: Another pre-commit test change. 2024-01-07 23:22:14 -08:00
Amara Emerson
fe1364f1e7 Update pre-committed test. Accidentally committed the wrong version, this one
properly demonstrates the upcoming change.
2024-01-07 22:53:48 -08:00
Amara Emerson
624b48789f [AArch64][NFC] Pre-commit IR translator switch lowering test. 2024-01-07 22:14:50 -08:00
Kai Luo
225e2704af [PowerPC] Precommit test for lowering llvm.trap on ppc64le. NFC. 2024-01-08 10:20:01 +08:00
Chen Zheng
d6aef863d8
[PowerPC] make LR/LR8 CTR/CTR8 aliased (#76926)
fixes https://github.com/llvm/llvm-project/issues/47156 
fixes https://github.com/llvm/llvm-project/issues/47155
2024-01-08 09:37:40 +08:00
Fangrui Song
360996ac5a
[RISCV] Merge machine operand flag MO_PLT into MO_CALL (#77253)
Since #72467, `@plt` in assembly output "call foo@plt" is omitted. We
can trivially merge MO_PLT and MO_CALL without any functional change to
assembly/relocatable file output.

Earlier architectures use different call relocation types whether a PLT
is potentially needed: R_386_PLT32/R_386_PC32, R_68K_PLT32/R_68K_PC32,
R_SPARC_WDISP30/R_SPARC_WPLT320. However, as the PLT property is
per-symbol instead of per-call-site and linkers can optimize out a PLT,
the distinction has been confusing.

Arm made good names R_ARM_CALL/R_AARCH64_CALL. Let's use MO_CALL instead
of MO_PLT.

As follow-ups, we can merge fixup_riscv_call/fixup_riscv_call_plt and
VK_RISCV_CALL/VK_RISCV_CALL_PLT.
2024-01-07 12:43:39 -08:00