52796 Commits

Author SHA1 Message Date
Kai Luo
56414220df
[PowerPC] Use 'sync; ld; cmp; bc; isync' for atomic load seq-cst on 32-bit platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.

64-bit platform already features it, now implement it for 32-bit
platform.
2023-12-20 10:01:02 +08:00
Jeffrey Byrnes
f1156fb622
[AMDGPU][IGLP]: Add SchedGroupMask::TRANS (#75416)
Makes constructing SchedGroups of this type easier, and provides ability
to create them with __builtin_amdgcn_sched_group_barrier
2023-12-19 16:54:18 -08:00
Craig Topper
05abe8a7e8
[RISCV] Remove Zfbfmin dependency from Zvfbfmin. (#75851)
Zvfbfmin does not have any scalar operands making this an unnecessary
dependency. The spec was just updated to remove this. See
86d7a74f4b

This fixes a correctness issue where Xsfvfwmaccqqq was incorrectly
depending on Zfbfmin. The SiFive CPUs that support Xsfvfwmaccqqq do not
implement Zfbfmin, but do implement Zvfbfmin based on a previous
understanding that it only requires Zve32f. I've added tests for this
feature to raise the bar for adding dependencies to it in the future.
2023-12-19 15:07:38 -08:00
Yusra Syeda
0768253c20
[SystemZ][z/OS] Add exception handling for XPLINK (#74638)
Adds emitting the exception table and the EH registers for XPLINK.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-12-19 13:58:33 -05:00
Michael Maitland
571d151dec
[RISCV][MISched] Set EnableIntervals to true for SiFive7 (#75681)
The SiFive7 scheduler model has been using AcquireAtCycles and
ReleaseAtCycles for some time. Without EnableIntervals, the scheduler
was not making decisions based on this information. This patch sets
EnableIntervals to true, and the test case demonstrates that the VADD
instructions can be issued one cycle earlier since the VCQ is not
reserved. This leads to better saturation of the SiFive7VA.
2023-12-19 11:03:03 -05:00
Jonas Paulsson
e32e147d6c
[DAGCombiner] Don't drop alignment info of original load. (#75626)
Pass the original MMO instead of different individual values.

getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
2023-12-19 16:30:47 +01:00
Rin
0894c2ee5f
[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792)
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
2023-12-19 15:25:38 +00:00
Antonio Frighetto
9aeb3336fd [AArch64] Ensure SplatBitSize conforms with the original lane width
A miscompilation issue has been addressed with improved checking.

Fixes: https://github.com/llvm/llvm-project/issues/75822.
2023-12-19 16:03:56 +01:00
Kerry McLaughlin
e9af57dfea
[Clang][SME2] Add builtins for moving multi-vectors to/from ZA (#71191)
Adds the following SME2 builtins:
 - svread_hor/ver,
 - svwrite_hor/ver,
 - svread_za64,
 - svwrite_za64

See https://github.com/ARM-software/acle/pull/217
2023-12-19 13:51:10 +00:00
Matt Arsenault
1196975286 AMDGPU: Add gfx11 run line to bf16 test 2023-12-19 17:12:52 +07:00
Mariusz Sikora
a018c8cdbb
GFX12: Add LoopDataPrefetchPass (#75625)
It is currently disabled by default. It will need experiments on a real
HW to tune and decide on the profitability.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-19 08:32:16 +01:00
Eric Biggers
09058654f6
[RISCV] Remove experimental from Vector Crypto extensions (#74213)
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
2023-12-18 22:04:22 -08:00
James Y Knight
137f785fa6
[AMDGPU] Set MaxAtomicSizeInBitsSupported. (#75185)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
2023-12-18 16:51:06 -05:00
Justin Bogner
4f54d71501
[HLSL][DirectX] Move handling of resource element types into the frontend
Rather than shepherding a type name all the way to the backend as a
string and attempting to parse it, get the element type out of the AST
and store that in the resource annotation metadata directly.

Pull Request: https://github.com/llvm/llvm-project/pull/75674
2023-12-18 11:43:52 -07:00
Simon Pilgrim
7b1e4239b3
[DAG] Fold (vt trunc (extload (vt x))) -> (vt load x) (#75229)
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).

reduceLoadWidth can handle this for scalar loads, but not for vectors.

Noticed while triaging D152928
2023-12-18 16:21:11 +00:00
Nathan Sidwell
d0285a31c8
aarch64: fix testcase (#75723)
Add missing < %s to RUN line.
2023-12-18 11:02:44 -05:00
Momchil Velikov
fd527def7e
[Clang][SVE2.1] Add floating-point variants of svrevd_XX (#75117) 2023-12-18 15:52:28 +00:00
Ulrich Weigand
82a1bffd34
[SelectionDAG] Do not crash on large integers in CheckInteger (#75787)
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.

This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".

These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.

Fix by using trySExtValue instead.

Fixes https://github.com/llvm/llvm-project/issues/75710
2023-12-18 14:03:57 +01:00
Serge Pavlov
2f81788067
[ARM][FPEnv] Lowering of fpmode intrinsics (#74054)
LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate
control modes, the bits of FP environment that affect FP operations. On
ARM these bits are in FPSCR together with the status bits. The
implementation of these intrinsics produces code close to that of
functions `fegetmode` and `fesetmode` from GLIBC.

Pull request: https://github.com/llvm/llvm-project/pull/74054
2023-12-18 18:57:36 +07:00
Ulrich Weigand
a00c4220be [SystemZ] Fix complex address matching when i128 is legal
Complex address matching currently handles truncations, under
the assumption that those are no-ops.  This is no longer true
when i128 is legal.  Change the code to only handle actual
no-op truncations.

Fixes https://github.com/llvm/llvm-project/issues/75708
Fixes https://github.com/llvm/llvm-project/issues/75714
2023-12-18 12:47:45 +01:00
Yeting Kuo
b83b28779e
[RISCV] Make Zhinx and Zvfh imply Zhinxmin and Zvfhmin respectively (#75735)
Zhinxmin is a subset of Zhinx and Zvfhmin is also a subset of Zvfh.
2023-12-18 11:46:22 +08:00
Arthur Eubanks
68c976bf64 [X86] Fix referencing local tagged globals
We should treat the medium code model like the small code model.
Classifying non-local references already properly handled this.
2023-12-17 13:49:50 -08:00
melonedo
3eaed9e6f5
[RISCV] Implement intrinsics for XCVbitmanip extension in CV32E40P (#74993)
Implement XCVbitmanip intrinsics for CV32E40P according to the
specification.

This commit is part of a patch-set to upstream the vendor specific
extensions of CV32E40P that need LLVM intrinsics to implement Clang
builtins.

Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill,
@NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie.

Spec:
05481cf0ef/specifications/corev-builtin-spec.md (listing-of-pulp-bit-manipulation-builtins-xcvbitmanip).

Previously reviewed on Phabricator: https://reviews.llvm.org/D157510.
Parallel GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635795.html.

Co-authored-by: melonedo <funanzeng@gmail.com>
2023-12-17 19:29:40 +08:00
Carl Ritson
5139299618
[AMDGPU] Track physical VGPRs used for SGPR spills (#75573)
Physical VGPRs used for SGPR spills need to be tracked independent of
WWM reserved registers. The WWM reserved set contains extra registers
allocated during WWM pre-allocation pass.

This causes SGPR spills allocated after WWM pre-allocation to overlap
with WWM register usage, e.g. if frame pointer is spilt during
prologue/epilog insertion.
2023-12-17 16:44:16 +09:00
Craig Topper
c26510a2bf [RISCV] Fix intrinsic names in sf_vfwmacc_4x4x4.ll. NFC
The type strings in the intrinsic name were using f16 instead of
bf16 for float types. Nothing really checks these strings so everything
still worked.
2023-12-16 14:54:50 -08:00
Stefan Pintilie
c398fa009a Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit f4b5be1ecdc85ca4257b739afb8d57e23c7a8030.

The above change was breaking the clang-ppc64le-linux-test-suite bot.
2023-12-16 07:30:53 -06:00
Yeting Kuo
5545b25452
[RISCV] Make Zfh imply Zfhmin. (#75576)
According to spec, the Zfhmin extension is a subset of the Zfh
extension.
2023-12-16 11:22:07 +08:00
Arthur Eubanks
b3e353d263
[X86] Don't use rip-relative lea to get a function address in medium static mode (#75656)
This essentially reverts https://reviews.llvm.org/D140593. Somewhere
along the line we properly fixed the medium code model to assume
functions are small, so now we get a 32-bit movl as desired.
2023-12-15 15:15:18 -08:00
Paul Kirth
9a578a9f60
Revert "[StackColoring] Delete dead stack slots (#75351)" (#75655)
This reverts commit 08b306dc8e7c0b2498f4f194a3c51686d56dbd20.

it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
2023-12-15 13:32:39 -08:00
Arthur Eubanks
809ee6cfcf [X86][test] Update tagged-globals*.ll tests
Use update_llc_test_checks.py.

Split out jump table tests into separate file since we don't want to check the exact instruction sequence for it.
2023-12-15 12:54:55 -08:00
Ulrich Weigand
59f7f35a90 [SystemZ] ABI support for single-element vector types
Support passing and returning values of single-element vector
types (i.e. <1 x i128> and <1 x fp128>).

Now that i128 is a legal type, supporting these types can be
done simply by providing a getRegisterTypeForCallingConv
implementation that handles them.

Fixes https://github.com/llvm/llvm-project/issues/61291
2023-12-15 19:31:00 +01:00
Philip Reames
e8a15eca92
[RISCV] Prefer whole register loads and stores when VL=VLMAX (#75531)
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
2023-12-15 09:26:57 -08:00
Craig Topper
93b14c3df1
[RISCV Add some vsetvli insertion test cases with vmv.s.x+reduction. NFC (#75544)
These test cases where intended to get a single vsetvli by using the
vmv.s.x intrinsic with the same LMUL as the reduction. This works for
FP, but does not work for integer.

I believe #71501 will break this for FP too. Hopefully the vsetvli pass
can be taught to fix this.
2023-12-15 08:50:54 -08:00
Mariusz Sikora
414d27419f
[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2023-12-15 17:15:55 +01:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Mirko Brkušanin
07a6d73664
[AMDGPU] CodeGen for GFX12 VFLAT, VSCRATCH and VGLOBAL instructions (#75493) 2023-12-15 15:01:40 +01:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Ulrich Weigand
a65ccc1b9f
[SystemZ] Support i128 as legal type in VRs (#74625)
On processors supporting vector registers and SIMD instructions, enable
i128 as legal type in VRs. This allows many operations to be implemented
via native instructions directly in VRs (including add, subtract,
logical operations and shifts). For a few other operations (e.g.
multiply and divide, as well as atomic operations), we need to move the
i128 value back to a GPR pair to use the corresponding instruction
there. Overall, this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand
type changes (which also improves compatibility with GCC).
2023-12-15 12:55:15 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Pierre van Houtryve
ef067f5204
[AMDGPU][SIInsertWaitcnts] Do not add s_waitcnt when the counters are known to be 0 already (#72830)
Co-authored-by: Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>
2023-12-15 12:33:32 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
chuongg3
70579c95bd
[AArch64][GlobalISel] Look into array's element (#74109)
In AArch64RegisterBankInfo, IsFPOrFPType() does not work correctly
with ArrayTypes and StructTypes as it does not not look at their
elements.

This caused some registers to be selected as gpr instead of fpr.
2023-12-15 10:46:57 +00:00
Mariusz Sikora
229273f538
[AMDGPU] Update permlane test for GFX12 (#75572) 2023-12-15 11:18:23 +01:00
mohammed-nurulhoque
08b306dc8e
[StackColoring] Delete dead stack slots (#75351)
deletes slots that have lifetime markers and the lifetime ranges are empty.
2023-12-15 09:58:19 +00:00
Mirko Brkušanin
569ef8ddd9
[AMDGPU] Add pseudo scalar trans instructions for GFX12 (#75204) 2023-12-15 10:41:05 +01:00
Carl Ritson
0ed0b7458a [AMDGPU] Pre-commit test for #75573. NFC
Shows spill allocation overlapping with WWM register use.
2023-12-15 18:29:08 +09:00
Mariusz Sikora
966416b9e8
[AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Pierre van Houtryve
f1ea77f7be
[AMDGPU][SIInsertWaitcnts] Set initial state for VS_CNT in non-kernel functions (#75436)
Split from #72830
2023-12-15 08:31:14 +01:00
Wang Yaduo
c532ba4edd [RISCV] Support printing immediate of RISCV MCInst in hexadecimal format (#74053)
Enable the llvm-objdump to disassemble the immediate of RISCV
instruction in hexadecimal format with --print-imm-hex flag.
2023-12-14 22:42:11 -08:00
Vitaly Buka
fc3adf74d3
Revert "[RISCV] Support printing immediate of RISCV MCInst in hexadecimal format" (#75561)
Reverts llvm/llvm-project#74053

Breaks https://lab.llvm.org/buildbot/#/builders/5/builds/39291

Co-authored-by: Wang Yaduo <wangyaduo@linux.alibaba.com>

Issue #75563
2023-12-14 22:05:47 -08:00