52796 Commits

Author SHA1 Message Date
wanglei
a60a5421b6 Reland "[LoongArch] Support CTLZ with lsx/lasx"
This patch simultaneously adds tests for `CTPOP`.

This relands 07cec73dcd095035257eec1f213d273b10988130 with fix tests.
2023-12-02 17:22:40 +08:00
wanglei
63e6bba0c3 Revert "[LoongArch] Support CTLZ with lsx/lasx"
This reverts commit 07cec73dcd095035257eec1f213d273b10988130.
2023-12-02 17:17:48 +08:00
wanglei
07cec73dcd [LoongArch] Support CTLZ with lsx/lasx
This patch simultaneously adds tests for `CTPOP`.
2023-12-02 17:13:36 +08:00
wanglei
66a3e4fafb [LoongArch] Override TargetLowering::isShuffleMaskLegal
By default, `isShuffleMaskLegal` always returns true, which can result
 in the expansion of `BUILD_VECTOR` into a `VECTOR_SHUFFLE` node in
 certain situations. Subsequently, the `VECTOR_SHUFFLE` node is expanded
 again into a `BUILD_VECTOR`, leading to an infinite loop.
 To address this, we always return false, allowing the expansion of
 `BUILD_VECTOR` through the stack.
2023-12-02 14:25:17 +08:00
Arthur Eubanks
d8a04398f9 Reland [X86] With large code model, put functions into .ltext with large section flag (#73037)
So that when mixing small and large text, large text stays out of the
way of the rest of the binary.

This is useful for mixing precompiled small code model object files and
built-from-source large code model binaries so that the the text
sections don't get merged.

The reland fixes an issue where a function in the large code model would reference small data without GOTOFF.

This was incorrectly reverted in 76f78ecc789d58baa3a88b2fe2a57428f07e5362.
2023-12-01 14:23:44 -08:00
Craig Topper
7e7aaa53a1
[RISCV][GISel] Support G_ABS with Zbb. (#72939)
We can use neg+max or negw+max.
2023-12-01 11:13:45 -08:00
David Green
aa7e873f2f [AArch64] Regenerate fmin/fmax/memcpy legalization tests. NFC 2023-12-01 19:04:29 +00:00
Philip Reames
e817966718
[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971)
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.

Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?

Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
2023-12-01 11:00:59 -08:00
Craig Topper
f866fde598
[RISCV][GISel] Lower G_FCONSTANT to constant pool load without F or D. (#73034)
I used an IR test because it was easier than constructing different MIR
test for each type of addressing.
2023-12-01 10:24:26 -08:00
Mircea Trofin
7832a8582a [mlgo] Fix test post PR #73899
Opcode value change.
2023-12-01 09:05:22 -08:00
Dmitri Gribenko
76f78ecc78 Revert "Reland [X86] With large code model, put functions into .ltext with large section flag (#73037)"
This reverts commit 4bf8a688956a759b7b6b8d94f42d25c13c7af130.

This commit seems to be breaking the semantics of the
ObjectFile::isSectionText method, which breaks numba/llvmlite bindings.
2023-12-01 17:18:14 +01:00
Jon Roelofs
39d15a7d3b
[AArch64][SME] Remove implicit-def's on smstart (#69012)
When we lower calls, the sequence of argument copy-to-reg nodes are
glued to the smstart. In the InstrEmitter, these glued copies are turned
into implicit defs, since the actual call instruction uses those
physregs, resulting in the register allocator adding unnecessary copies
of regs that are preserved anyway.
2023-12-01 07:34:22 -08:00
Matthew Devereau
e59a0cd7d8
[AArch64][SME2] Add SME2 builtins for zero { zt0 } (#72274)
See https://github.com/ARM-software/acle/pull/217

Patch by: Kerry McLaughlin kerry.mclaughlin@arm.com
2023-12-01 14:30:39 +00:00
Matt Devereau
5fe7ae848c [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (#72849)
Adds the builtins:
void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)

Patch by: Kerry McLaughlin kerry.mclaughlin@arm.com
2023-12-01 09:34:38 +00:00
Ramkumar Ramachandra
4d1dc7770a
AMDGPU/load-global-i32: regenerate test using UTC (NFC) (#73962)
Fix the RUN lines so that UTC runs cleanly, and regenerate the test
load-global-i32.ll using utils/update_llc_test_checks.py.
2023-12-01 09:22:13 +00:00
Ramkumar Ramachandra
d48d1edcf3
PowerPC/aix-cc-abi: regenerate test using UTC (NFC) (#73963)
Split out the parts of aix-cc-abi.ll that requires to be regenerated by
utils/update_mir_test_checks.py into aix-cc-abi-mir.ll, and regenerate
it using the script. Regenerate aix-cc-abi.ll using
utils/update_llc_test_checks.py.
2023-12-01 08:22:18 +00:00
Valery Pykhtin
604c29e934
[AMDGPU] NFC. Add test for debug info on CFG annotation instructions. (#73959) 2023-12-01 09:10:29 +01:00
Piyou Chen
5ecb37b45a
[RISCV] Make InitUndef handle undef operand (#65755)
Address https://github.com/llvm/llvm-project/issues/65704.

If the operand is marked as undef, the InitUndef misses this case. This patch makes InitUndef pass handle the undef operand case.
2023-12-01 01:04:42 -06:00
Uday Bondhugula
173fcf7da5
[NVPTX] Lower 16xi8 and 8xi8 stores efficiently (#73646)
Lower 16xi8 vector stores in NVPTX ISel efficiently using
st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads
and 8xf16. Similarly, 8xi8 using st.v2.u32.
2023-12-01 11:00:01 +05:30
leecheechen
dbbc7c31c8
[LoongArch] Add some binary IR instructions testcases for LASX (#74031)
The IR instructions include:
- Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv
- Bitwise Binary Operations: shl lshr ashr
2023-12-01 13:14:11 +08:00
Craig Topper
755c28a940
[GISel][Mips] Infer alignment when creating memory operand for G_VASTART. (#74004) 2023-11-30 19:55:23 -08:00
Piyou Chen
d0a39e617b
[RISCV] default enable splitting regalloc between RVV and other (#72950)
This patch make riscv-split-regalloc as true by default. 

It will not affect the codegen result if it vector register allocation
doesn't exist. If there is the vector register allocation, it may affect
the non-rvv register LiveInterval's segment/weight. It will make the
allocation in a different order.
2023-11-30 21:12:46 -06:00
wanglei
ca66df3b02 [LoongArch] Add more and/or/xor patterns for vector types 2023-12-01 10:28:41 +08:00
wanglei
add224c0a0 [LoongArch] Custom lowering ISD::BUILD_VECTOR 2023-12-01 09:13:39 +08:00
wanglei
f2cbd1fdf7 [LoongArch] Add codegen support for insertelement 2023-12-01 09:13:39 +08:00
Paul Kirth
cfe1ece833
[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180)
https://github.com/llvm/llvm-project/issues/70703 pointed out that
cloning LLVM modules could lead to miscompiles when using FatLTO.

This is due to an existing issue when cloning modules with labels (see
#55991 and #47769). Since this can lead to miscompilation, we can avoid
cloning the LLVM modules, which was desirable anyway.

This patch modifies the EmbedBitcodePass to no longer clone the module
or run an input pipeline over it. Further, it make FatLTO always perform
UnifiedLTO, so we can still defer the Thin/Full LTO decision to
link-time. Lastly, it removes dead/obsolete code related to now defunct
options that do not work with the EmbedBitcodePass implementation any
longer.
2023-11-30 17:09:34 -08:00
Arthur Eubanks
4bf8a68895 Reland [X86] With large code model, put functions into .ltext with large section flag (#73037)
So that when mixing small and large text, large text stays out of the
way of the rest of the binary.

This is useful for mixing precompiled small code model object files and
built-from-source large code model binaries so that the the text
sections don't get merged.

The reland fixes an issue where a function in the large code model would reference small data without GOTOFF.
2023-11-30 15:17:17 -08:00
Jeffrey Byrnes
1b02f594b3
[AMDGPU] Rework dot4 signedness checks (#68757)
Using the known/unknown value of the sign bit, reason about the signedness version of the dot4 instruction.
2023-11-30 13:38:05 -08:00
Eduard Zingerman
2484469803 Revert "[BPF] Attribute preserve_static_offset for structs"
This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f.
Buildbot reports MSAN failures in tests added in this commit:
https://lab.llvm.org/buildbot/#/builders/5/builds/38806

Failing tests:
  LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll
  LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll
2023-11-30 22:29:45 +02:00
Natalie Chouinard
f8a21dff70
[SPIR-V] Mark currently failing tests as XFAIL (#73858)
These tests are currently failing and their fix is being tracked in
Issue #60133. Marking them as XFAIL for now will get the test suite to a
passing state so we can work on adding a GitHub action to automatically
run these tests on a PR bot to help keep the tree green.

Also removed the no-longer supported -opaque-pointers=0 flag from the
couple tests where it was remaining.
2023-11-30 15:17:32 -05:00
Douglas Yung
c12de14876 Revert "[X86] Canonicalize fp zero vectors from bitcasted integer zero vectors"
This reverts commit 169db80e41936811c6744f2c513a1ed00d97f10e.

This change is causing many test failures on Windows bots:
- https://lab.llvm.org/buildbot/#/builders/235/builds/3616
- https://lab.llvm.org/buildbot/#/builders/233/builds/4883
- https://lab.llvm.org/buildbot/#/builders/216/builds/31174
2023-11-30 11:59:50 -08:00
Simon Pilgrim
169db80e41 [X86] Canonicalize fp zero vectors from bitcasted integer zero vectors
Generic code is supposed to handle this but can be blocked by hasOneUse checks.

Noticed while investigating #26392
2023-11-30 18:33:52 +00:00
Simon Pilgrim
539e60c34a [X86] X86FixupVectorConstantsPass - consistently use non-DQI 128/256-bit subvector broadcasts
Without the predicate there's no benefit to using the DQI variants instead of the default AVX512F instructions
2023-11-30 18:33:52 +00:00
Eduard Zingerman
cb13e9286b [BPF] Attribute preserve_static_offset for structs
This commit adds a new BPF specific structure attribte
`__attribute__((preserve_static_offset))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by
  updating the field offset;

  BPF verifier limits access patterns allowed for certain data
  types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these
  types only `LD/ST <reg> <static-offset>` memory loads and stores are
  allowed.

  This is so because offsets of the fields of these structures do not
  match real offsets in the running kernel. During BPF program
  load/verification loads and stores to the fields of these types are
  rewritten so that offsets match real offsets. For this rewrite to
  happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux
  kernel source tree for details.

- runtime environment might disallow access to the field of the type
  through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for
  register values. In case if register with such tag is modified BPF
  programs are not allowed to read or write memory using register. See
  kernel/bpf/verifier.c:check_mem_access function in the Linux kernel
  source tree for details.

Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `preserve_static_offset` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic
    `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((preserve_static_offset)) has a priority over
__attribute__((preserve_access_index)). When preserve_access_index
attribute is present preserve access index transformations are not
applied.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6

Differential Revision: https://reviews.llvm.org/D133361
2023-11-30 19:45:03 +02:00
Momchil Velikov
cc944f502f
[AArch64] Stack probing for function prologues (#66524)
This adds code to AArch64 function prologues to protect against stack
clash attacks by probing (writing to) the stack at regular enough
intervals to ensure that the guard page cannot be skipped over.

The patch depends on and maintains the following invariants:

Upon function entry the caller guarantees that it has probed the stack
(e.g. performed a store) at some address [sp, #N], where`0 <= N <=
1024`. This invariant comes from a requirement for compatibility with
GCC. Any address range in the allocated stack, no smaller than
stack-probe-size bytes contains at least one probe At any time the stack
pointer is above or in the guard page Probes are performed in
descreasing address order
The stack-probe-size is a function attribute that can be set by a
platform to correspond to the guard page size.

By default, the stack probe size is 4KiB, which is a safe default as
this is the smallest possible page size for AArch64. Linux uses a 64KiB
guard for AArch64, so this can be overridden by the stack-probe-size
function attribute.

For small frames without a frame pointer (<= 240 bytes), no probes are
needed.

For larger frame sizes, LLVM always stores x29 to the stack. This serves
as an implicit stack probe. Thus, while allocating stack objects the
compiler assumes that the stack has been probed at [sp].

There are multiple probing sequences that can be emitted, depending on
the size of the stack allocation:

A straight-line sequence of subtracts and stores, used when the
allocation size is smaller than 5 guard pages. A loop allocating and
probing one page size per iteration, plus at most a single probe to deal
with the remainder, used when the allocation size is larger but still
known at compile time. A loop which moves the SP down to the target
value held in a register (or a loop, moving a scratch register to the
target value help in SP), used when the allocation size is not known at
compile-time, such as when allocating space for SVE values, or when
over-aligning the stack. This is emitted in AArch64InstrInfo because it
will also be used for dynamic allocas in a future patch. A single probe
where the amount of stack adjustment is unknown, but is known to be less
than or equal to a page size.

---------

Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>
2023-11-30 17:41:51 +00:00
Michael Maitland
12326f5ff0
[RISCV][GISEL] lowerFormalArguments for variadic arguments (#73064)
This is based of the varargs coe in RISCVTargetLowering::LowerFormalArguments.
2023-11-30 12:15:35 -05:00
David Green
4d80122598
[AArch64] Teach areMemAccessesTriviallyDisjoint about scalable widths. (#73655)
The base change here is to change getMemOperandWithOffsetWidth to return
a TypeSize Width, which in turn allows areMemAccessesTriviallyDisjoint
to reason about trivially disjoint widths.
2023-11-30 16:54:28 +00:00
Michael Maitland
a6f7278595
[RISCV][GISEL] legalize, regbankselect, and instruction-select G_PTRMASK (#73062)
This is done in instruction-select instead of in legalization for the
sake of alias analysis.
2023-11-30 11:54:01 -05:00
Michael Maitland
dbb9043dea
[RISCV][GISEL] legalize, regbankselect, and instruction-select for G_… (#73061)
…[UN]MERGE_VALUES

When MERGE or UNMERGE s64 on a subtarget that is non-64bit, it must have
the D extension and use FPR in order to be legal.

All other instances of MERGE and UNMERGE that can be made legal should
be narrowed, widend, or replaced by the combiner.
2023-11-30 11:53:25 -05:00
Michael Maitland
6976dac09d
[RISCV][GISEL] regbankselect and instruction-select for G_IMPLICIT_DEF (#73060)
This is similar to the selection of G_IMPLICIT_DEF in AArch64.
Regbankselect may need to be improved in a future patch.
2023-11-30 11:38:02 -05:00
Philip Reames
ff5e536b5e
[RISCV] Add combines to form binop from tail insert idioms (#72675)
This patch contains two related combines:
1) If we have an scalar vector insert into the result of a
concat_vector,
   sink the insert into the operand of the concat.
2) If we have a insert of a scalar binop into a vector binop of the
   same opcode and the RHS of both are constant, perform the insert
   and then the binop.

The common theme to both is pushing inserts closer to the sources of the
computation graph. The goal is to enable forming vector bin ops from
inserts of scalar binops at the end of another vector.

For RISCV specifically, the concat_vector transform will push inserts to
smaller vectors. This will have the effect of reducing lmul for the
vslides, and usually doesn't require an additional vsetvli since
the source vectors are already working in the narrower VL.   I tried
that one as a target independent combine first, and it doesn't appear
profitable on all targets.

This is only one approach to the problem. Another idea would be to
aggressively form build_vectors and subvector inserts from the
individual scalar inserts, and then have a transform which sunk a
subvector_insert down through the concat. The advantage of the alternate
approach is that we expose parallelism in the insert sequence, even if
the source vector isn't a concat_vector. If reviewers are okay with it,
I'd like to start with this approach, and then explore that direction in
a follow up patch.
2023-11-30 07:32:42 -08:00
Piotr Sobczak
73d9f5fda6
[AMDGPU] Add test for GCNRegPressure tracker bug (#73786)
Add a test to document an existing problem in GCNRegPressure tracker.

The upward tracker does not count the registers used (16 of them) in
movrel instruction (for example V_INDIRECT_REG_WRITE_MOVREL_B32_V16).

The downward tracker counts the registers but reports a mismatch:
%0:L0000000000000C00 isn't found in LIS reported set
2023-11-30 16:26:44 +01:00
Shengchen Kan
a4e1aa256b
[X86][tablgen] Auto-gen broadcast tables (#73654)
1. Add TB_BCAST_SH for FP16
2. Auto-gen 4 broadcast tables BroadcastTable[1-4]

issue: https://github.com/llvm/llvm-project/issues/66360
2023-11-30 22:24:31 +08:00
David Spickett
99d485917a
[llvm][AArch64] Preserve regmask when expanding the BLR_BTI pseudo instruction (#73927)
Fixes #73787

Not doing so lead to us making use of a register after the call, which
has been clobbered by the call.

Added an MIR test that runs only the pseudo expansion pass.
2023-11-30 14:23:26 +00:00
leecheechen
29a0f3ec2b
[LoongArch] Add some binary IR instructions testcases for LSX (#73929)
The IR instructions include:
- Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv
- Bitwise Binary Operations: shl lshr ashr
2023-11-30 21:41:18 +08:00
Matt Arsenault
c44dca15a4
MachineVerifier: Reject extra non-register operands on instructions (#73758)
We were allowing extra immediate arguments, and only bothering to check
if registers were implicit or not.

Also consolidate extra operand checks in verifier, to make this
testable. We had 3 different places checking if you were trying to build
an instruction with more operands than allowed by the definition. We had
an assertion in addOperand, a direct check in the MIRParser to avoid the
assertion, and the machine verifier checks. Remove the assert and parser
check so the verifier can provide a consistent verification experience,
which will also handle instructions modified in place.
2023-11-30 22:33:42 +09:00
David Green
269e3049ea [AArch64] Remove invalid check lines from sme-aarch64-svcount.ll. NFC 2023-11-30 12:15:54 +00:00
Paul Walker
4db451a87d
[LLVM][SVE] Honour calling convention when using SVE for fixed length vectors. (#70847)
NOTE: I'm not sure how many of the corner cases are part of the
documented ABI but that shouldn't matter because my goal is for
`-msve-vector-bits` to have no affect on the way arguments and returns
are processed.
2023-11-30 12:09:58 +00:00
Simon Pilgrim
1d20b009a0 [X86] Enable v8f16/v16f16/v32f16 FCOPYSIGN custom lowering on SSE2/AVX/AVX512 targets 2023-11-30 11:48:33 +00:00
Simon Pilgrim
e653e0303d [X86] Add fcopysign vector test coverage 2023-11-30 11:33:39 +00:00