52796 Commits

Author SHA1 Message Date
Farzon Lotfi
36b86438d7
[DXIL] Implement pow lowering (#86733)
closes #86179
- `DXILIntrinsicExpansion.cpp` - add the pow expansion to
exp2(y*log2(x))
2024-03-28 12:32:28 -04:00
Craig Topper
f90813543b
[MCP] Use MachineInstr::all_defs instead of MachineInstr::defs in hasOverlappingMultipleDef. (#86889)
defs does not return the defs for inline assembly. We need to use
all_defs to find them.

Fixes #86880.
2024-03-28 08:37:19 -07:00
Amy Kwan
a3efc53f16
[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053)
Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
2024-03-28 09:18:45 -04:00
Freddy Ye
36b4b9d988
[X86] Support immediate folding for CCMP/CTEST (#86616)
E.g.
%0:gr32 = MOV32ri 81
CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags
=>
CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags
2024-03-28 18:54:32 +08:00
bvlgah
e640d9e725
[RISCV][GlobalISel] Fix legalizing ‘llvm.va_copy’ intrinsic (#86863)
Hi, I spotted a problem when running benchmarking programs on a RISCV64
device.

## Issue

Segmentation faults only occurred while running the programs compiled
with `GlobalISel` enabled.

Here is a small but complete example (it is adopted from [Google's
benchmark
framework](95a9f0d0b4/MicroBenchmarks/libs/benchmark/src/colorprint.cc (L85-L119))
to reproduce the issue,

```cpp
#include <cstdarg>
#include <cstdio>
#include <iostream>
#include <memory>
#include <string>

std::string FormatString(const char* msg, va_list args) {
  // we might need a second shot at this, so pre-emptivly make a copy
  va_list args_cp;
  va_copy(args_cp, args);

  std::size_t size = 256;
  char local_buff[256];
  auto ret = vsnprintf(local_buff, size, msg, args_cp);

  va_end(args_cp);

  // currently there is no error handling for failure, so this is hack.
  // BM_CHECK(ret >= 0);

  if (ret == 0)  // handle empty expansion
    return {};
  else if (static_cast<size_t>(ret) < size)
    return local_buff;
  else {
    // we did not provide a long enough buffer on our first attempt.
    size = static_cast<size_t>(ret) + 1;  // + 1 for the null byte
    std::unique_ptr<char[]> buff(new char[size]);
    ret = vsnprintf(buff.get(), size, msg, args);
    // BM_CHECK(ret > 0 && (static_cast<size_t>(ret)) < size);
    return buff.get();
  }
}

std::string FormatString(const char* msg, ...) {
  va_list args;
  va_start(args, msg);
  auto tmp = FormatString(msg, args);
  va_end(args);
  return tmp;
}

int main() {
  std::string Str =
      FormatString("%-*s %13s %15s %12s", static_cast<int>(20),
                   "Benchmark", "Time", "CPU", "Iterations");
  std::cout << Str << std::endl;
}
```

Use `clang++ -fglobal-isel -o main main.cpp` to compile it.

## Cause

I have examined MIR, it shows that these segmentation faults resulted
from a small mistake about legalizing the intrinsic function
`llvm.va_copy`.


36e74cfdbd/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (L451-L453)

`DstLst` and `Tmp` are placed in the wrong order.

## Changes

I have tweaked the test case `CodeGen/RISCV/GlobalISel/vararg.ll` so
that `s0` is used as the frame pointer (not in all checks) which points
to the starting address of the save area. I believe that it helps reason
about how `llvm.va_copy` is handled.
2024-03-28 13:09:18 +03:00
Luke Lau
eff4593a64 [RISCV] Add test case for missed vwaddu.vv due to add->or combine. NFC
We should be able to recover this with combineBinOp_VLToVWBinOp_VL if we
check that the or has the disjoint flag set.
2024-03-28 16:58:52 +08:00
Vyacheslav Levytskyy
b7ac8fddb5
[SPIR-V] Improve type inference: deduce types of composite data structures (#86782)
This PR improves type inference in general and deduces types of
composite data structures in particular. Also added a way to insert a
bitcast to make a fun call valid in case of arguments types mismatch due
to opaque pointers type inference.

The attached test `pointers/nested-struct-opaque-pointers.ll`
demonstrates new capabilities: the SPIRV code emitted for this test is
now (1) valid in a sense of data field types and (2) accepted by
`spirv-val`.

More strict LIT checks, support of more composite data structures and
improvement of fun calls from the perspective of type correctness are
main todo's at the moment.
2024-03-28 08:08:06 +01:00
Heejin Ahn
6b7ecc7979 Revert "[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633)"
This reverts commit 52431fdb1ab8d29be078edd55250e06381e4b6b0.

The PR assumed `__threwValue` couldn't be 0, but it could be when the
thrown thing is not a longjmp but an exception, so that `if` check was
actually necessary.
2024-03-28 04:41:29 +00:00
Eli Friedman
036e7ee9d1 [NFC][AArch64] Regenerate regression tests. 2024-03-27 17:08:02 -07:00
Philip Reames
8881281902 [RISCV] Add test coverage for (add (shl Z, c1), Y, (shl Z, c2)) variants
Basically, testing for interaction of shNadd matching with one step
of reassociation in the add.
2024-03-27 14:47:45 -07:00
Shilei Tian
0a43ca731b
[AMDGPU] Fix missing IsExact flag when expanding vector binary operator (#86712) 2024-03-27 17:40:58 -04:00
Florian Hahn
b9cd48f96a
Revert "[TBAA] Add verifier for tbaa.struct metadata (#86709)"
This reverts commit df75183d70e029352a49c93f275db703c81a65c1.

Revert for now as this appears to cause failures on some buildbots,
e.g.:
https://lab.llvm.org/buildbot/#/builders/93/builds/19428/steps/10/logs/stdio
2024-03-27 21:22:15 +00:00
Craig Topper
acab142751 [LegalizeDAG] Freeze index when converting insert_elt/insert_subvector to load/store on stack.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.

We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.

Fixes #86717
2024-03-27 13:01:23 -07:00
Craig Topper
0d7ea50d20 [AArch64] Pre-commit test for #86717. NFC 2024-03-27 13:01:23 -07:00
David Green
36e74cfdbd
[AArch64] Clear kill flags when removing FMOVDr. (#86308)
The uses of OldDef/NewDef may not be killed in the same place they
previously were after they are replaced, and so need to be cleared.
2024-03-27 18:36:02 +00:00
Heejin Ahn
52431fdb1a
[WebAssembly] Remove threwValue comparison after __wasm_setjmp_test (#86633)
Currently the code thinks a `longjmp` occurred if both `__THREW__` and
`__threwValue` are nonzero. But `__threwValue` can be 0, and the
`longjmp` library function should change it to 1 in case it is 0:
https://en.cppreference.com/w/c/program/longjmp

Emscripten libraries were not consistent about that, but after
https://github.com/emscripten-core/emscripten/pull/21493 and
https://github.com/emscripten-core/emscripten/pull/21502, we correctly
pass 1 in case the input is 0. So there will be no case `__threwValue`
is 0. And regardless of what `longjmp` library function does, treating
`longjmp`'s 0 input to its second argument as "not longjmping" doesn't
seem right.

I'm not sure where that `__threwValue` checking came from, but probably
I was porting then fastcomp's implementation and moved this part just
verbatim:
9bdc7bb4fc/lib/Target/JSBackend/CallHandlers.h (L274-L278)

Just for the context, how this was discovered:
https://github.com/emscripten-core/emscripten/pull/21502#pullrequestreview-1942160300
2024-03-27 11:11:16 -07:00
Simon Pilgrim
5d3ef06509 [X86] combine-pavg.ll - add demandedelts test coverage for #86284 2024-03-27 17:15:48 +00:00
Simon Pilgrim
dcd0f2b610 [X86] combineExtractFromVectorLoad support extraction from vector of different types to the extraction type/index
combineExtractFromVectorLoad no longer uses the vector we're extracting from to determine the pointer offset calculation, allowing us to extract from types that have been bitcast to work with specific target shuffles.

Fixes #85419
2024-03-27 17:01:41 +00:00
Simon Pilgrim
f92fa7e2cf [X86] Add -verify-machineinstrs to huge stack tests
Help identify EXPENSIVE_CHECKS regressions identified in #84114
2024-03-27 16:26:10 +00:00
Simon Pilgrim
78f0871bee Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114)"
This is failing on some EXPENSIVE_CHECKS buildbots
2024-03-27 16:16:15 +00:00
David Green
313bf28f98
[ARM][MVE] Remove kill flags when reusing VPR register. (#86300)
The vpr register may no longer be killed where it was, so we should be
removing the kill flags.
2024-03-27 16:04:48 +00:00
Wesley Wiser
58de1e2c5e
Fix stack layout for frames larger than 2gb (#84114)
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.

Fixes #48911
2024-03-27 15:05:58 +00:00
Brandon Wu
91896607ff
[RISCV] RISCV vector calling convention (1/2) (#77560)
[RISCV] RISCV vector calling convention (1/2)

    This is the vector calling convention based on
    https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
    the idea is to split between "scalar" callee-saved registers
    and "vector" callee-saved registers. "scalar" ones remain the
    original strategy, however, "vector" ones are handled together
    with RVV objects.

    The stack layout would be:

      |--------------------------| <-- FP
      | callee-allocated save    |
      | area for register varargs|
      |--------------------------|
      | callee-saved registers   | <-- scalar callee-saved
      |        (scalar)          |
      |--------------------------|
      | RVV alignment padding    |
      |--------------------------|
      | callee-saved registers   | <-- vector callee-saved
      |        (vector)          |
      |--------------------------|
      | RVV objects              |
      |--------------------------|
      | padding before RVV       |
      |--------------------------|
      | scalar local variables   |
      |--------------------------| <-- BP
      | variable size objects    |
      |--------------------------| <-- SP

    Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
          It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).

    Differential Revision: https://reviews.llvm.org/D154576
2024-03-27 23:03:13 +08:00
Simon Pilgrim
6d3ec56d3c [X86] combineExtractWithShuffle - use combineExtractFromVectorLoad to extract scalar load from shuffled vector load
Improves #85419
2024-03-27 14:54:25 +00:00
Kevin P. Neal
f5296df97c
[FPEnv][AMDGPU] Correct AMDGPUSimplifyLibCalls handling of strictfp attribute. (#86705)
The AMDGPUSimplifyLibCalls pass was lowering function calls with the
strictfp attribute to sequences that included function calls incorrectly
lacking the attribute. This patch corrects that.

The pass now also emits the correct constrained fp call instead of
normal FP instructions when in a function with the strictfp attribute.
Replacing non-constrained calls with constrained calls when required
is still on the IRBuilder's TODO list.
2024-03-27 10:20:00 -04:00
Justin Cady
26464f2662
[FreeBSD] Mark __stack_chk_guard dso_local except for PPC64 (#86665)
Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645.

D113443 is the original attempt to bring this FreeBSD patch to
llvm-project,
but it never landed. This change is required to build FreeBSD kernel
modules
with -fstack-protector using a standard LLVM toolchain. The FreeBSD
kernel
loader does not handle R_X86_64_REX_GOTPCRELX relocations.

Fixes #50932.
2024-03-27 09:03:46 -04:00
Simon Pilgrim
e82765bf07 [X86] masked_store.ll - add nounwind to remove cfi noise 2024-03-27 12:22:31 +00:00
Matt Arsenault
ef316da4a2 AMDGPU: Fix dead check prefixes in test 2024-03-27 14:42:47 +03:00
Luke Lau
f15b7deeaa [RISCV] Add test case to show missing vmerge fold on tied pseudos. NFC
Note we can't use vwaddu.wv because it will get combined away with #78403
2024-03-27 17:42:45 +08:00
Julian Nagele
df75183d70
[TBAA] Add verifier for tbaa.struct metadata (#86709)
Adds logic to the IR verifier that checks whether !tbaa.struct nodes are
well-formed. That is, it checks that the operands of !tbaa.struct nodes
are in groups of three, that each group of three operands consists of
two integers and a valid tbaa node, and that the regions described by
the offset and size operands are non-overlapping.

PR: https://github.com/llvm/llvm-project/pull/86709
2024-03-27 10:30:27 +01:00
Luke Lau
6d13263d4a
[RISCV] Add tests for combineBinOpOfZExts. NFC (#86689)
Unlike add, sub and mul, we don't have widening instructions for div,
rem and logical ops, so we don't have any test coverage if we were to
extend combineBinOpOfZExts to handle them.

Adding tests coincidentally revealed that logical ops are already
narrowed as a generic DAG combine via
DAGCombiner::hoistLogicOpWithSameOpcodeHands. So we don't actually need
to run combineBinOpOfZExts on them.
2024-03-27 15:23:34 +08:00
Yeting Kuo
22bfc58cd0
[RISCV] Teach RISCVMakeCompressible handle byte/half load/store for Zcb. (#83375)
For targets with Zcb, this patch makes llvm generate more compress
c.lb/lbu/lh/lhu/sb/sh instructions.
2024-03-27 13:40:38 +08:00
Craig Topper
8a9c170170
[RISCV] Align stack size down to a multiple of 16 before using cm.push/pop. (#86073)
This an alternative to #84935 to fix the miscompile, but not be optimal.

The immediate for cm.push/pop must be a multiple of 16. For RVE, it
might not be. It's not easy to increase the stack size without messing
up cfa directives and maybe other things.

This patch rounds the stack size down to a multiple of 16 before
clamping it to 48. This causes an extra addi to be emitted to handle the
remainder.

Once this commited, I can commit #84989 to add verification for these
instructions being generated with valid offsets.
2024-03-26 21:37:19 -07:00
Craig Topper
4d03a9ecc6
[RISCV] Preserve MMO when expanding PseudoRV32ZdinxSD/PseudoRV32ZdinxLD. (#85877)
This allows the asm printer to print the stack spill/reload messages.
2024-03-26 20:42:14 -07:00
Michael Maitland
54a9f0e441
[RISCV][GISEL] Legalize, regbankselect, and instruction-select G_VSCALE (#85967)
G_VSCALE should be lowered using VLENB. If the type is not sXLen it
should be lowered using a G_VSCALE on the narrow type and a G_MUL.
regbank select and instruction select are straightforward so we really
only need to add tests to show it works.
2024-03-26 20:17:22 -04:00
Björn Pettersson
3e6e54eb79
[X86] Fix miscompile in combineShiftRightArithmetic (#86597)
When folding (ashr (shl, x, c1), c2) we need to treat c1 and c2
as unsigned to find out if the combined shift should be a left
or right shift.
Also do an early out during pre-legalization in case c1 and c2
has differet types, as that otherwise complicated the comparison
of c1 and c2 a bit.
2024-03-26 20:53:34 +01:00
Bjorn Pettersson
982ebeb212 [X86] Pre-commit test case for bug in combineShiftRightArithmetic
It has been noticed that combineShiftRightArithmetic isn't dealing
properly with large shift amounts, as demonstrated by the test
case added in this commit.

I think the problem partly is related to X86 using i8 as shift amount
type during ISel. So shift amount larger then 127 may be treated
as negative shift amounts if not being careful.
2024-03-26 20:49:15 +01:00
Thorsten Schütt
da6cc4a24f
[CodeGen] Add nneg and disjoint flags (#86650)
MachineInstr learned the new flags.
2024-03-26 18:44:34 +01:00
Farzon Lotfi
5cf1e2e2ec
[DXIL] Implement log intrinsic Lowering (#86569)
Completes #86192
`DXIL.td` - add log2 to dxilop lowering
`DXILIntrinsicExpansion.cpp` - add log and log10 to log2 expansions
2024-03-26 12:46:11 -04:00
Simon Pilgrim
c8b85add2e [X86] extractelement-load.ll - add test case for #85419 2024-03-26 16:14:11 +00:00
Simon Pilgrim
3140d138e4 [X86] extractelement-load.ll - use X86 instead of X32 check prefix. NFC
X32 should be used for gnux32 triples
2024-03-26 16:14:11 +00:00
Luke Lau
87519a2830
[RISCV] Combine (mul (zext, zext)) -> (zext (mul (zext, zext))) (#86465)
Building on #86248, we can also narrow the width of a mul of zexts.

This is specifically legal because on RVV we always extend to the next
power of 2 width, and multiplying two N bit integers produces a maximum
value of 2\*N bits.
So as long as we keep an inner zext of 2\*N, we will have enough space
for the multiply and won't overflow.

Alive2 proof: https://alive2.llvm.org/ce/z/XteYyb
2024-03-26 23:28:04 +08:00
Simon Pilgrim
d18bee2313 [X86] combineConcatVectorOps - concatenate FADD/FSUB/FMUL ops if we don't increase the number of INSERT_SUBVECTOR nodes.
FADD/FSUB/FMUL are usually less port-bound than INSERT_SUBVECTOR, so only concatenate if it reduces the instruction count and doesn't introduce extra INSERT_SUBVECTOR nodes.
2024-03-26 15:03:41 +00:00
Simon Pilgrim
e933c05cd2 [X86] Add fadd/fsub/fmul tests showing failure to concat operands together and perform as a wider vector
We don't want to concat fadd/fsub/fmul if both operands would need concatenating (as the fp op is usually cheaper than the concat), but if at least one operand is free to concat (i.e. constant or extracted from a wider vector), then we should try to concat the fp op.
2024-03-26 15:03:41 +00:00
Il-Capitano
308ed0233a
[Intrinsics] Make patchpoint.i64 generic on its return type (#85911)
Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
2024-03-26 19:08:52 +05:30
Sander de Smalen
f914e8e77c
[AArch64][SME] Add coalescer barrier for args/results in locally streaming functions. (#85388)
Similar to how we protected FP/fixed-vector arguments and results from
calls, we should do the same for arguments/results from locally-streaming
functions such that those are not spilled/filled as ZPR registers.

This may cause a small regression (additional spills/fills), which is
addressed by #85386.
2024-03-26 11:40:31 +00:00
Simon Pilgrim
5b544b511c [Mips] ctpop.mir - regenerate checks to improve codegen diff in #86505 2024-03-26 10:43:29 +00:00
Simon Pilgrim
5fc619b5ee [DAG] Update ISD::AVG folds to use hasOperation to allow Custom matching prior to legalization
Fixes issue where AVX1 targets weren't matching 256-bit AVGCEILU cases.
2024-03-26 10:41:07 +00:00
Michal Paszkowski
d06ba37683
[SPIR-V] Support extension toggling and enabling all (#85503) 2024-03-26 03:04:49 -07:00
Simon Pilgrim
c7198e0af3
[DAG] Fold insert_subvector(N0, extract_subvector(N0, N2), N2) --> N0 (#86487)
Handle the case where we've ended up inserting back into the source vector we extracted the subvector from.
2024-03-26 10:03:42 +00:00