18577 Commits

Author SHA1 Message Date
Stephen Tozer
da0faa0594 [DebugInfo] Produce variadic DBG_INSTR_REFs from ISel
This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with
variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic
location expressions. The former essentially means just prepending
DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in
MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef.

Reviewed By: jmorse, Orlando

Differential Revision: https://reviews.llvm.org/D133929
2023-01-09 08:58:33 +00:00
Serguei Katkov
fd64bd94ed [Inline Spiller] Extend the snippet by statepoint uses
Snippet is a tiny live interval which has copy or fill like def
and copy or spill like use at the end (any of them might abcent).

Snippet has only one use/def inside interval and interval is located
in one basic block.

When inline spiller spills some reg around uses it also forces the
spilling of connected snippets those which got by splitting the
same original reg and its def is a full copy of our reg or its
last use is a full copy to our reg.

The definition of snippet is extended to allow not only one use/def
but more. However all other uses are statepoint instructions which will
fold fill into its operand. That way we do not introduce new fills/spills.

Reviewed By: qcolombet, dantrushin
Differential Revision: https://reviews.llvm.org/D138093
2023-01-09 13:30:57 +07:00
Simon Pilgrim
ddab12d118 [X86] Add shuffle test coverage for Issue #59860 2023-01-08 19:06:06 +00:00
Stephen Tozer
c383f4d655 [DebugInfo] Allow non-stack_value variadic expressions and use in DBG_INSTR_REF
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.

Reviewed by: jmorse

Differential Revision: https://reviews.llvm.org/D133926
2023-01-06 19:31:10 +00:00
James Y Knight
1ae36b1387 Remove special cases for invoke of non-throwing inline-asm.
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
2023-01-06 13:53:10 -05:00
Stephen Tozer
e10e936315 [DebugInfo][NFC] Add new MachineOperand type and change DBG_INSTR_REF syntax
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:

  * The introduction of a new MachineOperand type, MO_DbgInstrRef, which
    consists of two unsigned numbers that are used to index an instruction
    and an output operand within that instruction, having a meaning
    identical to first two operands of the current DBG_INSTR_REF
    instruction. This operand is only used in DBG_INSTR_REF (see below).
  * A change in syntax for the DBG_INSTR_REF instruction, shuffling the
    operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
    and replacing the first two operands with a single MO_DbgInstrRef-type
    operand.

This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D129372
2023-01-06 18:03:48 +00:00
Sanjay Patel
bf82070ea4 [SDAG] try to avoid multiply for X*Y==0
Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_
https://alive2.llvm.org/ce/z/STVD7d

We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.

In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.

Differential Revision: https://reviews.llvm.org/D141086
2023-01-06 09:06:11 -05:00
Sanjay Patel
f58eedeeee [x86] add tests for x*y == 0; NFC 2023-01-06 08:37:04 -05:00
Noah Goldstein
960bf8a454 [X86] Add tests for atomic bittest with register/memory operands
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D140938
2023-01-06 17:55:38 +08:00
Nikita Popov
e3c2faa64a Revert "[X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld"
This reverts commit 2679e8bba3e166e3174971d040b9457ec7b7d768.

This change is a significant backwards-compatibility break, which
does in fact break the entire Rust ecosystem, which uses an
-fno-plt -mrelax-relocations=0 default.

Please go through pre-commit review for this change in order to
gain broader consensus.
2023-01-06 09:43:47 +01:00
Noah Goldstein
a698790c51 [X86] Add additional tests to no-shift.ll
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D141076
2023-01-06 14:44:45 +08:00
Craig Topper
11e92bd61f [SelectionDAG] Improve codegen for udiv by constant if any divisors are 1.
If the divisor is 1, the magic algorithm does not return a correct
result and we end up using a select to pick the numerator for those
elements at the end.

Therefore we can use undef for that element of the earlier operations
when the divisor is 1. We sometimes get this through SimplifyDemandedVectorElts,
but not always. Definitely seems like we don't if the NPQ fixup is used.

Unfortunately, DAGCombiner is unable to fold srl X, <0, undef> to X so
I had to add flags to avoid emitting the srl unless one of the shift
amounts is non-zero.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D141022
2023-01-05 08:41:44 -08:00
Freddy Ye
27b8f54f51 [X86] Support -march=emeraldrapids
Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D140950
2023-01-05 20:27:32 +08:00
Nikita Popov
60442f0d44 [CodeGen] Convert some tests to opaque pointers (NFC)
These are mostly MIR tests, which I did not handle during previous
conversions.
2023-01-05 13:21:20 +01:00
Roman Lebedev
068033a2f7
[NFC][X86] Make vec_anyext.ll test non-useless 2023-01-05 01:12:30 +03:00
Philip Reames
9768a71a5e [X86] Regen a couple tests so they are autogen clean [nfc]
These appear to have had 32 bit check lines manually deleted - presumably since the checks are verbose.  Please don't do this!  Split the test file if you want, but manually deleting test lines makes the diffs for later autogen changes really confusing.
2023-01-04 12:17:32 -08:00
Philip Reames
56a40cd4ab [X86] Autogen tests for ease of update in upcoming change [nfc] 2023-01-04 12:00:20 -08:00
Roman Lebedev
846d06c707
[DAG] tryToFoldExtendOfConstant(): sext undef is not undef
https://alive2.llvm.org/ce/z/cLGpWV, but https://alive2.llvm.org/ce/z/TGNH4P
2023-01-04 22:42:43 +03:00
Philip Reames
5226077b21 [X86] Autogen tests for ease of update in upcoming change [nfc] 2023-01-04 11:30:49 -08:00
Roman Lebedev
e4b260efb2
[Codegen][X86] LowerBUILD_VECTOR(): improve lowering w/ multiple FREEZE-UNDEF ops
While we have great handling for UNDEF operands,
FREEZE-UNDEF operands are effectively normal operands.

We are better off "interleaving" such BUILD_VECTORS into a blend
between a splat of FREEZE-UNDEF, and "thawed" source BUILD_VECTOR,
both of which are more natural for us to handle.

Refs. f738ab9075 (r95017306)
2023-01-04 21:16:11 +03:00
Roman Lebedev
91f1c59fcd
[NFC][X86] Add few more tests for freezing BUILD_VECTOR 2023-01-04 21:16:11 +03:00
Amaury Séchet
ac17b6b963 [NFC] Autogenerate CodeGen/X86/sdiv-pow2.ll 2023-01-04 16:43:47 +00:00
Roman Lebedev
4fc417ec37
[DAGCombiner] convertBuildVecZextToBuildVecWithZeros(): rework split factor calculation
The original computation was both making assumptions that do not hold
in practice, and being overly pessimistic. We should just check
every possible split factor, and pick the best one.

Fixes https://github.com/llvm/llvm-project/issues/59781
2023-01-02 18:34:35 +03:00
Roman Lebedev
1337821f11
[DAGCombiner][X86] Fold a CONCAT_VECTORS of SHUFFLE_VECTOR and it's operand into wider SHUFFLE_VECTOR
This was showing as a source of *many* regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
2023-01-01 23:18:42 +03:00
Roman Lebedev
a190b40861
[NFC][X86] Add tests for concatenation of shuffle's operand to the shuffle 2023-01-01 23:12:21 +03:00
Fangrui Song
2679e8bba3 [X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld
ENABLE_X86_RELAX_RELOCATIONS has defaulted to on in 2020.
This workaround is not exercised for a long time.
2022-12-31 22:39:20 -08:00
Roman Lebedev
16facf1ca6
[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vector
Single-element vectors are legalized by splitting,
so the the memory operations would also get scalarized.
While we do have some support to reconstruct scalarized loads,
we clearly don't catch everything.

The comment for the affected AArch64 store suggests that
having two stores was the desired outcome in the first place.

This was showing as a source of *many* regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
2022-12-31 03:49:43 +03:00
Roman Lebedev
2480164247
[NFC][Codegen][x86] Add tests for load/store of a single-element vectors 2022-12-31 03:23:24 +03:00
Roman Lebedev
e4d25a9c23
[DAG] BUILD_VECTOR: absorb ZERO_EXTEND of a single first operand if all other ops are zeros
This kind of pattern seems to come up as regressions
with better ZERO_EXTEND_VECTOR_INREG recognition.

For initial implementation, this is quite restricted
to the minimal viable transform, otherwise there are
too many regressions to be dealt with.
2022-12-31 00:58:11 +03:00
Roman Lebedev
a35b216290
[NFC][X86] Add exhaustive-ish coverage for broadcast of implicitly aext/zext element
Some of these even crash instruction selection for AVX512.
This is one of the patterns that comes up as regressions
with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.

https://godbolt.org/z/x88aqfrT5
2022-12-30 22:40:20 +03:00
Roman Lebedev
c823517ef5
[NFC][Codegen][X86] zero_extend_vector_inreg.ll: add SSE4.2 runline 2022-12-30 01:44:15 +03:00
Roman Lebedev
248567a327
[DAGCombiner] Try to partition ISD::EXTRACT_VECTOR_ELT to accomodate it's ISD::BUILD_VECTOR users
This mainly cleans up a few patterns that are legalized by scalarization
from a wide-element vector, but then are further split apart to build
a more narrow-sized-element vector. In particular this happens in some
cases for illegal ISD::ZERO_EXTEND_VECTOR_INREG.

Given a ISD::EXTRACT_VECTOR_ELT, which is a glorified bit sequence extract,
recursively analyse all of it's users. and try to model themselves as
bit sequence extractions. If all of them agree on the new, narrower element
type, and all of them can be modelled as ISD::EXTRACT_VECTOR_ELT's of that
new element type, do that, but only if unmodelled users are ISD::BUILD_VECTOR.
2022-12-30 01:15:53 +03:00
Craig Topper
8abd70081f [TargetLowering] Teach BuildUDIV to take advantage of leading zeros in the dividend.
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.

This patch is for scalars only, but we might be able to do this
for vectors in a follow up.

Differential Revision: https://reviews.llvm.org/D140750
2022-12-29 13:58:46 -08:00
Roman Lebedev
778a7df50e
[NFC][Codegen][X86] Add exhaustive-ish test coverage for ZERO_EXTEND_VECTOR_INREG
It should be possible to deduplicate AVX2 and AVX512F checklines,
but i'm not sure which combination of check prefixes would do that.

https://godbolt.org/z/sndT9n1nz
2022-12-29 03:18:01 +03:00
Thomas Köppe
82be8a1d2b [X86] Emit RIP-relative access to local function in PIC medium code model
Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.)

Example:

```
static int g(int n) {
  return 2 * n + 3;
}

void f(int(**p)(int)) {
  *p = g;
}
```

This results in:

```
Disassembly of section .text:

0000000000000000 <f>:
       0: 48 b8 00 00 00 00 00 00 00 00	movabs	rax, 0x0
       a: 48 89 07                     	mov	qword ptr [rdi], rax
       d: c3                           	ret
```

```
Relocation section '.rela.text' at offset 0xf0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000002  0000000200000001 R_X86_64_64            0000000000000000 .text + 10
```

This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140593
2022-12-28 11:14:39 -08:00
Roman Lebedev
c4f815d705
[DAGCombine] combineShuffleToZeroExtendVectorInReg(): widen shuffle elements before trying to match
We might have sunk a bitcast into shuffle, and now it might be operating
on more fine-grained elements than what we'd match, so we must not be
dependent on whatever the granularity the shuffle happened to be in,
but transform it into the one canonical for us - with widest elements.
2022-12-27 00:47:45 +03:00
Roman Lebedev
cc051b0730
[NFC][X86] Add some tests that can be matched as ZERO_EXTEND_VECTOR_INREG 2022-12-27 00:41:59 +03:00
Roman Lebedev
e26e7ed69a
[DAG] combineShuffleToZeroExtendVectorInReg(): try to match w/ commuted operands
We don't have any reason to expect that the operand we will match
is on any particular hand of the shuffle, so we should try both.
2022-12-26 22:54:03 +03:00
Roman Lebedev
ec99bf2480
[NFC][Codegen][X86] Autogenerate check lines in shift-i256.ll 2022-12-24 19:26:42 +03:00
Roman Lebedev
110c5442b8
[NFC][Codegen] Add tests with oversized shifts by non-byte-multiple 2022-12-24 19:26:41 +03:00
Roman Lebedev
a9fbf25a14
[NFC][Codegen] Rename tests for oversized shifts by byte multiple 2022-12-24 19:26:41 +03:00
Roman Lebedev
387c1573f8
[NFC][Codegen] Tests with wide scalar shifts, for new potential legalization strategy 2022-12-24 00:47:25 +03:00
Roman Lebedev
aad725928d
[NFC][Codegen][X86] Add codegen test coverage for the variably-indexed load of alloca w/zero upper half 2022-12-23 20:16:41 +03:00
Roman Lebedev
03e848293e
[DAGCombiner] visitFREEZE(): fix cycle breaking
Depending on the particular DAG, we might either create a `freeze`,
or not. And only in the former case, the cycle would be formed.
It would be nicer to have `ReplaceAllUsesOfValueWithIf()`,
like we have in IR, but we don't have that.

Fixes https://github.com/llvm/llvm-project/issues/59677
2022-12-23 18:16:22 +03:00
Roman Lebedev
d8f541efe7
[DAGCombiner] visitFREEZE(): fix handling of no maybe-poison ops
The original code was confusing. It was stripping poison-generating flags,
but the comments were saying that doing so was a TODO.

If the poison-generating flags are present, then even if all operands
are guaranteed not to be undef or poison, the whole operation may still
produce undef or poison. We can still deal with that case,
and we already do deal with it in fact, by also dropping those flags.

Refs. https://github.com/llvm/llvm-project/issues/59676
2022-12-23 17:26:05 +03:00
Roman Lebedev
d7a63a0421
[DAGCombiner] visitFREEZE(): restore previous behaviour on no maybe-poison operands
Lack of such operands implies that the op might be poison-producing due to
it's flags. We seem to drop them already, but the comments are confusing.

Fixes https://github.com/llvm/llvm-project/issues/59676
2022-12-23 17:26:05 +03:00
Roman Lebedev
e7f21d750c
[NFC][Codegen][X86] Tests w/ final optimized IR of SROA-with-variably-indexed-loads (D140493)
32-byte ones are for consistency only, we really only care about
up to 16-byte on 64-bit and maybe up to 8-byte on 32-bit.

In 16byte ones, we are still having some redundant vec<->scalar traffic.

https://reviews.llvm.org/D140493
2022-12-23 04:41:32 +03:00
Roman Lebedev
6fea27662d
[DAGCombiner] visitFREEZE(): be less greedy with replacing other uses of undef 2022-12-23 02:26:36 +03:00
Roman Lebedev
f738ab9075
[DAGCombiner] visitFREEZE(): allow multiple maybe-poison operands for BUILD_VECTOR 2022-12-23 02:26:36 +03:00
Roman Lebedev
1234754bbc
[DAGCombine] BUILD_VECTOR can not create undef or poison 2022-12-23 02:26:36 +03:00