32115 Commits

Author SHA1 Message Date
Jeremy Morse
b8f68ad9cd [DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.

Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.

Differential Revision: https://reviews.llvm.org/D114476
2021-11-24 10:34:48 +00:00
Jun Ma
17eb6b61de Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches"
This reverts commit 1f9fa549841a2ec55aa5a131bfaf83f0383c4713.
2021-11-24 10:26:37 +08:00
Matt Arsenault
273a0c8bc9 PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.

I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
2021-11-23 18:01:12 -05:00
Rong Xu
bf1138491a [SampleFDO] Recompute BFI if the sample loader changes BPI
The MIR sample loader changes the branch probability but not BFI.
Here we force a recompute of BFI if the branch probabilities are
changed.

Also register the MIR FSAFDO passes properly.

Differential Revision: https://reviews.llvm.org/D114400
2021-11-23 13:24:31 -08:00
Quinn Pham
1345bc5e16 [NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with primary in `LiveRangeUtils.h`.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D114191
2021-11-23 13:07:42 -06:00
Kazu Hirata
d45cb1d7ea [llvm] Use range-based for loops (NFC) 2021-11-23 08:54:48 -08:00
Simon Moll
1e65b93f3a [VP] Canonicalize macros of VPIntrinsics.def
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>.  Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D114144
2021-11-23 16:51:11 +01:00
David Green
32b6c17b29 [SDAG] Use UnknownSize for masked load/store MMO size
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).

This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".

Differential Revision: https://reviews.llvm.org/D113888
2021-11-23 09:47:56 +00:00
Kazu Hirata
d5b73a70a0 [llvm] Use range-based for loops (NFC) 2021-11-22 20:33:28 -08:00
Nico Weber
2fb3c05b34 [asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr()
This basically reverts 1778831a3d1, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake.  D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().

The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).

It's also less code (23 insertions, 188 deletions).

No behavior change.

Differential Revision: https://reviews.llvm.org/D114330
2021-11-22 11:49:57 -05:00
Nico Weber
7c2d51474a [asm] Allow labels as operands in intel asm syntax
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.

I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e337aa) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to

    call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))

nowadays (thanks to jrtc27 for that example!).

(6c4d255bf3d64 switched clang to blockaddress on an opt-in basis,
e4801f7844bb added docs for it, 31b132c0b781 added IR support.)

I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.

The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.

Differential Revision: https://reviews.llvm.org/D114329
2021-11-22 11:49:29 -05:00
Kazu Hirata
c133fb321f [CodeGen] Use llvm::is_contained (NFC) 2021-11-21 10:36:20 -08:00
Kazu Hirata
fc981cedea [llvm] Use range-based for loops (NFC) 2021-11-21 10:36:18 -08:00
Kazu Hirata
f6bce30cf9 [llvm] Use range-based for loops (NFC) 2021-11-20 18:42:10 -08:00
Nico Weber
8b76d33c59 [asm] Allow block address operands in asm inteldialect
This makes the following program build with -masm=intel:

    int foo(int count) {
      asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
      return count;
    stop:
      return 0;
    }

It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().

Differential Revision: https://reviews.llvm.org/D114167
2021-11-19 09:27:30 -05:00
Nico Weber
4f9a5c2a14 [asm] Remove explicit branch for modifier 'l'
No intended behavior change.

EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.

(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)

*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.

Differential Revision: https://reviews.llvm.org/D114216
2021-11-19 09:19:53 -05:00
Simon Pilgrim
812e64ef0c [DAG] MatchRotate - support rotate-by-constant of illegal types
Patch to fix some of the regressions in D77804.

By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.

Differential Revision: https://reviews.llvm.org/D113192
2021-11-19 11:12:04 +00:00
Kazu Hirata
7ca14f6044 [llvm] Use range-based for loops (NFC) 2021-11-18 09:09:52 -08:00
Eric Tang
9fe6b9e802 [TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s|u]mul.overflow intrinsics
Fixed the vector type issue that where we used getVectorNumElements()
    should be replaced by getVectorElementCount() when lowering these
    intrinsics.

    This is similar to D94149

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D109809
2021-11-18 10:16:08 +00:00
Craig Topper
d78fdf111d [LegalizeTypes] Further limit expansion of CTTZ during type promotion.
Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type.
We have special handling for CTTZ expansion to use those ops with a
small conversion. The setup for that doesn't generate extra code or
large constants so we don't gain anything from expanding early and we
make CTTZ_ZERO_UNDEF codegen worse.

Follow up from post commit feedback on D112268. We don't seem to have
any in tree tests that care about this.
2021-11-17 15:27:29 -08:00
Nico Weber
bf834b2629 [x86/asm] Let EmitMSInlineAsmStr() handle variants too
This is preparation for D113707, where I want to make `-masm=intel`
emit `asm inteldialect` instructions.

`{movq %rbx, %rax|mov rax, rbx}` is supposed to evaluate to the bit
between { and | for att and to the bit between | and } for intel.
Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(),
EmitMSInlineAsmStr() has to support variants as well.

(clang translates `{...|...}` to `$(...$|...$)`. I'm not sure why
it doesn't just send along only the first `...` or the second `...`
to LLVM, but given the notes in PR23933 let's not do a big
reorganization in this codepath.)

Differential Revision: https://reviews.llvm.org/D113932
2021-11-17 13:31:59 -05:00
Craig Topper
0274be28d7 [RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent.
If we have a large enough floating point type that can exactly
represent the integer value, we can convert the value to FP and
use the exponent to calculate the leading/trailing zeros.

The exponent will contain log2 of the value plus the exponent bias.
We can then remove the bias and convert from log2 to leading/trailing
zeros.

This doesn't work for zero since the exponent of zero is zero so we
can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need
a value for zero we can use a vmseq and a vmerge to handle it.

We need to be careful to make sure the floating point type is legal.
If it isn't we'll continue using the integer expansion. We could split the vector
and concatenate the results but that needs some additional work and evaluation.

Differential Revision: https://reviews.llvm.org/D111904
2021-11-17 10:29:41 -08:00
Nico Weber
103cc914d6 [x86/asm] Make variants work when converting at&t inline asm input to intel asm output
`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.

For PowerPC, that default variant is 1. For other targets, it's 0.

Without this, the included test case errors out with

    error: unknown use of instruction mnemonic without a size suffix
             mov rax, rbx

since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.

Differential Revision: https://reviews.llvm.org/D113894
2021-11-17 13:23:18 -05:00
DianQK
1e9fa0b12a Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.
This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead.

The compiler also needs to mark register $x0 as live in for the following case.

```
$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0
```

This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0.
As an example:

```
lang=c
typedef struct {
    double v1;
    double v2;
} D16;

typedef struct {
    D16 v1;
    D16 v2;
} D32;

typedef long long LL8;
typedef struct {
    long long v1;
    long long v2;
} LL16;
typedef struct {
    LL16 v1;
    LL16 v2;
} LL32;

typedef struct {
    LL32 v1;
    LL32 v2;
} LL64;

LL8 needx0(LL8 v0, LL8 v1);

void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8);

LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8)
{
  LL8 result = needx0(v0, 0);
  bar(v1, v2, v3, v4, v5, v6, v7, v8);
  return result + 1;
}
```

As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`.
This code is compiled to give the following instruction MIR code.

```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
...
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
...
```

Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get:
```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
...
```

For the first time we outlined the following sequence:
```
frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
```
and
```
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
```

When we execute the outline again, we will get:
```
$x0 = ORRXrs $xzr, $lr, 0 <---- here
BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0
$lr = ORRXrs $xzr, $x0, 0

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```

When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register.
The reason for the above error appears to be that:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```
should be:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0
```

When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`.
A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0).

Reviewed By: jinlin

Differential Revision: https://reviews.llvm.org/D112911
2021-11-17 09:44:10 -08:00
Mirko Brkusanin
db6bc2ab51 [AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods
If possible fold fneg into instruction above if users cannot fold mods and we
know it will decrease instruction count.
Follows same logic as SDAG combiner in choosing opportunities to combine.

Differential Revision: https://reviews.llvm.org/D112827
2021-11-17 14:25:13 +01:00
David Sherwood
8d77555b12 [Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector
When getTypeConversion returns TypeScalarizeScalableVector we were
sometimes returning a non-simple type from getTypeLegalizationCost.
However, many callers depend upon this being a simple type and will
crash if not. This patch changes getTypeLegalizationCost to ensure
that we always a return sensible simple VT. If the vector type
contains unusual integer types, e.g. <vscale x 2 x i3>, then we just
set the type to MVT::i64 as a reasonable default.

A test has been added here that demonstrates the vectoriser can
correctly calculate the cost of vectorising a "zext i3 to i64"
instruction with a VF=vscale x 1:

  Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll

Differential Revision: https://reviews.llvm.org/D113777
2021-11-17 13:11:58 +00:00
Simon Pilgrim
5fedbd5b18 [DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c')
If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.
2021-11-17 12:41:48 +00:00
Jay Foad
3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Eric Tang
f7eb061a5f [SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors
This change make WidenVecRes_SELECT work for scalable vectors.

    This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388
2021-11-17 08:55:11 +00:00
Aaron Puchert
b20da5117f Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC)
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D113864
2021-11-17 00:01:20 +01:00
Aaron Puchert
86b3100cde [DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC)
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.

Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D113862
2021-11-17 00:01:20 +01:00
Mircea Trofin
c6b9b702a0 [NFC][Regalloc] Factor out eviction decision from eviction attempt
This splits tryEvict into a const tryFindEvictionCandidate, which
attempts to find a candidate, and the actual eviction (should the former
be successful)

The newly introduced tryFindEvictionCandidate will move subsequently
into the RegAllocEvictionAdvisor.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113941
2021-11-16 10:50:23 -08:00
Kazu Hirata
ee0133dc6d [llvm] Use range-for loops (NFC) 2021-11-16 09:01:56 -08:00
Frederik Gossen
3f3d4e8a15 Fix unused variable warning in LoadStoreOpt.cpp with (void) 2021-11-16 12:03:59 +01:00
Frederik Gossen
2bceb7c8da Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp"
This reverts commit 40a609aebe4ab51174a164852b6399f322bf6d9a.
2021-11-16 12:00:17 +01:00
Frederik Gossen
ecfe7a3404 Revert "Fix unused variable warning."
This reverts commit a062e2a8ca27b615cf3d02ed5c551ca85efc0325.
2021-11-16 11:59:34 +01:00
Frederik Gossen
9a6817b7ed Revert "Fix another unused variable error."
This reverts commit 5b84ae7c48083bd0f40199837990cf915a2053b8.
2021-11-16 11:58:02 +01:00
Adrian Kuegel
5b84ae7c48 Fix another unused variable error. 2021-11-16 11:32:44 +01:00
Adrian Kuegel
a062e2a8ca Fix unused variable warning. 2021-11-16 11:17:33 +01:00
Frederik Gossen
40a609aebe Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp 2021-11-16 11:05:18 +01:00
Amara Emerson
dcd8728d83 Remove unnecessary <any> include. 2021-11-16 00:50:30 -08:00
Kazu Hirata
7f00806a6a [llvm] Use make_early_inc_range (NFC) 2021-11-15 21:28:46 -08:00
Amara Emerson
dc84770d55 [GlobalISel] Add a store-merging optimization pass and enable for AArch64.
This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.

The high level goals of this pass:

* Have a simple and efficient algorithm. As close to linear time as we can get.
  Thus, prioritizing scalability of the algorithm over merging every corner case
  we can find. The DAGCombiner's store merging code has been the source of
  compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
  we don't have the concept of chains like we do in the DAG, and the instruction
  order is stricter than enforcing ordering with graph edges. Although I
  considered adding something similar, I couldn't justify the overhead.

The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.

Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.

On AArch64 -Os, this optimization results in very minor savings on CTMark.

Differential Revision: https://reviews.llvm.org/D109131
2021-11-15 21:10:39 -08:00
Fabian Wolff
b484fa8289 [X86] Fix crash with inline asm using wrong register name
Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113834
2021-11-16 10:38:12 +08:00
Craig Topper
233def40f7 [DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948
2021-11-15 17:15:51 -08:00
Mircea Trofin
19e6b730ce [NFC][Regalloc] Factor types that would be used by the eviction advisor
This is in prepartion of pulling the eviction decision-making into an
analysis pass, which would then allow swapping that decision making
process.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113929
2021-11-15 13:15:14 -08:00
Nico Weber
b4e50e5228 [asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike
https://reviews.llvm.org/D71677 copied a bunch of code from
EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small
(likely unintentional) changes. This makes these pieces look the same.

No behavior change.

(Why are these functions two copies? No great reason as far as I can tell.
https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the
split; we might want to undo them at some point. But PR23933 suggests
that a bigger change is planned for this file in the future, so keeping
this incremental for now.)

Differential Revision: https://reviews.llvm.org/D113924
2021-11-15 15:43:01 -05:00
Nico Weber
0be836b7dd [asm] Convert AsmPrinter::PrintSpecial() to StringRef
No behavior change.

Differential Revision: https://reviews.llvm.org/D113911
2021-11-15 15:38:27 -05:00
Nico Weber
833393e021 [asm] Correctly handle special names in variants
There's really no reason why anyone should use these special names in a variant.
I noticed this while reading the code: all other writes to OS are guarded by
this conditional, and the behavior with the check seems more correct, so
let's add the check.

Differential Revision: https://reviews.llvm.org/D113909
2021-11-15 15:37:09 -05:00
Simon Pilgrim
7bac1985f4 [DAG] SimplifyVBinOp - add SDLoc() argument
Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats
2021-11-15 10:43:56 +00:00