Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
Artifact combiner is not able to access individual elements after using
LCMTy style merge/unmerge, extract and insert to change vector number of
elements (pad with undef or split to sub-vector instructions).
Use unmerge to individual elements instead and then merge elements into
requested types.
Change argument lowering for vectors and moreElementsVector to use
buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements.
FewerElementsVector had a few helpers that had different behavior,
introduce new helper for most of the opcodes.
FewerElementsVector helper is more flexible since it can create leftover
instruction smaller then requested type (useful in case target wants to
avoid pad with undef and use fewer registers). If target does not want
leftover of different type it should call more elements first.
Some helpers were performing more elements first to have split without
leftover. Opcodes that used this helper use clampMaxNumElementsStrict
(does more elements first) in LegalizerInfo to avoid test changes.
Fixes failures caused by failing to combine artifacts created during
more/fewer elements vector.
Differential Revision: https://reviews.llvm.org/D114198
In collectTiedOperands, when handling an undef use that is tied to a
def, constrain the dst reg with the actual register class of the src
reg, instead of with the register class from the instructions's
MCInstrDesc. This makes a difference in some AMDGPU test cases like
this, before:
%16:sgpr_96 = INSERT_SUBREG undef %15:sgpr_96_with_sub0_sub1(tied-def 0), killed %11:sreg_64_xexec, %subreg.sub0_sub1
After, without this patch:
undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec
This fails machine verification if you force it to run after
TwoAddressInstruction (currently it is disabled) with:
*** Bad machine code: Invalid register class for subregister index ***
- function: s_load_constant_v3i32_align4
- basic block: %bb.0 (0xa011a88)
- instruction: undef %16.sub0_sub1:sgpr_96 = COPY killed %11:sreg_64_xexec
- operand 0: undef %16.sub0_sub1:sgpr_96
Register class SGPR_96 does not fully support subreg index 4
After, with this patch:
undef %16.sub0_sub1:sgpr_96_with_sub0_sub1 = COPY killed %11:sreg_64_xexec
See also svn r159120 which introduced the code to handle tied undef
uses.
Differential Revision: https://reviews.llvm.org/D110944