Refactor BasicBlockSections to use the target-specific noop insertion
hook from TargetInstrInfo instead of building it ourselves. Using the
TII hook is both cleaner and makes it easier to extend BBSections to
non-X86 targets.
Differential Revision: https://reviews.llvm.org/D158303
There is no pattern for ADCX/ADOX and they are never selected during
ISEL. So we remove the cases in some MIR optimizations in this patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D157717
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.
The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.
/// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
/// set, this hook lets the target specify whether the instruction is actually
/// trivially rematerializable, taking into consideration its operands.
It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.
The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`. That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).
By making this a single interface, we can override the interface to do either.
Reviewed By: craig.topper, nemanjai
Differential Revision: https://reviews.llvm.org/D156520
This is a workaround for a coalescer bug where coalescing
SUBREG_TO_REG ends up losing the liveness of the high bits of the
source register. The result is an incorrect undef subregister def
instead of preserving the high values. Work around the observed
failure after the resulting mov is eliminated during allocation until
a proper fix is ready. I believe the proper fix is to make
SUBREG_TO_REG use a tied operand.
The test should catch a regression originally observed after
b7836d856206ec39509d42529f958c920368166b and should not show a
difference after a496c8be6e638ae58bb45f13113dbe3a4b7b23fd is reverted.
https://reviews.llvm.org/D156164
Previously we remove a pattern like:
%reg = and32ri %in_reg, 5
... // EFLAGS not changed.
%src_reg = subreg_to_reg 0, %reg, %subreg.sub_index
test64rr %src_reg, %src_reg, implicit-def $eflags
We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially.
And this case is also can be opted for the same reason, like:
%reg = and32ri %in_reg, 5
... // EFLAGS not changed.
%src_reg = copy %reg.sub_16bit:gr32
test16rr %src_reg, %src_reg, implicit-def $eflags
The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D154193
Previously we remove a pattern like:
%reg = and32ri %in_reg, 5
... // EFLAGS not changed.
%src_reg = subreg_to_reg 0, %reg, %subreg.sub_index
test64rr %src_reg, %src_reg, implicit-def $eflags
We can remove test64rr since it has same functionality as and subreg_to_reg avoid the opt in previous code, so we handle this case specially.
And this case is also can be opted for the same reason, like:
%reg = and32ri %in_reg, 5
... // EFLAGS not changed.
%src_reg = copy %reg.sub_16bit:gr32
test16rr %src_reg, %src_reg, implicit-def $eflags
The COPY from gr32 to gr16 prevent the opt in previous code too, just handle it specially as what we did for test64rr.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D154193
Sometimes an developer would like to have more control over cmov vs branch. We have unpredictable metadata in LLVM IR, but currently it is ignored by X86 backend. Propagate this metadata and avoid cmov->branch conversion in X86CmovConversion for cmov with this metadata.
Example:
```
int MaxIndex(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is converted to branch by X86CmovConversion
if (a[i] > a[t]) t = i;
}
return t;
}
int MaxIndex2(int n, int *a) {
int t = 0;
for (int i = 1; i < n; i++) {
// cmov is preserved
if (__builtin_unpredictable(a[i] > a[t])) t = i;
}
return t;
}
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118118
This caused compiler assertions, see comment on
https://reviews.llvm.org/D150107.
This also reverts the dependent follow-up change:
> [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI
>
> This is follow-up of D150107.
>
> In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
> shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
>
> Differential Revision: https://reviews.llvm.org/D150949
This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and
5586bc539acb26cb94e461438de01a5080513401.
This is follow-up of D150107.
In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
Differential Revision: https://reviews.llvm.org/D150949
It's first suggested by @craig.topper in D150068. I think there are at least three pros
1. This can reduce the patterns during ISEL, as a result, reducing the bytes in X86GenDAGISel.inc
2. The patterns for shift/rotate with immediate 1 look quite similar to shift/rotate with immediate 8. So this can be seen as eliminating "duplicate" code.
3. Delay the optimization from imm8 to imm1, so that the previous optimization passes do not need to handle the version of imm1
It improves fast isel code and makes X86DomainReassignment work for shifts by 1, but regressed global isel, though no one should care.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D150107
Add support for splitting critical edges coming from an indirect jump
using a jump table ("switch jumps").
This introduces the `TargetInstrInfo::getJumpTableIndex` callback to
allows targets to return an index into `MachineJumpTableInfo` for a
given indirect jump. It also updates to
`MachineBasicBlock::SplitCriticalEdge` to allow splitting of critical
edges by rewriting jump table entries.
This is largely based on work done by Zhixuan Huan in D132202.
Differential Revision: https://reviews.llvm.org/D140975
KCFI machine function passes transform indirect calls with a
cfi-type attribute into architecture-specific type checks bundled
together with the calls. Instead of having a separate pass for each
architecture, add a generic machine function pass for KCFI and
move the architecture-specific code that emits the actual check to
TargetLowering. This avoids unnecessary duplication and makes it
easier to add KCFI support to other architectures.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D149915
The previous patch (D148980) didn't set the InstrIdxForVirtReg correctly
in genAlternativeDpCodeSequence(). It causes vnni lit test failure when
LLVM_ENABLE_EXPENSIVE_CHECKS is on.
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9793:3: error: variable 'MaddOpc' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
default:
^~~~~~~
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9854:25: note: uninitialized use occurs here
Madd->setDesc(TII.get(MaddOpc));
^~~~~~~
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9791:19: note: initialize the variable 'MaddOpc' to silence this warning
unsigned MaddOpc;
^
= 0
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9793:3: error: variable 'AddOpc' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
default:
^~~~~~~
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9862:46: note: uninitialized use occurs here
BuildMI(*MF, MIMetadata(Root), TII.get(AddOpc), DstReg)
^~~~~~
/data/llvm-project/llvm/lib/Target/X86/X86InstrInfo.cpp:9790:18: note: initialize the variable 'AddOpc' to silence this warning
unsigned AddOpc;
^
= 0
2 errors generated.
"vpmaddwd + vpaddd" can be combined to vpdpwssd and the latency is
reduced after combination. However when vpdpwssd is in a critical path
the combination get less ILP. It happens when vpdpwssd is in a loop, the
vpmaddwd can be executed in parallel in multi-iterations while vpdpwssd
has data dependency for each iterations. If vpaddd is in a critical path
while vpmaddwd is not, it is profitable to split vpdpwssd into "vpmaddwd
+ vpaddd".
This patch is based on the machine combiner framework to acheive decision
on "vpmaddwd + vpaddd" combination. The typical example code is as
below.
```
__m256i foo(int cnt, __m256i c, __m256i b, __m256i *p) {
for (int i = 0; i < cnt; ++i) {
__m256i a = p[i];
__m256i m = _mm256_madd_epi16 (b, a);
c = _mm256_add_epi32(m, c);
}
return c;
}
```
Differential Revision: https://reviews.llvm.org/D148980
I think it's good practice to avoid having default ctors unless they're really
valid/useful. For OutlinedFunction the default ctor was used to represent a
bail-out value for getOutliningCandidateInfo(), so I changed the API to return
an optional<getOutliningCandidateInfo> instead which seems a tad cleaner.
Differential Revision: https://reviews.llvm.org/D146375
The motivation behind this patch is to unify some of the outliner logic across architectures. This looks nicer in general and makes fixing [issues like this](https://reviews.llvm.org/D124707#3483805) easier.
There are some notable changes here:
1. `isMetaInstruction()` is used directly instead of checking for specific meta-instructions like `IMPLICIT_DEF` or `KILL`. This was already done in the RISC-V implementation, but other architectures still did hardcoded checks.
- As an exception to this, CFI instructions are explicitly delegated to the target because RISC-V has different handling for those.
2. `isTargetIndex()` checks are replaced with an assert; none of the architectures supported actually use `MO_TargetIndex` at this point in time.
3. `isCFIIndex()` and `isFI()` checks are also replaced with asserts, since these operands should not exist in [any context](https://reviews.llvm.org/D122635#3447214) at this stage in the pipeline.
Reviewed by: paquette
Differential Revision: https://reviews.llvm.org/D125072
```
MCRegister getX86SubSuperRegister*(MCRegister Reg, unsigned Size,
bool High = false);
```
A strange behavior of the functions `getX86SubSuperRegister*` was
introduced by llvm-svn:145579: The returned register may not
match the parameters when a 8-bit high register is required.
And llvm-svn: 175762 refined the code and dropped the comments, then we
knew nothing happened there from the code :-(
These two functions are only called with `Size=8` and `High=true` in two places.
One is in `X86FixupBWInsts.cpp` for liveness of registers and the other is in
`X86AsmPrinter.cpp` for inline asm.
For the first one, we provide an alternative in this patch.
For the second one, the strange behaviour caused a bug that an erorr was not reported for mismatched modifier.
```
void f() {
char x;
asm volatile ("mov %%ah, %h0" :"=r"(x)::"%eax", "%ebx", "%ecx", "%edx", "edi", "esi");
}
```
```
$ gcc -S test.c
error: extended registers have no high halves
```
```
$ clang -S test.c
no error
```
so we fix the bug in this patch.
`getX86SubSuperRegister` is just a wrapper of `getX86SubSuperRegisterOrZero` with a `assert`.
I belive we should remove the latter.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D142834
The register allocator may introduce reloads in the middle of reading
and writing the EFLAGS register, due to the RDFLAGS & WRFLAGS pseudos
being expanded before RA. This may cause an issue where the stack
pointer was adjusted but the stack offset for the reload wasn't
accounted for (see [1]).
To avoid this, expand these pseudos after register allocation.
[1] https://github.com/llvm/llvm-project/issues/59102
Reviewed By: craig.topper, nickdesaulniers, pengfei
Differential Revision: https://reviews.llvm.org/D140045
At the call site of findFirstSet, ZMask | (1 << DstIdx) always have
exactly 3 bits set, and they are all among the 4 least significant
bits, so (ZMask | (1 << DstIdx)) ^ 15 has exactly one bit set. Since
the argument to findFirstSet is nonzero, we can safely switch to
llvm::countr_zero.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138656
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.
Differential Revision: https://reviews.llvm.org/D136754
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Before this patch, the code enumerated `getCondFromBranch`, `getCondFromSETCC` and `getCondFromFromCMov` to get the condition code of a `MachineInstr`, and assigned the result to variable `OldCC` when `MI || IsSwapped || ImmDelta != 0` was satisfiled.
After this patch, the `if-else` structure is eliminated by using `getCondFromMI`. Since `OldCC` is only used when `MI || IsSwapped || ImmDelta != 0` is true, it is initialized with `getCondFromMI` directly outside the scope of `if` now.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D138349
Following on from D129634, this patch fixes more X86 CodeGen test
failures with D129213 applied, which adds verification of LiveIntervals
after the TwoAddressInstruction pass runs. These failures only showed up
with LLVM_ENABLE_EXPENSIVE_CHECKS=ON which adds the equivalent of an
implicit -verify-machineinstrs on all tests.
Differential Revision: https://reviews.llvm.org/D136596
Clang may optimize conditional tailcall blocks with the following layout:
cmp <condition>
je tailcall_target
ret
When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above.
Thus, prevent clang from emitting this it if CodeModel is Kernel.
Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/
Reviewed By: nickdesaulniers, pengfei
Differential Revision: https://reviews.llvm.org/D134915