1529 Commits

Author SHA1 Message Date
Craig Topper
4b28980772 [X86] Simplify the interface to getCondNoFromDesc.
Instead of taking a SkipDefs parameter, rename to getCondSrcNoFromDesc
and have it return the source operand number. Make getCondFromMI
responsible for adding the number of Defs for MI instructions.

While there remove some unneeded casts to unsigned and check for
negative numbers instead of explicitly -1. Less than 0 is easier
for a compiler to codegen.

Differential Revision: https://reviews.llvm.org/D122113
2022-03-20 22:41:39 -07:00
Shengchen Kan
01136c0530 [X86][NFC] Run clang-format on cb26730aaa8b, fix typo and remove redundant else 2022-03-21 12:08:10 +08:00
Shengchen Kan
cb26730aaa [X86][NFC] Unify implementations of getting condition code 2022-03-21 11:31:16 +08:00
Shengchen Kan
51e6059c12 [X86] Simplify function isDataInvariant by using X86MnemonicTables
This is not a NFC change b/c we add more instructions like
IMUL16/32/64r, MOV16ao16 and MOV16rr_REV etc to the list.
But I think it's reasonable.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D122063
2022-03-20 18:38:58 +08:00
Shengchen Kan
076a9dc99a [X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF()
To make them less like other feature functions.
This is a follow-up patch for D121978.
2022-03-20 12:00:25 +08:00
Shengchen Kan
920c2e5763 [X86][NFC] Rename target feature hasCMov->hasCMOV
This is a follow-up patch for D121975.
2022-03-18 14:05:52 +08:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Jessica Paquette
6d58f4ab07 [MachineOutliner] NFC: Hide LRU-related stuff behind helper functions
It's not particularly user-friendly to have to call `initLRU` everywhere. Also,
it wasn't particularly great that the LRU for registers used in a sequence was
also initialized by `initLRU`.

This patch hides this stuff behind some helper functions:

* `isAvailableAcrossAndOutOfSeq`
* `isAnyUnavailableAcrossOrOutOfSeq`
* `isAvailableInsideSeq`

This allows the user to avoid calling `initLRU` explicitly. Also, it allows
us to separate initializing the used-in-sequence LRU from the main LRU.

Since both ARM and AArch64 check LR liveness in `insertOutlinedCall`, this
refactor requires that we de-const the Candidate there.

Some other quality-of-code improvements:

* LRUs in outliner::Candidate now have more descriptive names
* Use `Register` instead of `unsigned` in some places
* Improve readability in some places by using ranges rather than `std::for_each`

This is a preparatory commit for a larger compile time related change for the
AArch64 outliner.
2022-02-16 11:39:07 -08:00
Simon Pilgrim
e95fc20f04 [X86] getFMA3OpcodeToCommuteOperands - use unreachable to detect fma3 format mismatch
Matches what we do in getThreeSrcCommuteCase.

Fixes static analyzer out of bounds array access warning.
2022-02-10 17:14:39 +00:00
Matthias Braun
ad25f8a556 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This is a re-commit of e2c7ee0743592e39274e28dbe0d0c213ba342317 which
was reverted in a2a58d91e82db38fbdf88cc317dcb3753d79d492 and
ea81cea8163a1a0e54df42103ee1c657bbf03791. This includes a fix to
consistently check for EFLAGS being live-out. See phabricator
review.

Original Summary:

This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2022-01-11 09:07:29 -08:00
Simon Pilgrim
0a08813cad [X86][MMX] Remove superfluous 'i' from MMX binop opnames. NFCI.
This is a very old copy+paste typo - none of these binops have an immediate operand.

Noticed while trying to merge MMX instructions into some existing SSE instruction scheduler instregex patterns.
2021-12-12 17:59:16 +00:00
Bogdan Graur
ea81cea816 Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr"
This reverts commit 847a6807332b13f43704327c2d30103ec0347c77.

The reverted revision was causing miscompiles that manifest on AMD
machines.

Differential Revision: https://reviews.llvm.org/D115528
2021-12-10 23:01:24 +01:00
Kazu Hirata
259cd6f893 [llvm] Use range-based for loops (NFC) 2021-11-25 22:17:10 -08:00
Matt Morehouse
671f0930fe [X86] Selective relocation relaxation for +tagged-globals
For tagged-globals, we only need to disable relaxation for globals that
we actually tag.  With this patch function pointer relocations, which
we do not instrument, can be relaxed.

This patch also makes tagged-globals work properly with LTO, as
-Wa,-mrelax-relocations=no doesn't work with LTO.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D113220
2021-11-19 07:18:27 -08:00
Jay Foad
3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Serge Pavlov
3057e850b8 [X86] Preserve FPSW when popping x87 stack
When compiler converts x87 operations to stack model, it may insert
instructions that pop top stack element. To do it the compiler inserts
instruction FSTP right after the instruction that calculates value on
the stack. It can break the code that uses FPSW set by the last
instruction. For example, an instruction FXAM is usually followed by
FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code
in FPSW undefined, the compiler produces incorrect code.

With this change FSTP in inserted after the FPSW consumer if the last
instruction sets FPSW.

Differential Revision: https://reviews.llvm.org/D113335
2021-11-12 12:00:09 +07:00
Alexander Shaposhnikov
b705e13341 [CodeGen][Outliner] Clean up dead code
Clean up dead code in X86InstrInfo.cpp and AArch64InstrInfo.cpp

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D111151
2021-11-09 21:39:38 +00:00
Kazu Hirata
41ef3187e0 [ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC) 2021-11-07 09:53:16 -08:00
Simon Pilgrim
d391e4fe84 [X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.

Helps prevent future scheduler model mismatches like those that were only addressed in D44687.

Differential Revision: https://reviews.llvm.org/D113302
2021-11-07 15:06:54 +00:00
Kazu Hirata
14d656b3d8 [Target] Use llvm::reverse (NFC) 2021-11-06 13:08:21 -07:00
Matthias Braun
847a680733 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This is a re-commit of e2c7ee0743592e39274e28dbe0d0c213ba342317 which
was reverted in a2a58d91e82db38fbdf88cc317dcb3753d79d492. This includes
a fix to consistently check for EFLAGS being live-out. See phabricator
review.

Original Summary:

This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-11-03 14:12:23 -07:00
Hans Wennborg
a2a58d91e8 Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr"
This casued miscompiles of switches, see comments on the code review.

> This extends `optimizeCompareInstr` to re-use previous comparison
> results if the previous comparison was with an immediate that was 1
> bigger or smaller. Example:
>
>     CMP x, 13
>     ...
>     CMP x, 12   ; can be removed if we change the SETg
>     SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP
>
> Motivation: This often happens because SelectionDAG canonicalization
> tends to add/subtract 1 often when optimizing for fallthrough blocks.
> Example for `x > C` the fallthrough optimization switches true/false
> blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
> `x < C + 1`.
>
> Differential Revision: https://reviews.llvm.org/D110867

This reverts commit e2c7ee0743592e39274e28dbe0d0c213ba342317.
2021-11-03 17:01:36 +01:00
Matthias Braun
e2c7ee0743 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-10-28 10:33:56 -07:00
Matthias Braun
97a1570d8c X86InstrInfo: Optimize more combinations of SUB+CMP
`X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB`
followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to
also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB
x,0`+`TEST`.

- Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the
  candidate instruction and compare the results. This normalizes things
  and gives consistent results for various comparisons (`CMP x, y`,
  `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`,
  `CMP x, 0`...).
- Turn `isRedundantFlagInstr` into a member function so it can call
  `analyzeCompare`.  - We now also check `isRedundantFlagInstr` even if
  `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`.

Differential Revision: https://reviews.llvm.org/D110865
2021-10-28 10:33:56 -07:00
Kazu Hirata
593451bd3c [X86] Remove getSETOpc (NFC)
This function seems to be unused for at least one year.
2021-10-27 09:22:31 -07:00
Matthias Braun
4b75d674f8 X86InstrInfo: Look across basic blocks in optimizeCompareInstr
This extends `optimizeCompareInstr` to continue the backwards search
when it reached the beginning of a basic block. If there is a single
predecessor block then we can just continue the search in that block and
mark the EFLAGS register as live-in.

Differential Revision: https://reviews.llvm.org/D110862
2021-10-24 16:22:45 -07:00
Matthias Braun
683994c863 X86InstrInfo: Refactor and cleanup optimizeCompareInstr
This changes the first part of `optimizeCompareInstr` being split into a
loop with a forward scan for cases that re-use zero flags from a
producer in case of compare with zero and a backward scan for finding an
instruction equivalent to a compare.

The code now uses a single backward scan searching for the next
instructions that reads or writes EFLAGS.

Also:
- Add comments giving examples for the 3 cases handled.
- Check `MI` which contains the result of the zero-compare cases,
  instead of re-checking `IsCmpZero`.
- Tweak coding style in some loops.
- Add new MIR based tests that test the optimization in isolation.

This also removes a check for flag readers in situations like this:
```
= SUB32rr %0, %1, implicit-def $eflags
...  we no longer stop when there are $eflag users here
CMP32rr %0, %1   ; will be removed
...
```

Differential Revision: https://reviews.llvm.org/D110857
2021-10-24 16:22:45 -07:00
Fangrui Song
c2d4fe51bb [X86] Remove little support we had for MPX
GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019.
glibc 2.35 will remove MPX.

Our support is limited: we support assembling of bndmov but not bnd.
Just remove it.

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D111517
2021-10-12 16:18:51 -07:00
Jay Foad
5b8befdd02 [X86] Special-case ADD of two identical registers in convertToThreeAddress
X86InstrInfo::convertToThreeAddress would convert this:

  %1:gr32 = ADD32rr killed %0:gr32(tied-def 0), %0:gr32, implicit-def dead $eflags

to this:

  undef %2.sub_32bit:gr64 = COPY killed %0:gr32
  undef %3.sub_32bit:gr64_nosp = COPY %0:gr32
  %1:gr32 = LEA64_32r killed %2:gr64, 1, killed %3:gr64_nosp, 0, $noreg

Note that in the ADD32rr, %0 was used twice and the first use had a kill
flag, which is what MachineInstr::addRegisterKilled does.

In the converted code, each use of %0 is copied to a new reg, and the
first COPY inherits the kill flag from the ADD32rr. This causes
machine verification to fail (if you force it to run after
TwoAddressInstructionPass) because the second COPY uses %0 after it is
killed. Note that machine verification is currently disabled after
TwoAddressInstructionPass but this is a step towards being able to
enable it.

Fix this by not inserting more than one COPY from the same source
register.

Differential Revision: https://reviews.llvm.org/D110829
2021-10-07 19:50:27 +01:00
Jay Foad
6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Wang, Pengfei
ebec077e07 [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109658
2021-09-23 11:02:48 +08:00
Nikita Popov
0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Wang, Pengfei
c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei
b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Wang, Pengfei
2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Craig Topper
786b8fcc9b [X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl.
They were already added to findCommuteOpIndices, but they also
need to be in X86InstrInfo::commuteInstructionImpl in order
to adjust the immediate control.
2021-08-15 11:36:13 -07:00
Wang, Pengfei
f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Wang, Pengfei
6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Jeremy Morse
f46321207f [InstrRef][X86] Drop debug instruction numbers from x87 instructions
Avoid a crash when using instruction referencing if x87 floating point
instructions are used. These instructions are significantly mutated when
they're rewritten from referring to registers, to referring to
floating-point-stack positions. As a result, their operands are re-ordered,
and (InstrRef) LiveDebugValues asserts when it sees a DBG_INSTR_REF
referring to a non-reg non-def register operand.

To fix this, drop the instruction numbers, and thus variable locations.
This patch adds a helper utility do do that.

Dropping the variable locations is sub-optimal, but applying DBG_VALUEs to
the $fp0 and similar registers is dropped on emission too. It seems we've
never done well at describing variables that live in x87 registers, at all.

Differential Revision: https://reviews.llvm.org/D105657
2021-07-19 15:08:27 +01:00
Guozhi Wei
5609c8b607 [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence
    lea (reg1, reg2), reg3
    sub reg3, reg4
to two sub instructions
    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.

The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).

Differential Revision: https://reviews.llvm.org/D104684
2021-07-16 10:16:03 -07:00
Jeremy Morse
30cce54dad [X86] Return src/dest register from stack spill/restore recogniser
LLVM provides target hooks to recognise stack spill and restore
instructions, such as isLoadFromStackSlot, and it also provides post frame
elimination versions such as isLoadFromStackSlotPostFE. These are supposed
to return the store-source and load-destination registers; unfortunately on
X86, the PostFE recognisers just return "1", apparently to signify "yes
it's a spill/load". This patch alters the hooks to correctly return the
store-source and load-destination registers:

This is really useful for debug-info as we it helps follow variable values
as they move on/off the stack. There should be no codegen changes: the only
other users of these PostFE target hooks are MachineInstr::getRestoreSize
and MachineInstr::getSpillSize, which don't attempt to interpret the
returned register location.

While we're here, delete the (InstrRef) LiveDebugValues heuristic that
tries to find the spill source register by looking for a killed reg -- we
should be able to rely on the target hooks for that. This involves
temporarily turning off a n InstrRef LivedDebugValues test on aarch64
(patch to re-enable it is in D104521).

Differential Revision: https://reviews.llvm.org/D105428
2021-07-09 18:12:30 +01:00
Jeremy Morse
63cc251eb9 [DebugInfo][InstrRef][4/4] Support DBG_INSTR_REF through all backend passes
This is a cleanup patch -- we're now able to support all flavours of
variable location in instruction referencing mode. This patch updates
various tests for debug instructions to be broader: numerous code paths
try to ignore debug isntructions, and they now have to ignore the
additional DBG_PHI and DBG_INSTR_REFs that we can generate.

A small amount of rework happens for LiveDebugVariables: as we don't need
to track live intervals through regalloc any more, we can get away with
unlinking debug instructions before regalloc, then re-inserting them after.
Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go
through live interval tracking.

In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs
for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the
final locations where DBG_VALUEs are emitted for vreg arguments.

X86InstrInfo now un-sets the debug instr number on SUB instructions that
get mutated into CMP instructions. As the instruction no longer computes a
subtraction, we can't use it for variable locations.

Differential Revision: https://reviews.llvm.org/D88898
2021-07-08 16:42:24 +01:00
Luo, Yuanke
5be314f79b [X86] Check immediate before get it.
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.

Differential Revision: https://reviews.llvm.org/D104037
2021-06-13 15:40:52 +08:00
Luo, Yuanke
1e72b9d52f Revert "[X86] Check immediate before get it."
This reverts commit 9eb2f723c24523194b833779d20b027bf89a4f55.
2021-06-13 13:55:38 +08:00
Luo, Yuanke
9eb2f723c2 [X86] Check immediate before get it.
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.

Differential Revision: https://reviews.llvm.org/D104037
2021-06-13 09:08:40 +08:00
Florian Hahn
5cd66420cc
Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
2021-06-12 11:41:38 +01:00
Guozhi Wei
1b748faf2b [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).

Differential Revision: https://reviews.llvm.org/D101970
2021-06-01 10:31:30 -07:00
Simon Pilgrim
707fc2e2f2 Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
Reports on D101970 indicate this is causing failures on multi-stage compiles.
2021-05-19 15:01:20 +01:00
Guozhi Wei
528bc10e95 [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).

Differential Revision: https://reviews.llvm.org/D101970
2021-05-18 18:02:36 -07:00
Simon Pilgrim
4956655640 [X86] X86InstrInfo.cpp - try to pass DebugLoc by const-ref to avoid costly TrackingMDNodeRef copies. NFCI. 2021-05-13 13:31:53 +01:00