18483 Commits

Author SHA1 Message Date
Phoebe Wang
fc16ca7120 [X86] Pre-commit test for pr59305 2022-12-08 00:35:42 +08:00
Simon Pilgrim
519adee201 [X86] combine-and.ll - add some 256/512-bit test coverage for D138521 2022-12-07 14:21:24 +00:00
Simon Pilgrim
abe6dbeedd [X86] combine-and.ll - add AVX1 test coverage 2022-12-07 14:21:24 +00:00
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
Rahman Lavaee
6015a045d7 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2022-12-06 22:50:09 -08:00
Sanjay Patel
adc7c589c3 [SDAG] try to convert bit set/clear to signbit test when trunc is free
(X & Pow2MaskC) == 0 --> (trunc X) >= 0
(X & Pow2MaskC) != 0 --> (trunc X) <  0

This was noted as a regression in the post-commit feedback for D112634
(where we canonicalized IR differently).

For x86, this saves a few instruction bytes. AArch64 seems neutral.

Differential Revision: https://reviews.llvm.org/D139363
2022-12-06 11:34:48 -05:00
Sanjay Patel
772c2f461b [AArch64][RISCV][x86] add tests for masked val equality with 0; NFC 2022-12-06 11:34:48 -05:00
Jonas Paulsson
5ecd363295 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.

- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.

Original commit message:

A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-05 12:53:50 -06:00
Dmitry Vyukov
dbe8c2c316 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-12-05 14:40:31 +01:00
Jonas Paulsson
122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson
17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Matt Arsenault
a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Fangrui Song
ca23b7ca47 [AsmPrinter] .addrsig_sym: remove isTransitiveUsedByMetadataOnly
With D135642 ignoring unregistered symbols, isTransitiveUsedByMetadataOnly added
by D101512 is no longer needed (the operation is potentially slow). There is a
`.addrsig_sym` directive for an only-used-by-metadata symbol but it does not
emit an entry.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D138362
2022-12-02 19:05:43 +00:00
Florian Hahn
63150f4639
Revert "Enhance stack protector for calling no return function"
This reverts commit 416e8c6ad529c57f21f46c6f52ded96d3ed239fb.

This commit causes a test failure with expensive checks due to a DT
verification failure. Revert to bring bot back to green:

https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/24249/testReport/junit/LLVM/CodeGen_X86/stack_protector_no_return_ll/

+ /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-expensive/clang-build/bin/llc /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-expensive/llvm-project/llvm/test/CodeGen/X86/stack-protector-no-return.ll -mtriple=x86_64-unknown-linux-gnu -o -
+ /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-expensive/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-expensive/llvm-project/llvm/test/CodeGen/X86/stack-protector-no-return.ll
DominatorTree is different than a freshly computed one!
	Current:
=============================--------------------------------
Inorder Dominator Tree: DFSNumbers invalid: 0 slow queries.
  [1] %entry {4294967295,4294967295} [0]
    [2] %unreachable {4294967295,4294967295} [1]
    [2] %lpad {4294967295,4294967295} [1]
      [3] %invoke.cont {4294967295,4294967295} [2]
        [4] %invoke.cont2 {4294967295,4294967295} [3]
        [4] %SP_return3 {4294967295,4294967295} [3]
        [4] %CallStackCheckFailBlk2 {4294967295,4294967295} [3]
      [3] %lpad1 {4294967295,4294967295} [2]
        [4] %eh.resume {4294967295,4294967295} [3]
          [5] %SP_return6 {4294967295,4294967295} [4]
          [5] %CallStackCheckFailBlk5 {4294967295,4294967295} [4]
        [4] %terminate.lpad {4294967295,4294967295} [3]
          [5] %SP_return9 {4294967295,4294967295} [4]
          [5] %CallStackCheckFailBlk8 {4294967295,4294967295} [4]
    [2] %SP_return {4294967295,4294967295} [1]
    [2] %CallStackCheckFailBlk {4294967295,4294967295} [1]
Roots: %entry
2022-12-02 12:58:46 +00:00
tentzen
db6a979ae8 Revert "[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2"
This reverts commit 1a949c871ab4a6b6d792849d3e8c0fa6958d27f5.
2022-12-02 02:44:18 -08:00
tentzen
1a949c871a [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 2
This patch is the Part-2 (BE LLVM) implementation of HW Exception handling.
Part-1 (FE Clang) was committed in 797ad701522988e212495285dade8efac41a24d4.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).

Compiler options:
  For clang-cl.exe, the option is -EHa, the same as MSVC.
  For clang.exe, the extra option is -fasync-exceptions,
  plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules:
  First, no exception can move in or out of _try region., i.e., no "potential faulty
    instruction can be moved across _try boundary.
  Second, the order of exceptions for instructions 'directly' under a _try must be preserved
    (not applied to those in callees).
  Finally, global states (local/global/heap variables) that can be read outside of _try region
    must be updated in memory (not just in register) before the subsequent exception occurs.

The impact to C++ code:
  Although SEH is a feature for C code, -EHa does have a profound effect on C++
  side. When a C++ function (in the same compilation unit with option -EHa ) is
  called by a SEH C function, a hardware exception occurs in C++ code can also
  be handled properly by an upstream SEH _try-handler or a C++ catch(...).
  As such, when that happens in the middle of an object's life scope, the dtor
  must be invoked the same way as C++ Synchronous Exception during unwinding process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation (already in):
Please see commit 797ad701522988e212495285dade8efac41a24d4).

Part-2 : LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D102817/new/
2022-12-01 23:44:25 -08:00
Jonas Paulsson
8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.
2022-12-01 13:29:24 -05:00
Jonas Paulsson
6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Roman Lebedev
7850ab2112
[NFC] Port an assortment of tests that invoke SROA to new pass manager 2022-12-01 21:17:18 +03:00
Phoebe Wang
54ebf1c4a1 [X86][FP16] Do not combine fminnum/fmaxnum for FP16 emulation
Under the emulation situation, we lack native fmin/fmax instruction support.

Fixes #59258

Reviewed By: skan, spatel

Differential Revision: https://reviews.llvm.org/D139078
2022-12-01 23:24:40 +08:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Xiang1 Zhang
416e8c6ad5 Enhance stack protector for calling no return function
Reviewed By: LuoYuanke, WangPengfei, lebedev.ri

Differential Revision: https://reviews.llvm.org/D138774
2022-12-01 13:20:36 +08:00
Xiang1 Zhang
94c5df8a76 [AMX] Support AMX-FP16 new intrinsic interface
We support AMX-FP16 isa in https://reviews.llvm.org/D135941 now.
The old  intrinsic interface need to manually write tile registers.
So we support its new intrinsic interface to let it be able to do register allocation.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D138987
2022-12-01 09:47:53 +08:00
Marco Elver
b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc8b69dda70bf5f999c5b39dc314dd73.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Dmitry Vyukov
d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Sylvain Audi
3f3438a596 [CodeGen][X86] Crash fixes for "patchable-function" pass
This patch fixes crashes related with how PatchableFunction selects the instruction to make patchable:
- Ensure PatchableFunction skips all instructions that don't generate actual machine instructions.
- Handle the case where the first MachineBasicBlock is empty
- Removed support for 16 bit x86 architectures.

Note: another issue remains related with PatchableFunction, in the lowering part.
See https://github.com/llvm/llvm-project/issues/59039

Differential Revision: https://reviews.llvm.org/D137642
2022-11-30 07:29:54 -05:00
Tim Northover
b32280baf9 X86: relax EFLAGS liveness check when generating stack probes.
The probes are all inserted at the iterator passed into the functions, so
that's where any EFLAGS clobbering will happen and where we need it to be dead.

Fixes: https://github.com/llvm/llvm-project/issues/59121
2022-11-30 11:44:39 +00:00
Evgenii Kudriashov
5b1cb15952 [X86] combine-and.ll - add test coverage for scalar broadcast
Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D138734
2022-11-30 14:01:08 +08:00
Simon Pilgrim
30eff7f29f [DAG] Attempt to replace a mul node with an existing umul_lohi/smul_lohi node (PR59217)
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.

This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node

Differential Revision: https://reviews.llvm.org/D138790
2022-11-29 12:51:30 +00:00
Simon Pilgrim
23b261377e [X86] Add test coverage for Issue #59217 2022-11-27 16:37:17 +00:00
Simon Pilgrim
c757780c62 [X86] lowerShuffleAsDecomposedShuffleMerge - try to match unpck(permute(x),permute(y)) for v4i32/v2i64 shuffles
We're using lowerShuffleAsPermuteAndUnpack, which can probably be improved to handle 256/512-bit types pretty easily.

First step towards trying to address the poor vector-shuffle-sse4a.ll pre-SSSE3 codegen mentioned on D127115
2022-11-25 16:24:56 +00:00
Simon Pilgrim
e95c119c3f [X86] oddshuffles.ll - update check-prefixes
Share AVX common prefix with XOP as well as AVX1/AVX2
2022-11-25 16:13:20 +00:00
Simon Pilgrim
6fd0ae39be [X86] combineScalarAndWithMaskSetcc - handle (concat_vectors (and (vYi1 setcc, vYi1 x), undef)) patterns
If one of the AND operands is a setcc then we're implicitly zeroing the upper mask bits

Similar pattern to regressions identified in D127115 (masked comparisons)
2022-11-25 11:16:24 +00:00
Simon Pilgrim
b883e9f392 [X86] Add test case for (any_extend (bitcast (concat_vectors (and (vYi1 setcc, vYi1 x), undef)))) pattern
Similar pattern to a regression identified in D127115
2022-11-25 11:16:24 +00:00
Simon Pilgrim
dbe2f44316 [X86] combineScalarAndWithMaskSetcc - optionally peek through (oneuse) any_extend node
Extend pass to handle: (and (any_extend (bitcast (vXi1 (concat_vectors (vYi1 setcc), undef,)))), C)

Fixes several regressions identified in D127115
2022-11-24 16:26:35 +00:00
Simon Pilgrim
6f11c395f5 [X86] combine-and.ll - add AVX2/AVX512 test coverage 2022-11-24 10:38:11 +00:00
Phoebe Wang
7218103bca [X86] Use lock add/sub/or/and/xor for cases that we only care about the EFLAGS (negated cases)
This fixes #58685

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138428
2022-11-23 09:39:04 +08:00
chenglin.bi
cdb7b804f6 [DAGCombiner] fold or (xor x, y),? patterns
or (xor x, y), x --> or x, y
or (xor x, y), y --> or x, y
or (xor x, y), (and x, y) --> or x, y
or (xor x, y), (or x, y) --> or x, y

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D138401
2022-11-23 09:28:10 +08:00
Davide Italiano
0c011335c9 [X86] Don't lower f16->f80 fpext to libcall on darwin.
We don't provide __extendhfxf2, and only have the soft-float
__extendhfsf2 in compiler-rt.  This only changed recently with
655ba9c8a1d2, so this patch reverts back to the previous behavior.

However, the f80->f16 fptrunc is not easily implementable without
the compiler-rt __truncxfhf2, but that has always been true, and
isn't an immediate regression.

Patch by Ahmed Bougacha.

rdar://102194995
2022-11-22 12:32:22 -08:00
Phoebe Wang
b39b76f2ef [X86] Allow no X87 on 32-bit
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D137946
2022-11-22 10:47:47 +08:00
Phoebe Wang
8cce9d7fb3 [X86] Pre-commit tests for pr58685 (negated cases) 2022-11-21 23:35:38 +08:00
chenglin.bi
ac1b999e85 [DAGCombiner] fold or (and x, y), x --> x
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138398
2022-11-21 22:11:12 +08:00
Benjamin Kramer
8db98599f1 Mark test using -debug-only as requiring assertions 2022-11-21 11:46:02 +01:00
Anton Sidorenko
fb47bb37e4 [MachineTraceMetrics] Pick the trace successor for an entry block
We generate erroneous trace for a basic block if it does not have at least one
predecessor when MinInstr strategy is used. Currently only this strategy is
implemented, so we always have a wrong trace for any entry block. This results in
wrong instructions heights calculation and also leads to wrong critical path.

The described behavior is demonstrated on a simple test. It shows that early
if-conv pass makes wrong decisions due to incorrectly calculated critical path
lenght.

Differential Revision: https://reviews.llvm.org/D138272
2022-11-21 12:56:40 +03:00
Anton Sidorenko
0c22cdfdd1 [MachineTraceMetrics] Precommit test for D138272 2022-11-21 12:55:40 +03:00
Phoebe Wang
510e5fba16 [X86] Use lock or/and/xor for cases that we only care about the EFLAGS
This is a follow up of D137711 to fix the reset of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138294
2022-11-20 10:42:48 +08:00
Simon Pilgrim
88be0a2197 [X86] Ensure we're testing the misched-matrix.ll tests with the generic cpu
Noticed when experimenting with using tuning parameters to control ILP mode
2022-11-19 14:38:53 +00:00
Simon Pilgrim
3ad13ca278 [X86] Regenerate memcpy-2.ll test checks 2022-11-19 14:30:00 +00:00
Phoebe Wang
d558255650 [X86] Use lock add/sub for cases that we only care about the EFLAGS
This fixes #36373, #36905 and partial of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137711
2022-11-18 21:43:47 +08:00
Phoebe Wang
055097b12c [X86] Pre-commit tests for pr58685 2022-11-18 20:50:24 +08:00