18046 Commits

Author SHA1 Message Date
Simon Pilgrim
72a049d778 [X86][AVX2] LowerINSERT_VECTOR_ELT - support v4i64 insertion as BLENDI(X, SCALAR_TO_VECTOR(Y)) 2022-06-09 21:18:10 +01:00
Simon Pilgrim
7dbfcfa735 [DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one.
This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.
2022-06-09 14:56:14 +01:00
Denis Antrushin
e99e821ce8 [FixupStatepoints] Precommit test for D127308. NFC 2022-06-09 17:04:48 +07:00
Simon Pilgrim
d6bb577ffb [X86] Regenerate slow-pmulld.ll with common SSE check prefixes
Add back some unused check prefixes to simplify the D127115 regeneration
2022-06-08 18:23:25 +01:00
Simon Pilgrim
a083f3caa1 [DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements
As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first.

This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).
2022-06-07 16:42:24 +01:00
Matt Arsenault
56303223ac llvm-reduce: Don't assert on functions which don't track liveness
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
2022-06-07 10:00:25 -04:00
Simon Pilgrim
f5507978a3 [X86] getFauxShuffleMask - add VSELECT/BLENDV handling
First step towards enabling shuffle combining starting from VSELECT/BLENDV nodes - this should eventually help improve the codegen reported at Issue #54819
2022-06-07 14:46:25 +01:00
Simon Pilgrim
61984f9199 [X86] x86-interleaved-access.ll - use nounwind to remove cfi noise from tests 2022-06-07 14:46:25 +01:00
Nikita Popov
5a64bc207e [DAGCombiner] Remove overzealous assertion when folding assert+trunc+assert (PR55846)
These assert that there are no "useless" assertzext/assertsext nodes
(that assert a wider width than a following trunc), but I don't think
there is anything preventing such nodes from reaching this code.
I don't think the assertion is relevant for correctness of this
transform either -- if such an assert is present, then the other
one will always be to a smaller width, and we'll pick that one.
The assertion dates back to D37017.

Fixes https://github.com/llvm/llvm-project/issues/55846.

Differential Revision: https://reviews.llvm.org/D126952
2022-06-07 09:50:26 +02:00
Eric Christopher
93cb6b9c83 Revert "[X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes"
See the original commit for a testcase.

This reverts commit ea8fb3b6019642a3a032fd65588eb8460439d2f9.
2022-06-03 12:31:11 -07:00
Nikita Popov
ad742cf85d [DAGCombine] Handle promotion of shift with both operands the same
When promoting a shift, make sure we only fetch the second operand
after promoting the first. Load promotion may replace users of the
old load, and we don't want to be left with a dangling reference to
the old load instruction.

The crashing test case is from https://reviews.llvm.org/D126689#3553212.

Differential Revision: https://reviews.llvm.org/D126886
2022-06-03 10:00:44 +02:00
Nikita Popov
41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Hendrik Greving
a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving
e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving
430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Denis Antrushin
7047d79fde [TwoAddressInstructionPass] Relax assert in statepoint processing.
D124631 added special processing for STATEPOINT instructions.
It appears that assertion added there is too strong. We can get two
tied operands with the same register tied to different defs. If we
hit such case, do not process it in statepoint-specific code and
delegate it to common case.
2022-06-01 21:34:52 +07:00
Sanjay Patel
3a503a4a9c [x86] fix miscompile from wrongly identified fneg
We may need to peek through a bitcast when identifying an fneg idiom
via its pool constant, but we can't allow a different-sized constant
in that match.

This is noted in issue #55758 with an example that needs fast-math,
but as the test here shows, this has potential to miscompile more
generally (no fast-math required).

Differential Revision: https://reviews.llvm.org/D126775
2022-06-01 09:56:33 -04:00
Sanjay Patel
3c3f2f99c4 [x86] add test for mismatched fneg; NFC
issue #55758
2022-06-01 09:45:33 -04:00
Simon Pilgrim
ea8fb3b601 [X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes
If the LHS/RHS selection operands can be cheaply concatenated back together then replace 2 x 128-bit selection nodes with 1 x 256-bit node

Addresses the regression introduced in the bug fix from rGd5af6a38082b39ae520a328e44dc29ebcb036bb2
2022-06-01 10:46:06 +01:00
Fangrui Song
726e2c5be5 [X86][test] Remove unneeded -mtriple from llc RUN lines 2022-05-31 22:35:07 -07:00
Phoebe Wang
a2ea5b496b [X86] Add support for -mharden-sls=[none|all|return|indirect-jmp]
The patch addresses the feature request from https://github.com/ClangBuiltLinux/linux/issues/1633. The implementation borrows a lot from aarch64.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D126137
2022-06-01 09:45:04 +08:00
Simon Pilgrim
a7317a5728 [X86] Add test case for PR55648 2022-05-31 17:28:56 +01:00
Simon Pilgrim
d5af6a3808 [X86] LowerMINMAX - split v4i64 types on AVX1 targets (Issue #55648)
Originally we tried to use default expansion for v4i64 types to make it easier to concatenate the results back together, but this can cause infinite loop issues with existing VSELECT splitting code in narrowExtractedVectorSelect if we have other uses of the VSELECT results (e.g. reduction patterns).

To fix the infinite loop, this patch always splits MIN/MAX v4i64 nodes during lowering and I've added a TODO for combineConcatVectorOps to investigate when we can cheaply concatenate VSELECT/BLENDV nodes together.

Fixes #55648 - regression test case will be added in a follow up.
2022-05-31 17:28:56 +01:00
Bjorn Pettersson
86caa03718 Revert "Round up zero-sized symbols to 1 byte in .debug_aranges."
This reverts commit 256a52d9aac8a9e98fbfd6a3d91090bf127cef7d (and
also the follow-up commit 38eb4fe74b3843ab0d7fc1e that moved a test
case to a different directory).

As discussed in https://reviews.llvm.org/D126257 there is a suspicion
that something was wrong with this commit as text section range was
shortened to 1 byte rather than rounded up as shown in the
llvm/test/DebugInfo/X86/dwarf-aranges.ll test case.
2022-05-31 11:03:44 +02:00
Nuno Lopes
80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Denis Antrushin
85322e82be [TwoAddressInstructionPass] Special processing of STATEPOINT instruction.
STATEPOINT is a special pseudo instruction which represent Moving GC semantic to LLVM.
Every tied def/use VReg pair in STATEPOINT represent same physical register which can
'magically' change during call wrapped by statepoint.
(By construction, tied use operand  is not live across  STATEPOINT).

This means that when converting into two-address form, there is not need to insert COPY
instruction before stateppoint, what TwoAddressInstruction pass does for 'regular'
instructions.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D124631
2022-05-30 19:07:30 +03:00
Edd Barrett
d245974e1a Test stackmap support for floating point types.
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).

One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
2022-05-30 10:49:32 +01:00
Luo, Yuanke
aaaf9cede7 [X86][AMX] Replace LDTILECFG with PLDTILECFGV on auto-config.
There is intrinsic `@llvm.x86.ldtilecfg` which is lowered to LDTILECFG.
This intrinsic is open for user to configure tile registers by
themselves. There is a chance that `@llvm.x86.ldtilecfg` would be mixed
with the new AMX intrinsics which depend on compiler to configure tile
registers. Separate pusedo instruction PLDTILECFGV would avoid
unexpected behavious when `@llvm.x86.ldtilecfg` is mixed with new AMX
intrinsics. Though user should not mix the two programming model,
compiler should avoid crash or UB when they are mixed.

Differential Revision: https://reviews.llvm.org/D126519
2022-05-27 16:38:35 +08:00
Rahman Lavaee
08cc058518 Reland "[Propeller] Promote functions with propeller profiles to .text.hot."
This relands commit 4d8d2580c53e130c3c3dd3877384301e3c495554.

The major change here is using 'addUsedIfAvailable<BasicBlockSectionsProfileReader>()` to make sure we don't change the pipeline tests.

Differential Revision: https://reviews.llvm.org/D126518
2022-05-26 19:53:14 -07:00
Luo, Yuanke
535604e177 [X86][AMX] Update test case with automation tool. 2022-05-27 10:35:05 +08:00
Rahman Lavaee
3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c53e130c3c3dd3877384301e3c495554.
2022-05-26 18:45:40 -07:00
Rahman Lavaee
4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Chen Zheng
d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf728b104ea87e6eb84c0be69b779df7.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Simon Pilgrim
b45e046858 [X86] Add non-uniform vector tests for 'one bit diff' comparison fold 2022-05-26 11:13:20 +01:00
David Spickett
38eb4fe74b [llvm][DWARF] Move test using X86 triple into X86 tests
Fixes failure seen when building without X86 backend:
https://lab.llvm.org/buildbot/#/builders/171/builds/15124
2022-05-26 09:27:23 +00:00
Takafumi Arakaki
18e6b8234a Allow pointer types for atomicrmw xchg
This adds support for pointer types for `atomic xchg` and let us write
instructions such as `atomicrmw xchg i64** %0, i64* %1 seq_cst`. This
is similar to the patch for allowing atomicrmw xchg on floating point
types: https://reviews.llvm.org/D52416.

Differential Revision: https://reviews.llvm.org/D124728
2022-05-25 16:20:26 +00:00
Craig Topper
06fee478d2 [X86] Add isSimple check to the load combine in combineExtractVectorElt.
I think we need to be sure the load isn't volatile before we
duplicate and shrink it.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D126353
2022-05-25 09:11:11 -07:00
Sanjay Patel
d5ebba2aa6 [x86] add test with volatile load; NFC
Test for D126353
2022-05-25 08:19:29 -04:00
Chen Zheng
80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Sotiris Apostolakis
67be40df6e Recommit "[SelectOpti][5/5] Optimize select-to-branch transformation"
Use container::size_type directly to avoid type mismatch causing build failures in Windows.

Original commit message:
This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable.
It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their
execution on the common path and interleaves dependence slices to maximize ILP.

Depends on D120232

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D120233
2022-05-24 14:08:09 -04:00
Serge Pavlov
6fc0bc5b0f Fix behavior of is_fp_class on empty class set
The second argument to is_fp_class specifies the set of floating-point
class to test against. It can be zero, in this case the intrinsic is
expected to return zero value.

Differential Revision: https://reviews.llvm.org/D112025
2022-05-24 21:50:18 +07:00
Simon Pilgrim
11455e4758 [DAG] Unroll vectorized FPOW instructions before widening that will scalarize to libcalls anyway
Followup to D125988 - FPOW is similar to FREM and will most likely scalarize to libcalls, so unroll before widening to prevent use making additional libcalls with UNDEF args.
2022-05-24 15:44:53 +01:00
Nabeel Omer
8b5d9cbbfe [x86][DAG] Unroll vectorized FREMs that will become libcalls
Currently, two element vectors produced as the result of a binary op are
widened to four element vectors on x86 by
DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal
after widening it is unrolled into 4 scalar ops in SelectionDAG before
being converted into a libcall. This way we end up with 4 libcalls (two of
them on known undef elements) instead of the original two libcalls.

This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure that if
it is known that a binary op will be tunred into a libcall, it is unrolled
instead of being widened. This prevents the creation of the extra scalar
instructions on known undef elements and (eventually) libacalls with known
undef parameters which would otherwise be created when the op gets expanded
post widening.

Differential Revision: https://reviews.llvm.org/D125988
2022-05-24 13:34:51 +01:00
Simon Pilgrim
96323c9f4c [X86] scalar_widen_div.ll - remove non-generated CHECKs
These were left over from when we converted to the update_llc_test_checks script and were just matching some asm/cfi directives - we have CHECK-LABEL to do this properly now.
2022-05-24 11:55:51 +01:00
Simon Pilgrim
64186e9b35 [X86] Add test showing failure to expand <2 x float> fpow without widening to <4 x float>
Similar to D125988 (and I have a pending follow up patch to handle fpow).
2022-05-24 10:45:20 +01:00
Luo, Yuanke
496156ac57 [X86][AMX] Multiple configure for AMX register.
The previous solution depends on variable name to record the shape
information. However it is not reliable, because in release build
compiler would not set the variable name. It can be accomplished with an
additional option `fno-discard-value-names`, but it is not acceptable
for users.
This patch is to preconfigure the tile register with machine
instruction. It follow the same way what sigle configure does. In the
future we can fall back to multiple configure when single configure
fails due to the shape dependency issue.
The algorithm to configure the tile register is simple in the patch. We
may improve it in the future. It configure tile register based on basic
block. Compiler would spill the tile register if it live out the basic
block. After the configure there should be no spill across tile
confgiure in the register alloction. Just like fast register allocation
the algorithm walk the instruction in reverse order. When the shape
dependency doesn't meet, it insert ldtilecfg after the last instruction
that define the shape.
In post configuration compiler also walk the basic block to collect the
physical tile register number and generate instruction to fill the stack
slot for the correponding shape information.
TODO: There is some following work in D125602. The risk is modifying the
fast RA may cause regression as fast RA is usded for different targets.
We may create an independent RA for tile register.

Differential Revision: https://reviews.llvm.org/D125075
2022-05-24 13:18:42 +08:00
Chen Zheng
62a9b36fcf [MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-24 01:16:19 -04:00
Sotiris Apostolakis
1786e70bd8 Revert "[SelectOpti][5/5] Optimize select-to-branch transformation"
This reverts commit a111fb960108df910a864500f3b98d75d37f083c.
2022-05-24 00:02:00 -04:00
Sotiris Apostolakis
a111fb9601 [SelectOpti][5/5] Optimize select-to-branch transformation
This patch optimizes the transformation of selects to a branch when the heuristics deemed it profitable.
It aggressively sinks eligible instructions to the newly created true/false blocks to prevent their
execution on the common path and interleaves dependence slices to maximize ILP.

Depends on D120232

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D120233
2022-05-23 23:31:27 -04:00
Sotiris Apostolakis
d7ebb74611 [SelectOpti][4/5] Loop Heuristics
This patch adds the loop-level heuristics for determining whether branches are more profitable than conditional moves.
These heuristics apply to only inner-most loops.

Depends on D120231

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D120232
2022-05-23 22:05:41 -04:00