23116 Commits

Author SHA1 Message Date
Benjamin Kramer
91487b2481 [X86][Disassembler][NFCI] Read bytes with support::endian::read 2023-01-08 18:19:49 +01:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Nikita Popov
e3c2faa64a Revert "[X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld"
This reverts commit 2679e8bba3e166e3174971d040b9457ec7b7d768.

This change is a significant backwards-compatibility break, which
does in fact break the entire Rust ecosystem, which uses an
-fno-plt -mrelax-relocations=0 default.

Please go through pre-commit review for this change in order to
gain broader consensus.
2023-01-06 09:43:47 +01:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
Freddy Ye
27b8f54f51 [X86] Support -march=emeraldrapids
Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D140950
2023-01-05 20:27:32 +08:00
Roman Lebedev
dbce1110f1
[NFC][DAG] Move getOpcode_EXTEND*() helpers from X86 into SelectionDAG
To be used in an upcoming patch.
2023-01-05 01:12:30 +03:00
Roman Lebedev
e4b260efb2
[Codegen][X86] LowerBUILD_VECTOR(): improve lowering w/ multiple FREEZE-UNDEF ops
While we have great handling for UNDEF operands,
FREEZE-UNDEF operands are effectively normal operands.

We are better off "interleaving" such BUILD_VECTORS into a blend
between a splat of FREEZE-UNDEF, and "thawed" source BUILD_VECTOR,
both of which are more natural for us to handle.

Refs. f738ab9075 (r95017306)
2023-01-04 21:16:11 +03:00
Jay Foad
6f7ff9b933 [MC] Consistently use MCInstrDesc::getImplicitUses and getImplicitDefs. NFC. 2023-01-04 13:16:12 +00:00
Fangrui Song
2679e8bba3 [X86] Revert -fno-plt __tls_get_addr workaround for old GNU ld
ENABLE_X86_RELAX_RELOCATIONS has defaulted to on in 2020.
This workaround is not exercised for a long time.
2022-12-31 22:39:20 -08:00
Thomas Köppe
82be8a1d2b [X86] Emit RIP-relative access to local function in PIC medium code model
Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.)

Example:

```
static int g(int n) {
  return 2 * n + 3;
}

void f(int(**p)(int)) {
  *p = g;
}
```

This results in:

```
Disassembly of section .text:

0000000000000000 <f>:
       0: 48 b8 00 00 00 00 00 00 00 00	movabs	rax, 0x0
       a: 48 89 07                     	mov	qword ptr [rdi], rax
       d: c3                           	ret
```

```
Relocation section '.rela.text' at offset 0xf0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000002  0000000200000001 R_X86_64_64            0000000000000000 .text + 10
```

This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140593
2022-12-28 11:14:39 -08:00
Fangrui Song
69243cdb92 Remove incorrectly implemented -mibt-seal
The option from D116070 does not work as intended and will not be needed when
hidden visibility is used. A function needs ENDBR if it may be reached
indirectly. If we make ThinLTO combine the address-taken property (close to
`!GV.use_empty() && !GV.hasAtLeastLocalUnnamedAddr()`), then the condition can
be expressed with:

`AddressTaken || (!F.hasLocalLinkage() && (VisibleToRegularObj || !F.hasHiddenVisibility()))`

The current `F.hasAddressTaken()` condition does not take into acount of
address-significance in another bitcode file or ELF relocatable file.

For the Linux kernel, it uses relocatable linking. lld/ELF uses a
conservative approach by setting all `VisibleToRegularObj` to true.
Using the non-relocatable semantics may under-estimate
`VisibleToRegularObj`. As @pcc mentioned on
https://github.com/ClangBuiltLinux/linux/issues/1737#issuecomment-1343414686
, we probably need a symbol list to supply additional
`VisibleToRegularObj` symbols (not part of the relocatable LTO link).

Reviewed By: samitolvanen

Differential Revision: https://reviews.llvm.org/D140363
2022-12-22 12:32:59 -08:00
Evgenii Kudriashov
15dd5ed96c [X86] Support ANDNP combine through vector_shuffle
Combine
```
   and (vector_shuffle<Z,...,Z>
            (insert_vector_elt undef, (xor X, -1), Z), undef), Y
   ->
   andnp (vector_shuffle<Z,...,Z>
              (insert_vector_elt undef, X, Z), undef), Y
```

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D138521
2022-12-22 16:55:14 +08:00
Matt Arsenault
69e75ae695 CodeGen: Don't lazily construct MachineFunctionInfo
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
2022-12-21 10:49:32 -05:00
Craig Topper
eeb8de9363 [X86] Replace getOperand calls with an existing variable. NFC 2022-12-20 19:27:11 -08:00
Roman Lebedev
1cbcd8ad20
[X86] avx512fp16: add missing instruction selection patterns for "i16" VMOVSH
For all other patterns, we consistently have both I and F variants,
let's not diverge.

Fixes https://github.com/llvm/llvm-project/issues/59628
2022-12-21 05:17:02 +03:00
Nick Desaulniers
be8fd64091 [llvm][X86ISelDAGToDAG] support -{start|stop}-{before|after}=x86-isel
Follow a similar pattern as AMDGPUDAGToDAGISel's constructor so that we
can use INITIALIZE_PASS to register a pass. This allows for more fine
grain testability of SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D140323
2022-12-20 14:16:45 -08:00
Nick Desaulniers
ad99774a5f [llvm][PassSupport] don't require passes to be default constructible
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.

Delete the default constructors of classes derived from
SelectionDAGISel.

Link: https://github.com/llvm/llvm-project/issues/59538

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D140349
2022-12-20 14:07:29 -08:00
Simon Pilgrim
e16b4f5b16 [X86] Fix SLM uops/resources counts for CMPXCHG instructions
LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though

Based off llvm-exegesis captures, confirmed with Agner + uops.info
2022-12-20 13:07:03 +00:00
Archibald Elliott
f09cf34d00 [Support] Move TargetParsers to new component
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
  component into a new LLVM Component called "TargetParser". This
  potentially enables using tablegen to maintain this information, as
  is shown in https://reviews.llvm.org/D137517. This cannot currently
  be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
  information in the TargetParser:
  - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
    the current Host machine for info about it, primarily to support
    getting the host triple, but also for `-mcpu=native` support in e.g.
    Clang. This is fairly tightly intertwined with the information in
    `X86TargetParser.h`, so keeping them in the same component makes
    sense.
  - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
    the target triple parser and representation. This is very intertwined
    with the Arm target parser, because the arm architecture version
    appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.

And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM

Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.

If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.

Differential Revision: https://reviews.llvm.org/D137838
2022-12-20 11:05:50 +00:00
Simon Pilgrim
e5abaf8dec [X86] Fix SLM uops counts for WriteBitTestSetRegRMW instructions
The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions

Based off llvm-exegesis captures, confirmed with Agner + uops.info
2022-12-19 18:21:31 +00:00
Simon Pilgrim
c39c2cc954 [X86] Fix SLM uops counts for AES instructions
Based off llvm-exegesis captures, confirmed with uops.info
2022-12-19 11:03:41 +00:00
Simon Pilgrim
e7bd805805 [X86] Add default LoadUOps argument to Intel models WriteResPair macro
This will make it easier to override the folded uop count on a class-by-class basis
2022-12-19 10:44:48 +00:00
Qiu Chaofan
a40ef656d8 [Intrinsic] Rename flt.rounds intrinsic to get.rounding
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D139507
2022-12-19 15:22:39 +08:00
Sergei Barannikov
4d48ccfc88 [MC] Use MCRegister instead of unsigned in MCTargetAsmParser
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D140273
2022-12-18 12:12:05 -08:00
Simon Pilgrim
bbf84fcf18 [X86] SandyBridge - fix ADC RMW uop count
These should consistently use the fused domain count, not the unfused domain

Confirmed with Agner + uops.info
2022-12-17 21:52:44 +00:00
Simon Pilgrim
ed37234f9b [X86] Fix BMI uop/throughputs on znver1/znver2
Most BMI ops are 2uop and 0.5 throughput - interestingly TZCNTrm doesn't take an extra uop but the other instructions do

Confirmed by AMD SoG + Agner
2022-12-17 20:38:40 +00:00
Simon Pilgrim
2bc2bcb246 [X86] All the WriteBLS instructions take 2uops, not 1uop
Confirmed by AMD SoG + Agner + uops.info
2022-12-17 15:40:41 +00:00
Simon Pilgrim
2ee17d691f [llvm-exegesis][X86] Use the same AGU counter estimate mapping for znver1 as znver2, and count RMW ops as well
znver2 can use the ld/st dispatch counters to make a reasonable estimate for the AGU usage (although it misses complex LEA ops which I don't think we can fix), although it wasn't accounting for RMW ld-st uops which are counted separately - the same approach can be used for znver1 (ymm double-pumping ld/st agu is correctly measured as 2uops)

This change is mainly academic, but was noticed as the znver1/2 models incorrectly assume scalar RMW ops take 2uops
2022-12-17 14:06:31 +00:00
Ganesh Gopalasubramanian
1f057e365f [X86] AMD Zen 4 Initial enablement
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D139073
2022-12-17 16:15:22 +05:30
Christudasan Devadasan
b5efec4b27 [CodeGen] Additional Register argument to storeRegToStackSlot/loadRegFromStackSlot
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138656
2022-12-17 11:55:34 +05:30
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Craig Topper
c09edce1b3 [SelectionDAG] Give all the target specific subclasses of SelectionDAGISel their own pass ID.
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.

This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".

I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.

This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.

Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.

Step 1 to fixing PR59538.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D140161
2022-12-15 15:48:55 -08:00
Simon Pilgrim
37c3b83bd8 [X86] combineBitcastvxi1 - handle boolmask sign-extension through vselect
See if we can freely sign-extend both sources of a vselect operand, also handle allones constant build vectors (easily rematerializable and uses in the test case).

Fixes #59526
2022-12-15 16:40:44 +00:00
Matt Arsenault
c16a58b36c Attributes: Add function getter to parse integer string attributes
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.

Update a few of the uses, but there are quite a lot of these.
2022-12-14 13:12:35 -05:00
Simon Pilgrim
463910ab2a [X86] Don't fold scalar_to_vector(i64 C) -> vzext_movl(scalar_to_vector(i32 C))
Fixes constant-folding infinite loop reported by @uabelho on rG5ca77541446d
2022-12-14 12:11:06 +00:00
Simon Pilgrim
4f41ea2016 [X86] lowerShuffleAsVTRUNC - bit shift the offset elements into place instead of shuffle
This helps avoid issues on non-BWI targets which can end up splitting the shuffles to 2 x 256-bit bitshifts of a smaller scalar width
2022-12-14 11:41:14 +00:00
Simon Pilgrim
b3eaf40166 [X86] lowerShuffleAsVTRUNC - improve detection of cheap/free vector concatenation
Handle the case where the lo/hi subvectors are a split load.
2022-12-14 10:49:44 +00:00
Josh Stone
9b8fcd04ef [X86] Fix cmp order in probing BuildStackAlignAND
Due to reversed arguments, the loop start was almost always skipping the
whole loop, since FinalStackProbed is probably less than StackPtr for
large alignments. The intent was to skip the loop if the first sub on
StackPtr made it less than FinalStackProbed already, so flip it.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D139756
2022-12-13 12:10:39 -08:00
Roman Lebedev
64d46e141c
[NFC][Costmodel][X86] Replication shuffle: AVX512F can promote i1 to i32.
As the added codegen test coverage shows,
there isn't that much difference between AVX512DQI and
baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`,
but with just the Foundation we can use `vpternlogd`/`vptestmd`
to do the same.
2022-12-13 21:21:07 +03:00
Roman Lebedev
ff5fcda430
[x86][Costmodel] AVX512VL: add missing costs for v8 i1<->i32 casts
This would come up as a regression in the follow-up Replication-of-i1 patch.

https://godbolt.org/z/fxr9Mzssr
2022-12-13 21:21:07 +03:00
Phoebe Wang
57f71dccd3 [NFC] Fix duplicated Src 2022-12-13 22:44:28 +08:00
Simon Pilgrim
4177e6cd4f [X86] lowerShuffleAsVTRUNC - support offseted truncations
Extend the <0,Scale,2*Scale,..> pattern to allow for a fixed offset <Offset,Offset+Scale,Offset+2*Scale,..> pattern, which will lower to a single additional bitshift/pshufd.

At the moment I've limited this to cases where the LHS/RHS operands are concatenated for free, but this is only to avoid a couple of regressions that should be easily addressable in followups.
2022-12-13 14:00:35 +00:00
Simon Pilgrim
f6a96bee51 [X86] X86TTIImpl::getIntImmCost - use APInt::isInt/isSignedInt directly
Avoid some getSExtValue()/getZExtValue() calls

Hopefully we can remove some of the getBitWidth() constraints as well, as many are just there as a proxy for legal types (albeit assuming x86_64).
2022-12-12 15:32:49 +00:00
Simon Pilgrim
00a2d6e23d [llvm-exegesis][X86] Add memory pipe counters to SLM model
There might not be any exposed alu pipe counters for us to measure - but the sum of load/store uop counters seems to give a really good approximation to memory controller usage - even for more complex instructions like cmpxchg
2022-12-12 12:09:11 +00:00
Simon Pilgrim
ba237cb268 Revert rG6a0bbb84cef28ed642a730e55c52447b8c870647 "[X86] RDRAND is a Goldmont feature, not Silvermont"
RDRAND is a Silvermont feature - confirmed with CPUID
2022-12-11 18:19:31 +00:00
Roman Lebedev
680b33b66e
[X86] AMD Zen 3 sched model: FMA ops have inverse throughput of 0.5
Now that exegesis produces meaningful snippets to measure throughtput
for instructions with tied operands:
2ffe225d11
the measurements clearly show these instructions to have
more optimistic throughtput.

There's still some noise in the reports, especially around instructions
with memory operands. I'm not sure if we measure those correctly.

Fixes https://github.com/llvm/llvm-project/issues/59325
2022-12-11 21:12:55 +03:00
Simon Pilgrim
6a0bbb84ce [X86] RDRAND is a Goldmont feature, not Silvermont 2022-12-11 12:28:22 +00:00
Simon Pilgrim
b3c7e43d04 [X86] Fix missing HasPRFCHW predicate
This was declared in FeaturePRFCHW but never defined.

Noticed while preparing to add Unsupported features handling to X86 scheduler models.
2022-12-11 11:06:10 +00:00
Simon Pilgrim
d75980f807 [X86] Fix missing HasX86_64 predicate
This was declared in FeatureX86_64 but never defined (we use the *64BitMode predicates for instruction defs - but now we need it for scheduler model defs).

Noticed while preparing to add Unsupported features handling to X86 scheduler models.
2022-12-11 10:27:03 +00:00