Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.
Differential Revision: https://reviews.llvm.org/D140498
This reverts commit 2679e8bba3e166e3174971d040b9457ec7b7d768.
This change is a significant backwards-compatibility break, which
does in fact break the entire Rust ecosystem, which uses an
-fno-plt -mrelax-relocations=0 default.
Please go through pre-commit review for this change in order to
gain broader consensus.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
While we have great handling for UNDEF operands,
FREEZE-UNDEF operands are effectively normal operands.
We are better off "interleaving" such BUILD_VECTORS into a blend
between a splat of FREEZE-UNDEF, and "thawed" source BUILD_VECTOR,
both of which are more natural for us to handle.
Refs. f738ab9075 (r95017306)
Currently, the medium code model for x86_64 emits position-dependent relocations (R_X86_64_64) for local functions, regardless of PIC or no-PIC mode. (This means generically that code compiled with the medium model cannot be linked into a position-independent executable.)
Example:
```
static int g(int n) {
return 2 * n + 3;
}
void f(int(**p)(int)) {
*p = g;
}
```
This results in:
```
Disassembly of section .text:
0000000000000000 <f>:
0: 48 b8 00 00 00 00 00 00 00 00 movabs rax, 0x0
a: 48 89 07 mov qword ptr [rdi], rax
d: c3 ret
```
```
Relocation section '.rela.text' at offset 0xf0 contains 1 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000002 0000000200000001 R_X86_64_64 0000000000000000 .text + 10
```
This patch changes the behaviour to unconditionally emit a RIP-relative access, both in PIC and non-PIC mode. This fixes PIC mode, and is perhaps an improvement in non-PIC mode, too, since it results in a shorter instruction. A 32-bit relocation should suffice since the medium memory model demands that all code fit within 2GiB.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D140593
The option from D116070 does not work as intended and will not be needed when
hidden visibility is used. A function needs ENDBR if it may be reached
indirectly. If we make ThinLTO combine the address-taken property (close to
`!GV.use_empty() && !GV.hasAtLeastLocalUnnamedAddr()`), then the condition can
be expressed with:
`AddressTaken || (!F.hasLocalLinkage() && (VisibleToRegularObj || !F.hasHiddenVisibility()))`
The current `F.hasAddressTaken()` condition does not take into acount of
address-significance in another bitcode file or ELF relocatable file.
For the Linux kernel, it uses relocatable linking. lld/ELF uses a
conservative approach by setting all `VisibleToRegularObj` to true.
Using the non-relocatable semantics may under-estimate
`VisibleToRegularObj`. As @pcc mentioned on
https://github.com/ClangBuiltLinux/linux/issues/1737#issuecomment-1343414686
, we probably need a symbol list to supply additional
`VisibleToRegularObj` symbols (not part of the relocatable LTO link).
Reviewed By: samitolvanen
Differential Revision: https://reviews.llvm.org/D140363
This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.
I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.
A missing feature I still could use is a way to access an custom
analysis pass on the IR here.
Quite a few passes are not default constructible. In order to properly
support -{start|stop}-{before|after}= for these passes, we would like to
continue to use INITIALIZE_PASS, but not necessarily provide a default
constructor.
Delete the default constructors of classes derived from
SelectionDAGISel.
Link: https://github.com/llvm/llvm-project/issues/59538
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D140349
LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though
Based off llvm-exegesis captures, confirmed with Agner + uops.info
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions
Based off llvm-exegesis captures, confirmed with Agner + uops.info
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
znver2 can use the ld/st dispatch counters to make a reasonable estimate for the AGU usage (although it misses complex LEA ops which I don't think we can fix), although it wasn't accounting for RMW ld-st uops which are counted separately - the same approach can be used for znver1 (ymm double-pumping ld/st agu is correctly measured as 2uops)
This change is mainly academic, but was noticed as the znver1/2 models incorrectly assume scalar RMW ops take 2uops
With D134950, targets get notified when a virtual register is created and/or
cloned. Targets can do the needful with the delegate callback. AMDGPU propagates
the virtual register flags maintained in the target file itself. They are useful
to identify a certain type of machine operands while inserting spill stores and
reloads. Since RegAllocFast spills the physical register itself, there is no way
its virtual register can be mapped back to retrieve the flags. It can be solved
by passing the virtual register as an additional argument. This argument has no
use when the spill interfaces are called during the greedy allocator or even the
PrologEpilogInserter and can pass a null register in such cases.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138656
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
Previously we had a shared ID in SelectionDAGISel. AMDGPU has an
initializePass function for its subclass of SelectionDAGISel. No
other target does.
This causes all target specific SelectionDAGISel passes to be known
as "amdgpu-isel".
I'm not sure what would happen if another target tried to implement
an initializePass function too since the ID is already claimed.
This patch gives all targets their own ID and passes it down to
SelectionDAGISel constructor to MachineFunctionPass's constructor.
Unfortunately, I think this causes most targets to lose
print-before/after-all support for their SelectionDAGISel pass.
And they probably no longer support start/stop-before/after. We
can add initializePass functions to fix this as a follow up. NOTE:
This was probably also broken if the AMDGPU target isn't compiled in.
Step 1 to fixing PR59538.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140161
See if we can freely sign-extend both sources of a vselect operand, also handle allones constant build vectors (easily rematerializable and uses in the test case).
Fixes#59526
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.
Update a few of the uses, but there are quite a lot of these.
Due to reversed arguments, the loop start was almost always skipping the
whole loop, since FinalStackProbed is probably less than StackPtr for
large alignments. The intent was to skip the loop if the first sub on
StackPtr made it less than FinalStackProbed already, so flip it.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D139756
As the added codegen test coverage shows,
there isn't that much difference between AVX512DQI and
baseline AVX512F codegen, DQI added `vpmovm2d`/`vpmovd2m`,
but with just the Foundation we can use `vpternlogd`/`vptestmd`
to do the same.
Extend the <0,Scale,2*Scale,..> pattern to allow for a fixed offset <Offset,Offset+Scale,Offset+2*Scale,..> pattern, which will lower to a single additional bitshift/pshufd.
At the moment I've limited this to cases where the LHS/RHS operands are concatenated for free, but this is only to avoid a couple of regressions that should be easily addressable in followups.
Avoid some getSExtValue()/getZExtValue() calls
Hopefully we can remove some of the getBitWidth() constraints as well, as many are just there as a proxy for legal types (albeit assuming x86_64).
There might not be any exposed alu pipe counters for us to measure - but the sum of load/store uop counters seems to give a really good approximation to memory controller usage - even for more complex instructions like cmpxchg
Now that exegesis produces meaningful snippets to measure throughtput
for instructions with tied operands:
2ffe225d11
the measurements clearly show these instructions to have
more optimistic throughtput.
There's still some noise in the reports, especially around instructions
with memory operands. I'm not sure if we measure those correctly.
Fixes https://github.com/llvm/llvm-project/issues/59325
This was declared in FeatureX86_64 but never defined (we use the *64BitMode predicates for instruction defs - but now we need it for scheduler model defs).
Noticed while preparing to add Unsupported features handling to X86 scheduler models.