This partially reverts commit 07ff786e39e2190449998d3af1000454dee501be.
The hunk being reverted in this patch seems to break:
tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml
under LLVM_ENABLE_EXPENSIVE_CHECKS.
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the
buffer load docs under DirectX/DXILResources.
This resolves the "load" parts of #106188
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:
"Among any two versions, the higher priority version is determined by
identifying the highest priority feature that is specified in exactly
one of the versions, and selecting that version."
When importing nested patterns, we create InsnMatcher for each pattern
and miss them if consider only the top level InsnMatcher. Iterate
PhysRegOperands instead.
Change the type of PhysRegOperands from DenseMap to SmallMapVector to
have stable generation. Also drop PhysRegInputs member from InsnMatcher
as there are no users of it.
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.
While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.
I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.
This reverts commit f6cb56902c6dcafede21eb6662910b6ff661fc0f.
Buildbot failures such as https://lab.llvm.org/buildbot/#/builders/89/builds/13541:
```
/usr/bin/ld: utils/TableGen/Basic/CMakeFiles/obj.LLVMTableGenBasic.dir/ARMTargetDefEmitter.cpp.o: undefined reference to symbol '_ZN4llvm23EnableABIBreakingChecksE'
/usr/bin/ld: /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/./lib/libLLVMSupport.so.20.0git: error adding symbols: DSO missing from command line
```
Going to investigate.
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.
While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.
I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.
Types used in the destination DAG of a pattern should not matter for
GlobalISel. All necessary checks are emitted in the form of matchers
when traversing the source DAG.
In particular, the check prevented importing patterns containing iPTR in
the middle of the destination DAG.
This reduces the number of skipped patterns on Mips and RISCV:
```
Mips 1270 -> 1212 (-58)
RISCV 42165 -> 42088 (-77)
```
Most of these patterns are for atomic operations.
Split importExplicitUseRenderer into several smaller functions and
add a bunch of TODOs and FIXMEs.
This is an NFCI change to simplify review of future functional changes.
Pull Request: https://github.com/llvm/llvm-project/pull/121071
Sub-instruction can have a def with the same name as a def in a
top-level instruction.
Previously this could result in both defs copied to the instruction
being built.
The existing test case is not representative. Even though TableGen
doesn't complain, the code generated from it is invalid and fails
verification with the message "Use not jointly dominated by defs.".
There is no way to magically transform `frameindex` to `tframeindex`
as it happens for some other leaf nodes. `frameindex` can only be
selected by custom C++ code or by using an `SDNodeXForm`.
This patch makes the test representative one and fixes the handling of
`G_FRAME_INDEX`, which shouldn't have set the operand's name.
It also fixes the type of the result of `G_FRAME_INDEX` in order to get
the correct type check (`GIM_CheckPointerToAny` instead of
`GIM_CheckType` with a scalar LLT argument).
Add const to `import*Renderer` member functions and recursively to
functions called from them.
I didn't do that for `import*Matcher` functions because they mutate
class variables.
The number of skipped patterns reduces for ARM from 4278 to 4257.
This is the only in-tree target that makes use of OptionalDefOperand.
Pull Request: https://github.com/llvm/llvm-project/pull/120470
The last uses were removed in #120332 and #120426.
When emitting renderers, we shouldn't look at the source DAG at all. The
required information is provided by the destination DAG and by the
instructions referenced in that DAG. Sometimes, we do want to know if a
result was referenced in the source DAG; this can be checked by calling
`RuleMatcher::hasOperand`. Any other use of the source DAG when emitting
renderers is likely an error.
Pull Request: https://github.com/llvm/llvm-project/pull/120445
A dead implicit def wasn't marked as dead if it is also an implicit use.
The new approach should also be more straightforward and simplifies
future changes for supporting optional defs and physical register defs.
Pull Request: https://github.com/llvm/llvm-project/pull/120426
Previously, if the destination DAG has an untyped leaf, we would import
the pattern only if that leaf is defined by the *top-level* source DAG.
This is an unnecessary restriction.
Here is an example of such pattern:
```
def : Pat<(add (mul v8i16:$vA, v8i16:$vB), v8i16:$vC),
(VMLADDUHM $vA, $vB, $vC)>;
```
Previously, it failed to import because `add` doesn't define neither
`$vA` nor `$vB`.
This change reduces the number of skipped patterns as follows:
```
AArch64: 8695 -> 8548 (-147)
AMDGPU: 11333 -> 11240 (-93)
ARM: 4297 -> 4278 (-1)
PowerPC: 3955 -> 3010 (-945)
```
Other GISel-enabled targets are unaffected.
Add some comments that hopefully clarify a few things.
This was supposed to be NFC, but there is a difference in the inferred
register class for EXTRACT_SUBREG.
Pull Request: https://github.com/llvm/llvm-project/pull/120135
Some clients do not want to emit a terminator after each sub-sequence
(they have other means of determining the length of sub-sequences).
This moves `Term` argument from `emit` method to the constructor and
makes it optional. It couldn't be made optional while still on the
`emit` method because if the terminator wasn't specified, it has to be
taken into account in `layout` method as well.
The fact that `layout` method was called is now recorded in a dedicated
member variable, `IsLaidOut`. `Entries != 0` can no longer be used to
reliably check if `layout` method was called because it may be zero for
a different reason: the terminator wasn't specified and all added
sequences (if any) were empty.
This reduces the size of `*LaneMaskLists` and `*SubRegIdxLists` a bit
and resolves the removed TODO.
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.
This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.
The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.
Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.
Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
These properties are only valid on ComplexPatterns. Having them as flags
is more convenient because one can now use "let = ... in" syntax to set
these flags on several patterns at a time. This is also less error-prone
as it makes it impossible to specify these properties on records derived
from SDPatternOperator.
Pull Request: https://github.com/llvm/llvm-project/pull/119599
This avoids the need to dynamically relocate each pointer in the table.
To make this work, this PR also moves the binary search of intrinsic
names to an internal function with an adjusted signature, and switches
the unittesting to test against actual intrinsics.
Previously the check comments indicated that [pi][0-9]+ would match as a
type suffix, however the check itself was looking for [pi][0-9]* and
hence an 'i' suffix in isolation was being considered as a type suffix
despite it not having a bitwidth.
This change makes the check consistent with the comment and looks for
[pi][0-9]+
fixes#112974
partially fixes#70103
An earlier version of this change was reverted so some issues could be fixed.
### Changes
- Added new tablegen based way of lowering dx intrinsics to DXIL ops.
- Added int_dx_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsDirectX.td
- Added expansion for int_dx_group_memory_barrier_with_group_sync in
DXILIntrinsicExpansion.cpp`
- Added DXIL backend test case
### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
https://github.com/llvm/llvm-project/commit/b71704436e61
Rewrote the register operands handling,
but the Table only contains physical regs, we will SEGV when there are
non physical regs.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
Implement isSubclass with direct lookup into some tables instead of
nested switches.
Part of the motivation for this is improving compile time when clang-18
is used as a host compiler, since it seems to have trouble with very
large switch statements.
Implement the register operand handling in validateOperandClass with a
table lookup instead of a potentially huge switch.
Part of the motivation for this is improving compile time when clang-18
is used as a host compiler, since it seems to have trouble with very
large switch statements.
The issue with slow compile-time was caused by an assert in
AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked'
after adding all the reserved registers. This call gets very expensive
after adding the _HI registers due to the way the function searches
in the 'Exception' list, which is expected to be a small list but isn't
(the patch added 190 _HI regs).
It was possible to rewrite the code in such a way that the _HI registers
are marked as reserved after the check. This makes the problem go away
entirely and restores compile-time to what it was before (tested for
`check-runtimes`, which previously showed a ~5x slowdown).
This reverts commits:
1434d2ab215e3ea9c5f34689d056edd3d4423a78
2704647fb7986673b89cef1def729e3b022e2607
This generated comments like:
// 'BoolReg' class
case MCK_BoolReg: {
which seem redundant because the name is always repeated on the next
line as part of the MCK_ enumerator.
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.
multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076