Make IntrinsicsToAttributesMap's func. and arg. fields be able to have
adaptive sizes based on input other than hardcoded 8bits/8bits.
This will ease the pressure for adding new intrinsics in private
downstreams.
func. attr bitsize will become 7(127/128) vs 8(255/256)
This adds value types for representing capability types, enabling their use in instruction selection and other parts of the backend.
These types are distinguished from each other only by size. This is sufficient, at least today, because no existing CHERI configuration supports multiple capability sizes simultaneously. Hybrid configurations supporting intermixed integral pointers and capabilities do exist, and are one of the reasons why these value types are needed beyond existing integral types.
Co-authored-by: David Chisnall <theraven@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.
To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.
`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.
While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
We were sizing the table appropriately for the number of LibcallImpls,
but many of those have identical names which were pushing up the
collision count unnecessarily. This ends up decreasing the table size
slightly, and makes it a bit faster.
BM_LookupRuntimeLibcallByNameRandomCalls improves by ~25% and
BM_LookupRuntimeLibcallByNameSampleData by ~5%.
As a secondary change, align the table size up to the next
power of 2. This makes the table larger than before, but improves
the sample data benchmark by an additional 5%.
Also starts pruning out these calls if the exception model is
forced to none.
I worked backwards from the logic in addPassesToHandleExceptions
and the pass content. There appears to be some tolerance
for mixing and matching exception modes inside of a single module.
As far as I can tell _Unwind_CallPersonality is only relevant for
wasm, so just add it there.
As usual, the arm64ec case makes things difficult and is
missing test coverage. The set of calls in list form is necessary
to use foreach for the duplication, but in every other context a
dag is more convenient. You cannot use foreach over a dag, and I
haven't found a way to flatten a dag into a list.
This removes the last manual setLibcallImpl call in generic code.
…210)"
This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.
Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"
This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.
Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"
This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.
Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
Avoids strlen when constructing the returned StringRef. We were emitting
these in the libcall name lookup anyway, so split out the offsets for
general use.
Currently emitted as a separate table, not sure if it would be better
to change the string offset table to store pairs of offset and width
instead.
a96121089b9c94e08c6632f91f2dffc73c0ffa28 reverted a change
to use a binary search on the string name table because it
was too slow. This replaces it with a static string hash
table based on the known set of libcall names. Microbenchmarking
shows this is similarly fast to using DenseMap. It's possibly
slightly slower than using StringSet, though these aren't an
exact comparison. This also saves on the one time use construction
of the map, so it could be better in practice.
This search isn't simple set check, since it does find the
range of possible matches with the same name. There's also
an additional check for whether the current target supports
the name. The runtime constructed set doesn't require this,
since it only adds the symbols live for the target.
Followed algorithm from this post
http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html
I'm also thinking the 2 special case global symbols should
just be added to RuntimeLibcalls. There are also other global
references emitted in the backend that aren't tracked; we probably
should just use this as a centralized database for all compiler
selected symbols.
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
This change implements several small improvements to
`Intrinsic::getAttributes`:
1. Use `SequenceToOffsetTable` to emit `ArgAttrIdTable`. This enables
reuse of entries when they share a common prefix. This reduces the size
of this table from 546 to 484 entries, which is 248 bytes.
2. Fix `AttributeComparator` to purely compare argument attributes and
not look at function attributes. This avoids unnecessary duplicates in
the uniqueing process and eliminates 2 entries from
`ArgAttributesInfoTable`, saving 8 bytes.
3. Improve `Intrinsic::getAttributes` code to not initialize all entries
of `AS` always. Currently, we initialize all entries of the array `AS`
even if we may not use all of them. In addition to the runtime cost, for
Clang release builds, since the initialization loop is unrolled, it
consumes ~330 bytes of code to initialize the `AS` array. Address this
by declaring the storage for AS using just a char array with appropriate
`alignas` (similar to how `SmallVectorStorage` defines its inline
elements).
This a follow on to https://github.com/llvm/llvm-project/pull/152219 to
reduce both code and frame size of `Intrinsic::getAttributes` further.
Currently, this function consists of several switch cases (one per
unique argument attributes) that populates the local `AS` array with all
non-empty argument attributes for that intrinsic by calling
`getIntrinsicArgAttributeSet`. This change makes this code table driven
and implements `Intrinsic::getAttributes` without any switch cases,
which reduces the code size of this function on all platforms and in
addition reduces the frame size by a factor of 10 on Windows.
This is achieved by:
1. Emitting table `ArgAttrIdTable` containing a concatenated list of
`<ArgNo, AttrID>` entries across all unique arguments.
2. Emitting table `ArgAttributesInfoTable` (indexed by unique
arguments-ID) to store the starting index and number of non-empty arg
attributes.
3. Reserving unique function-ID 255 to indicate that the intrinsic has
no function attributes (to replace `HasFnAttr` setup in each switch
case).
4. Using a simple table lookup and for loop to build the list of
argument and function attributes for a given intrinsic.
Experimental data shows that with release builds and assertions
disabled, this change reduces the code size for GCC and Clang builds on
Linux by ~9KB for a modest (80/152 byte) increase in frame size. For
Windows, it reduces the code size by 20KB and frame size from 4736 bytes
to 461 bytes which is 10x reduction. Actual data is as follows:
```
Current trunk:
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
code size 0x35a9 0x370c 0x5581
frame size 0x120 0x118 0x1280
table driven Intrinsic::getAttributes:
code size 0xcfb 0xcd0 0x1cf
frame size 0x1b8 0x188 0x1A0
Total savings (code + data) 9212 bytes 9790 bytes 20119 bytes
```
Total savings above accounts for the additional data size for the 2 new
tables, which in this experiment was: `ArgAttributesInfoTable` = 314
bytes and `ArgAttrIdTable` = 888 bytes. Coupled with the earlier
https://github.com/llvm/llvm-project/pull/152219, this achieves a 46x
reduction in frame size for this function in Windows release builds.
This change fixes a stack size regression that got introduced in
0de0354aa8.
That change did 2 independent things:
1. Uniquify argument and function attributes separately so that we
generate a smaller number of unique sets as opposed to uniquifying them
together. This is beneficial for code size.
2. Eliminate the fixed size array `AS` and `NumAttrs` variable and
instead build the returned AttribteList in each case using an
initializer list.
The second part seems to have caused a regression in the stack size
usage of this function for Windows. This change essentially undoes part
2 and reinstates the use of the fixed size array `AS` which fixes this
stack size regression. The actual measured stack frame size for this
function before/after this change is as follows:
```
Current trunk data for release build (x86_64 builds for Linux, x86 build for Windows):
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
DLLVM_ENABLE_ASSERTIONS=OFF 0x120 0x110 0x54B0
DLLVM_ENABLE_ASSERTIONS=ON 0x2880 0x110 0x5250
After applying the fix:
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
DLLVM_ENABLE_ASSERTIONS=OFF 0x120 0x118 0x1240h
DLLVM_ENABLE_ASSERTIONS=ON 0x120 0x118 0x1240h
```
Note that for Windows builds with assertions disabled, the stack frame
size for this function reduces from 21680 to 4672 which is a 4.6x
reduction. Stack frame size for GCC build with assertions also improved
and clang builds are unimpacted. The speculation is that clang and gcc
is able to reuse the stack space across these switch cases better with
existing code, but MSVC is not, and re-introducing the `AS` variable
forces all cases to use the same local variable, addressing the stack
space regression.
Hack in the default setting so it's consistently generated like
the other cases. Maintain a list of targets where this applies.
The alternative would require new infrastructure to sort the system
library initialization in some way.
I wanted the unhandled target case to be treated as a fatal
error, but it turns out there's a hack in IRSymtab using
RuntimeLibcalls, which will fail out in many tests that
do not have a triple set. Many of the failures are simply
running llvm-as with no triple, which probably should not
depend on knowing an accurate set of calls.
Also replace the current static DenseMap of preserved symbol
names in the Symtab hack with this. That was broken statefulness
across compiles, so this at least fixes that. However this is
still broken, llvm-as shouldn't really depend on the triple.
Additionally, add sentinel values <Enum>::First_ and <Enum>::Last_ to
each one of those enums.
This will allow using `enum_seq_inclusive` to generate the list of
enum-typed values of any generated scoped (non-bitmask) enum.
Make `AppendZero` a class member instead of an argument to
`GetOrAddStringOffset` to reflect the intended usage that for a given
`StringToOffsetTable`, all strings must use the same value of
`AppendZero`.
Modify `EmitStringTableDef` to drop the `Indent` argument as its always
set to `""`, and to fail if it's called for a table with
non-null-terminated strings.
Allow associating a non-default CallingConv with a set of library
functions, and applying a default for a SystemLibrary.
I also wanted to be able to apply a default calling conv
to a RuntimeLibcallImpl, but that turned out to be annoying
so leave it for later.
Add a way to define a SystemLibrary for a complete set of
libcalls, subdivided by a predicate based on the triple.
Libraries can be defined using dag set operations, and the
prior default set can be subtracted from and added to (though
I think eventually all targets should move to explicit opt-ins.
We're still doing things like reporting ppcf128 libcalls as
available dy default on all targets).
Start migrating some of the easier targets to only use the new
system. Targets that don't define a SystemLibrary are still
manually mutating a table set to the old defaults.
Work towards separating the ABI existence of libcalls vs. the
lowering selection. Set libcall selection through enums, rather
than through raw string names.
Replace RuntimeLibcalls.def with a tablegenerated version. This
is in preparation for splitting RuntimeLibcalls into two components.
For now match the existing functionality.
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
In "get<lang>DirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "get<lang>DirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling. This
applies to clauses as well. In general it applies to classes that have
spellings (defined via TableGen class "Spelling").
Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of
llvm/Frontent was added: LLVMFrontendDirective. The corresponding header
file also defines C++ classes "Spelling" and "VersionRange", which are
used in TableGen/DirectiveEmitter as well.
For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
Use the spellings in the generated clause parser. The functions
`get<lang>ClauseKind` and `get<lang>ClauseName` are not yet updated.
The definitions of both clauses and directives now take a list of
"Spelling"s instead of a single string. For example
```
def ACCC_Copyin : Clause<[Spelling<"copyin">,
Spelling<"present_or_copyin">,
Spelling<"pcopyin">]> { ... }
```
A "Spelling" is a versioned string, defaulting to "all versions".
For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
The code in DirectiveEmitter that generates clause parsers sorted clause
names to ensure that longer names were tried before shorter ones, in
cases where a shorter name may be a prefix of a longer one. This matters
in the strict Fortran source format, since whitespace is ignored there.
This sorting did not take into account clause aliases, which are just
alternative names. These extra names were not protected in the same way,
and were just appended immediately after the primary name.
This patch generates a list of pairs Record+Name, where a given record
can appear multiple times with different names. Sort that list and use
it to generate parsers for each record. What used to be
```
("fred" || "f") >> construct<SomeClause>{} ||
"foo" << construct<OtherClause>{}
```
is now
```
"fred" >> construct<SomeClause>{} ||
"foo" >> construct<OtherClause>{} ||
"f" >> construct<SomeClause>{}
```
There were 3 different functions in DirectiveEmitter.cpp doing
essentially the same thing: taking a name separated with _ or whitepace,
and converting it to the upper-camel case. Extract that into a single
function that can handle different sets of separators.
The class "ClauseVal" actually represents a definition of an enumeration
value, and in itself it is not bound to any clause. Rename it to EnumVal
and add a comment clarifying how it's translated into an actual enum
definition in the generated source code.
There is no change in functionality.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Remove extraneous qualifications from names when
- the name is explicitly enclosed by corresponding namespaces, and
- the name is in a body of a function defined in corresponding
namespaces. Otherwise add missing qualifications.
This applies to individual sections of TableGen output, and makes name
lookup independent of the context in which these sections are included.
Rename `ListInit::getValues()` to `getElements()` to better match with
other `ListInit` members like `getElement`. Keep `getValues()` for
existing downstream code but mark it deprecated.
The official languages that OpenMP recognizes are C/C++ and Fortran.
Some OpenMP directives are language-specific, some are C/C++-only, some
are Fortran-only.
Add a property to the TableGen definition of Directive that will be the
list of languages that allow the directive.
The TableGen backend will then generate a bitmask-like enumeration
SourceLanguages, and a function
SourceLanguages getDirectiveLanguages(Directive D);