Make IntrinsicsToAttributesMap's func. and arg. fields be able to have
adaptive sizes based on input other than hardcoded 8bits/8bits.
This will ease the pressure for adding new intrinsics in private
downstreams.
func. attr bitsize will become 7(127/128) vs 8(255/256)
This change implements several small improvements to
`Intrinsic::getAttributes`:
1. Use `SequenceToOffsetTable` to emit `ArgAttrIdTable`. This enables
reuse of entries when they share a common prefix. This reduces the size
of this table from 546 to 484 entries, which is 248 bytes.
2. Fix `AttributeComparator` to purely compare argument attributes and
not look at function attributes. This avoids unnecessary duplicates in
the uniqueing process and eliminates 2 entries from
`ArgAttributesInfoTable`, saving 8 bytes.
3. Improve `Intrinsic::getAttributes` code to not initialize all entries
of `AS` always. Currently, we initialize all entries of the array `AS`
even if we may not use all of them. In addition to the runtime cost, for
Clang release builds, since the initialization loop is unrolled, it
consumes ~330 bytes of code to initialize the `AS` array. Address this
by declaring the storage for AS using just a char array with appropriate
`alignas` (similar to how `SmallVectorStorage` defines its inline
elements).
This a follow on to https://github.com/llvm/llvm-project/pull/152219 to
reduce both code and frame size of `Intrinsic::getAttributes` further.
Currently, this function consists of several switch cases (one per
unique argument attributes) that populates the local `AS` array with all
non-empty argument attributes for that intrinsic by calling
`getIntrinsicArgAttributeSet`. This change makes this code table driven
and implements `Intrinsic::getAttributes` without any switch cases,
which reduces the code size of this function on all platforms and in
addition reduces the frame size by a factor of 10 on Windows.
This is achieved by:
1. Emitting table `ArgAttrIdTable` containing a concatenated list of
`<ArgNo, AttrID>` entries across all unique arguments.
2. Emitting table `ArgAttributesInfoTable` (indexed by unique
arguments-ID) to store the starting index and number of non-empty arg
attributes.
3. Reserving unique function-ID 255 to indicate that the intrinsic has
no function attributes (to replace `HasFnAttr` setup in each switch
case).
4. Using a simple table lookup and for loop to build the list of
argument and function attributes for a given intrinsic.
Experimental data shows that with release builds and assertions
disabled, this change reduces the code size for GCC and Clang builds on
Linux by ~9KB for a modest (80/152 byte) increase in frame size. For
Windows, it reduces the code size by 20KB and frame size from 4736 bytes
to 461 bytes which is 10x reduction. Actual data is as follows:
```
Current trunk:
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
code size 0x35a9 0x370c 0x5581
frame size 0x120 0x118 0x1280
table driven Intrinsic::getAttributes:
code size 0xcfb 0xcd0 0x1cf
frame size 0x1b8 0x188 0x1A0
Total savings (code + data) 9212 bytes 9790 bytes 20119 bytes
```
Total savings above accounts for the additional data size for the 2 new
tables, which in this experiment was: `ArgAttributesInfoTable` = 314
bytes and `ArgAttrIdTable` = 888 bytes. Coupled with the earlier
https://github.com/llvm/llvm-project/pull/152219, this achieves a 46x
reduction in frame size for this function in Windows release builds.
This change fixes a stack size regression that got introduced in
0de0354aa8.
That change did 2 independent things:
1. Uniquify argument and function attributes separately so that we
generate a smaller number of unique sets as opposed to uniquifying them
together. This is beneficial for code size.
2. Eliminate the fixed size array `AS` and `NumAttrs` variable and
instead build the returned AttribteList in each case using an
initializer list.
The second part seems to have caused a regression in the stack size
usage of this function for Windows. This change essentially undoes part
2 and reinstates the use of the fixed size array `AS` which fixes this
stack size regression. The actual measured stack frame size for this
function before/after this change is as follows:
```
Current trunk data for release build (x86_64 builds for Linux, x86 build for Windows):
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
DLLVM_ENABLE_ASSERTIONS=OFF 0x120 0x110 0x54B0
DLLVM_ENABLE_ASSERTIONS=ON 0x2880 0x110 0x5250
After applying the fix:
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
DLLVM_ENABLE_ASSERTIONS=OFF 0x120 0x118 0x1240h
DLLVM_ENABLE_ASSERTIONS=ON 0x120 0x118 0x1240h
```
Note that for Windows builds with assertions disabled, the stack frame
size for this function reduces from 21680 to 4672 which is a 4.6x
reduction. Stack frame size for GCC build with assertions also improved
and clang builds are unimpacted. The speculation is that clang and gcc
is able to reuse the stack space across these switch cases better with
existing code, but MSVC is not, and re-introducing the `AS` variable
forces all cases to use the same local variable, addressing the stack
space regression.
Make `AppendZero` a class member instead of an argument to
`GetOrAddStringOffset` to reflect the intended usage that for a given
`StringToOffsetTable`, all strings must use the same value of
`AppendZero`.
Modify `EmitStringTableDef` to drop the `Indent` argument as its always
set to `""`, and to fail if it's called for a table with
non-null-terminated strings.
Rename `ListInit::getValues()` to `getElements()` to better match with
other `ListInit` members like `getElement`. Keep `getValues()` for
existing downstream code but mark it deprecated.
Add support for specifying range attributes in Intrinsics.td. Use this
to specify the ucmp/scmp range [-1,2).
This case is trickier than existing intrinsic attributes, because we
need to create the attribute with the correct bitwidth. As such, the
attribute construction now needs to be aware of the function type.
We also need to be careful to no longer assign attributes on intrinsics
with invalid signatures, as we'd make invalid assumptions about the
number of arguments etc otherwise.
Fixes https://github.com/llvm/llvm-project/issues/130179.
Most places that call Intrinsic::getAttributes() are only interested in
the function attributes, so add a separate function for that.
The motivation for this is that I'd like to add the ability to specify
range attributes on intrinsics, which requires knowing the function
type. This avoids needing to know the type for most attribute queries.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
Historically, the main example of *very* large string tables used the
`EmitCharArray` to work around MSVC limitations with string literals,
but that was switched (without removing the API) in order to consolidate
on a nicer emission primitive.
While this large string table in `IntrinsicsImpl.inc` seems to compile
correctly on MSVC without the work around in `EmitCharArray` (and that
this PR adds back to the nicer emission path), other users have
repeatedly hit this MSVC limitation as you can see in the discussion on
PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the
string offset table emission to look at
the size of the table and switch to the char array emission strategy
when the table becomes too large.
This work around does have the downside of making compile times worse
for large string tables, but that appears unavoidable until we can
identify known good MSVC versions and switch to requiring them for all
LLVM users. It also reduces searchability of the generated string table
-- I looked at emitting a comment with each string but it is tricky
because the escaping rules for an inline comment are different from
those of of a string literal, and there's no real way to turn the string
literal into a comment.
While improving the output in this way, also clean up the output to not
emit an extraneous empty string at the end of the string table, and
update the `StringTable` class to not look for that. It isn't actually
used by anything and is wasteful.
This PR also switches the `IntrinsicsImpl.inc` string tables over to the
new `StringTable` runtime abstraction. I didn't want to do this until
landing the MSVC workaround in case it caused even this example to start
hitting the MSVC bug, but I wanted to switch here so that I could
simplify the API for emitting the string table with the workaround
present. With the two different emission strategies, its important to
use a very exact syntax and that seems better encapsulated in the API.
Last but not least, the `SDNodeInfoEmitter` is updated, including its
tests to match the new output.
This PR should unblock landing
https://github.com/llvm/llvm-project/pull/120534 and letting us switch
all of
Clang's builtins to use string tables. That PR has all the details
motivating the overall effort.
Follow-up patches will try to consolidate the remaining users onto the
single interface, but those at least were easy to separate into
follow-ups and keep this PR somewhat smaller.
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.
While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.
I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.
This reverts commit f6cb56902c6dcafede21eb6662910b6ff661fc0f.
Buildbot failures such as https://lab.llvm.org/buildbot/#/builders/89/builds/13541:
```
/usr/bin/ld: utils/TableGen/Basic/CMakeFiles/obj.LLVMTableGenBasic.dir/ARMTargetDefEmitter.cpp.o: undefined reference to symbol '_ZN4llvm23EnableABIBreakingChecksE'
/usr/bin/ld: /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/./lib/libLLVMSupport.so.20.0git: error adding symbols: DSO missing from command line
```
Going to investigate.
All the sources of `llvm-min-tblgen` are also used for `llvm-tblgen`,
with identical compilation flags. Reuse the object files of
`llvm-min-tblgen` for `llvm-tblgen` by applying the usual source
structure of an executable: One file per executable which named after
the executable name containing the (in this case trivial) main function,
which just calls the tblgen_main in TableGen.cpp. This should also clear
up any confusion (including mine) of where each executable's main
function is.
While this slightly reduces build time, the main motivation is ccache.
Using the hard_link
option, building the object files for `llvm-tblgen` will result in a
hard link to the same object file already used for `llvm-min-tblgen`. To
signal the build system that the file is new, ccache will update the
file's time stamp. Unfortunately, time stamps are shared between all
hard-linked files s.t. this will indirectly also update the time stamps
for the object files used for `llvm-tblgen`. At the next run, Ninja will
recognize this time stamp discrepancy to the expected stamp recorded in
`.ninja_log` and rebuild those object files for `llvm-min-tblgen`, which
again will also update the stamp for the `llvm-tblgen`... . This is
especially annoying for tablegen because it means Ninja will re-run all
tablegenning in every build.
I am using the hard_link option because it reduces the cost of having
multiple build-trees of the LLVM sources and reduces the wear to the SSD
they are stored on.