https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive
tail duplication for computed-gotos in both pre- and post-regalloc tail
duplication.
In some cases, performing tail-duplication too early can lead to worse
results, especially if we duplicate blocks with a number of phi nodes.
This is causing a ~3% performance regression in some workloads using
Python 3.12.
This patch updates TailDup to delay aggressive tail-duplication for
computed gotos to after register allocation.
This means we can keep the non-duplicated version for a bit longer
throughout the backend, which should reduce compile-time as well as
allowing a number of optimizations and simplifications to trigger before
drastically expanding the CFG.
For the case in https://github.com/llvm/llvm-project/issues/106846, I
get the same performance with and without this patch on Skylake.
PR: https://github.com/llvm/llvm-project/pull/150911
This patch adds support for lowering several float classification ops
from the Math dialect to the SPIR-V dialect.
### Highlights:
- Introduced a new `spirv.IsFinite` operation corresponding to the
SPIR-V `OpIsFinite` instruction.
- Lowered `math.isfinite`, `math.isinf`, and `math.isnan` to SPIR-V
using `CheckedElementwiseOpPattern`.
- Added corresponding tests for op definition and conversion lowering.
This addresses the discussion in:
https://github.com/llvm/llvm-project/issues/150778
---
Let me know if any additional adjustments are needed!
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
In #112303 they were enabled for Windows dylib builds, but the condition
excluded Cygwin in that case. Their dynamic libraries work the same way,
so adjust the condition accordingly.
This fixes the plugin tests on Cygwin.
When building with -fdebug-compilation-dir/-fcoverige-compilation-dir,
infer the compilation directory in clang driver, rather than try to
fallback to VFS current working directory lookup during CodeGen. This
allows compilation directory to be used as it is passed via cc1 flag and
the value can be empty to remove dependency on CWD if needed.
Reviewers: adrian-prantl, dwblaikie
Reviewed By: adrian-prantl, dwblaikie
Pull Request: https://github.com/llvm/llvm-project/pull/150112
Implements the `inferResultRanges` method from the
`InferIntRangeInterface` interface for `vector.step`. The implementation
is similar to that of arith.constant, since the exact result values are
statically known.
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Previously, when dwim-print finds a frame variables, it returns immediately after
calling `Dump`, even if `Dump` returns an error. This is most likely to happen when
evaluating an object description, ie `po`.
This changes dwim-print to continue on to expression evaluation when `Dump`ing a
variable returns an error . This is to allow for diagnostics that match `expression`.
%T has been deprecated for about seven years since it is not unique to
each test and can thus lead to races. This patch removes uses of %T from
clang-tools-extra with the eventual goal of removing support for %T from
lit.
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
This reduces the amount of boilerplate required when adding a new
field to MIMetadata and reduces the chance of bugs like the
one I fixed in TargetInstrInfo::reassociateOps.
Reviewers: arsenm, nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/133535
Currently terminatorIsComputedGoto will return for blocks with a
indirect branch terminator and no successor. If there are no successor,
the terminator is likely not a computed goto, return false in that case.
Note that this is currently NFC, as the only use checks it only if there
are successors, but it will be needed in
https://github.com/llvm/llvm-project/pull/150911.
PR: https://github.com/llvm/llvm-project/pull/151342
Implements the `inferResultRanges` method from the
`InferIntRangeInterface` interface for `vector.transpose`. The result
ranges simply match the source ranges.
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
These changes allow LLVM and Clang to be built with Clang targeting
Arm64EC using the MSVC linker.
Built with these options:
```
-DLLVM_ENABLE_PROJECTS="clang"
-DLLVM_HOST_TRIPLE=arm64ec-pc-windows-msvc
-DCMAKE_C_COMPILER=clang-cl.exe
-DCMAKE_C_COMPILER_TARGET=arm64ec-pc-windows-msvc
-DCMAKE_CXX_COMPILER=clang-cl.exe
-DCMAKE_CXX_COMPILER_TARGET=arm64ec-pc-windows-msvc
-DCMAKE_LINKER_TYPE=MSVC
```
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
Similar to #150639 this fixes the AggressiveInstCombine fold for convert
tables to cttz instructions if the gep types are not array types. i.e
`gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.
I'm not sure how libc should be configuring this in bazel, but for now
it seems like `LIBC_THREAD_MODE_PLATFORM` restores the default behavior.
Defining this param is required:
```
ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/c9d34ded3a9d94cc250207948aceadfc/external/llvm-project/libc/BUILD.bazel:5788:14: Compiling libc/src/stdio/generic/vprintf.cpp failed: (Exit 1): clang failed: error executing CppCompile command (from target @@llvm-project//libc:vprintf) /usr/lib/llvm-18/bin/clang -U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign -Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics -fno-omit-frame-pointer ... (remaining 31 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/llvm-project/libc/src/stdio/generic/vprintf.cpp:11:
In file included from external/llvm-project/libc/src/__support/File/file.h:19:
external/llvm-project/libc/src/__support/threads/mutex.h:25:2: error: LIBC_THREAD_MODE is undefined
25 | #error LIBC_THREAD_MODE is undefined
| ^
```
As an alternative to this PR, we could make the libc header default to
`LIBC_THREAD_MODE_PLATFORM` if not provided, instead of an `#error`
We extract the binding logic out of the DXILResource analysis passes into the
FrontendHLSL library. This will allow us to use this logic for resource and
root signature bindings in both the DirectX backend and the HLSL frontend.
When adding a predicated field to non-attribute properties /
implemneting PropConstraint, a call to genPropertyVerifiers() wasn't
added to the generation sequence for [Op]GenericAdaptor::verify. This
commit fixes the issue.
This PR adds the implementation of the `ComputeVolatileBitfields`
function for the AAPCS ABI, following the rules described in [AAPCS64
§8.1.8.5 Volatile
Bit-fields](f52e1ad3f8/aapcs64/aapcs64.rst (8185volatile-bit-fields----preserving-number-and-width-of-container-accesses)).
When accessing a volatile bit-field either reading or writing the
compiler must perform a load or store using the access size that matches
the width of the declared type (i.e., the type of the container), rather
than the packed bit-field size.
For example, if a field is declared as `int`, it must read or write 32
bits, even if the bit-field is only 3 bits wide.
The `ComputeVolatileBitfields` function calculates the correct values
and offsets necessary for proper lowering of volatile bitfields.
Support for emitting calls to `get_bitfield` and `set_bitfield` with the
correct access size for volatile bitfields will be implemented in a
future PR.
Noticed this when checking the invariant that all phis in the header
block must be header phis. I think there's a missing set of parentheses
here, since otherwise it only cast<VPInstruction> when RecipeI isn't a
VPInstruction.
This patch adds or completes support for a couple of top level
declaration types that don't emit any code: Most notably these include
Concepts, static_assert and type aliases.
This patch removes deprecated methods in CustomizableOptional.
These methods come from llvm::Optional, which later got renamed to
CustomizableOptional. Since CustomizableOptional is used only in
OptionalDirectoryEntryRef and OptionalFileEntryRef, we do not need to
have full std::optional-like support in CustomizableOptional.
This switches to `makeIntrusiveRefCnt<FileSystem>` where creating a new
object, and to passing/returning by `IntrusiveRefCntPtr<FileSystem>`
instead of `FileSystem*` or `FileSystem&`, when dealing with existing
objects.
Part of cleanup #151026.