This patch makes it so that generate_report will add information about
failed build actions to the summary report. This makes it significantly
easier to find compilation failures, especially given we run ninja with
-k 0.
This patch only does the integration into generate_report (along with
testing). Actual utilization in the script is split into a separate
patch to try and keep things clean.
Reviewers: dschuff, cmtice, DavidSpickett, Keenuts, lnihlen, gburgessiv
Reviewed By: cmtice, DavidSpickett
Pull Request: https://github.com/llvm/llvm-project/pull/152621
No behavioral change, but eliminates potential UB in strict-alignment
systems.
The previous commit (llvm#94171) bulk-updated alignment usage to C++23
spec, but missed this occurrence.
Fold `broadcast(shape_cast(x))` into `broadcast(x)` if the type of x is
compatible with broadcast's result type and the shape_cast only adds or removes ones in the leading dimensions.
---------
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Co-authored-by: James Newling <james.newling@gmail.com>
The CMake flag has been on by default for a month without any issues.
This makes the feature support in LLVM unconditional (but does not
enable the feature by default).
This patch updates the pmulhw/pmulhuw builtins to support constant
expression handling - extending the VectorExprEvaluator::VisitCallExpr
handling code that handles elementwise integer binop builtins.
Hopefully this can be used as reference patch to show how to add future
target specific constexpr handling with minimal code impact.
I've also enabled pmullw constexpr handling (which are tagged on
#152490) as they all use very similar tests.
I've also had to tweak the MMX -> SSE2 wrapper as undefs are not
permitted in constexpr shuffle masks
Fixes#152524
5fc3e76ec4f323c22cddf7b9458137510507847a made the pipelines fail on
errors and also removed the TODO comments, but did not remove the
explanatory comments on why things were set up that way. Given things no
longer succeed on error, these comments are outdated and should be
removed.
This patch adds in support for taking the CLI output from ninja and
parsing it for failures. This is intended to be used in the cases where
all tests pass (or none have run), but the build fails to easily surface
where exactly the build failed.
The actual integration will happen in a future patch.
Reviewers: gburgessiv, dschuff, lnihlen, DavidSpickett, Keenuts, cmtice
Reviewed By: DavidSpickett, cmtice
Pull Request: https://github.com/llvm/llvm-project/pull/152620
When InstTy is a type like IntrinsicInst which can have a variable
number of arguments, we can encounter a case where Operation will have
fewer than two arguments and error at the getOperand() calls.
Fixes: https://github.com/llvm/llvm-project/issues/152725.
When the target enables +sme2p2, the svcompact intrinsic is now
available in streaming SVE mode, through updating the guards in
arm_sve.td. Included Sema test acle_sve_compact.cpp.
This PR adds hardware-measured latencies for all instructions defined in
Section 12 of the RVV specification: "Vector Fixed-Point Arithmetic
Instructions" to the SpacemiT-X60 scheduling model.
This patch takes care of the highly mechanical part of proofreading
SourceLevelDebugging.rst, namely:
- hyphenating "32 bit value" and similar and
- hypenating "Objective C"
Intrinsic::getDeclaration has been deprecated for more than 9 months
since:
commit b9f08676abcfbb226c67b5ac2a7bc5b33254b915
Author: Rahul Joshi <rjoshi@nvidia.com>
Date: Mon Oct 14 19:21:28 2024 -0700
This patch removes it. I'm not aware of any downstream use AFAIK.
Still missing the "extend to 256-bit" casts - _mm256_castpd128_pd256 / _mm256_castps128_ps256 / _mm256_castsi128_si256 - due to constexpr not liking undefined/poison etc.
Followup work of #140498 to continue the work on clangd/clangd#529
Introduce the use of the Clang doxygen parser to parse the documentation
of hovered code.
- ASTContext independent doxygen parsing
- Parsing doxygen commands to markdown for hover information
Note: after this PR I have planned another patch to rearrange the
information shown in the hover info.
This PR is just for the basic introduction of doxygen parsing for hover
information.
---------
Co-authored-by: Maksim Ivanov <emaxx@google.com>
These commits continue the work done in
https://github.com/llvm/llvm-project/pull/144344, of adding alignment
attributes to operations in the vector and memref. These commits focus
on adding the alignment attribute to the `maskedload` and `maskedstore`
operations. The `VectorLoadConversion` pattern in VectorToLLVM is a
template for `load`, `store`, `maskedload` and `maskedstore` operations.
Having the alignment attribute in all these operations would allow for
an easy way to propagate the alignment attribute from the vector dialect
to the LLVM dialect.
This patchset also includes changes to the conversion from VectorToLLVM
to propagate the alignment attribute for the
vector.{,masked}{load,store} operations.
In visitShuffleVectorInst there's an if block that's meant to turn
shufflevector followed by bitcast into extractelement where possible.
It assumes that there will never be bitcasts performed on vectors of ptr
as such operations are almost always illegal, and ptrtoint instructions
should be used instead.
There is however an edge case where a bitcast instruction can be
performed on a vector of type `<1 x ptr>` to turn it into type `ptr`
In this edge case, the code initializes the variable `VecBitWidth` to 0.
Then, when iterating over users that are bitcasts, an attempt is made to
create a vector of size 0, which triggers and assert.
This commit changes initialization of `VecBitWidth` to use datalayout to
find the the size of the vector instead of getPrimitiveSizeInBits method
which results in 0 for ptr and vectors of ptr.
This a follow on to https://github.com/llvm/llvm-project/pull/152219 to
reduce both code and frame size of `Intrinsic::getAttributes` further.
Currently, this function consists of several switch cases (one per
unique argument attributes) that populates the local `AS` array with all
non-empty argument attributes for that intrinsic by calling
`getIntrinsicArgAttributeSet`. This change makes this code table driven
and implements `Intrinsic::getAttributes` without any switch cases,
which reduces the code size of this function on all platforms and in
addition reduces the frame size by a factor of 10 on Windows.
This is achieved by:
1. Emitting table `ArgAttrIdTable` containing a concatenated list of
`<ArgNo, AttrID>` entries across all unique arguments.
2. Emitting table `ArgAttributesInfoTable` (indexed by unique
arguments-ID) to store the starting index and number of non-empty arg
attributes.
3. Reserving unique function-ID 255 to indicate that the intrinsic has
no function attributes (to replace `HasFnAttr` setup in each switch
case).
4. Using a simple table lookup and for loop to build the list of
argument and function attributes for a given intrinsic.
Experimental data shows that with release builds and assertions
disabled, this change reduces the code size for GCC and Clang builds on
Linux by ~9KB for a modest (80/152 byte) increase in frame size. For
Windows, it reduces the code size by 20KB and frame size from 4736 bytes
to 461 bytes which is 10x reduction. Actual data is as follows:
```
Current trunk:
Compiler gcc-13.3.0 clang-18.1.3 MSVC 19.43.34810.0
code size 0x35a9 0x370c 0x5581
frame size 0x120 0x118 0x1280
table driven Intrinsic::getAttributes:
code size 0xcfb 0xcd0 0x1cf
frame size 0x1b8 0x188 0x1A0
Total savings (code + data) 9212 bytes 9790 bytes 20119 bytes
```
Total savings above accounts for the additional data size for the 2 new
tables, which in this experiment was: `ArgAttributesInfoTable` = 314
bytes and `ArgAttrIdTable` = 888 bytes. Coupled with the earlier
https://github.com/llvm/llvm-project/pull/152219, this achieves a 46x
reduction in frame size for this function in Windows release builds.
HIP runtime support for compressed bundle format v3 is in place,
therefore switch the default compressed bundle format to v3 in compiler.
This allows both compressed and decompressed fat binary size to exceed
4GB by default.
Environment variable COMPRESSED_BUNDLE_FORMAT_VERSION=2 can be used for
backward compatibility for older HIP runtimes not supporting v3.
Fixes: SWDEV-548879
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
The `RPTarget`'s way of determining whether VGPRs are beneficial to save
and whether the target has been reached w.r.t. VGPR usage currently
assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR
RC can always be used for the other. Implicitly, this makes the
rematerialization stage (only current user of `RPTarget`) follow a
different occupancy calculation than the "regular one" that the
scheduler uses, one that assumes that ArchVGPR/AGPR usage can be
balanced perfectly and at no cost, which is untrue in general. This
ultimately yields suboptimal rematerialization decisions that require
cross-VGPR-RC copies unnecessarily.
This fixes that, making the `RPTarget`'s internal model of occupancy
consistent with the regular one. The `CombinedVGPRSavings` flag is
removed, and a form of cross-VGPR-RC saving implemented only for unified
RFs, which is where it makes the most sense. Only when the amount of
free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the
excess VGPR usage in the other VGPR RC does the `RPTarget` consider that
a pressure reduction in the former will be beneficial to the latter.
This fixes the edge case we had with variables pointing to dynamic
blocks, which forced us to convert basically *all* dynamic blocks to
DeadBlock when deallocating them.
We now don't run dynamic blocks through InterpState::deallocate() but
instead add them to a DeadAllocations list when they are deallocated but
still have pointers.
As a consequence, not all blocks with Block::IsDead = true are
DeadBlocks.
Materialize the vector trip count computation using VPInstruction
instead of directly creating IR. This is one of the last few steps
needed to model the full vector skeleton in VPlan. It also simplifies
vector-trip count computations for scalable vectors, as we can re-use
the UF x VF computation.
PR: https://github.com/llvm/llvm-project/pull/151925
## Problem
Given 3 pipelines, A, B, and a superset pipeline AB that runs both the A
& B pipelines, it is not easy to manage their options - one needs to
manually recreate all options from A and B into AB, and maintain them.
This is tedious.
## Proposed solution
Ideally, AB options class inherits from both A and B options, making the
maintenance effortless. Today though, this causes problems as their base
classes `PassPipelineOptions<A>` and `PassPipelineOptions<B>` both
inherit from `mlir::detail::PassOptions`, leading to the so called
"diamond inheritance problem", i.e. multiple definitions of the same
symbol, in this case parseFromString that is defined in
mlir::detail::PassOptions.
Visually, the inheritance looks like this:
```
mlir::detail::PassOptions
↑ ↑
| |
PassPipelineOptions<A> PassPipelineOptions<B>
↑ ↑
| |
AOptions BOptions
↑ ↑
+---------+--------+
|
ABOptions
```
A proposed fix is to use the common solution to the diamond inheritance
problem - virtual inheritance.
The existing functions `getIndexExpressionsFromGEP` and
`tryDelinearizeFixedSizeImpl` provide functionality to delinearize
memory accesses for fixed size array. They use the GEP source element
type in their optimization heuristics. However, driving optimization
heuristics based on GEP type information is not allowed.
This patch introduces new functions `findFixedSizeArrayDimensions` and
`delinearizeFixedSizeArray` to delinearize a fixed size array without
using the type information in GEP. The new function
`findFixedSizeArrayDimensions` infers the size of each dimension of the
array based on the value to be added to the address as induction
variables are incremented. `delinearizeFixedSizeArray` attempts to
restore the subscripts of each dimension based on the estimated array
size.
This is an initial implementation that may not cover all cases, but is
intended to replace the existing function in the future.
Related:
- https://discourse.llvm.org/t/enabling-loop-interchange/82589/4
-
https://github.com/llvm/llvm-project/pull/124911#issuecomment-2962499501