From the discussion at the round-table at the RISC-V Summit it was clear
people see cases where global merging would help. So the direction of
enabling it by default and iteratively working to enable it in more
cases or to improve the heuristics seems sensible. This patch tries to
make a minimal step in that direction.
This is a follow-up for the conversation here
https://github.com/llvm/llvm-project/pull/115722/.
This test is designed for Windows target/PDB format, so it shouldn't be
built and run for DWARF/etc.
Summary:
Address spaces are used in several embedded and GPU targets to describe
accesses to different types of memory. Currently we use the address
space enumerations to control which address spaces are considered
supersets of eachother, however this is also a target level property as
described by the C standard's passing mentions. This patch allows the
address space checks to use the target information to decide if a
pointer conversion is legal. For AMDGPU and NVPTX, all supported address
spaces can be converted to the default address space.
More semantic checks can be added on top of this, for now I'm mainly
looking to get more standard semantics working for C/C++. Right now the
address space conversions must all be done explicitly in C/C++ unlike
the offloading languages which define their own custom address spaces
that just map to the same target specific ones anyway. The main question
is if this behavior is a function of the target or the language.
The implementation is mostly based on the one existing for the exact
flag.
disjoint means that for each bit, that bit is zero in at least one of
the inputs. This allows the Or to be treated as an Add since no carry
can occur from any bit. If the disjoint keyword is present, the result
value of the or is a [poison
value](https://llvm.org/docs/LangRef.html#poisonvalues) if both inputs
have a one in the same bit position. For vectors, only the element
containing the bit is poison.
Emit .ptr, .address-space, and .align attributes for kernel
args in CUDA (previously handled only for OpenCL).
This allows for more vectorization opportunities if the PTX consumer
is able to know about the pointer alignments.
If no alignment is explicitly specified, .align 1 will be emitted
to match the LLVM IR semantics in this case.
PTX ISA doc -
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-parameter-attribute-ptr
This is a rework of the original patch proposed in #79646
---------
Co-authored-by: Vandana <vandanak@nvidia.com>
Introduces the SYCL based toolchain and initial toolchain construction
when using the '-fsycl' option. This option will enable SYCL based
offloading, creating a SPIR-V based IR file packaged into the compiled
host object.
This includes early support for creating the host/device object using
the new offloading model. The device object is created using the
spir64-unknown-unknown target triple.
New/Updated Options:
-fsycl Enables SYCL offloading for host and device
-fsycl-device-only
Enables device only compilation for SYCL
-fsycl-host-only
Enables host only compilation for SYCL
RFC Reference:
https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092
I ran into this while look at a different bug (patch coming soon). This
function has only two callers. The first is SBTypeStaticField::GetName
(which doesn't care about templates), and the other is
CompilerDecl::GetCompilerContext (in the TypeQuery constructor), which
does want template arguments.
This function was (normally) returning the name without template args.
Since this code is only used when looking up a type in another shared
library, the odds of running into this bug are relatively low, but I add
a test to demonstrate the scenario and the fix for it nonetheless.
Amazingly (and scarily), this test actually passes without this change
in the default configuration -- and only fails with
-gsimple-template-names. The reason for that is that in the
non-simplified case we create a regular CXXRecordDecl whose name is
"bar<int>" (instead of a template record "foo" with an argument of
"int"). When evaluating the expression, we are somehow able to replace
this with a proper template specialization decl.
/llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp:48:24:
error: variable 'ThisRD' set but not used [-Werror,-Wunused-but-set-variable]
const CXXRecordDecl *ThisRD = nullptr;
^
1 error generated.
When lowering from a partial reduction to a pair of wide adds,
previously the corresponding intrinsics were returned as nodes. Now
there are AArch64ISD nodes that are returned.
When hoisting the allocas with a constant integer size, the constant
integer was moved to where the alloca is hoisted to unconditionally.
By CodeGen there have been various iterations of mlir canonicalization
and dead code elimination. This can cause lots of unrelated bits of code
to share the same constant values. If for some reason the alloca
couldn't be hoisted all of the way to the entry block of the function,
moving the constant might result in it no-longer dominating some of the
remaining uses.
In theory, there should be dominance analysis to ensure the location of
the constant does dominate all uses of it. But those constants are
effectively free anyway (they aren't even separate instructions in LLVM
IR), so it is less expensive just to leave the old one where it was and
insert a new one we know for sure is immediately before the alloca.
returned by-value from opaque function calls.
If a struct is returned by-value from an opaque call, the "value" of the
whole struct is represented by a Conjured symbol.
Later fields may slice off smaller subregions by creating Derived
symbols of that Conjured symbol, but those are handled well, and
"isTainted" returns true as expected.
However, passing the whole struct to "isTainted" would be false, because
LazyCompoundVals and CompoundVals are not handled.
This patch addresses this.
Fixes#114270
Split from #114835
This is based on other targets like PPC/AArch64 and some experiments.
This PR will only enable bidirectional scheduling and tracking register
pressure.
Disclaimer: I haven't tested it on many cores, maybe we should make
some options being features. I believe downstreams must have tried
this before, so feedbacks are welcome.
## Problem Statement
Previously, the examples in the AST matcher reference, which gets
generated by the Doxygen comments in `ASTMatchers.h`, were untested and
best effort.
Some of the matchers had no or wrong examples of how to use the matcher.
## Solution
This patch introduces a simple DSL around Doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy to use.
In `ASTMatchers.h`, most matchers are documented with a Doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that the documentation is tested, is by using Doxygen's alias
feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple
words). Using the Doxygen aliases is the obvious choice, because there
are (now) four consumers:
- people reading the header/using signature help
- the Doxygen generated documentation
- the generated HTML AST matcher reference
- (new) the generated matcher tests
This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a Doxygen comment)
and provides diagnostics and statistics about the matchers.
The current statistics emitted by the parser are:
```text
Statistics:
doxygen_blocks : 519
missing_tests : 10
skipped_objc : 42
code_snippets : 503
matches : 820
matchers : 580
tested_matchers : 574
none_type_matchers : 6
```
The tests are generated during building, and the script will only print
something if it found an issue with the specified tests (e.g., missing
tests).
## Description
DSL for generating the tests from documentation.
TLDR:
```
\header{a.h}
\endheader <- zero or more header
\code
int a = 42;
\endcode
\compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
whole languages
\matcher{expr()} <- one or more matchers in succession
\match{42} <- one or more matches in succession
\matcher{varDecl()} <- new matcher resets the context, the above
\match will not count for this new
matcher(-group)
\match{int a = 42} <- only applies to the previous matcher (not to the
previous case)
```
The above block can be repeated inside a Doxygen command for multiple
code examples for a single matcher.
The test generation script will only look for these annotations and
ignore anything else like `\c` or the sentences where these annotations
are embedded into: `The matcher \matcher{expr()} matches the number
\match{42}.`.
### Language Grammar
[] denotes an optional, and <> denotes user-input
```
compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
matcher_tag_key ::= type
match_tag_key ::= type || std || count || sub
matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
matcher ::= \matcher{[matcher_tags$]<matcher>}
matchers ::= [matcher] matcher
match ::= \match{[match_tags$]<match>}
matches ::= [match] match
case ::= matchers matches
cases ::= [case] case
header-block ::= \header{<name>} <code> \endheader
code-block ::= \code <code> \endcode
testcase ::= code-block [compile_args] cases
```
### Language Standard Versions
The 'std' tag and '\compile_args' support specifying a specific language
version, a whole language and all of its versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' separator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the
matcher. Predicates for the 'std' compiler flag are used with
disjunction between languages (e.g. 'c || c++') and conjunction for all
predicates specific to each language (e.g. 'c++11-or-later &&
c++23-or-earlier').
Examples:
- `c` all available versions of C
- `c++11` only C++11
- `c++11-or-later` C++11 or later
- `c++11-or-earlier` C++11 or earlier
- `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and
23 (inclusive)
- `c++11-23,c` same as above
### Tags
#### `type`:
**Match types** are used to select where the string that is used to
check if a node matches comes from.
Available: `code`, `name`, `typestr`, `typeofstr`. The default is
`code`.
- `code`: Forwards to `tooling::fixit::getText(...)` and should be the
preferred way to show what matches.
- `name`: Casts the match to a `NamedDecl` and returns the result of
`getNameAsString`. Useful when the matched AST node is not easy to spell
out (`code` type), e.g., namespaces or classes with many members.
- `typestr`: Returns the result of `QualType::getAsString` for the type
derived from `Type` (otherwise, if it is derived from `Decl`, recurses
with `Node->getTypeForDecl()`)
**Matcher types** are used to mark matchers as sub-matcher with 'sub' or
as deactivated using 'none'. Testing sub-matcher is not implemented.
#### `count`:
Specifying a 'count=n' on a match will result in a test that requires
that the specified match will be matched n times. Default is 1.
#### `std`:
A match allows specifying if it matches only in specific language
versions. This may be needed when the AST differs between language
versions.
#### `sub`:
The `sub` tag on a `\match` will indicate that the match is for a node
of a bound sub-matcher.
E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that
binds to `inner`, which is the value for the `sub` tag of the expected
match for the sub-matcher `\match{sub=inner$...}`. Currently,
sub-matchers are not tested in any way.
### What if ...?
#### ... I want to add a matcher?
Add a Doxygen comment to the matcher with a code example, corresponding
matchers and matches, that shows what the matcher is supposed to do.
Specify the compile arguments/supported languages if required, and run
`ninja check-clang-unit` to test the documentation.
#### ... the example I wrote is wrong?
The test-failure output of the generated test file will provide
information about
- where the generated test file is located
- which line in `ASTMatcher.h` the example is from
- which matches were: found, not-(yet)-found, expected
- in case of an unexpected match: what the node looks like using the
different `type`s
- the language version and if the test ran with a windows `-target` flag
(also in failure summary)
#### ... I don't adhere to the required order of the syntax?
The script will diagnose any found issues, such as `matcher is missing
an example` with a `file:line:` prefix,
which should provide enough information about the issue.
#### ... the script diagnoses a false-positive issue with a Doxygen
comment?
It hopefully shouldn't, but if you, e.g., added some non-matcher code
and documented it with Doxygen, then the script will consider that as a
matcher documentation. As a result, the script will print that it
detected a mismatch between the actual and the expected number of
failures. If the diagnostic truly is a false-positive, change the
`expected_failure_statistics` at the top of the
`generate_ast_matcher_doc_tests.py` file.
Fixes#57607Fixes#63748
The hasNonAffineUsersOnPath utility used during fusion was terribly
inefficient in its approach. Rewrite it efficiently to simply work based
on use lists (sparse) instead of having to traverse all nodes of an MDG
repeatedly and all operands of all ops of each node in the relevant
range.
On large models (with 10s of thousands of loop nests), this reduces
fusion pass time by nearly 2x (cutting down several tens of seconds).
The scalar __mfp8 type has the wrong name and mangle name in
AArch64SVEACLETypes.def
According to the ACLE[1] the name should be __mfp8
This patch fixes this problem by replacing
the Name __MFloat8_t by __mfp8
and
the Mangle Name __MFloat8_t by u6__mfp8
And we revert the incorrect typedef in NeonEmitter.
[1]https://github.com/ARM-software/acle
The newly added
test/API/functionalities/scripted_process_empty_memory_region/dummy_scripted_process.py
imports
examples/python/templates/scripted_process.py
which only has register definitions for x86_64 and arm64.
Only run this test on those two architectures for now.
Add patterns to fold MOV (scalar, predicated) to MOV (imm, pred,
merging) or MOV (imm, pred, zeroing) as appropriate.
This affects the `@llvm.aarch64.sve.dup` intrinsics, which currently
generate MOV (scalar, predicated) instructions even when the
immediate forms are possible. For example:
```
svuint8_t mov_z_b(svbool_t p) {
return svdup_u8_z(p, 1);
}
```
Currently generates:
```
mov_z_b(__SVBool_t):
mov z0.b, #0
mov w8, #1
mov z0.b, p0/m, w8
ret
```
Instead of:
```
mov_z_b(__SVBool_t):
mov z0.b, p0/z, #1
ret
```
This could deduce some complex syms derived from simple ones whose
values could be constrainted to be concrete during execution, thus
reducing some overconstrainted states.
This commit also fix `unix.StdCLibraryFunctions` crash due to these
overconstrainted states being added to the graph, which is marked as
sink node (PosteriorlyOverconstrained). The 'assume' API is used in
non-dual style so the checker should protectively test whether these
newly added nodes are actually impossible.
1. The crash: https://godbolt.org/z/8KKWeKb86
2. The solver needs to solve equivalent: https://godbolt.org/z/ed8WqsbTh
A scripted process implementation might return an SBMemoryRegionInfo
object in its implementation of `get_memory_region_containing_address`
which will have an address 0 and size 0, without realizing the problems
this can cause. Several algorithms in lldb will try to iterate over the
MemoryRegions of the process, starting at address 0 and expecting to
iterate up to the highest vm address, stepping by the size of each
region, so a 0-length region will result in an infinite loop. Add a
check to Process::GetMemoryRegionInfo that rejects a MemoryRegion which
does not contain the requested address; a 0-length memory region will
therefor always be rejected.
rdar://139678032
Consider the case when the compiler generates a static member. Any
consumer of the debug info generated for that case, would benefit if
that member has the DW_AT_artificial flag.
Fix buildbot failure on: llvm-clang-aarch64-darwin
- Add specific configuration: x86_64-linux
Prior to this, the "old != new" check would always evaluate to true because it was comparing a pre-mangling new command to a post-mangling old command.
This patch adds a Github action that runs whenever changes to the libc++
Docker images are pushed to `main`. The action will rebuild the Docker
images and push them to LLVM's container registry so that we can then
point to those images from our CI nodes.
Building `MLIRCIR` will report an error `CIROpsDialect.h.inc` not found.
This is because `MLIRCIR` hasn't declared its dependence on the tablegen
target `MLIRCIROpsIncGen`. This patch fixes the issue.