As like 48d868493fa74025e7768afacdbbbd3ea9c82468, `WORKSPACE` is another
bazel-specific file that is unused. LLVM's bazel configuration is
entirely in `utils/bazel`.
The code that verifies that the type in a TYPE IS or CLASS IS clause is
a match or an extension of the type of the SELECT TYPE selector needs
rework to avoid emitting a bogus error for a test.
Fixes https://github.com/llvm/llvm-project/issues/83612.
When the actual MOLD= argument of a reference to the intrinsic function
NULL is itself just NULL() (possibly nested), treat the MOLD= as if it
had not been present.
Fixes https://github.com/llvm/llvm-project/issues/83572.
The check for declarations of polymorphic entities was emitting a bogus
error for one (or more) layers of pointers to procedures returning
pointers to polymorphic types.
Fixes https://github.com/llvm/llvm-project/issues/83292.
Currently, calls to Host::SystemLog print to stderr on all host
platforms except Darwin. This severely limits its value on the command
line, where we don't want to overload the user with log messages. Switch
to using the syslog function on POSIX systems to send messages to the
system logger instead of stdout.
On Darwin systems this sends the log message to os_log, which matches
what we do today. Nevertheless I kept the current implementation that
uses os_log directly as it gives us more freedom.
I'm not sure if there's an equivalent on Windows, so I kept the existing
behavior of logging to stderr.
Before this patch, if a module fails to build because of a missing
config_macro, the user will never see the config macro warning. This
patch diagnoses this before building, and each subsequent time a module
is imported.
rdar://123921931
It's the only thing depending on third-party/benchmark.
The recent third-party/benchmark uprev made it not build in the
GN build, so remove ScudoBenchmarks until someone feels motivated
to update the third-party/benchmark BUILD.gn file.
This PR implements the frontend for #70076
This PR is part 1 of 2.
Part 2 requires an intrinsic to instructions lowering.
- `Builtins.td` - add an `any` builtin
- `CGBuiltin.cpp` add the builtin to intrinsic lowering
- `hlsl_basic_types.h` -add the `bool` vectors since that is an input
for any
- `hlsl_intrinsics.h` - add the `any` api
- `SemaChecking.cpp` - addy `any` builtin checking
- `IntrinsicsDirectX.td` - add the llvm intrinsic
This change implements: #70072
- `hlsl_intrinsics.h` - add the `exp` api
- `DXIL.td` - add the llvm intrinsic to DXIL opcode lowering mapping.
- This change reuses llvm's existing intrinsic
`__builtin_elementwise_exp` \ `int_exp` & `__builtin_elementwise_exp2` \
`int_exp2`
- This PR is part 1 of 2.
- Part 2 requires an intrinsic to instructions lowering.
Part2 will expand `int_exp` to
```
A = Builder.CreateFMul(log2eConst, val);
int_exp2(A)
```
just like we do in
[TranslateExp](https://github.com/microsoft/DirectXShaderCompiler/blob/main/lib/HLSL/HLOperationLower.cpp#L2220C1-L2236C2)
This change implements #83736
The dot product lowering needs a tertiary multipy add operation. DXIL
has three mad opcodes for `fmad`(46), `imad`(48), and `umad`(49). Dot
product in DXIL only uses `imad`\ `umad`, but for completeness and
because the hlsl `mad` intrinsic requires it `fmad` was also included.
Two new intrinsics were needed to be created to complete this change.
the `fmad` case already supported by llvm via `fmuladd` intrinsic.
- `hlsl_intrinsics.h` - exposed mad api call.
- `Builtins.td` - exposed a `mad` builtin.
- `Sema.h` - make `tertiary` calls check for float types optional.
- `CGBuiltin.cpp` - pick the intrinsic for singed\unsigned & float also
reuse `int_fmuladd`.
- `SemaChecking.cpp` - type checks for `__builtin_hlsl_mad`.
- `IntrinsicsDirectX.td` create the two new intrinsics for
`imad`\`umad`/
- `DXIL.td` - create the llvm intrinsic to `DXIL` opcode mapping.
---------
Co-authored-by: Farzon Lotfi <farzon@farzon.com>
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.
Metric: size..text
Program size..text
results results0 diff
test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test 92549.00 92609.00 0.1%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 663381.00 663493.00 0.0%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 663381.00 663493.00 0.0%
test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 307182.00 307214.00 0.0%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1394420.00 1394484.00 0.0%
test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2040257.00 2040273.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test 909944.00 909768.00 -0.0%
SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/78976
It's becoming potentially unsafe to insert a PHI instruction using a plain
Instruction pointer. Switch all the remaining sites that create and insert
PHIs to use iterators instead. For example, the code in
ComplexDeinterleavingPass.cpp is definitely at-risk of mixing PHIs and
debug-info.
Debug info refering to a TLS variable via DW_OP_GNU_push_tls_address
needs to use a R_390_TLS_LDO64 relocation instead of R_390_64.
Fixed by adding a SystemZELFTargetObjectFile override class and proving
a getDebugThreadLocalSymbol implementation.
When ldp-aligned-only/stp-aligned-only is specified, modified to cancel
ldp/stp transformation if MachineMemOperand is not present or the access
size is unknown.
In the previous implementation, the test passed when there was no
MachineMemOperand. Also, if the size was unknown, an incorrect value was
used or an assertion failed. (But actually, if there is no
MachineMemOperand, it will be excluded from the target by
isCandidateToMergeOrPair() before reaching the part.)
A statistic NumFailedAlignmentCheck is added. NumPairCreated is modified
so that it only counts if it is not cancelled.
#56628 changed the behavior of `-Wmissing-field-initializers`, which
introduces many new warnings in C++ code that uses partial designated
initializers. If such code is being built with `-Wextra -Werror`, this
change will break the build.
This PR adds a new flag that allows to disable these new warnings and
keep the old ones, as was suggested by @AaronBallman in the original
issue:
https://github.com/llvm/llvm-project/issues/56628#issuecomment-1761510850
Fixes #68933
This patch seeks to create a process that happens on module finalization
for OpenMP, in which a list of operations that had declare target
directives applied to them and were not generated at the time of
processing the original declare target directive are re-checked to apply
the appropriate declare target semantics.
This works by maintaining a vector of declare target related data inside
of the FIR converter, in this case the symbol and the two relevant
unsigned integers representing the enumerators. This vector is added to
via a new function called from Bridge.cpp, insertDeferredDeclareTargets,
which happens prior to the processing of the directive (similarly to
getDeclareTargetFunctionDevice currently for requires), it effectively
checks if the Operation the declare target directive is applied to
currently exists, if it doesn't it appends to the vector. This is a
seperate function to the processing of the declare target via the
overloaded genOMP as we unfortunately do not have access to the list
without passing it through every call, as the AbstractConverter we pass
will not allow access to it (I've seen no other cases of casting it to a
FirConverter, so I opted to not do that).
The list is then processed at the end of the module in the
finalizeOpenMPLowering function in Bridge by calling a new function
markDelayedDeclareTargetFunctions which marks the latently generated
operations. In certain cases, some still will not be generated, e.g. if
an interface is defined, marked as declare target, but has no definition
or usage in the module then it will not be emitted to the module, so due
to these cases we must silently ignore when an operation has not been
found via it's symbol.
The main use-case for this (although, I imagine there is others) is for
processing interfaces that have been declared in a module with a declare
target directive but do not have their implementation defined in the
same module. For example, inside of a seperate C++ module that will be
linked in. In cases where the interface is called inside of a target
region it'll be marked as used on device appropriately (although,
realistically a user should explicitly mark it to match the
corresponding definition), however, in cases where it's used in a
non-clear manner through something like a function pointer passed to an
external call we require this explicit marking, which this patch adds
support for (currently will cause the compiler to crash).
This patch also adds documentation on the declare target process and
mechanisms within the compiler currently.
Fixes:
libc/src/string/memory_utils/utils.h:345:13: warning: invalid case style
for member 'offset_' [readability-identifier-naming]
Having a trailing underscore for members is a google3 style, not LLVM style.
Removing the underscore is insufficient, as we would then have 2 members with
the same identifier which is not allowed (it is a compile time error). Remove
the getter, and just access the renamed member that's now made public.
Found via:
$ ninja -k2000 libc-lint 2>&1 | grep readability-identifier-naming
Auto fixed via:
$ clang-tidy -p build/compile_commands.json \
-checks="-*,readability-identifier-naming" \
<filename> --fix
This doesn't fix all instances, just the obvious simple cases where it makes
sense to change the identifier names. Subsequent PRs will fix up the
stragglers.
It's common to delete some instructions after using SCEVExpander,
while it is still live (but will not be used afterwards). In that
case, the AssertingVH may trigger. Replace it with a PoisoningVH
so that we only detect the case where the SCEVExpander actually is
used in a problematic fashion after the instruction removal.
The alternative would be to add clear() calls to more code paths.
Fixes https://github.com/llvm/llvm-project/issues/83404.
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
Arithmetic constants for vector types can be constructed from objects
implementing Python buffer protocol such as `array.array`. Note that
until Python 3.12, there is no typing support for buffer protocol
implementers, so the annotations use array explicitly.
This allows users who include `llvm-project` as a subrepository to
access the `LLVM_VERSION_MAJOR` variable on par with if they were
relying on an installed LLVM and `FindLLVM.cmake`. They just need to do
something like:
```
get_directory_property(LLVM_VERSION_MAJOR DIRECTORY "third_party/llvm-project/llvm" LLVM_VERSION_MAJOR)
```
Context: https://github.com/openxla/iree/pull/16606 -- like other
projects with similar needs that I found by some googling, our
work-around had been to rely on the CMake cached variable
`CLANG_EXECUTABLE_VERSION`. Being cached, it over time inevitably ended
up having a wrong value.
This is a follow up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously
defined manually.
… AAPCS frame chain fix (#82801)"
This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch
was found to cause miscompilations and compilation failures.
762f762504/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (L394-L407)
Comment from @topperc:
> This transforms assumes the mask is a non-zero splat. We only know its
a splat and not provably all 0s. The mask is a constexpr that includes
the address of the global variable. We can't resolve the constant
expression to an exact value.
Fixes#83947.