The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.
Since d56e0d07cc5ee8e334fd1ad403eef0b1a771384f, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
This was caught by coverity, reported as: `dead_error_condition`.
Since the conditional revolves around `CF`, it is guaranteed to be null
in the else clause, hence making the second part of the statement
redundant.
This was requiring all fast math flags, which is practically
useless. This wouldn't fire using all the standard OpenCL fast math
flags. This only needs afn nnan and ninf.
https://reviews.llvm.org/D158904
Apparently the spec has overloads for fmin/fmax and ldexp with one of
the operands as scalar. We need to broadcast the scalars to the vector
type.
https://reviews.llvm.org/D158077
These just get replaced with an intrinsic now. This was also
introducing host dependence on the result since it relied on the
compiler choice to contract or not.
OpenCL loses fast math information by going through libcall wrappers
around intrinsics.
Do this to preserve call site flags which are lost when inlining. It's
not safe in general to propagate flags during inline, so avoid dealing
with this by just special casing some of the useful calls.
The comment was out of date, the device libs build does provide all
the pointer overloads. An extremely pedantic interpretation of the
spec would suggest only the flat version exists, but the overloads do
exist in the implementation.
https://reviews.llvm.org/D156720
Match how the generic implementation handles this. We now will leave
behind the dead other user for later passes to deal with.
https://reviews.llvm.org/D156707
After the library is linked and trivially inlined, the generic fma and
fmuladd intrinsics already handle these cases, and with precise flag
handling. This was requiring all fast math flags when we really just
need nsz for the fma(a, b, 0) case.
https://reviews.llvm.org/D156677
This was trying to constant fold these calls, and also turn some of
them into a regular fmul/fdiv. There's no point to doing that, the
underlying library implementation should be using those in the first
place. Even when the library does use the rcp intrinsics, the backend
handles constant folding of those. This was also only performing the
folds under overly strict fast-evertyhing-is-required conditions.
The one possible plus this gained over linking in the library is if
you were using all fast math flags, it would propagate them to the new
instructions. We could address this in the library by adding more fast
math flags to the native implementations.
The constant fold case also had no test coverage.
https://reviews.llvm.org/D156676
It's not really correct to implement this as exp(x) - 1, it was maybe
OK for the restricted float-as-double case handled here. There's not a
strong reason to special case it with the host function, as the
implementation naturally constant folds anyway. InstSimplify can fully
handle everything in it, so just running the inliner alone with a
constant argument produces the fully constant folded result. This also
had no tests.
https://reviews.llvm.org/D156892
The first trivial example I tried failed to merge due to the user scan
logic. Remove the complicated scan of users handling with distance
thresholds, with a same block restriction. The actual expansion of
sincos is basically the same size as sin or cos individually. Copy the
technique the generic optimization uses, which is to just use the
input instruction as the insert point or just insert at the start of
the entry block.
https://reviews.llvm.org/D156706
It's not a libcall so doesn't really belong here to begin
with. Relying on checking the target name and explicit features isn't
particularly sound either. The library doesn't use the intrinsic
anymore, so it doesn't matter anyway.
This was trying to find a loaded value for some reason when looking
for sincos arguments. This is untested and shouldn't be necessary.
https://reviews.llvm.org/D156746
Early exit on intrinsics and don't duplicate indirect call
checks. Also let the IRBuilder constructor figure out the insert point
rather than doing it manually. Also avoid debug print about trying to
simplify calls in more unhandled scenarios.
This is guaranteed to be evaluated so we can avoid repeated calls.
Helps the static analyzer as it couldn't recognise that each getVecSize() would return the same value.
Based on the output of include-what-you-use.
This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/
I've tried to summarize the biggest change below:
- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h
And the usual count of preprocessed lines:
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after: 6189948
200k lines less to process is no that bad ;-)
Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D118652