ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from TypeRagne(std::nullopt) and
ValueRange(std::nullopt).
The C API does not provide a way to replace the subroutine type after
creating a subprogram. This functionality is useful for creating a
subroutine type composed of types which have the subprogram as scope
Motivation: When using libc++, `std::bitset<64>::count()` doesn't
optimize to a single popcount instruction on AArch64, because we fail to
inline the library code completely. Inlining fails, because the internal
bit_iterator struct is passed as a [2 x i64] %arg value on AArch64. The
value is built using insertvalue instructions and only one of the array
entries is constant. If we know that this entry is constant, we can
prove that half the function becomes dead. However, InlineCost only
considers operands for simplification if they are Constants, which %arg
is not. Without this simplification the function is too expensive to
inline.
Therefore, we had to teach InlineCost to support non-Constant simplified values
(PR #145083). Now, we enable this for extractvalue, because we want to simplify
the extractvalue with the insertvalues from the caller function. This is enough to
get bitset::count fully optimized.
There are similar opportunities we can explore for BinOps in the future
(e.g. cmp eq %arg1, %arg2 when the caller passes the same value into
both arguments), but we need to be careful here, because InstSimplify
isn't completely safe to use with operands owned by different functions.
flang incorrectly issues a semantic erorr when a `cycle` statement is
used inside a `target teams distribute [simd]` associated loop. This is
not prevented by the spec, therefore this PR allows such construct.
These FIXMEs were added to keep the dbg_record implementation identical to the
dbg intrinsic versions, which have since been removed. I don't think there's any
reason for the old behaviour; my understanding is it was a minor bug no one got
round to fixing.
I've upgraded the test to be written with dbg_records while I'm here.
This patch reassociates `add(add(vecreduce(a), b), add(vecreduce(c),
d))` into `add(vecreduce(add(a, c)), add(b, d))`, to combine the
reductions into a single node. This comes up after unrolling vectorized
loops.
There is another small change to move reassociateReduction inside fadd
outside of a AllowNewConst block, as new constants will not be created
and it should be OK to perform the combine later after legalization.
This patch fixes a bug discovered in the
`affine::makeComposedFoldedAffineApply` function when `composeAffineMin
== true`. The bug happened because the simplification assumed the
symbols appearing in the `affine.apply` op corresponded to symbols in
the `affine.min` op, and that's not always the case. For example:
```mlir
#map = affine_map<()[s0, s1] -> (s1)>
#map1 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>
module {
func.func @min_max_full_simplify() -> index {
%0 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%1 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%2 = affine.min #map()[%0, %1]
%3 = affine.apply #map1()[%2, %0]
return %3 : index
}
}
```
This patch also introduces the test `make_composed_folded_affine_apply`
transform operation to test this simplification. It also adds tests
ensuring we get correct behavior.
---------
Co-authored-by: Nicolas Vasilache <nico.vasilache@amd.com>
This PR introduce a new function `peekNextPPToken`. It's an extension of
`isNextPPTokenLParen` and can makes look ahead one token in preprocessor
without side-effects.
It's also the 1st part of
https://github.com/llvm/llvm-project/pull/107168 and it was used to look
ahead next token then determine whether current lexing pp directive is
one of pp-import or pp-module directive.
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
This lets get rid of platform-specific code in ProcessGDBRemote and use
the
same code path (module differences in socket types) everywhere. It also
unlocks
further cleanups in the debugserver launching code.
The main effect of this change is that lldb on windows will now use the
`--fd` lldb-server argument for "local remote" debug sessions instead of
having lldb-server connect back to lldb. This is the same method used by
lldb on non-windows platforms (for many years) and "lldb-server
platform" on windows for truly remote debug sessions (for ~one year).
Depends on #145015.
If StreamingSVE is available, we may be able to lower the intrinsic
to the GET_ACTIVE_LANE_MASK node instead of expanding it.
Also adds the node to addTypeForFixedLengthSVE to ensure we lower
to the SVE instruction when useSVEForFixedLengthVectors is true.
This patch fixes an issue where the __trampoline_setup symbol is missing
with some programs compiled with flang. This symbol is present only in
compiler-rt and not in libgcc. This patch adds compiler-rt to the link
line after libgcc if libgcc is being used, so that only this symbol will
be picked from compiler-rt.
Fixes#141147
Move the OR(PSHUFB(),PSHUFB()) fold to reuse an existing
createShuffleMaskFromVSELECT result and ensure it is performed before
the combineX86ShufflesRecursively combine to prevent some hasOneUse
failures noticed in #133947 (combineX86ShufflesRecursively still
unnecessarily widens vectors in several locations).
Originally added for reproducers, it is now only used for test code.
While we could make it a test helper, I think that after #145015 it is
simple enough to not be needed.
Also squeeze in a change to make ConnectionFileDescriptor accept a
unique_ptr<Socket>.
For readonly/readnone calls returning void we can't CSE the return
value. However, making these participate in CSE is still useful,
because it allows DCE of calls that are not willreturn/nounwind
(something no other part of LLVM is capable of removing).
The more interesting use-case is CSE for writeonly calls (not
yet supported), but I figured this change makes sense independently.
There is no impact on compile-time.
We should be able to allow `simplifySwitchOfPowersOfTwo` transform
to take place, as, on recent X86 targets, the weighted latency-size
appears to be 2. This favours computing trailing zeroes and indexing
into a smaller value table, over generating a jump table with an
indirect branch, which overall should be more efficient.
When a block is getting inlined, the destination block does not have to
be legalized. That's because the signature of the destination block does
not change by inlining.
This commit makes the implementation consistent with this comment:
```
// If the pattern moved or created any blocks, make sure the types of block
// arguments get legalized.
```
When -stack-clash-protection option is specified and APX push2pop2 is
enabled, there will be two calls to emitSPUpdate function which emits
two STACKALLOC_W_PROBING pseudo instructions. The pseudo instruction for
push2 padding is silently ignored which leads to the stack misaligned to
16 bytes and GP exception in runtime. Fixed by directly emitting "push
%rax" instruction for push2 padding, instead of calling emitSPUpdate.
There was a similar issue on https://reviews.llvm.org/D150033.
This patch moves the TMA S2G intrinsics into their own set of loops.
This is in preparation for adding im2colw/w128 modes support to
the G2S intrinsics (but the S2G ones do not support those modes).
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>