Seemingly I forgot to implement the appertainment checks when doing the
original device_type implementation, so we fell through to the 'not
implemented' section of the diagnostics.
This patch corrects the appertainment, so that we disallow it correctly.
This argument allows to set specific sysroot pass which will be used for
building LLDB API test programs.
It might come in handy for setting up cross-platform remote runs of API
tests on Windows host.
It can be useful for cross-compiling LLDB API tests. The argument can be
set using `LLDB_TEST_USER_ARGS` argument:
```
cmake ...
-DLLDB_TEST_USER_ARGS="...;--sysroot;C:\path\to\sysroot;..."
...
```
In some cases where we have an `hlfir.no_reassoc` operation, the
bufferization pass could not earse the hlfir.destroy op during the
`hlfir.associate` op conversion as show in the example below.
```
func.func @double_free(%arg0: !fir.boxchar<1>) {
%c5 = arith.constant 5 : index
%true = arith.constant true
%0 = hlfir.as_expr %arg0 move %true : (!fir.boxchar<1>, i1) -> !hlfir.expr<!fir.char<1,?>>
%1 = hlfir.no_reassoc %0 : !hlfir.expr<!fir.char<1,?>>
%2:3 = hlfir.associate %1 typeparams %c5 {adapt.valuebyref} : (!hlfir.expr<!fir.char<1,?>>, index) -> (!fir.boxchar<1>, !fir.ref<!fir.char<1,?>>, i1)
fir.call @noop(%2#0) : (!fir.boxchar<1>) -> ()
hlfir.end_associate %2#1, %2#2 : !fir.ref<!fir.char<1,?>>, i1
hlfir.destroy %0 : !hlfir.expr<!fir.char<1,?>>
return
}
func.func private @noop(!fir.boxchar<1>)
```
The bufferization pass is looking at uses of its source `%1` that is the
result of an `hlfir.no_reassoc` operation. In order to avoid double free
generation, also look at the indirection in presence of
`hlfir.no_reassoc`.
1. Use dashes (-) instead of colons (:) as time separator in a session log
file name since Windows doesn't support saving files with names containing
colons.
2. Temporary file creation code is changed in the test:
On Windows, the temporary file should be closed before 'session save'
writes session log to it. NamedTemporaryFile() can preserve the file
after closing it with delete_on_close=False option.
However, this option is only available since Python 3.12. Thus
mkstemp() is used for temporary file creation as the more compatible
option.
The exec.f90 test sets an environment variable for a specific command
directly
rather than using env, which doesn't work on shells that don't support
this
syntax, most notably the LIT integrated shell. This patch simply adds
env so
that this works on the integrated shell.
Often on AVX1 we're better off consistently using 128-bit instructions, so recognise when the operands are loads that can be freely/cheaply split - ideally this functionality needs to be moved to isFreeToSplitVector but we're using it in a few places where we don't want to split loads yet.
Based off a regression reported after #92794
Now that the VPlan for the main vector loop gets cloned in the epilogue
vectorization code path, there optimizeForVFAndUF can be applied
unconditionally.
Improve hasNonDefaultLowerBounds to follow box fir.convert. This helps
HLFIR helpers to generate less code when it can be easily deduced that
the fir.box lower bounds were set to ones.
It will help me for SELECT RANK lowering to avoid generating
hlfir.declare with lower bounds inside the RANK CASE (Current situation
would not be incorrect, the lower bounds would be SSA value ending-up
being one, I just want simpler IR).
Renamed to mayHaveNonDefaultLowerBounds since it may still answer yes when
the lower bounds are ones.
This patch adds processing of min/max intrinsics in LoopPeel in the
similar way as it was done for conditional statements: for
min/max(IterVal, BoundVal) we peel iterations where IterVal < BoundVal
for monotonically increasing IterVal; for monotonically decreasing
IterVal we peel iterations where IterVal > BoundVal (strict comparision
predicates are used to minimize number of peeled iterations).
Updated the documentation in `checkers.rst` to include an example of how
`trylock` function is handled.
Added a new test for a scenario where `pthread_mutex_trylock` is used,
demonstrating the current limitation.
Prefer using `llvm-spirv-<LLVM_VERSION_MAJOR>` tool (i.e.
`llvm-spirv-18`) over plain `llvm-spirv`. If the versioned tool is not
found in PATH, fall back to use the plain `llvm-spirv`.
An issue with the using `llvm-spirv` is that the one found in PATH might
be compiled against older LLVM version which could lead to crashes or
obscure bugs. For example, `llvm-spirv` distributed by Ubuntu links
against different LLVM version depending on the Ubuntu release (LLVM-10
in 20.04LTS, LLVM-13 in 22.04LTS).
The pass constructor can be generated automatically by tablegen.
This pass does not need adapting to work with non-function top level
operations because it operates specifically on call operations inside of
an OpenMP declare target function.
This reverts commit e1cc9e4eaddcc295b4e775512e33b947b1514c17.
This causes some non-trivial text size increases in unoptimized
builds for Bullet. Revert while I investigate.
Because symbols cannot refer to operations outside of their symbol
tables, it was impossible to refer to operations outside of the dialect
currently being defined. This PR modifies the lookup logic to happen
relative to the symbol table containing the dialect-defining operations.
This is a bit of hack but should unblock the situation here.
I'd like to nominate myself to join the LLVM Security group as a
representative of ST. I work in ST's compiler team contributing to
upstream (LLVM and GNU) and several downstream toolchains. We believe
that it is important for us to be part of this group to address or
report any potential security issues the LLVM project or our toolchains
may encounter.
This fold is subtly incorrect, because DL-unaware constant folding does
not know the correct index type to use, and just performs the addition
in the type that happens to already be there. This is incorrect, since
sext(X)+sext(Y) is generally not the same as sext(X+Y). See the
`@constexpr_gep_of_gep_with_narrow_type()` for a miscompile with the
current implementation.
One could try to restrict the fold to cases where no overflow occurs,
but I'm not bothering with that here, because the DL-aware constant
folding will take care of this anyway. I've only kept the
straightforward zero-index case, where we just concatenate two GEPs.
Currently, the tablegen files that generate the instruction definitions
in lib/Target/AMDGPU/AMDGPUGenInstrInfo.inc often only include implicit
operands for the architecture-independent pseudo instructions, but not
for the corresponding real instructions. The missing implicit operands
(most prominently: the EXEC mask) do not affect code generation, since
that operates on pseudo instructions, but they are problematic when
working with real instructions, e.g., as a decoding result from the MC
layer.
This patch copies the implicit Defs and Uses from pseudo instructions to
the corresponding real instructions, so that implicit operands are also
defined for real instructions.
Addresses issue #89830.
Assumed-rank fir.box/class may describe assumed-size array. This case
needs special handling in SELECT RANK. It is not possible to generate
FIR code to detect that a fir.box is an assumed-size (the way to detect
that is to check that upper dimension extent is -1 in the descriptor).
Instead of emitting a runtime call directly in lowering, add an
operation that can later be lowered to a runtime call or inline code
when the descriptor layout is known.
GEPNoWrapFlags.h calls `assert` creating a undeclared identifier error
when running an Apple-stage2 build with LLVM_ENABLE_MODULES enabled.
resolves: rdar://129031201
If a weak function is missing, still return it's address (zero) rather
than failing interpretation. Otherwise we have a mismatch between
Interpret() and CanInterpret() resulting in failures that would not
occur with JIT execution.
Alternatively, we could try to look for weak symbols in CanInterpret()
and generally reject them there.
This is the root cause for the issue exposed by
https://github.com/llvm/llvm-project/pull/92885. Previously, the case
affected by that always fell back to JIT because an icmp constant
expression was used, which is not supported by the interpreter. Now a
normal icmp instruction is used, which is supported. However, we fail to
interpret due to incorrect handling of weak function addresses.
`MachORebaseEntry::moveNext()` and `MachOBindEntry::moveNext()` assume
that the rebase/bind table ends with `{REBASE|BIND}_OPCODE_DONE` or an
actual rebase/bind. However a valid rebase/bind table might also end
with other effectively no-op opcodes, which caused the parser to move
past the end and go into the next table, resulting in corrupted entries
or infinite loops.
CDSplit splits functions up to three ways: main fragment with no suffix,
and fragments with .cold and .warm suffixes.
Add .warm suffix to the regex used to recognize split fragments.
Test Plan: updated register-fragments-bolt-symbols.s