This was introduced ~5yrs ago (by me), and has never really gotten
any adoption. By now, it's significantly out of sync with new/changed
poison propoagation rules. The idea is still reasonable, but the
imagined use case is largely covered by alive2 these days anyways.
https://github.com/llvm/llvm-project/pull/65972 introduced
-ubsan-unique-traps and -bounds-checking-unique-traps, which attach the
function size to the ubsantrap intrinsic.
https://github.com/llvm/llvm-project/pull/117651 changed
ubsan-unique-traps to use nomerge instead of the function size, but did
not update -bounds-checking-unique-traps. This patch adds nomerge to
bounds-checking-unique-traps.
This is a NFC.
Duplicate gfx12_asm_vop3_alias.s file to true16/fake16 version and
update `real-true16` flag on it.
This is preparing the upcoming changes for true16
Instead of requiring `uselocale()` as part of the base locale API,
define __locale_guard in the few places that need it directly, without
making __locale_guard part of the base API.
In practice, most mainstream platforms never used __locale_guard, so
they also didn't need to define uselocale(), and after this patch they
actually don't define it anymore.
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120511) duplicates the
functionality of -ubsan-unique-traps but also allows individual checks
to be specified e.g.,
* "-fno-sanitize-merge" without arguments is equivalent to
-ubsan-unique-traps
* "-fno-sanitize-merge=bool,enum" will apply it only to those two checks
Additionally, the naming is more consistent with the rest of the
-fsanitize- family.
This patch therefore removes -ubsan-unique-traps. This breaks backwards
compatibility; we hope that this is acceptable since '-mllvm
-ubsan-unique-traps' was an experimental flag.
This patch also adds negative test examples to bounds-checking.c, and
strengthens the NOOPTARRAY assertion to prevent spurious matches.
"-bounds-checking-unique-traps" is unaffected by this patch.
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.
The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.
Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
This is a very simple sema implementation, and just required AST node
plus the existing diagnostics. This patch adds tests and adds the AST
node required, plus enables it for 'init' and 'shutdown' (only!)
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
This patch introduces IndexedCallstackIdConveter as a convenience
wrapper around FrameIdConverter and CallStackIdConverter just for
tests.
With the new wrapper, we get to replace idioms like:
FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
MemProfData.Frames);
CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
MemProfData.CallStacks, FrameIdConv);
with:
IndexedCallstackIdConveter CSIdConv(MemProfData);
Unfortunately, this exact pattern occurs in tests only; the
combinations of the frame ID converter and call stack ID converter are
diverse in production code.
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves#117030
adoptRef in WebKit constructs Ref/RefPtr so treat it as such in
isCtorOfRefCounted. Also removed the support for makeRef and makeRefPtr
as they don't exist any more.
Prior to this patch, we required that all users had the same VL in order
to optimize. But as the FIXME said, we can use the largest VL to
optimize, as long as we can determine what the largest is. This patch
implements the FIXME.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
**Note:** The register reading and writing depends on new register
flavor support in thread_get_state/thread_set_state in the kernel, which
will be first available in macOS 15.4.
The Apple M4 line of cores includes the Scalable Matrix Extension (SME)
feature. The M4s do not implement Scalable Vector Extension (SVE),
although the processor is in Streaming SVE Mode when the SME is being
used. The most obvious side effects of being in SSVE Mode are that (on
the M4 cores) NEON instructions cannot be used, and watchpoints may get
false positives, the address comparisons are done at a lowered
granularity.
When SSVE mode is enabled, the kernel will provide the Streaming Vector
Length register, which is a maximum of 64 bytes with the M4. Also
provided are SVCR (with bits indicating if SSVE mode and SME mode are
enabled), TPIDR2, SVL. Then the SVE registers Z0..31 (SVL bytes long),
P0..15 (SVL/8 bytes), the ZA matrix register (SVL*SVL bytes), and the M4
supports SME2, so the ZT0 register (64 bytes).
When SSVE/SME are disabled, none of these registers are provided by the
kernel - reads and writes of them will fail.
Unlike Linux, lldb cannot modify the SVL through a thread_set_state
call, or change the processor state's SSVE/SME status. There is also no
way for a process to request a lowered SVL size today, so the work that
David did to handle VL/SVL changing while stepping through a process is
not an issue on Darwin today. But debugserver should be providing
everything necessary so we can reuse all of David's work on resizing the
register contexts in lldb if it happens in the future. debugbserver
sends svl, svcr, and tpidr2 in the expedited registers when a thread
stops, if SSVE|SME mode are enabled (if the kernel allows it to read the
ARM_SME_STATE register set).
While the maximum SVL is 64 bytes on M4, the AArch64 maximum possible
SVL is 256; this would give us a 64k ZA register. If debugserver sized
all of its register contexts assuming the largest possible SVL, we could
easily use 2MB more memory for the register contexts of all threads in a
process -- and on iOS et al, processes must run within a small memory
allotment and this would push us over that.
Much of the work in debugserver was changing the arm64 register context
from being a static compile-time array of register sets, to being
initialized at runtime if debugserver is running on a machine with SME.
The ZA is only created to the machine's actual maximum SVL. The size of
the 32 SVE Z registers is less significant so I am statically allocating
those to the architecturally largest possible SVL value today.
Also, debugserver includes information about registers that share the
same part of the register file. e.g. S0 and D0 are the lower parts of
the NEON 128-bit V0 register. And when running on an SME machine, v0 is
the lower 128 bits of the SVE Z0 register. So the register maps used
when defining the VFP registers must differ depending on the
capabilities of the cpu at runtime.
I also changed register reading in debugserver, where formerly when
debugserver was asked to read a register, and the thread_get_state read
of that register failed, it would return all zero's. This is necessary
when constructing a `g` packet that gets all registers - because there
is no separation between register bytes, the offsets are fixed. But when
we are asking for a single register (e.g. Z0) when not in SSVE/SME mode,
this should return an error.
This does mean that when you're running on an SME capabable machine, but
not in SME mode, and do `register read -a`, lldb will report that 48 SVE
registers were unavailable and 5 SME registers were unavailable. But
that's only when `-a` is used.
The register reading and writing depends on new register flavor support
in thread_get_state/thread_set_state in the kernel, which is not yet in
a release. The test case I wrote is skipped on current OSes. I pilfered
the SME register setup from some of David's existing SME test files;
there were a few Linux specific details in those tests that they weren't
easy to reuse on Darwin.
rdar://121608074
Add tests for horizontal add patterns with missing/undemanded elements - which typically prevents folding to the (add (shuffle a, b),(shuffle a, b)) optimal pattern
This can be used with /llvmlibthin to create thin archives without an
index, which is a prerequisite for porting
https://reviews.llvm.org/D117284 to lld-link.
Creating files like this is already possible with `llvm-ar rcS`, so this
doesn't add additional problems.
This reverts commit e0526b0780f56eede09b05a859a93626ecdc6e4d.
The `v_minmax/maxmin_f16`(GFX11) needs to be updated to t16 with
`v_minmax/maxmin_num_f16`(GFX12) together since they share the same
codegen pattern. Revert the old patch and resubmit
TestFirmwareCorefiles.py has a helper utility,
create-empty-corefile.cpp, which creates corefiles with different
metadata to specify the binary that should be loaded. It normally uses
an actual binary's UUID for the metadata, and it uses the binary's
cputype/cpusubtype for the corefile's mach header.
There is one test where it creates a corefile with metadata for a UUID
that cannot be found -- it is given no binary -- and in that case, the
cputype/cpusubtype it sets in the core file mach header was
uninitialized data. Through luck, on Darwin systems, the uninitialized
data typically matched a CPU_TYPE from machine.h and the test would
work. But when the value doens't match one of thoes defines, lldb would
reject the corefile entirely, and the test would fail. This has been an
infrequent failure on the CI bots for a while and I couldn't ever repo
it. There's a recent configuration where it was happening every time and
I was able to track it down.
rdar://141727563
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
That change forgot to set `lazy` to false before calling `addFile()` in
`forceLazy()` which caused `addFile()` to parse the file we want to
force a load for to be added as a lazy object again instead of adding
the file to `ctx.objFileInstances`.
This is caught by a pretty simple test (included).
If foo.obj is eagerly loaded (due to a prior undef referencing one if
its symbols) and has more than one symbol, we used to assert:
SymbolTable::addLazyObject() for the first symbol would set `lazy` to
false and load all symbols from the file, but the outer
ObjFile::parseLazy() loop would continue to run and call addLazyObject()
for the second symbol, which would assert.
Instead, just stop adding lazy symbols if the file got loaded for real
while adding a symbol.
(The ELF port has a similar early exit in `ObjFile<ELFT>::parseLazy()`.)
This PR enhances the test coverage for std::vector::assign by adding new
tests for several important test cases that were previously missing, as
shown in the following table:
| test cases | forward_iterator | input_iterator |
|-----------------------------------|------------------|----------------|
| new_size > capacity() | Yes | Yes |
| size() < new_size <= capacity() | No | No |
| new_size <= size() | No | No |
Similarly, no tests have previously covered `assign(InputIterator, InputIterator)`
and `assign(size_type, const value_type&)` for `vector<bool>`.
With this patch applied, all missing tests are covered.