There are two related issues here. On the declaration/definition side,
we need to make sure the markings are conservative. Then on the caller
side, we need to make sure we don't access parameters that don't exist.
Fixes#187535.
Template patchELFPHDRTable, rewriteNoteSections, markGnuRelroSections,
and discoverStorage to support both ELF32LE and ELF64LE binaries.
Previously these functions were hardcoded for ELF64LE, causing crashes
when processing 32-bit ELF binaries.
The RewriteInstance constructor now accepts ELF32LE objects in addition
to ELF64LE. The ELF_FUNCTION macro is reused (and moved earlier in the
header) to dispatch to the correct template instantiation.
These changes are preparation for adding support to hexagon architecture
in Bolt.
Summary:
Allocation kinds were added after these were introduced. We only needed
the TLI to identify these in the attributor so we can now just use
attributes. Update the usage in OpenMP and drop the TLI interface.
Fixes: https://github.com/llvm/llvm-project/issues/190072
Avoid expensive hash map of block to value by using a vector. To avoid
allocating and clearing the entire vector per query, cache the
allocation and use an epoch to identify stale values from previous
queries.
The brkpt instruction is intended for the in-silicon debugger (ISDB).
When ISDB is not enabled, brkpt is treated as a NOP, so
__builtin_debugtrap() would silently do nothing in user-mode Linux
processes.
Use trap0(#0xDB) instead.
Matrix loads and stores are accesses of their element types. Emit TBAA
nodes using their element type to allow more precise TBAA alias
analysis.
PR: https://github.com/llvm/llvm-project/pull/190029
Unify include and library paths by reusing common code to compute path
prefixes. First, determine the effective sysroot by choosing a
user-provided sysroot, "../target/<triple>", or "../target/hexagon",
in the order of precedence. Based on the sysroot, derive the standard
include path, C++ include path, and base library path.
Fix the default -L library paths so they are taken from the external
sysroot, when one specified. Previously, these paths were always
relative to the install directory and sysroot was ignored.
Remove certain locations from considerations, as there are never used
for the corresponding purpose in existing sysroots:
- fallback to install path, typically "../target/bin", as the base path
when other sysroot cannot be found;
- similarly, fallback to "../target/" for startup files;
- "../target/bin" for program paths as there are no program files in
current sysroots.
Other minor changes:
- use windows-correct path delimiting;
- enable hexagon-toolchain-linux.c test for windows hosts.
This adds a new method Verifier::visitDIType, and then changes method
for subclasses of DIType to call it. The new method just dispatches to
DIScope and adds a file/line check inspired by
Verifier::visitDISubprogram.
Fix a bug where `distributeIRToProfileLocationMap` fails to find
location mappings from IR to profile for renamed functions because
`FuncMappings` is indexed by the IR function name while
`distributeIRToProfileLocationMap` looks up by the profile function
name. Fixed by making `FuncMappings` to use profile function name as
key.
I had auto-merge enabled in #189444 and since the formatter is
non-blocking it got merged despite the issue. Given I'm already here, I
just formatted the whole file.
`BindingDecl` nodes, i.e. the individual names in a structured binding,
were not handled in `IdentifierNamingCheck::findStyleKind()`, causing
them to fall through to the Default style or be silently ignored.
This led to incorrect renames, e.g. applying member variable conventions
to local bindings.
---------
Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>
Match naming convention for other m_Specific* matchers, and frees up the
m_Opc() matcher for future use in #84940 to allow us to capture the
opcode of a unknown binop
Moving to m_SpecificOpc does mess up the formatting in a few places,
I've tried to refactor to use the m_Value(SDValue, ....) matcher where I
can to retrieve some whitespace
This updates the CIR constant emitter to use the correct destination
type when emitting a constant initializer for a structure that might be
initialized with non-prototyped function pointers. We were previously
using the type from whatever function declaration we had, but this may
not be the correct type.
This change also updates the `replaceUsesOfNonProtoTypeWithRealFunction`
to ignore global initializer uses, which do not need to be updated after
this change.
When JITing SPIR-V using LevelZero API, it expects the length of the
string since passed input data is a `void *`. Problem is, getting the
length of the string is not possible using something like
`strlen(reinterpret_cast<char *>(data))` in `mgpuModuleLoadJIT`
implementation. Becasuse the SPIR-V binary contains null bytes (i.e.,
the data is binary SPIR-V, not null-terminated text).
As a result we need to pass the `assmeblySize` via the
`mgpuModuleLoadJIT(void* data, int optLevel, size_t assmeblySize)`.
When a CIR op specifies a non-empty `llvmOp` field, the lowering
emitter now generates the `matchAndRewrite` body that converts the
result type and forwards all operands to the corresponding LLVM op.
This removes 27 boilerplate lowering patterns from LowerToLLVM.cpp.
Ops needing custom logic (FMaxNumOp/FMinNumOp for FastmathFlags::nsz)
override `llvmOp = ""` to retain hand-written implementations.
Also fixes llvmOp names (TruncOp -> FTruncOp, FloorOp -> FFloorOp)
and adds a diagnostic rejecting conflicting llvmOp + custom constructor.
Fix two bugs in CIR's handling of `[[no_unique_address]]` fields:
- Record layout: Use the base subobject type (without tail padding)
instead of the complete object type for [[no_unique_address]] fields,
allowing subsequent fields to overlap with tail padding.
- Field access: Insert bitcasts from the base subobject pointer to the
complete object pointer after cir.get_member for potentially-overlapping
fields, so downstream code sees the expected type.
- Zero-sized fields: Handle truly empty [[no_unique_address]] fields by
computing their address via byte offsets rather than cir.get_member,
since they have no entry in the record layout.
A known gap (CIR copies 8 bytes where OG copies 5 via
`ConstructorMemcpyizer`) is noted for follow-up.
LLDB automatically discovers, but doesn't automatically load, scripts in
the dSYM bundle. This is to prevent running untrusted code. Users can
choose to import the script manually or toggle a global setting to
override this policy. This isn't a great user experience: the former
quickly becomes tedious and the latter leads to decreased security.
This PR offers a middle ground that allows LLDB to automatically load
scripts from trusted dSYM bundles. Trusted here means that the bundle
was signed with a certificate trusted by the system. This can be a
locally created certificate (but not an ad-hoc certificate) or a
certificate from a trusted vendor.
Summary:
When the disk runs out of space during output file writing, BOLT would
crash with SIGSEGV/SIGABRT because raw_fd_ostream silently records write
errors and only reports them via abort() in its destructor. This made it
difficult to distinguish real BOLT bugs from infrastructure issues in
production monitoring.
Add an explicit error check on the output stream before calling
Out->keep(), so BOLT exits cleanly with exit code 1 and a clear error
message instead.
Test: manually verified with a full filesystem that BOLT now prints
"BOLT-ERROR: failed to write output file: No space left on device" and
exits with code 1.
We want the LV cost-model to make the best possible decision of VF and
whether or not to use partial reductions. At the moment, when the LV can
use partial reductions for a given VF range, it assumes those are always
preferred. After transforming the plan to use partial reductions, it
then chooses the most profitable VF. It is possible for a different VF
to have been more profitable, if it wouldn't have chosen to use partial
reductions.
This PR changes that, to first decide whether partial reductions are
more profitable for a given chain. If not, then it won't do the
transform.
This causes some regressions for AArch64 which are addressed in a
follow-up PR to keep this one simple.
The ConstantRange intersection check can now handle cases where the
condition of this branch is satisfied. The check is performed before
entering this function, so this part is no longer necessary.
This PR extracts the write to the in-memory module cache from within
`ASTWriter` into `CompilerInstance.` This brings it closer to other
module cache manipulations, making the ordering much more clear and
explicit.
Closes#189666 .
Fix incorrect printing and parsing of `cir.global` if
`global_visibility` attribute is present. Incorrect assembly format
```
(`` $global_visibility^)?
```
Resulted in keyword sticking to previous word and producing incorrect
cir like this:
```
cir.globalhidden external dso_local @hidden_var = #cir.int<10> : !s32i {alignment = 4 : i64} loc(#loc22)
cir.global "private"hidden internal dso_local @hidden_static_var = #cir.int<10> : !s32i {alignment = 4 : i64} loc(#loc24)
```
Using custom parser/printer that is used in `cir.func` parser fixes this
issue and makes printed/parsed attribute for functions and global values
consistent.
Also added tests for both global values and functions.
If a try block has a catch-all handler and one or more type-specific
catch handlers, we were failing to generate the null type specifier when
lowering from CIR to LLVM IR. This change fixes that problem.
Assisted-by: Cursor / claude-4.6-opus-high
This implements handling for cleanup of temporary variables with
automatic storage duration. This is a simplified implementation that
doesn't yet handle the possibility of exceptions being thrown within
this cleanup scope or the cleanup scope being inside a conditional
operation. Support for those cases will be added later.
Partial reductions were previously disabled by default, but by
implementing a generic cost-model in BasicTTIImpl (#189905) this now
accidentally enables the use of those when vectorising loops for targets
that may not support this yet.
For a given element, I believe A is only 0 when the divisor is INT_MIN.
The only way for NeedToApplyOffset to be false after processing all
elements, is for all divisors to be INT_MIN. If all divisors are
INT_MIN, then all divisors are a power of 2 and we wouldn't do the
transform.
Factor this similar to the ARM case for future
expansion. The difference being -mcpu is treated as
an alias for -mcpu instead of something separately
useful.
I don't understand this mutation of the triple into
spirv64. The only test where this appears to matter
does not use -mcpu. Previously this would only match
for -mcpu, but this would change the behavior to prefer
-march before falling back to -mcpu.