Prefers the page size to come from the AUX vector, `getpagesize` is
removed from POSIX.1-2001. Also throws in a couple asserts to ensure the
page size is a valid value.
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.
Namely:
1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.
2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h
The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.
The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
This patch fixes a bug in the creation of shuffle masks when vectorizing
vectors in case of a diamond reuse with shuffle. The mask needs to
enumerate all elements of a vector, not treat the original vector value
as a single element. That is: if vectorizing two <2 x float> vectors
into a <4 x float> the mask needs to have 4 indices, not just 2.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
We should be able to use `spirv64` as a device variant match and it
should be considered a GPU.
Also add the triple to an RTTI check.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Background:
Telemetry code isn't always built (controlled by this
LLVM_BUILD_TELEMETRY cmake flag)
This means users of the library may not have the library. So we're
definding the `-DLLVM_BUILD_TELEMETRY` to be used in ifdef.
Currently, for IRBuilders that require an explicitly constructed
Folder, we also force Inserter to be constructed and then copied.
Provide a variant where the Inserter uses in-place default
construction, to support cases where it is self-referential.
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.
Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
They have same semantics. NonUniqueID is more friendly for isUnique
implementation in MCSectionELF.
History: 97837b7 added support for unique IDs in sections and added
GenericSectionID. Later, 1dc16c7 added NonUniqueID.
This has been deprecated since a479be0f39a3301e9ca634d37cf6454b6d3865c6
from September 2023, before LLVM 18. Surely now enough release cycles
have happened that it can be removed upstream.
- **Precommit tests for synchronous uwtable CFI fixup**
- **[CFIFixup] Fixup CFI for split functions with synchronous uwtables**
Commit
6e54fccede
disables CFI fixup for
functions with synchronous tables, breaking CFI for split functions.
Instead, we can disable *block-level* CFI fixup for functions with
synchronous tables.
Unwind tables can be:
- N/A (not present)
- Asynchronous
- Synchronous
Functions without unwind tables don't need CFI fixup (since they don't
care about CFI).
Functions with asynchronous unwind tables must be accurate for each
basic block, so full CFI fixup is necessary.
Functions with synchronous unwind tables only need to be accurate for
each function (specifically, the portion of a function in a given
section). Disabling CFI fixup entirely for functions with synchronous
uwtables may break CFI for a function split between two sections. The
portion in the first section may have valid CFI, while the portion in
the second section is missing a call frame.
Ex:
```
(.text.hot)
Foo (BB1):
<Call frame information>
...
BB2:
...
(.text.split)
BB3:
...
BB4:
<epilogue>
```
Even if `Foo` has a synchronous unwind table, we still need to insert
call frame information into `BB3` so that unwinding the call stack from
`BB3` or `BB4` works properly.
The find-dynamic-unwind-info callback registration APIs in libunwind
limit the number of callbacks that can be registered. If we use multiple
UnwindInfoManager instances, each with their own own callback function
(as was the case prior to this patch) we can quickly exceed this limit
(see https://github.com/llvm/llvm-project/issues/126611).
This patch updates the UnwindInfoManager class to use a singleton
pattern, with the single instance shared between all LLVM JITs in the
process.
This change does _not_ apply to compact unwind info registered through
the ORC runtime (which currently installs its own callbacks).
As a bonus this change eliminates the need to load an IR "bouncer"
module to supply the unique callback for each instance, so support for
compact-unwind can be extended to the llvm-jitlink tools (which does not
support adding IR).
Parameter PossiblyLoopIndependent has lost its intended purpose. This
flag is always set to true in all cases when depends() is called, hence
we want to reconsider the utility of this variable and remove it from
the function signature entirely. This is an NFC patch.
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).
The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.
```
declare { float, float } @llvm.sincospi.f32(float %Val)
declare { double, double } @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val)
```
Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
\[NVPTX\] Add Prefetch intrinsics
This PR adds prefetch intrinsics with the relevant eviction priorities.
* Lit tests are added as part of prefetch.ll
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst.
For more information, refer PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
---------
Co-authored-by: abmajumder <abmajumder@nvidia.com>
Update llvm.call/llvm.invoke pretty printer/parser and the llvm ir import/export
to deal with the argument and result attributes.
This patch is made on top of PR 123176 that modified the
CallOpInterface and added the argument and result attributes to
llvm.call and llvm.invoke without doing anything with them.
RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107
It was discovered that BOLT had several distinct issues of missing debug
information by various tags for debug names (119493 & 119023 as
examples), but the verification of a DWARF with llvm-dwarfdump prior to
those fixes only gave one 'missing name' category.
```
{"error-categories":{"Name Index DIE entry missing name":{"count":36355210}},"error-count":36355210}
```
To more easily leverage dwarf verification for debug health, the JSON
output will be improved to allow having detailed counts by a
sub-category when it makes sense.
For now, this is only implemented on the missing tags, but can be
extended to more.
```
{"error-categories":{"Name Index DIE entry missing name":{"count":10,"details":{"DW_TAG_inlined_subroutine":1,"DW_TAG_label":1,"DW_TAG_namespace":2,"DW_TAG_subprogram":2,"DW_TAG_variable":4}}},"error-count":10}
```
This diff also modifies the tests created in pull request 124936 (not
yet landed) to ensure the JSON switches. Ideally this lands after that
but it did not correctly create a stack of pull requests.
MCStreamer should not declare arch-specific functions. Such functions
should go to MCTargetStreamer.
Move MCMachOStreamer::emitThumbFunc to ARMTargetMachOStreamer, which is
a new subclass of ARMTargetStreamer. (The new class is just placed in
ARMMachObjectWriter.cpp. The conventional split like
ARMELFObjectWriter.cpp/ARMELFObjectWriter.cpp is overkill.)
`emitCFILabel`, called by ARMWinCOFFStreamer.cpp, has to be made public.
Pull Request: https://github.com/llvm/llvm-project/pull/126199
As Intel is working to add support for SPIR-V OpenMP device offloading
in upstream clang/liboffload, we need to modify the OpenMP frontend to
allow SPIR-V as well as generate valid IR for SPIR-V. For example, we
need the frontend to generate code to define and interact with device
globals used in the DeviceRTL.
This is the beginning of what I expect will be (many) other changes, but
let's get started with something simple.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
C++23 removed `<ciso646>` from the standard library. The header is used
in two places: Once in order to pull in standard library macros. Since
this file also includes `<optional>`, that use of `<ciso646>` is
technically redundant, but should probably be left in in case a future
change ever removes the include of `<optional>`. A second use of
`<ciso646>` appears to have been introduced in
da650094b187ee3c8017d74f63c885663faca1d8, but seems unnecessary (the
file doesn't seem to use anything from that header, and it seems to
build just fine on MSVC here without it). The new `<version>` header
should be supported by all supported implementations. This change
replaces uses of `<ciso646>` with the `<version>` header, or removes
them entirely where unnecessary.
The TransactionAcceptOrRevert pass is the final pass in the Sandbox
Vectorizer's default pass pipeline. It's job is to check the cost
before/after vectorization and accept or revert the IR to its original
state.
Since we are now starting the transaction in BottomUpVec, tests that run
a custom pipeline need to accept the transaction. This is done with the
help of the TransactionAlwaysAccept pass (tr-accept).
This PR adds:
- `RootSignatureFlags` extraction from DXContainer using `obj2yaml`
This PR is part of: #121493
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
- Remove a redundant LegalityQuery constructor by using a default value
for `MMODescrs` and remove const for ArrayRef arguments.
- Use a delegating constructor for `MemDesc` constructor that takes
`MachineMemOperand`.
This patch adds intrinsics for tcgen05 wait,
fence and commit PTX instructions.
lit tests are added and verified with a
ptxas-12.8 executable.
Docs are updated in the NVPTXUsage.rst file.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
When a def in a block A reaches another block B that is in A's iterated
dominance frontier, a phi node is added to B for the def register.
A clobbering def can be created at a call instruction, for a register
clobbered by a call.
However, phi nodes are not created for a register, when one of the
reaching defs of the register is a clobbering def.
This patch adds phi nodes for registers that have a clobbering reaching
def. These additional phis help in checking reaching defs for an
instruction in RDF based copy propagation and addressing mode
optimizations.
This introduces options `-floop-interchange` and `-fno-loop-interchange`
to enable/disable the loop-interchange pass. This is part of the work
that tries to get that pass enabled by default (#124911), where it was
remarked that a user facing option to control this would be convenient
to have. The option name is the same as GCC's.
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly
reusing the lowering for sincos and frexp).
The `llvm.modf` intrinsic takes a floating-point value and returns both
the integral and fractional parts (as a struct).
```
declare { float, float } @llvm.modf.f32(float %Val)
declare { double, double } @llvm.modf.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val)
```
This corresponds to the libm `modf` function but returns multiple values
in a struct (rather than take output pointers), which makes it easier to
vectorize.
The system libunwind on older Darwins does not support JIT registration of
compact-unwind. Since the CompactUnwindManager utility discards redundant
eh-frame FDEs by default we need to remove the compact-unwind section first
when targeting older libunwinds in order to preserve eh-frames.
While LLJIT was already doing this as of eae6d6d18bd, MachOPlatform was not.
This was causing buildbot failures in the ORC runtime (e.g. in
https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/3479/).
This patch updates both LLJIT and MachOPlatform to check a bootstrap value,
"darwin-use-ehframes-only", to determine whether to forcibly preserve
eh-frame sections. If this value is present and set to true then compact-unwind
sections will be discarded, causing eh-frames to be preserved. If the value is
absent or set to false then compact-unwind will be used and redundant FDEs in
eh-frames discarded (FDEs that are needed by the compact-unwind section are
always preserved).
rdar://143895614
The getBootstrapMap, getBootstrapMapValue, getBootstrapSymbolsMap, and
getBootstrapSymbols methods forward to their respective counterparts in
ExecutorProcessControl, similar to the callWrapper functions.
These methods will be used to simplify an upcoming patch that accesses
the bootstrap values.
RISCV Zicfilp/Zicfiss extensions uses the `.note.gnu.property` section
to store flags indicating the adoption of features based on these
extensions. This patch enables the llvm-readobj/llvm-readelf tools to
dump these flags with the `--note` flag.