This pull request modifies the behavior of the
`@llvm.experimental.stackmap` intrinsic to require that its two first
operands (`id` and `numShadowBytes`) be **immediate values**. This
change ensures that variables cannot be passed as two first arguments to
this intrinsic.
Related Issue: https://github.com/llvm/llvm-project/issues/115733
### Testing
- Added new test cases to ensure errors are emitted for non-immediate
operands.
- Ran the full LLVM test suite to verify no regressions were introduced.
The patch fix https://github.com/llvm/llvm-project/issues/116061
The root cause of the assertion is that the FMA mutation pass does not
update the subranges of the live interval for the defined register of
the modified instruction .
it recalculate the live interval of the defined register of xvmaddmdp in
the VSX FMA mutation pass.
When extracting a smaller integer from a scalar_to_vector source, we were limited to only folding/truncating the lowest bits of the scalar source.
This patch extends the fold to handle extraction of any other element, by right shifting the source before truncation.
Fixes a regression from #117884
When GlobalMerge creates a MergedGlobal of statics all initialized to
zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results
in just emitting zeros followed by labels for the aliases. We need to
handle it more like how emitGlobalConstantStruct does by emitting each
global inside the aggregate.
---------
Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Nodes created with `getMemIntrinsicNode` have memory operands. In order
for operands to be propagated to machine instructions, the nodes should
have `SDNPMemOperand` property.
Similar to 3c8c385a.
Commit 93589057830b2c3c35500ee8cac25c717a1e98f9 was reverted because it
caused a failure with test `lld :: ELF/ppc64-local-exec-tls.s`. This
relands the commit with a fix for the test.
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.
Last part of Implement the atan2 HLSL Function. Fixes#70096.
When the chain is not the entry node there is a risk the stores are
within a (CALLSEQ_START, CALLSEQ_END), which when the node is expanded
will lead to nested call sequences.
It should be possible to check for this and allow more cases, but for
now, let's limit this to cases where it's definitely safe.
Fixes#115323
If an instruction doesn't support memory operands, but one is provided,
an error should be raised. And conversely, if an instruction requires a
memory operand, but none is given, an error should be raised.
This patch fixes the combines for vector_shuffles when either or both of
its left and right hand side inputs are scalar_to_vector nodes.
Previously, when both left and right side inputs are scalar_to_vector
nodes, the current combine could not handle this situation, as the shuffle
mask was updated incorrectly. To temporarily solve this solution, this combine
was simply disabled and not performed.
Now, not only does this patch aim to resolve the previous issue of the
incorrect shuffle mask adjustments respectively, but it also updates any test
cases that are affected by this change.
Patch migrated from https://reviews.llvm.org/D130487.
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.
Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.
Fixes https://github.com/llvm/llvm-project/issues/112771.
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
The dpbusd_const.ll test change is due to us losing the expanded add reduction pattern as one of the elements is known to be zero (removing one of the adds from the reduction pyramid). I don't think its of concern.
Noticed while working on #107423
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.
These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.
Alternative to #114967
Fixes the remaining regression in #112588
This patch utilizes getReservedRegs() to find asm clobberable registers.
And to make the result of getReservedRegs() accurate, this patch
implements the todo, which is to make r2 allocatable on AIX for some
leaf functions.
Utilize common API in PPCTargetParser
(https://github.com/llvm/llvm-project/pull/97541) to set default CPU
with same interfaces for LLC.
This will update AIX default CPU to pwr7 and LoP powerppc64 default CPU
to ppc64.
Fixes: https://github.com/llvm/llvm-project/issues/71030
Bug only happens in 64bit involving spills. Since we don't know when the
spill will happen, all instructions in the chain used to deduce sign
extension for eliminating 'extsw' will need to be promoted to 64-bit
pseudo instructions.
The following instruction will promoted in PPCMIPeepholes: EXTSH, LHA,
ISEL to EXTSH8, LHA8, ISEL8
Some tests contain errors in constrained intrinsic usage, such as missed
or extra type parameters, wrong type parameters order and some other.
---------
Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>
Symbols that get mapped into the read-only section are loaded as part of
the text segment and will always need a TOC entry to be addressable. Add
an option to aggressively merge these read only globals to reduce TOC
usage.
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
Add support for using a thread-local variable with a specified offset
for holding the stack guard canary value. This supports both 32- and 64-
bit PowerPC targets.
This mirrors changes from #108942 but targeting PowerPC instead of
RISCV. Because both of these PRs modify the same driver functions, this
series is stack on top of the RISC-V one.
---------
Signed-off-by: Keith Packard <keithp@keithp.com>
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.
When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.
The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.
One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
The patch adds support for lround when the output type of the rounding
is i32.
The support for a rounding result of type i64 existed before this patch.