By looking at whether a global is large instead of looking at the code
model.
This also fixes references to large data in the small code model.
We now always fold any 32-bit offset into the addressing mode with the
large code model since it uses 64-bit relocations.
Depositing value into the lowest byte/word is a common code pattern.
This patch improves the code generation for it to avoid redundant AND
and OR operations.
IR intrinsics were already defined, but no codegen support had been
added.
I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
- Adds a new +pc option to -mbranch-protection that will enable
the use of PC as a diversifier in PAC branch protection code.
- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
(pacibsppc, retaasppc, etc) are used.
Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/
Co-authored-by: Lucas Prates <lucas.prates@arm.com>
This presents misleading and confusing output. If you have a function
defined at the beginning of an XCOFF object file, and you have a
function call to an external function, the function call disassembles as
a branch to the local function. That is,
`void f() { f(); g();}`
disassembles as
>00000000 <.f>:
0: 7c 08 02 a6 mflr 0
4: 94 21 ff c0 stwu 1, -64(1)
8: 90 01 00 48 stw 0, 72(1)
c: 4b ff ff f5 bl 0x0 <.f>
10: 4b ff ff f1 bl 0x0 <.f>
With this PR, the second call will display:
`10: 4b ff ff f1 bl 0x0 <.g> `
Using -r can help, but you still get the confusing output:
>10: 4b ff ff f1 bl 0x0 <.f>
00000010: R_RBR .g
[TLI] Pass replace-with-veclib works with Scalable Vectors.
The pass is heavily refactored.
It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors.
Improve tests for ArmPL and SLEEF Intrinsics:
- Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines.
- Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]`
- Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.
A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.
Closes#75662
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.
Check all terminators for the function before clearing Restored.
This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.
Alternative to #73553
Fix bitcast test, which was splitting apart phis intended to force
bitcasts that survive all the way to selection.
Disable the amdgpu-codegenprepare phi splitting, which defeats the technique
of using a phi to ensure a bitcast reaches all the way to selection. Also
add a variety of bfloat tests. These probably need revisiting to avoid the
cast folding into argument loads. Also round out set of bfloat bitcast and
ABI tests.
Add codegen tests for more bf16 operations The promotion of these works
contrary to the comment.
Zvfbfmin does not have any scalar operands making this an unnecessary
dependency. The spec was just updated to remove this. See
86d7a74f4b
This fixes a correctness issue where Xsfvfwmaccqqq was incorrectly
depending on Zfbfmin. The SiFive CPUs that support Xsfvfwmaccqqq do not
implement Zfbfmin, but do implement Zvfbfmin based on a previous
understanding that it only requires Zve32f. I've added tests for this
feature to raise the bar for adding dependencies to it in the future.
The SiFive7 scheduler model has been using AcquireAtCycles and
ReleaseAtCycles for some time. Without EnableIntervals, the scheduler
was not making decisions based on this information. This patch sets
EnableIntervals to true, and the test case demonstrates that the VADD
instructions can be issued one cycle earlier since the VCQ is not
reserved. This leads to better saturation of the SiFive7VA.
Pass the original MMO instead of different individual values.
getAlign() was used before where actually getOriginalAlign() would have been
better, and this patch has the same effect.
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
It is currently disabled by default. It will need experiments on a real
HW to tune and decide on the profitability.
---------
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.
While AMDGPU currently disables the use of all libcalls, I've changed it
to instead disable all of them _except_ the atomic ones. Those are
already be emitted by the Clang frontend, and enabling them in the
backend allows the same behavior there.
Rather than shepherding a type name all the way to the backend as a
string and attempting to parse it, get the element type out of the AST
and store that in the resource annotation metadata directly.
Pull Request: https://github.com/llvm/llvm-project/pull/75674
We were only folding cases which remained extloads, but DAG.getExtLoad can also handle the cases which don't need to extend at all (we just can't do truncloads).
reduceLoadWidth can handle this for scalar loads, but not for vectors.
Noticed while triaging D152928