This adds support for using the L and H argument modifiers for twinword
operands in inline asm code, such as in:
```
%1 = tail call i64 asm sideeffect "rd %pc, ${0:L} ; srlx ${0:L}, 32, ${0:H}", "={o4}"()
```
This is needed by the Linux kernel.
When in 32-bit mode, the backend doesn't currently implement 64-bit
atomics, even though the hardware is capable if you have specified a V9
CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO.
This fixes a regression triggered by PR #73176.
This adds support for marking arbitrary general purpose registers -
except for those with special purpose (G0, I6-I7, O6-O7) - as reserved,
as needed by some software like the Linux kernel.
On 64-bit target, prefer using RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids
polluting the processor's predictor state.
The old behavior of using a fake CALL is still done when tuning for
classic UltraSPARC processors, since RDPC is much slower there.
A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT
loads, and about 7% speedup on INSERT/UPDATE loads.
Reviewed By: @s-barannikov
Github PR: https://github.com/llvm/llvm-project/pull/78280
* Placing 'G' before 'M' (SHF_MERGE) can be misleading as the sh_entsize
argument goes before the section group name, if a reader doesn't know
that the order of extra arguments is not affected by the order of flags.
* 'a', 'w', and 'x' indicate basic permission-related flags. Separating
them with 'G' is kinda ugly.
Simplify code and move 'G' after 'o'. The new output is more similar to
GCC.
There are no 64-bit variants of these ALU / SETHI instructions in V9.
Remove these instruction definitions and add patterns to match DAG nodes
to the generic instructions defined in SparcInstrInfo.td.
This is not strictly NFC because of the changes in
`2011-01-11-FrameAddr.ll` test. The reason is that Sparc delay slot
filler pass handled ADDrr but not ADDXrr, which are now the same
instruction.
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
LEA_ADDri and LEAX_ADDri are printed / encoded the same way as ADDri. I
had to change the type of simm13Op so that it can be used in both 32-
and 64-bit modes. This required the changes in operands of some
InstAliases.
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
LLVM fails to build on 32-bit Solaris/SPARC: several programs fail to
link due to undefined references to `__multi3`. This reference is from
`lib/libLLVMScalarOpts.a(LoopStrengthReduce.cpp.o)`. However, This
function exists neither in the 32-bit `libgcc.a` nor in
`libclang_rt.builtins-sparc.a`. It's only defined in their 64-bit
counterparts.
The same issue affects several 32-bit targets, e.g. 32-bit PowerPC as
described in Issue #54460. The fix is the same: inhibit the libcall for
32-bit compilations. This patch does just that, regenerating the
affected testcases. It allows the build to complete.
Tested on `sparc-sun-solaris2.11`.
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
We currently don't extract vector elements from multi-use build vectors unless TLI.aggressivelyPreferBuildVectorSources accepts them, which seems a little extreme for constant build vectors (especially as under some cases ComputeKnownBits will indirectly extract the data for us).
This is causing a few regressions in some upcoming SimplifyDemandedBits work I'm looking at, all of which just need to know that the element is zero, so I've tweaked the fold to accept zero elements as well, which will typically fold very easily.
Differential Revision: https://reviews.llvm.org/D155582
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
On 64-bit target, when doing i64 BR_CC where one of the comparison operands is a
constant zero, try to fold the compare and BPcc into a BPr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142461
On 64-bit target, when doing i64 BR_CC where one of the comparison operands is a
constant zero, try to fold the compare and BPcc into a BPr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142461
Integrate the BranchRelaxation pass to help with relaxing out-of-range
conditional branches.
This is mostly of concern for SPARCv9, which uses conditional branches with
much smaller range than its v8 counterparts.
(Some large autogenerated code, such as the ones generated by TableGen, already
hits this limitation when building in Release)
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142458
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
In LowerSELECT_CC, SELECT_REG between two f128s should only be emitted if we
have hardware quadfloat enabled.
This should fix issue #59646
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140515
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
On 64-bit target, when doing i64 SELECT_CC where one of the comparison operands
is a constant zero, try to fold the compare and MOVcc into a MOVr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138922
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
Don't emit deprecated v8-style FP compares & branches when targeting v9
processors.
For now, always use %fcc0, because currently the allocator requires allocatable
registers to also be spillable, which isn't the case with v9 FCC registers.
The work to enable allocation over the entire FCC register file will be done in
a future patch.
Fixes bug #17834
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135515
Do not emit deprecated v8-style branches when targeting a v9 processor.
As a side effect, this also fixes the emission of useless ba's when doing
conditional branches on 64-bit integer values.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D130006
Implement CanLowerReturn and associated CallingConv changes for SPARC/SPARC64.
In particular, for SPARC64 there's new `RetCC_Sparc64_*` functions that handles the return case of the calling convention.
It uses the same analysis as `CC_Sparc64_*` family of funtions, but fails if the return value doesn't fit into the return registers.
This makes calls to functions with big return values converted to an sret function as expected, instead of crashing LLVM.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D132465
The code incorrectly checked for CTLZ_ZERO_UNDEF instead of
CTTZ_ZERO_UNDEF.
While I was there I flipped the condition into an early out.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D136010
As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests
currently `FAIL` on Sparc since that backend uses the Sun assembler syntax
for the `.section` directive, controlled by
`SunStyleELFSectionSwitchSyntax`.
Instead of adapting the affected tests, this patch changes that default.
The internal assembler still accepts both forms as input, only the output
syntax is affected.
Current support for the Sun syntax is cursory at best: the built-in
assembler cannot even assemble some of the directives emitted by GCC, and
the set supported by the Solaris assembler is even larger: SPARC Assembly
Language Reference Manual, 3.4 Pseudo-Op Attributes
<https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>.
A few Sparc test cases need to be adjusted. At the same time, the patch
fixes the failures from D85414 <https://reviews.llvm.org/D85414>.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D85415
This patch emits table lookup in expandCTTZ.
Context -
https://reviews.llvm.org/D113291 transforms set of IR instructions to
cttz intrinsic but there are some targets which does not support CTTZ or
CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().
Differential Revision: https://reviews.llvm.org/D128911
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.
This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.
This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_
Differential Revision: https://reviews.llvm.org/D128123
On SPARC, leaf function optimization omits the register window sliding (and the associated register name changes). This might result in miscompilation of procedures containing inline assembly, as some of the register constraints used may interfere with the register usage of optimized functions, so we disable leaf procedure optimization on those procedures to prevent it from happening.
This is a continuation of patch D102342 by @LemonBoy, the original comment is reproduced below:
> Leaf functions allow the compiler to omit the setup and teardown of a frame pointer, therefore avoiding the exchange of the in/out register. According to the SPARC architecture manual every reference to %i0-%i5 should be replaced with %o0-o5, if the target register is already in use a further remapping step to %g1-%g7 is required to free the output register.
>
> Add a simple check to make sure not to stomp on any output register that's already in use.
Reviewed By: dcederman
Differential Revision: https://reviews.llvm.org/D128263
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.
This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.
Reviewed by: nikic, MaskRay, RKSimon
Differential Revision: https://reviews.llvm.org/D126550
Make sure that we really don't emit quad-precision unless the "hard-quad-float"
feature is available. Add missing replacement instruction patterns that are
needed to emit alternative code for conditional moves of quad-precision floats.
Test from koakuma.
Reviewed By: koakuma
Differential Revision: https://reviews.llvm.org/D119104
This patch adds tail call support to the 32-bit Sparc backend.
Two new instructions are defined, TAIL_CALL and TAIL_CALLri. They are
encoded the same as CALL and BINDri, but are marked with isReturn so
that the epilogue gets emitted. In contrast to CALL, TAIL_CALL is not
marked with isCall. This makes it possible to use the leaf function
optimization when the only call a function makes is a tail call.
TAIL_CALL modifies the return address in %o7, so for leaf functions
the value in %o7 needs to be restored after the call. For normal
functions which uses the restore instruction this is not necessary.
Reviewed By: koakuma
Differential Revision: https://reviews.llvm.org/D51206
On SPARC, S/UMULO operation on 64-bit integers works by extending the value to 128-bit, then doing a multiplication and checking the upper half of the result.
This makes UMULO works correctly by putting a zero in the upper half rather than doing a sign extension.
Reviewed By: LemonBoy
Differential Revision: https://reviews.llvm.org/D110555
These compiler-rt-only symbols aren't available in libgcc. Similar to
D108842, D108844, and D108926.
Fixes: pr/52043
Reviewed By: craig.topper, rengolin
Differential Revision: https://reviews.llvm.org/D112750
The tests only specify -march, so when the tests are run on AIX the target OS defaults to AIX, which causes the tests to misbehave.
This patch constrains the tests by specifying -mtriple instead of -march.
Reviewed By: daltenty, jsji, MaskRay
Differential Revision: https://reviews.llvm.org/D110186
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
Lower truncations and expansions between fp128 and half values into libcalls.
Expand truncating stores into two separate truncation and a store operations.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D104185