Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
On 64-bit target, when doing i64 SELECT_CC where one of the comparison operands
is a constant zero, try to fold the compare and MOVcc into a MOVr instruction.
For all integers, EQ and NE comparison are available, additionally for signed
integers, GT, GE, LT, and LE is also available.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138922
Materialize zeros by copying from %g0, which is now marked as constant.
This makes it possible for some common operations (like integer negation) to be
performed in fewer instructions.
This continues @arichardson's patch at D132561.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D138887
Don't emit deprecated v8-style FP compares & branches when targeting v9
processors.
For now, always use %fcc0, because currently the allocator requires allocatable
registers to also be spillable, which isn't the case with v9 FCC registers.
The work to enable allocation over the entire FCC register file will be done in
a future patch.
Fixes bug #17834
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D135515
On SPARC, S/UMULO operation on 64-bit integers works by extending the value to 128-bit, then doing a multiplication and checking the upper half of the result.
This makes UMULO works correctly by putting a zero in the upper half rather than doing a sign extension.
Reviewed By: LemonBoy
Differential Revision: https://reviews.llvm.org/D110555
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: James Y Knight
llvm-svn: 326028
TableGen has a fairly dubious heuristic to decide whether an alias should be
printed: does the alias have lest operands than the real instruction. This is
bad enough (particularly with no way to override it), but it should at least be
calculated consistently for both strings.
This patch implements that logic: first get the *correct* string for the
variant, in the same way as the Matcher, without guessing; then count the
number of whitespace chars.
There are basically 4 changes this brings about after the previous
commits; all of these appear to be good, so I have changed the tests:
+ ARM64: we print "neg X, Y" instead of "sub X, xzr, Y".
+ ARM64: we skip implicit "uxtx" and "uxtw" modifiers.
+ Sparc: we print "mov A, B" instead of "or %g0, A, B".
+ Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B"
llvm-svn: 208969
This is different from the argument passing convention which puts the
first float argument in %f1.
With this patch, all returned floats are treated as if the 'inreg' flag
were set. This means multiple float return values get packed in %f0,
%f1, %f2, ...
Note that when returning a struct in registers, clang will set the
'inreg' flag on the return value, so that behavior is unchanged. This
also happens when returning a float _Complex.
llvm-svn: 199028
Also avoid locals evicting locals just because they want a cheaper register.
Problem: MI Sched knows exactly how many registers we have and assumes
they can be colored. In cases where we have large blocks, usually from
unrolled loops, greedy coloring fails. This is a source of
"regressions" from the MI Scheduler on x86. I noticed this issue on
x86 where we have long chains of two-address defs in the same live
range. It's easy to see this in matrix multiplication benchmarks like
IRSmk and even the unit test misched-matmul.ll.
A fundamental difference between the LLVM register allocator and
conventional graph coloring is that in our model a live range can't
discover its neighbors, it can only verify its neighbors. That's why
we initially went for greedy coloring and added eviction to deal with
the hard cases. However, for singly defined and two-address live
ranges, we can optimally color without visiting neighbors simply by
processing the live ranges in instruction order.
Other beneficial side effects:
It is much easier to understand and debug regalloc for large blocks
when the live ranges are allocated in order. Yes, global allocation is
still very confusing, but it's nice to be able to comprehend what
happened locally.
Heuristics could be added to bias register assignment based on
instruction locality (think late register pairing, banks...).
Intuituvely this will make some test cases that are on the threshold
of register pressure more stable.
llvm-svn: 187139
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
llvm-svn: 178737
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
llvm-svn: 178621