Add a test case where some ops of a reassociate-able expression are in
an earlier block.
This can appear in practice, e.g. when computing the final reduction
value after vectorization.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.
Reviewed By: hassnaa-arm
Differential Revision: https://reviews.llvm.org/D141266
This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm.
Differential Revision: https://reviews.llvm.org/D141137
If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we
can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we
can emit `csinc x, ZR, cc`. This can help where we can not otherwise
general ccmp instructions.
Differential Revision: https://reviews.llvm.org/D141119
Unlike an anonymous block, it will not be removed even though we've resolved
all valid paths to get here. So removing a PHI can leave vregs with no
definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.
This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with
variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic
location expressions. The former essentially means just prepending
DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in
MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef.
Reviewed By: jmorse, Orlando
Differential Revision: https://reviews.llvm.org/D133929
Snippet is a tiny live interval which has copy or fill like def
and copy or spill like use at the end (any of them might abcent).
Snippet has only one use/def inside interval and interval is located
in one basic block.
When inline spiller spills some reg around uses it also forces the
spilling of connected snippets those which got by splitting the
same original reg and its def is a full copy of our reg or its
last use is a full copy to our reg.
The definition of snippet is extended to allow not only one use/def
but more. However all other uses are statepoint instructions which will
fold fill into its operand. That way we do not introduce new fills/spills.
Reviewed By: qcolombet, dantrushin
Differential Revision: https://reviews.llvm.org/D138093
This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.
Differential Revision: https://reviews.llvm.org/D140573
This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.
Differential Revision: https://reviews.llvm.org/D140572
This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.
Differential Revision: https://reviews.llvm.org/D140571
This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.
Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.
Differential Revision: https://reviews.llvm.org/D140570
32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.
I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.
This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.
As a side effect, non-constant 32-bit shifts also become more efficient.
For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.
The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.
Differential Revision: https://reviews.llvm.org/D140569
Integer legalization already supported splitting the output integer of
llround and llrint, but did not support this for lround and lrint yet.
This is not a problem for 32-bit architectures, but for 8/16-bit
architectures like AVR it results in a crash like this:
ExpandIntegerResult #0: t7: i32 = lround t6
LLVM ERROR: Do not know how to expand the result of this operator!
This patch simply add lrint/lround to the list of ISD opcodes to expand.
Fixes https://github.com/llvm/llvm-project/issues/59573.
Differential Revision: https://reviews.llvm.org/D140822
These two symbols are declared in object files to indicate whether .data
needs to be copied from flash or .bss needs to be cleared. They are
supported on avr-gcc and reduce firmware size a bit, which is especially
important on very small chips.
I checked the behavior of avr-gcc and matched it as well as possible.
From my investigation, it seems to work as follows:
__do_copy_data is set when the compiler finds a data symbol:
* without a section name
* with a section name starting with ".data" or ".gnu.linkonce.d"
* with a section name starting with ".rodata" or ".gnu.linkonce.r" and
flash and RAM are in the same address space
__do_clear_bss is set when the compiler finds a data symbol:
* without a section name
* with a section name that starts with .bss
Simply checking whether the calculated section name starts with ".data",
".rodata" or ".bss" should result in the same behavior.
Fixes: https://github.com/llvm/llvm-project/issues/58857
Differential Revision: https://reviews.llvm.org/D140830
Ensure the negation required when lowering negative power-of-two
divides uses the scalable vector container type with the fixed
length result extracted from it.
Fixes: #59647
Differential Revision: https://reviews.llvm.org/D140563
After frontend changes in the following commit:
"BPF: preserve btf_decl_tag for parameters of extern functions"
same mechanics could be used to get the list of function parameters
and associated btf_decl_tag entries for both extern and non-extern
functions.
This commit extracts this mechanics as a separate auxiliary function
BTFDebug::processDISubprogram(). The function is called for both
extern and non-extern functions in order to generated corresponding
BTF_DECL_TAG records.
Differential Revision: https://reviews.llvm.org/D140971
SPIRVModuleAnalysis collects module and external function registers
(usually result of OpFunction) for use when emitting OpFunctionCall.
This patch makes the mapping between the functions and registers using
pointers (instead of name strings) to ensure anonymous functions and
calls can be resolved properly.
Differential Revision: https://reviews.llvm.org/D140548
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.
There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
This was looking for a specific constant cast of the function, when
the type doesn't matter. Doesn't bother trying to handle typed
pointers, it will just assert.
Things probably don't work completely correctly if the block kernel
address is captured somewhere else, but that wouldn't work before
either. The uses should really be loads out of the handle, and the
handle initializer should contain the kernel address.
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.
Reviewed by: jmorse
Differential Revision: https://reviews.llvm.org/D133926
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:
* The introduction of a new MachineOperand type, MO_DbgInstrRef, which
consists of two unsigned numbers that are used to index an instruction
and an output operand within that instruction, having a meaning
identical to first two operands of the current DBG_INSTR_REF
instruction. This operand is only used in DBG_INSTR_REF (see below).
* A change in syntax for the DBG_INSTR_REF instruction, shuffling the
operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
and replacing the first two operands with a single MO_DbgInstrRef-type
operand.
This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D129372
This includes a fix for the tramp3d failure from the llvm-testsuite
that caused the last revert. Hopefully the others failures were the
same issue.
Original commit message:
For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset.
Co-authored-by: luxufan <luxufan@iscas.ac.cn>
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D98101
Add patterns with seteq/setne conditions.
We don't have instructions for seteq/setne except for comparing
with zero and need to emit an ADDI or XOR before a seqz/snez to
compare other values.
The select ISD node takes a 0/1 value for the condition, but the
VT_MASKC(N) instructions check all XLen bits for zero or non-zero.
We can use this to avoid the seqz/snez in many cases.
This is pretty ridiculous number of patterns. I wonder if we could
use some ComplexPatterns to merge them, but I'd like to do that as
a follow up and focus on correctness of the result in this patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D140421
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.
This uses vscale_range for min/max vector width unless the command
line overrides are used.
As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139873
Now that D139525 fixes the Hexagon infinite loop, the stopgap can be
removed to provide more information about known bits in SPLAT_VECTOR
whose operands are smaller than the bit width (which is most of the
time)
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D141075
By default expand all operations, then change to Custom/Legal if needed.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D141068
Forking this off from D140850 -
https://alive2.llvm.org/ce/z/TgBeK_https://alive2.llvm.org/ce/z/STVD7d
We could almost justify doing this in IR, but consideration for
"minsize" requires that we only try it in codegen -- the
transform is not reversible.
In all other cases, avoiding multiply should be a win because a
mul is more expensive than simple/parallelizable compares. AArch
even has a trick to keep instruction count even for some types.
Differential Revision: https://reviews.llvm.org/D141086