70514 Commits

Author SHA1 Message Date
Philip Reames
3df27e9741 [RISCV] Minor style cleanup in advance of D141311 [nfc] 2023-01-09 11:31:54 -08:00
Heejin Ahn
78f01f69b3 [WebAssembly] Ensure 'end_function' in functions
Local info is supposed to be emitted in the start of every function.
When there are locals, `.local` section should be present, and we emit
local info according to the section.

If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a `.local` section. If there is no
instruction, `end_function` pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.

The following assembly is malformed because the function `test` doesn't
have an `end_function` at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.
```
.functype test () -> ()
test:
```

This patch adds one extra state to the Wasm assembly parser,
`FunctionLabel` to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.

It is somewhat tricky to distinguish `FunctionLabel` and
`FunctionStart`, because it is not always possible to ensure the state
goes from `FunctionLabel` -> `FunctionStart`. `.functype` directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
`end_function` properly. Also we would like to error out when it does
not.

For example,
```
.functype test() -> ()
test:
```
We should error out for this because we know `test` is a function and it
doesn't end with an `end_function`. This PR fixes this.

```
test:
```
We don't error out for this because there is no info that `test` is a
function, so we don't know whether there should be an `end_function` or
not.

```
test:
.functype test() -> ()
```
We error out for this currently already, because we currently switch to
`FunctionStart` state when we first see `.functype` directive after its
label definition.

Fixes https://github.com/llvm/llvm-project/issues/57427.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D141103
2023-01-09 11:09:35 -08:00
Matt Arsenault
925aa514ed AMDGPU: Use DataExtractor for printf string extraction
Attempt 2 to fix big endian bot failures.
2023-01-09 14:03:42 -05:00
Ivan Kosarev
2d945ef864 [AMDGPU][NFC] Rename GFX10A16 operands.
They do not seem to be GFX10-specific anymore. Also renames the
corresponding feature.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D141069
2023-01-09 17:18:46 +00:00
Jay Foad
6a6f62a719 [AMDGPU] S_MULK_I32 does not define SCC. NFCI.
Differential Revision: https://reviews.llvm.org/D141281
2023-01-09 15:44:56 +00:00
Sander de Smalen
8aff167b34 [AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D141266
2023-01-09 15:08:04 +00:00
Alexey Baturo
35b8bb0ab3 [RISC-V][HWASAN] Don't explicitly load GOT entry to call hwasan mismatch routine
Reviewed by: luismarques

Differential Revision: https://reviews.llvm.org/D132994
2023-01-09 16:46:28 +03:00
David Green
90f24bef47 [ARM] Fold And/Or into CSel if possible
This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm.

Differential Revision: https://reviews.llvm.org/D141137
2023-01-09 13:28:57 +00:00
David Green
07d6af6a71 [AArch64] Fold And/Or into CSel if possible
If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we
can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we
can emit `csinc x, ZR, cc`. This can help where we can not otherwise
general ccmp instructions.

Differential Revision: https://reviews.llvm.org/D141119
2023-01-09 11:52:37 +00:00
Sander de Smalen
17a1936122 [AArch64] NFC: Align addTypeForStreamingSVE and addTypeForFixedLengthSVE
This patch is NFC and just moves things around so their implementation is very similar.
2023-01-09 09:47:33 +00:00
zhongyunde
9e83333445 [AArch64][SelectionDAG] Eliminates redundant zero-extension for 32-bit popcount
Fix https://github.com/llvm/llvm-project/issues/59597.
mov w8, w0 + fmov d0, x8 ==> fmov s0, w0

Reviewed By: dmgreen, efriedma

Differential Revision: https://reviews.llvm.org/D140649
2023-01-09 16:08:16 +08:00
liqinweng
1f8746cc80 [RISCV][CostModel] Add half type support for the cost model of sqrt/fabs
1. Refactor for costs of sqrt/fabs
2. Add half type support for the cost model of sqrt/fabs

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132908
2023-01-09 12:57:03 +08:00
liqinweng
f3408739da [RISCV][CostModel] Add cost model for integer abs
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132999
2023-01-09 11:38:24 +08:00
Fangrui Song
8f7e674771 [AVR] Support .reloc directive
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D141176
2023-01-08 23:27:21 +00:00
Ayke van Laethem
9592920890
[AVR] Optimize 32-bit shifts: optimize REG_SEQUENCE
This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.

Differential Revision: https://reviews.llvm.org/D140573
2023-01-08 20:05:31 +01:00
Ayke van Laethem
fad5e0cf50
[AVR] Optimize 32-bit shifts: reverse shift + move
This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.

Differential Revision: https://reviews.llvm.org/D140572
2023-01-08 20:05:31 +01:00
Ayke van Laethem
81f5f22f27
[AVR] Optimize 32-bit shifts: shift by 4 bits
This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.

Differential Revision: https://reviews.llvm.org/D140571
2023-01-08 20:05:31 +01:00
Ayke van Laethem
8f8afabd32
[AVR] Optimize 32-bit shift: move bytes around
This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.

Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.

Differential Revision: https://reviews.llvm.org/D140570
2023-01-08 20:05:31 +01:00
Ayke van Laethem
840d10a1d2
[AVR] Custom lower 32-bit shift instructions
32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.

I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.

This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.

The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.

Differential Revision: https://reviews.llvm.org/D140569
2023-01-08 20:05:31 +01:00
Ayke van Laethem
167338de96
[AVR] correctly declare __do_copy_data and __do_clear_bss
These two symbols are declared in object files to indicate whether .data
needs to be copied from flash or .bss needs to be cleared. They are
supported on avr-gcc and reduce firmware size a bit, which is especially
important on very small chips.

I checked the behavior of avr-gcc and matched it as well as possible.
From my investigation, it seems to work as follows:

__do_copy_data is set when the compiler finds a data symbol:
  * without a section name
  * with a section name starting with ".data" or ".gnu.linkonce.d"
  * with a section name starting with ".rodata" or ".gnu.linkonce.r" and
    flash and RAM are in the same address space

__do_clear_bss is set when the compiler finds a data symbol:
  * without a section name
  * with a section name that starts with .bss

Simply checking whether the calculated section name starts with ".data",
".rodata" or ".bss" should result in the same behavior.

Fixes: https://github.com/llvm/llvm-project/issues/58857

Differential Revision: https://reviews.llvm.org/D140830
2023-01-08 18:56:06 +01:00
Benjamin Kramer
91487b2481 [X86][Disassembler][NFCI] Read bytes with support::endian::read 2023-01-08 18:19:49 +01:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Paul Walker
c9602e02fc [SVE] Fix incorrect VT usage when lowering fixed length vector divides.
Ensure the negation required when lowering negative power-of-two
divides uses the scalable vector container type with the fixed
length result extracted from it.

Fixes: #59647

Differential Revision: https://reviews.llvm.org/D140563
2023-01-08 12:22:05 +00:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Eduard Zingerman
f60aefdc7f [BPF] generate btf_decl_tag records for params of extern functions
After frontend changes in the following commit:
"BPF: preserve btf_decl_tag for parameters of extern functions"
same mechanics could be used to get the list of function parameters
and associated btf_decl_tag entries for both extern and non-extern
functions.

This commit extracts this mechanics as a separate auxiliary function
BTFDebug::processDISubprogram(). The function is called for both
extern and non-extern functions in order to generated corresponding
BTF_DECL_TAG records.

Differential Revision: https://reviews.llvm.org/D140971
2023-01-07 09:32:18 -08:00
Ben Shi
6dc85bd3fd [AVR] Fix incorrect decoding of RJMP and RCALL
This patch fixes the inaccurate decoding of the offset operand of
the RCALL & RJMP instructions.

Reviewed By: aykevl, MaskRay

Differential Revision: https://reviews.llvm.org/D140815
2023-01-07 23:26:40 +08:00
Michal Paszkowski
99203241df [SPIR-V] Map IR function pointers to registers in ModuleAnalysis
SPIRVModuleAnalysis collects module and external function registers
(usually result of OpFunction) for use when emitting OpFunctionCall.
This patch makes the mapping between the functions and registers using
pointers (instead of name strings) to ensure anonymous functions and
calls can be resolved properly.

Differential Revision: https://reviews.llvm.org/D140548
2023-01-07 15:38:01 +01:00
wanglei
b1800240be [LoongArch] Reorder code and inline variable in lowerGlobalTLSAddress for clarity. NFC 2023-01-07 15:26:39 +08:00
Ben Shi
7a45b13cd7 [AVR] Fix some ambiguous cases in AsmParser
Some specific operands in specific instructions should be treated
as variables/symbols/labels, other than registers.

This patch fixes those ambiguous cases, such as "lds r25, r24",
which means loading the value inside symbol 'r24' into register 'r25'.

Fixes https://github.com/llvm/llvm-project/issues/58853

Reviewed by: aykevl

Differential Revision: https://reviews.llvm.org/D140777
2023-01-07 10:42:07 +08:00
Matt Arsenault
68b6cabd9e AMDGPU: Use getTypeAllocSize 2023-01-06 21:33:19 -05:00
Matt Arsenault
47554a0c73 AMDGPU: Use more accurate IR type for block handle
The device library uses this as a struct with a pointer sized integer
and 2 ints.
2023-01-06 21:23:28 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Matt Arsenault
0416883dc1 AMDGPU: Fix enqueue block lowering for opaque pointers
This was looking for a specific constant cast of the function, when
the type doesn't matter. Doesn't bother trying to handle typed
pointers, it will just assert.

Things probably don't work completely correctly if the block kernel
address is captured somewhere else, but that wouldn't work before
either. The uses should really be loads out of the handle, and the
handle initializer should contain the kernel address.
2023-01-06 21:15:39 -05:00
Matt Arsenault
0995a31e5d AMDGPU: Try to fix 32-bit build bot 2023-01-06 17:34:25 -05:00
Matt Arsenault
40078a6b71 AMDGPU: Use BinaryByteStream in printf expansion
Attempt to fix test failures on big endian bots. This pass definitely
needs more test coverage.
2023-01-06 17:22:13 -05:00
Joe Nash
1b12d7d15b [AMDGPU] Combine redundant Asm64 and AsmVOP3DPPBase. NFC
Reduce duplication in the codebase by combining these fields in
VOPProfile.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D141088
2023-01-06 14:09:42 -05:00
Stephen Tozer
e10e936315 [DebugInfo][NFC] Add new MachineOperand type and change DBG_INSTR_REF syntax
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:

  * The introduction of a new MachineOperand type, MO_DbgInstrRef, which
    consists of two unsigned numbers that are used to index an instruction
    and an output operand within that instruction, having a meaning
    identical to first two operands of the current DBG_INSTR_REF
    instruction. This operand is only used in DBG_INSTR_REF (see below).
  * A change in syntax for the DBG_INSTR_REF instruction, shuffling the
    operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
    and replacing the first two operands with a single MO_DbgInstrRef-type
    operand.

This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D129372
2023-01-06 18:03:48 +00:00
Kai Nacke
70a5d8e4c4 [PPC] Add support for tune-cpu attribute
clang (like gcc) has the -mtune= command line option. This option
adds the "tune-cpu" attribute to a function. The intended functionality
is that the scheduling model of that cpu is used. E.g. -mtune=pwr9 -march=pwr8
generates only instructions supported on pwr8 but uses the scheduling model
of pwr9 for it.
This PR adds the infrastructure to support this in LLVM.
clang support was added in https://reviews.llvm.org/D130526.

Reviewed By: amyk, qiucf

Differential Revision: https://reviews.llvm.org/D138317
2023-01-06 18:01:48 +00:00
LiDongjin
4554663bc0 Recommit "[RISCV] Enable the LocalStackSlotAllocation pass support"
This includes a fix for the tramp3d failure from the llvm-testsuite
that caused the last revert. Hopefully the others failures were the
same issue.

Original commit message:
For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset.

Co-authored-by: luxufan <luxufan@iscas.ac.cn>
Co-authored-by: Craig Topper <craig.topper@sifive.com>

Differential Revision: https://reviews.llvm.org/D98101
2023-01-06 09:54:19 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Craig Topper
9f087ba05b [RISCV] Improve 4x and 8x (s/u)int_to_fp.
Previously we emitted a 4x or 8x vzext followed by a vfcvt.
We can instead use a 2x or 4x vzext followed by a vfwcvt.
2023-01-06 08:39:14 -08:00
Craig Topper
1aa9862df3 [RISCV] Add more XVentanaCondOps patterns.
Add patterns with seteq/setne conditions.

We don't have instructions for seteq/setne except for comparing
with zero and need to emit an ADDI or XOR before a seqz/snez to
compare other values.

The select ISD node takes a 0/1 value for the condition, but the
VT_MASKC(N) instructions check all XLen bits for zero or non-zero.
We can use this to avoid the seqz/snez in many cases.

This is pretty ridiculous number of patterns. I wonder if we could
use some ComplexPatterns to merge them, but I'd like to do that as
a follow up and focus on correctness of the result in this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D140421
2023-01-06 08:29:23 -08:00
Craig Topper
e5a71a41d8 [RISCV] Add support for the vscale_range attribute.
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.

This uses vscale_range for min/max vector width unless the command
line overrides are used.

As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139873
2023-01-06 08:20:37 -08:00
Hassnaa Hamdi
9eb698946d [AArch64][SME]: Make 'Expand' the default action for all Ops.
By default expand all operations, then change to Custom/Legal if needed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D141068
2023-01-06 15:32:07 +00:00
Guillaume Chatelet
87b6b347fc Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"
The patch should be discussed further.

This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.
2023-01-06 15:27:50 +00:00
Guillaume Chatelet
dd56e1c92b [NFC] Only expose getXXXSize functions in TypeSize
Currently 'TypeSize' exposes two functions that serve the same purpose:
 - getFixedSize / getFixedValue
 - getKnownMinSize / getKnownMinValue

source : bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134
2023-01-06 15:24:52 +00:00
Nikita Popov
a6a526ec54 [IR] Add AllocaInst::getAllocationSize() (NFC)
When fetching allocation sizes, we almost always want to have the
size in bytes, but we were only providing an InBits API. Also add
the corresponding byte-based conjugate to save some *8 and /8
juggling everywhere.
2023-01-06 15:36:16 +01:00
Jay Foad
b7ef63af56 [AMDGPU] Add a feature for VALUTransUseHazard
NFCI. This just allows us to experiment with enabling/disabling the
workaround on different subtargets.

Differential Revision: https://reviews.llvm.org/D141121
2023-01-06 13:50:17 +00:00
Luke Lau
fb6602616c [WebAssembly] Explicitly add {z,s}ext so extends are selected
During DAG legalization, {u,s}itofp instructions on v2i8, v2i16, v4i8
and v4i16 types ended up being legalized into scalar instructions, when
they could just be extended to v2i32/v4i32 instead.

Fixes https://github.com/llvm/llvm-project/issues/57182

Differential Revision: https://reviews.llvm.org/D140916
2023-01-06 12:28:29 +00:00
Ties Stuij
0b066e02a6 [AArch64] add GlobalIsel support for scalar CNT instruction
When feature CSSC is available we should use instruction CNT for s32, s64 and
s128 types in GlobalIsel's G_CTPOP.

spec:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits-

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D139417
2023-01-06 11:08:34 +00:00