165545 Commits

Author SHA1 Message Date
spupyrev
61eb12e1f4 [BOLT] introducing profi params
We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D134756
2023-01-09 12:03:28 -08:00
Philip Reames
3df27e9741 [RISCV] Minor style cleanup in advance of D141311 [nfc] 2023-01-09 11:31:54 -08:00
Heejin Ahn
78f01f69b3 [WebAssembly] Ensure 'end_function' in functions
Local info is supposed to be emitted in the start of every function.
When there are locals, `.local` section should be present, and we emit
local info according to the section.

If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a `.local` section. If there is no
instruction, `end_function` pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.

The following assembly is malformed because the function `test` doesn't
have an `end_function` at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.
```
.functype test () -> ()
test:
```

This patch adds one extra state to the Wasm assembly parser,
`FunctionLabel` to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.

It is somewhat tricky to distinguish `FunctionLabel` and
`FunctionStart`, because it is not always possible to ensure the state
goes from `FunctionLabel` -> `FunctionStart`. `.functype` directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
`end_function` properly. Also we would like to error out when it does
not.

For example,
```
.functype test() -> ()
test:
```
We should error out for this because we know `test` is a function and it
doesn't end with an `end_function`. This PR fixes this.

```
test:
```
We don't error out for this because there is no info that `test` is a
function, so we don't know whether there should be an `end_function` or
not.

```
test:
.functype test() -> ()
```
We error out for this currently already, because we currently switch to
`FunctionStart` state when we first see `.functype` directive after its
label definition.

Fixes https://github.com/llvm/llvm-project/issues/57427.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D141103
2023-01-09 11:09:35 -08:00
Matt Arsenault
925aa514ed AMDGPU: Use DataExtractor for printf string extraction
Attempt 2 to fix big endian bot failures.
2023-01-09 14:03:42 -05:00
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00
Craig Topper
1153313c33 [LocalStackSlotAllocation] Minor simplifications. NFC
Instead of maintaining a separate valid flag for BaseReg, Use
BaseReg.isValid(). I think this is left over from an older
implementation that maintained a vector of base registers.

The other change is not do a speculative assignment to BaseOffset
that needs to be reverted. Only commit it after we do the check.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141153
2023-01-09 09:45:22 -08:00
Sanjay Patel
0eedc9e567 [InstCombine] bitrev (zext i1 X) --> select X, SMinC, 0
https://alive2.llvm.org/ce/z/ZXCtgi

This breaks the infinite combine loop for issue #59897,
but we may still need more changes to avoid those loops.
2023-01-09 12:27:37 -05:00
Ivan Kosarev
2d945ef864 [AMDGPU][NFC] Rename GFX10A16 operands.
They do not seem to be GFX10-specific anymore. Also renames the
corresponding feature.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D141069
2023-01-09 17:18:46 +00:00
Jay Foad
6a6f62a719 [AMDGPU] S_MULK_I32 does not define SCC. NFCI.
Differential Revision: https://reviews.llvm.org/D141281
2023-01-09 15:44:56 +00:00
Nikita Popov
fd07583ca4 [ConstantRange] Fix single bit abs range (PR59887)
For a full range input, we would produce an empty range instead
of a full range. The change to the SMin.isNonNegative() branch is
an optimality fix, because we should account for the potentially
discarded SMin value in the IntMinIsPoison case.

Change TestUnaryOpExhaustive to test both 4 and 1 bits, to both
cover this specific case in unit tests, and make sure all other
unary operations deal with 1-bit inputs correctly.

Fixes https://github.com/llvm/llvm-project/issues/59887.
2023-01-09 16:34:09 +01:00
Sanjay Patel
2dcbd740ee [InstCombine] reduce smul.ov with i1 types to 'and'
https://alive2.llvm.org/ce/z/5tLkW6

There's still a miscompile bug as shown in issue #59876 / D141214 .
2023-01-09 10:27:15 -05:00
Sander de Smalen
8aff167b34 [AArch64][SME] Improve streaming-compatible codegen for extending loads/truncating stores.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.

Reviewed By: hassnaa-arm

Differential Revision: https://reviews.llvm.org/D141266
2023-01-09 15:08:04 +00:00
Nikita Popov
59f91ddf90 [InstCombine] Preserve alignment in atomicrmw -> store fold
Preserve the alignment of the original atomicrmw, rather than using
the ABI alignment.

The same problem exists for loads, but that code is being removed
in D141277 anyway.
2023-01-09 15:37:24 +01:00
Alexey Baturo
35b8bb0ab3 [RISC-V][HWASAN] Don't explicitly load GOT entry to call hwasan mismatch routine
Reviewed by: luismarques

Differential Revision: https://reviews.llvm.org/D132994
2023-01-09 16:46:28 +03:00
David Green
90f24bef47 [ARM] Fold And/Or into CSel if possible
This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm.

Differential Revision: https://reviews.llvm.org/D141137
2023-01-09 13:28:57 +00:00
Jamie Hill-Daniel
6b9317f52a [InstCombine] Fold zero check followed by decrement to usub.sat
Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1).

Differential Revision: https://reviews.llvm.org/D140798
2023-01-09 14:22:25 +01:00
Nikita Popov
9afb6360fc [Attributes] Avoid duplicate hasAttribute() query (NFC)
removeAttribute() already performs a hasAttribute() check, so no
need to also do it in the caller. Instead check whether the
attribute set was changed.

This makes the implementations in line with removeAttributesAtIndex().
2023-01-09 12:59:16 +01:00
David Green
07d6af6a71 [AArch64] Fold And/Or into CSel if possible
If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we
can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we
can emit `csinc x, ZR, cc`. This can help where we can not otherwise
general ccmp instructions.

Differential Revision: https://reviews.llvm.org/D141119
2023-01-09 11:52:37 +00:00
Noah Goldstein
6d839621da [InstCombine] Canonicalize (A & B_Pow2) eq/ne B_Pow2 patterns
1. A & B_Pow2 != B_Pow2 -> A & B_Pow2 == 0
   https://alive2.llvm.org/ce/z/KVUej4

2. A & B_Pow2 == B_Pow2 -> A & B_Pow2 != 0
   https://alive2.llvm.org/ce/z/PVv9FR

This allows the patterns to more easily be analyzed elsewhere.

Differential Revision: https://reviews.llvm.org/D141090
2023-01-09 12:48:28 +01:00
Ben Mudd
1f11d1bd12 [DebugInfo] Fix jump threading failing to update cloned dbg.values
This is a patch to fix duplicated dbg.values in the JumpThreading pass not
pointing towards their local value, and instead towards the variable in the
original block.
JumpThreadingPass::cloneInstructions is the changed function to target metadata
as well as normal cloned values.

Reviewed By: jmorse, StephenTozer

Differential Revision: https://reviews.llvm.org/D140006
2023-01-09 11:42:33 +00:00
Tim Northover
5b24d42106 TailDuplication: do not remove trivial PHIs from addr-taken blocks.
Unlike an anonymous block, it will not be removed even though we've resolved
all valid paths to get here. So removing a PHI can leave vregs with no
definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.
2023-01-09 11:12:33 +00:00
Nikita Popov
b24909c4c7 [Attributes] Avoid repeated attribute set lookup (NFC)
Perform the hasAttribute() check on the AttributeSet we need to
fetch anyway, rather than going through hasAttributeAtIndex().
2023-01-09 11:58:06 +01:00
Noah Goldstein
e6375ca6dc [InstCombine] Fix potentially buggy code in ((%x & C) == 0) --> %x u< (-C) transform
While demanded bits constant shrinking appears to prevent this in
practice right now, it is principally possible for C2 to have
set bits that are known not-needed (zeroable). See: D140858

`+` will overflow here, `|` will get the right logic.

Differential Revision: https://reviews.llvm.org/D141089
2023-01-09 11:44:11 +01:00
Dmitri Gribenko
05d722a11d [llvm] Fix an "unused variable" warning when assertions are disabled 2023-01-09 11:33:05 +01:00
Markus Böck
90b5afeb65 [TableGen][SourceMgr] Fix obvious mistake in D141220
It now tried to open the IncludedFile instead of the Filename, which was not intended.
2023-01-09 11:16:56 +01:00
Sander de Smalen
17a1936122 [AArch64] NFC: Align addTypeForStreamingSVE and addTypeForFixedLengthSVE
This patch is NFC and just moves things around so their implementation is very similar.
2023-01-09 09:47:33 +00:00
Thomas Symalla
6c1cf201be [NFC] Missing whitespace in SSAUpdaterBulk debug output.
Adds a whitespace in a debug message before printing out a
value in the SSAUpdaterBulk.
Without this, debugging can end up a bit cumbersome.

Differential Revision: https://reviews.llvm.org/D141262
2023-01-09 10:15:25 +01:00
Max Kazantsev
957952dbf2 [JumpThreading] Preserve profile metadata during select unfolding
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.

This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.

Patch by Roman Paukner!

Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
2023-01-09 16:14:58 +07:00
Stephen Tozer
da0faa0594 [DebugInfo] Produce variadic DBG_INSTR_REFs from ISel
This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with
variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic
location expressions. The former essentially means just prepending
DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in
MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef.

Reviewed By: jmorse, Orlando

Differential Revision: https://reviews.llvm.org/D133929
2023-01-09 08:58:33 +00:00
Guillaume Chatelet
59b029238a [reland][NFC] Vastly simplifies TypeSize
Simplifies the implementation of `TypeSize` while retaining its interface.
There is no need for abstract concepts like `LinearPolyBase`, `UnivariateLinearPolyBase` or `LinearPolySize`.

Differential Revision: https://reviews.llvm.org/D140263
2023-01-09 08:43:37 +00:00
zhongyunde
9e83333445 [AArch64][SelectionDAG] Eliminates redundant zero-extension for 32-bit popcount
Fix https://github.com/llvm/llvm-project/issues/59597.
mov w8, w0 + fmov d0, x8 ==> fmov s0, w0

Reviewed By: dmgreen, efriedma

Differential Revision: https://reviews.llvm.org/D140649
2023-01-09 16:08:16 +08:00
Markus Böck
15692e7487 [TableGen][SourceMgr] Correctly append filename to include directories
The current implementation unconditionally appends the system path separator with the filename to the include directory. This is not correct in edge cases however, such as when specifying `/` as include directory (on Unix systems) or just `\` on Windows.
This patch fixes that by using `sys::path::append`, which already has the required logic to correctly implement this.

While this is technically only a change in the `SourceMgr` class, I think the main user of that class, and the include mechanism, is TableGen.
No test attached because no behavioral difference is observable without trying to access the root directory of the users filesystem.

The motivation for this change is a rather funny story, as this actually fixes a performance problem when running `check-mlir` on Windows.
Some tests for `mlir-pdll-lsp-server` lead to adding `\` as include directory in TableGen (which is a valid absolute path on Windows!). Due to the unconditional append, the created filepath would then be of the form `\\<dir>\...` which is also a valid path on Windows, but is a network path. On my machine it'd then attempt to access the network and find a machine with the name `<dir>` and the file there. This call would take several seconds, leading to some tests in `mlir-pdll-lsp-server` taking 2 minutes on my machine.

Running `check-mlir` after this patch reduces the runtime on my machine from 161 seconds to 6 seconds.

Differential Revision: https://reviews.llvm.org/D141220
2023-01-09 08:47:09 +01:00
Max Kazantsev
ba7af0bf69 [NFC] Add missing 'static' notion in createReplacement 2023-01-09 14:13:05 +07:00
Serguei Katkov
fd64bd94ed [Inline Spiller] Extend the snippet by statepoint uses
Snippet is a tiny live interval which has copy or fill like def
and copy or spill like use at the end (any of them might abcent).

Snippet has only one use/def inside interval and interval is located
in one basic block.

When inline spiller spills some reg around uses it also forces the
spilling of connected snippets those which got by splitting the
same original reg and its def is a full copy of our reg or its
last use is a full copy to our reg.

The definition of snippet is extended to allow not only one use/def
but more. However all other uses are statepoint instructions which will
fold fill into its operand. That way we do not introduce new fills/spills.

Reviewed By: qcolombet, dantrushin
Differential Revision: https://reviews.llvm.org/D138093
2023-01-09 13:30:57 +07:00
liqinweng
1f8746cc80 [RISCV][CostModel] Add half type support for the cost model of sqrt/fabs
1. Refactor for costs of sqrt/fabs
2. Add half type support for the cost model of sqrt/fabs

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132908
2023-01-09 12:57:03 +08:00
liqinweng
f3408739da [RISCV][CostModel] Add cost model for integer abs
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132999
2023-01-09 11:38:24 +08:00
chenglin.bi
33794cffcf [InstCombine] Fold logic-and/logic-or by distributive laws part2
Follow up https://reviews.llvm.org/D139408, support `and/or+select` patterns
X && Z || Y && Z --> (X || Y) && Z
https://alive2.llvm.org/ce/z/EMCkBG
https://alive2.llvm.org/ce/z/Q-YRvr
https://alive2.llvm.org/ce/z/SFkVQc
https://alive2.llvm.org/ce/z/S9MCuJ
https://alive2.llvm.org/ce/z/KZ7zzz

(X || Z) && (Y || Z) --> (X && Y) || Z
https://alive2.llvm.org/ce/z/Ggpa8-
https://alive2.llvm.org/ce/z/nhQRLY
https://alive2.llvm.org/ce/z/zpmEnq
https://alive2.llvm.org/ce/z/7omsrf
https://alive2.llvm.org/ce/z/CWBzBp

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D139630
2023-01-09 10:21:17 +08:00
Kazu Hirata
e0e48187e6 [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp:1220:13: error:
  unused function 'locStr' [-Werror,-Wunused-function]
2023-01-08 16:31:45 -08:00
Fangrui Song
8f7e674771 [AVR] Support .reloc directive
Reviewed By: benshi001

Differential Revision: https://reviews.llvm.org/D141176
2023-01-08 23:27:21 +00:00
Shilei Tian
acd22b2751 [AAUnderlyingObjects] Introduce an AA for getting underlying objects of a pointer
This patch introduces a new AA `AAUnderlyingObjects`. It is basically like a wrapper
AA of the function `AA::getAssumedUnderlyingObjects`, but it can recursively do
query if the underlying object is an indirect access, such as a phi node or a select
instruction.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D141164
2023-01-08 16:45:50 -05:00
Ayke van Laethem
9592920890
[AVR] Optimize 32-bit shifts: optimize REG_SEQUENCE
This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.

Differential Revision: https://reviews.llvm.org/D140573
2023-01-08 20:05:31 +01:00
Ayke van Laethem
fad5e0cf50
[AVR] Optimize 32-bit shifts: reverse shift + move
This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.

Differential Revision: https://reviews.llvm.org/D140572
2023-01-08 20:05:31 +01:00
Ayke van Laethem
81f5f22f27
[AVR] Optimize 32-bit shifts: shift by 4 bits
This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.

Differential Revision: https://reviews.llvm.org/D140571
2023-01-08 20:05:31 +01:00
Ayke van Laethem
8f8afabd32
[AVR] Optimize 32-bit shift: move bytes around
This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.

Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.

Differential Revision: https://reviews.llvm.org/D140570
2023-01-08 20:05:31 +01:00
Ayke van Laethem
840d10a1d2
[AVR] Custom lower 32-bit shift instructions
32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.

I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.

This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.

The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.

Differential Revision: https://reviews.llvm.org/D140569
2023-01-08 20:05:31 +01:00
Ayke van Laethem
0408b131eb
[SelectionDAG][AVR] Add support for lrint and lround intrinsics
Integer legalization already supported splitting the output integer of
llround and llrint, but did not support this for lround and lrint yet.
This is not a problem for 32-bit architectures, but for 8/16-bit
architectures like AVR it results in a crash like this:

    ExpandIntegerResult #0: t7: i32 = lround t6

    LLVM ERROR: Do not know how to expand the result of this operator!

This patch simply add lrint/lround to the list of ISD opcodes to expand.

Fixes https://github.com/llvm/llvm-project/issues/59573.

Differential Revision: https://reviews.llvm.org/D140822
2023-01-08 18:56:07 +01:00
Ayke van Laethem
167338de96
[AVR] correctly declare __do_copy_data and __do_clear_bss
These two symbols are declared in object files to indicate whether .data
needs to be copied from flash or .bss needs to be cleared. They are
supported on avr-gcc and reduce firmware size a bit, which is especially
important on very small chips.

I checked the behavior of avr-gcc and matched it as well as possible.
From my investigation, it seems to work as follows:

__do_copy_data is set when the compiler finds a data symbol:
  * without a section name
  * with a section name starting with ".data" or ".gnu.linkonce.d"
  * with a section name starting with ".rodata" or ".gnu.linkonce.r" and
    flash and RAM are in the same address space

__do_clear_bss is set when the compiler finds a data symbol:
  * without a section name
  * with a section name that starts with .bss

Simply checking whether the calculated section name starts with ".data",
".rodata" or ".bss" should result in the same behavior.

Fixes: https://github.com/llvm/llvm-project/issues/58857

Differential Revision: https://reviews.llvm.org/D140830
2023-01-08 18:56:06 +01:00
Benjamin Kramer
91487b2481 [X86][Disassembler][NFCI] Read bytes with support::endian::read 2023-01-08 18:19:49 +01:00
Sanjay Patel
21d3871b7c [InstCombine] fold not-shift of signbit to icmp+zext, part 2
Follow-up to:
6c39a3aae1dc

That converted a pattern with ashr directly to icmp+zext, and
this updates the pattern that we used to convert to.

This canonicalizes to icmp for better analysis in the minimum case
and shortens patterns where the source type is not the same as dest type:
https://alive2.llvm.org/ce/z/tpXJ64
https://alive2.llvm.org/ce/z/dQ405O

This requires an adjustment to an icmp transform to avoid infinite looping.
2023-01-08 12:04:09 -05:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00