We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D134756
Local info is supposed to be emitted in the start of every function.
When there are locals, `.local` section should be present, and we emit
local info according to the section.
If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a `.local` section. If there is no
instruction, `end_function` pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.
The following assembly is malformed because the function `test` doesn't
have an `end_function` at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.
```
.functype test () -> ()
test:
```
This patch adds one extra state to the Wasm assembly parser,
`FunctionLabel` to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.
It is somewhat tricky to distinguish `FunctionLabel` and
`FunctionStart`, because it is not always possible to ensure the state
goes from `FunctionLabel` -> `FunctionStart`. `.functype` directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
`end_function` properly. Also we would like to error out when it does
not.
For example,
```
.functype test() -> ()
test:
```
We should error out for this because we know `test` is a function and it
doesn't end with an `end_function`. This PR fixes this.
```
test:
```
We don't error out for this because there is no info that `test` is a
function, so we don't know whether there should be an `end_function` or
not.
```
test:
.functype test() -> ()
```
We error out for this currently already, because we currently switch to
`FunctionStart` state when we first see `.functype` directive after its
label definition.
Fixes https://github.com/llvm/llvm-project/issues/57427.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D141103
Instead of maintaining a separate valid flag for BaseReg, Use
BaseReg.isValid(). I think this is left over from an older
implementation that maintained a vector of base registers.
The other change is not do a speculative assignment to BaseOffset
that needs to be reverted. Only commit it after we do the check.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D141153
They do not seem to be GFX10-specific anymore. Also renames the
corresponding feature.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D141069
There have been multiple cases where range calculations were wrong
in the 1 bit case. Make sure we catch these by not specifying the
bit width explicitly, and letting the test framework pick it (which
will now always test 1 and 4 bits both).
For a full range input, we would produce an empty range instead
of a full range. The change to the SMin.isNonNegative() branch is
an optimality fix, because we should account for the potentially
discarded SMin value in the IntMinIsPoison case.
Change TestUnaryOpExhaustive to test both 4 and 1 bits, to both
cover this specific case in unit tests, and make sure all other
unary operations deal with 1-bit inputs correctly.
Fixes https://github.com/llvm/llvm-project/issues/59887.
Add a test case where some ops of a reassociate-able expression are in
an earlier block.
This can appear in practice, e.g. when computing the final reduction
value after vectorization.
This is another step in aligning addTypeForStreamingSVE with addTypeForFixedLengthSVE,
which also improves code quality for extending loads and truncating stores.
Reviewed By: hassnaa-arm
Differential Revision: https://reviews.llvm.org/D141266
Preserve the alignment of the original atomicrmw, rather than using
the ABI alignment.
The same problem exists for loads, but that code is being removed
in D141277 anyway.
This is the ARM equivalent of D141119, where we fold `and x, (csel 0, 1, cc)`
to `csel ZR, x, cc` if we know that x is 0/1 and for `or x, (csel 0, 1, cc)`
emit `csinc x, ZR, cc`. The or pattern gets recognized from a cmov under Arm.
Differential Revision: https://reviews.llvm.org/D141137
removeAttribute() already performs a hasAttribute() check, so no
need to also do it in the caller. Instead check whether the
attribute set was changed.
This makes the implementations in line with removeAttributesAtIndex().
If we have `and x, (csel 0, 1, cc)` and we know that x is 0/1, then we
can emit a `csel ZR, x, cc`. Similarly for `or x, (csel 0, 1, cc)` we
can emit `csinc x, ZR, cc`. This can help where we can not otherwise
general ccmp instructions.
Differential Revision: https://reviews.llvm.org/D141119
This is a patch to fix duplicated dbg.values in the JumpThreading pass not
pointing towards their local value, and instead towards the variable in the
original block.
JumpThreadingPass::cloneInstructions is the changed function to target metadata
as well as normal cloned values.
Reviewed By: jmorse, StephenTozer
Differential Revision: https://reviews.llvm.org/D140006
Unlike an anonymous block, it will not be removed even though we've resolved
all valid paths to get here. So removing a PHI can leave vregs with no
definition, violating SSA. Instead, this converts it to an IMPLICIT_DEF.
While demanded bits constant shrinking appears to prevent this in
practice right now, it is principally possible for C2 to have
set bits that are known not-needed (zeroable). See: D140858
`+` will overflow here, `|` will get the right logic.
Differential Revision: https://reviews.llvm.org/D141089
Adds a whitespace in a debug message before printing out a
value in the SSAUpdaterBulk.
Without this, debugging can end up a bit cumbersome.
Differential Revision: https://reviews.llvm.org/D141262
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.
This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.
Patch by Roman Paukner!
Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
This patch modifies SelectionDAG and FastISel to produce DBG_INSTR_REFs with
variadic expressions, and produce DBG_INSTR_REFs for debug values with variadic
location expressions. The former essentially means just prepending
DW_OP_LLVM_arg, 0 to the existing expression. The latter is achieved in
MachineFunction::finalizeDebugInstrRefs and InstrEmitter::EmitDbgInstrRef.
Reviewed By: jmorse, Orlando
Differential Revision: https://reviews.llvm.org/D133929
Simplifies the implementation of `TypeSize` while retaining its interface.
There is no need for abstract concepts like `LinearPolyBase`, `UnivariateLinearPolyBase` or `LinearPolySize`.
Differential Revision: https://reviews.llvm.org/D140263
The current implementation unconditionally appends the system path separator with the filename to the include directory. This is not correct in edge cases however, such as when specifying `/` as include directory (on Unix systems) or just `\` on Windows.
This patch fixes that by using `sys::path::append`, which already has the required logic to correctly implement this.
While this is technically only a change in the `SourceMgr` class, I think the main user of that class, and the include mechanism, is TableGen.
No test attached because no behavioral difference is observable without trying to access the root directory of the users filesystem.
The motivation for this change is a rather funny story, as this actually fixes a performance problem when running `check-mlir` on Windows.
Some tests for `mlir-pdll-lsp-server` lead to adding `\` as include directory in TableGen (which is a valid absolute path on Windows!). Due to the unconditional append, the created filepath would then be of the form `\\<dir>\...` which is also a valid path on Windows, but is a network path. On my machine it'd then attempt to access the network and find a machine with the name `<dir>` and the file there. This call would take several seconds, leading to some tests in `mlir-pdll-lsp-server` taking 2 minutes on my machine.
Running `check-mlir` after this patch reduces the runtime on my machine from 161 seconds to 6 seconds.
Differential Revision: https://reviews.llvm.org/D141220
Snippet is a tiny live interval which has copy or fill like def
and copy or spill like use at the end (any of them might abcent).
Snippet has only one use/def inside interval and interval is located
in one basic block.
When inline spiller spills some reg around uses it also forces the
spilling of connected snippets those which got by splitting the
same original reg and its def is a full copy of our reg or its
last use is a full copy to our reg.
The definition of snippet is extended to allow not only one use/def
but more. However all other uses are statepoint instructions which will
fold fill into its operand. That way we do not introduce new fills/spills.
Reviewed By: qcolombet, dantrushin
Differential Revision: https://reviews.llvm.org/D138093
1. Refactor for costs of sqrt/fabs
2. Add half type support for the cost model of sqrt/fabs
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132908