Identifiers major and minor are often already taken in POSIX systems due
to their presence in <sys/types.h> as part of the makedev library
function.
This causes compilation failures on FreeBSD and Linux systems with glibc
<2.28.
This change renames the identifiers to major_/minor_.
Differential Revision: https://reviews.llvm.org/D156683
Fixes a regression introduced by D75362 for irreducible control
flow. In that case, we may visit the predecessor that renders
the current block live only later, and incorrectly determine
that a block is dead.
Instead, switch to using the same DeadEdges based implementation
we also use during the main InstCombine iteration.
This temporarily regresses some cases that need replacement of
dead phi operands with poison, which is currently only done during
the main run, but not worklist population. This will be addressed
in a followup, to keep it separate from the correctness fix here.
Fixes https://github.com/llvm/llvm-project/issues/64259.
The target should not have to construct MachineIRBuilders during
RegBankSelect (we should perhaps hide the constructors for it). The
pass should own the builder setup with the desired CSE configuration
(although currently the pass does not use the CSE builder, which is
what I want to fix).
https://reviews.llvm.org/D156479
ParseStatus is slightly more convenient to use due to implicit
conversion from bool, which allows to do something like:
```
return Error(L, "msg");
```
when with MatchOperandResultTy it had to be:
```
Error(L, "msg");
return MatchOperand_ParseFail;
```
It also has more appropriate name since parse* methods are not only for
parsing operands.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D154316
`DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated.
Differential Revision: https://reviews.llvm.org/D156684
EmptyTensorElimination is a pre-bufferization transformation that replaces "tensor.empty" ops with "tensor.extract_slice" ops. This revision adds support for cases where the input IR contains "tensor.cast" ops.
Differential Revision: https://reviews.llvm.org/D156167
* Move `foldDynamicIndexList` to `DialectUtils` and simplify function.
* Move `OpWithOffsetSizesAndStridesConstantArgumentFolder` to `ViewLikeInterface` and add documentation.
Differential Revision: https://reviews.llvm.org/D156581
Early exit on intrinsics and don't duplicate indirect call
checks. Also let the IRBuilder constructor figure out the insert point
rather than doing it manually. Also avoid debug print about trying to
simplify calls in more unhandled scenarios.
Its contents are transferred into DeferredDecls in Release(), so it
should be empty in moveLazyEmissionStates(). This matches the code
downstream in Cling.
Differential Revision: https://reviews.llvm.org/D156660
This patch provides support for usage of common block
in private/firstprivate and lastprivate clauses.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D156120
This supports the common block in OpenMP privat clause by making
each common block member host-associated privatization and
adds the test case.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127215
This patch makes the following non-functional changes:
- Extract OpenMP clause processing into a new internal `ClauseProcessor`
class. Atomic and reduction-related clauses processing is kept unchanged,
since atomic clauses are stored in `OmpAtomicClauseList` rather than
`OmpClauseList` and there are many TODO comments related to the current
implementation of reduction lowering. This has been left unchanged to avoid
merge conflicts and work duplication.
- Reorganize functions into sections in the file to improve readability.
- Explicitly use mlir:: namespace everywhere, rather than just most places.
- Spell out uses of `auto` in which the type wasn't explicitly stated as part
of the initialization expression.
- Normalize a few function names to match the rest and renamed variables in
'snake_case' to 'camelCase'.
The main purpose is to reduce code duplication and simplify the implementation
of upcoming work to support loop-attached target constructs and teams/
distribute lowering to MLIR.
Differential Revision: https://reviews.llvm.org/D155981
This patch creates the .td files for the Python bindings of the
transform ops of the MemRef dialect and integrates them into the build
systems (CMake and Bazel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156536
Add a const qualifier to this API call, since this is a member of
MemoryDepChecker and LoopAccessInfo returns an object of this class as a
const, as follows:
const MemoryDepChecker &getDepChecker() const { return *DepChecker; }
If one tries to use function as follows:
LAI->getDepChecker().getMaxSafeDepDistBytes()
results in the following error:
passing ‘const llvm::MemoryDepChecker’ as ‘this’ argument discards
qualifiers
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D156304
If the pre-truncated value was the same width as the extension, and the assertzext guarantees that the extended bits are already zero, then skip the zext/trunc 'zero_extend_inreg' pattern.
Addresses several regressions noticed in D155472
This modifies the switch-statement generation in SelectionDAGBuilder,
specifically the part that generates case clusters of type CC_JumpTable.
A table-based branch of any kind is at risk of being a JOP gadget, if
it doesn't range-check the offset into the table. For some types of
table branch, such as Arm TBB/TBH, the impact of this is limited
because the value loaded from the table is a relative offset of
limited size; for others, such as a MOV PC,Rn computed branch into a
table of further branch instructions, the gadget is fully general.
When compiling for branch-target enforcement via Arm's BTI system,
many of these table branch idioms use branch instructions of types
that do not require a BTI instruction at the branch destination. This
avoids the need to put a BTI at the start of each case handler,
reducing the number of available gadgets //with// BTIs (i.e. ones
which could be used by a JOP attack in spite of the BTI system). But
without a range check, the use of a non-BTI-requiring branch also
opens up a larger range of followup gadgets for an attacker's use.
A defence against this is to avoid optimising away the range check on
the table offset, even if the compiler believes that no out-of-range
value should be able to reach the table branch. (Rationale: that may
be true for values generated legitimately by the program, but not
those generated maliciously by attackers who have already corrupted
the control flow.)
The effect of keeping the range check and branching to an unreachable
block is that no actual code is generated at that block, so it will
typically point at the end of the function. That may still cause some
kind of unpredictable code execution (such as executing data as code,
or falling through to the next function in the code section), but even
if so, there will only be //one// possible invalid branch target,
rather than giving an attacker the choice of many possibilities.
This defence is enabled only when branch target enforcement is in use.
Without branch target enforcement, the range check is easily bypassed
anyway, by branching in to a location just after it. But with
enforcement, the attacker will have to enter the jump table dispatcher
at the initial BTI and then go through the range check. (Or, if they
don't, it's because they //already// have a general BTI-bypassing
gadget.)
Reviewed By: MaskRay, chill
Differential Revision: https://reviews.llvm.org/D155485
InstCombine is a worklist-driven algorithm, which works roughly
as follows:
* All instructions are initially pushed to the worklist.
The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
worklist.
* When the use-count of an instruction decreases, it gets added
back to the worklist.
* And a few of other heuristics on when we should revisit
instructions.
On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.
In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:
"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,
The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.
In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.
This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.
This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.
In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).
Differential Revision: https://reviews.llvm.org/D154579
Add the Python mix-in for MapNestedForallToThreads. Fix typing
annotations in MapForallToBlocks and drop the attribute wrapping
rendered unnecessary by attribute builders.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D156528
LLVM is not set up in a thread-safe way, which seems to be leading to
race conditions when sending stuff to llvm::nulls in opt builds. Try a
thread-local alternative.
Reviewed By: pzread
Differential Revision: https://reviews.llvm.org/D156421
The option is used to force the use of resource intervals
in the machine scheduler, effectively ignoring the value of
`EnableIntervals` in the instance of the `SchedMachineModel`.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D156540
This patch adds additional printing of template argument list when the described function is a template specialization.
This can be useful when handling complex template functions in constexpr evaluator.
Reviewed By: cjdb, dblaikie
Differential Revision: https://reviews.llvm.org/D154366
Reapply after D156401, which stops PatternMatch from recognizing
binop constant expressions, which should avoid the infinite loops
and assertion failures this patch previously exposed.
-----
In preparation for removing support for and/or expressions, mark
them as undesirable. As such, we will no longer implicitly create
such expressions, but they still exist.
This patch allows constant folding of PHIs when estimating the user
bonus. Phi nodes are a special case since some of their inputs may
remain unresolved until all the specialization arguments have been
processed by the InstCostVisitor. Therefore, we keep a list of dead
basic blocks and then lazily visit the Phi nodes once the user bonus
has been computed for all the specialization arguments.
Differential Revision: https://reviews.llvm.org/D154852
At the moment the below header tests fail with the multilib error in LLVM Embedded Toolchain for Arm because there is no corresponding aarch64 big endian library variant exist. Specifying --sysroot to its own testing directory clang/test/Headers/Inputs (which does not have any dependency library) prevents these header tests from being located in standard library directories.
1. clang/test/Headers/arm-neon-header.c
2. clang/test/Headers/arm-fp16-header.c
Reviewed By: michaelplatings
Differential Revision: https://reviews.llvm.org/D156427
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:
- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS
Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154766