This removes -O1 from the SVE ACLE intrinsics tests and replaces it with
-O0 and "opt -mem2reg -instcombine -tailcallelim". Instrcombine and
TailCallElim are only added to keep the differences smaller and can be
removed in a followup patches. The only remaining differences in the
tests are tbaa nodes not being emitted under -O0, and the removable of
some tailcall flags.
Add zihintntl compressed instructions and some files related to zihintntl.
This patch is base on {D121670}.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D121779
This drops the artificial requirement of producers having a single
result value to be able to fuse with consumers.
The current default also only fuses producer with consumer when the
producer has a single use. This is a simplifying assumption. There are
legitimate use cases where a producer can be fused with consumer and
the fused o pcould be used to replace the uses of the producer as
well. This needs to be done with care to avoid use-def violations. To
allow for downstream users to explore more fusion opportunities, the
core transformation method is exposed as a utility function.
This patch also modifies the control function to take just the fused
operand as the argument. This is enough information for the callers to
get the producer and the consumer operations being considered to
fuse. It also provides information of which producer result is used.
Differential Revision: https://reviews.llvm.org/D132301
Summary: Currently, an error was reported when a thread local symbol has an invalid name. D100956 create a new symbol to prefix the TLS symbol name with a dot. When the symbol name is renamed, the error occurs. This patch uses the original symbol name (name in the symbol table) as the input for the symbol for TOC entry.
Reviewed By: shchenz, lkail
Differential Revision: https://reviews.llvm.org/D132348
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.
This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.
.mbb:
foo
j .dest_bb
bar
bar
bar
.dest_bb:
baz
The above code will be relaxed to the following code.
.mbb:
foo
sd s11, 0(sp)
jump .restore_bb, s11
bar
bar
bar
j .dest_bb
.restore_bb:
ld s11, 0(sp)
.dest_bb:
baz
Depends on D129999.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D130560
If a MachineInstr's operand should be Reg in compiler's output but is
currently FrameIndex, `isCompressibleInst()` will terminate at
`MachineOperandType::getReg()`.
This patch adds `.isReg()` checks to make `isCompressibleInst()` return
false for these MachineInstr, allowing `getInstSizeInBytes()` to return
a value and `EstimateFunctionSizeInBytes()` to work as intended.
See https://reviews.llvm.org/D129999#3694222 for details.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D129999
The array size specification of the an alloca can be any integer,
so zext or trunc it to intptr before attempting to multiply it
with an intptr constant.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131846
We were passing the type of `Val` to `getShadowOriginPtr`, rather
than the type of `Val`'s shadow resulting in broken IR. The fix
is simple.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131845
If `LLVM_BUILD_MAIN_SRC_DIR` is not defined, just assume we are in
regular monorepo layout. Non-standard (and not really supported) layouts
can still be configured manually.
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D132314
extractShiftForRotate may fail to return canonicalized shifts due to constant folding or other simplification that can occur in getNode()
Fixes Issue #57283
This change came in a few hours ago and introduced a warning. The fix
is trivial, so I'm providing it. The original change was reviewed here:
https://reviews.llvm.org/D132331
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
255 to 233 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132104
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
295 to 255 LoC.
Differential Revision: https://reviews.llvm.org/D132101
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
338 to 295 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132100
Move the large lambda out of BinaryFunction::disassemble, reducing its size from
377 to 338 LoC.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D132099
We're seeing the following warnings with --rtlib=compiler-rt:
lld-link: warning: ignoring unknown argument '--as-needed'
lld-link: warning: ignoring unknown argument '-lunwind'
lld-link: warning: ignoring unknown argument '--no-as-needed'
MSVC doesn't use the unwind library, so just omit it.
Differential Revision: https://reviews.llvm.org/D132440
Some tests still pass even though we don't claim big-endian support. Using
UNSUPPORTED is a better indicator than XFAIL that we don't guarantee that
the tests work.
Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren't
fully baked.
Differential Revision: https://reviews.llvm.org/D132498
This moves some parsing functionality from BytecodeReader to
AttrTypeReader, and removes some duplication between the attribute/type
code paths.
Differential Revision: https://reviews.llvm.org/D132497
This extracts the string section writer and reader into dedicated
classes, which better separates the logic and will also simplify future
patches that want to interact with the string section.
Differential Revision: https://reviews.llvm.org/D132496
The existing code resulted in the max size and access counts being equal
to the min. Compute the max instead (max lifetime was already correct).
Differential Revision: https://reviews.llvm.org/D132515
The constructors of non-POD array elements are evaluated under
certain conditions. This patch makes sure that in such cases
we also evaluate the destructors.
Differential Revision: https://reviews.llvm.org/D130737
Keep the original data type of integer do-variables
for structured loops. When do-variable's data type
is an integer type shorter than IndexType, processing
the do-variable separately from the DoLoop's iteration index
allows getting rid of type casts, which can make backend
optimizations easier.
For example,
```
do i = 2, n-1
do j = 2, n-1
... = a(j-1, i)
end do
end do
```
If value of 'j' is computed by casting the DoLoop's iteration
index to 'i32', then Flang will produce the following LLVM IR:
```
%1 = trunc i64 %iter_index to i32
%2 = sub i32 %1, 1
%3 = sext i32 %2 to i64
```
LLVM's InstCombine may try to get rid of the sign extension,
and may transform this into:
```
%1 = shl i64 %iter_index, 32
%2 = add i64 %1, -4294967296
%3 = ashr exact i64 %2, 32
```
The extra computations for the element address applied on top
of this awkward pattern confuse LLVM vectorizer so that
it does not recognize the unit-strided access of 'a'.
Measured performance improvements on `SPEC CPU2000@IceLake`:
```
168.wupwise: 11.96%
171.swim: 11.22%
172.mrgid: 56.38%
178.galgel: 7.29%
301.apsi: 8.32%
```
Differential Revision: https://reviews.llvm.org/D132176
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
Adjust the legality check to avoid the poor codegen on AArch64.
We probably only want to use popcount on this pattern when it
is a single instruction.
fixes#57225
Differential Revision: https://reviews.llvm.org/D132237
The existing tests were added with 2880d7b9e4c9a0, but
discussion in D132412 suggests that we should start
with a simpler pattern (the more complicated pattern
may not be a real problem).
We build libcxx using LLVM_ENABLE_RUNTIMES during Stage2, which requires
cxx-headers to be part of LLVM_RUNTIME_DISTRIBUTION_COMPONENTS instead
of LLVM_DISTRIBUTION_COMPONENTS.
rdar://99028431
Differential Revision: https://reviews.llvm.org/D132488
"this" parameter of lambda if undef, notnull and differentiable.
So we need to pass something consistent.
Any alloca will work. It will be eliminated as unused later by optimizer.
Otherwise we generate code which Msan is expected to catch.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D132275
For mingw target, if a symbol is not marked DSO local, a `.refptr` is
generated for it. This makes CFG check calls use an extra pointer
dereference, which adds extra overhead compared to the MSVC version,
so mark the CFG guard check funciton pointer DSO local to stop it.
This should have no effect on MSVC target.
Also adapt the existing cfguard tests to run for mingw targets, so that
this change is checked.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D132331
To support using Control Flow Guard with mingw-w64, Clang needs to
accept `__declspec(guard(nocf))` also for the GNU target. Since mingw
has `#define __declspec(a) __attribute__((a))` as built-in, the simplest
solution is to accept `__attribute__((guard(nocf)))` to be compatible with
MSVC and Clang's msvc target.
As a side effect, this also adds `[[clang::guard(nocf)]]` for C++.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D132302