Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
The default `f16` lowering has some issues that result in incorrect
float behavior, so over time most targets have switched to use
`softPromoteHalfType`. Swap to soft promotion by default and add
overrides for SystemZ and AMDGPU, which are the two remaining backends
that still depend on this behavior.
All basic `f16` op tests now pass on all remaining experimental arches.
Fixes: https://github.com/llvm/llvm-project/issues/97981
Fixes: https://github.com/llvm/llvm-project/issues/97975
On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.
The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97981
The default `half` legalization, which Wasm currently uses, does not
respect IEEE conventions: for example, casting to bits may invoke a lossy
libcall, meaning soft float operations cannot be correctly implemented.
Change to the soft promotion legalization which passes `f16` as an `i16`
and treats each `half` operation as an individual
f16->f32->libcall->f32->f16 sequence.
Of note in the test updates are that `from_bits` and `to_bits` are now
libcall-free, and that chained operations now round back to `f16` after
each step.
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97981
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97975
Fixes: https://github.com/llvm/llvm-project/issues/96437
Fixes: https://github.com/llvm/llvm-project/issues/96438
fixes https://github.com/llvm/llvm-project/issues/98389
As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.
I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
It looks like these were copied from fp16 tests, and forgot to update the
intrinsic types. Also remove some old definitions that are no longer required.
If a landing pad is at the very start of a split section, it has to be
padded by a nop instruction. Otherwise its offset is marked as zero in
the LSDA, which means no landing pad (leading it to be skipped).
LLVM already handles this. If a landing pad is the first machine block
in a section, a nop is inserted to ensure a non-zero offset. However, if
the landing pad is preceeded by an empty block, the nop would be
omitted.
To fix this, this patch adds a field to machine blocks indicating
whether this block contains the first instruction in its section. This
variable is then used to determine whether to emit the padding.
Co-authored-by: Jinjie Huang <huangjinjie@bytedance.com>
This intrinsic emits a BFD_RELOC_NONE relocation at the point of call,
which allows optimizations and languages to explicitly pull in symbols
from static libraries without there being any code or data that has an
effectual relocation against such a symbol.
See issue #146159 for context.
MachineFunctionSplitter was missing a skipFunction() check, causing it
to incorrectly split functions that should be skipped (e.g., functions
with optnone attribute).
This patch adds an early skipFunction() check in runOnMachineFunction()
to ensure these functions are never split, regardless of profile data
availability or other splitting conditions.
Based on top of #157211.
`FNEG` and `FABS` must preserve signalling NaNs, meaning they should not
convert to f32 to perform the operation. Instead legalize to `XOR` and
`AND`.
Fixes almost all of #104915
The refactoring of ComputePTXValueVTs in #154476 caused the complier to
no longer crash when lowering i256 and i96. This has caused a few tests
to unexpectedly pass.
Update these tests and tweak how we emit parameter declarations to
correctly lower these types.
`half` currently uses the default legalization of promoting to a `f32`;
however, this implementation implements math in a way that results in
incorrect rounding. Switch to the soft promote implementation, which
does not have this problem.
The SPARC ABI does not specify a `_Float16` type, so there is no concern
with keeping interface compatibility.
Fixes the SPARC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the SPARC part of
https://github.com/llvm/llvm-project/issues/97981
`f16` is passed and returned in vector registers on both x86 on AArch64,
the same calling convention as `f32`, so it is a straightforward type to
support. The calling convention support already exists, added as part of
a6065f0fa55a ("Arm64EC entry/exit thunks, consolidated. (#79067)").
Thus, add mangling and remove the error in order to make `half` work.
MSVC does not yet support `_Float16`, so for now this will remain an
LLVM-only extension.
Fixes the `f16` portion of
https://github.com/llvm/llvm-project/issues/94434
There are a number of platforms affected by [1]. It is easy enough to
check in a cross-platform way that bitcasts aren't using f16<->f32
libcalls; thus, add a generic test covering most supported
architectures, with an XFAIL for targets that are currently broken. As
they get fixed, this test will fail and can be updated.
[1]: https://github.com/llvm/llvm-project/issues/97981
* `tools/llvm-objcopy/MachO/update-section-object.test` was failing on
Windows since the input file (`macho_sections.s`) might be checked out
with the wrong line ending, resulting in difference in the size of
sections being checked.
* Removed the check for Windows in `AArch64Arm64ECCallLowering`: when
`llc` is run without an explicit target, the module's target triple is
unknown so this assert fires.
* Expect `llvm/test/CodeGen/Generic/allow-check.ll` to fail for Arm64EC:
Global ISel is not supported.
`f128` intrinsic functions from libm sometimes lower to `long double`
library calls when they instead need to be `f128` versions. Add a
generic test demonstrating current behavior.
MSVC always emits minimal CodeView metadata with compiler information,
even when debug info is otherwise disabled. Other tools may rely on this
metadata being present. For example, linkers use it to determine whether
hotpatching is enabled for the object file.
Over in 6a45fce, this flag (experimental-debuginfo-iterators) was
switched to do nothing, to flush out anything that depended on the
debug-intrinsics way of doing things. It's been a month and nothing's
super-broken, so we'll start to rip things out.
This commit deletes MergeFunc's debuginfo-iterators test: in d2942a86d7
it's documented that that test is specifically because of differences
between intrinsic/non-intrinsic data structures, and we're deleting the
possibility of that difference.
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.
We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.
There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.
A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.
Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
After outputting block scalar string, the indent will be wrong.
This patch fixes Padding after block scalar string to ensure the correct
format of yaml.
The new added ut will fail in main.
```diff
@@ -3,4 +3,4 @@
Just a block
scalar doc
-scalar: a
+ scalar: a
...\n
```
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
Adds test case for X86 to check that the output of
@llvm.expect.with.probability's generic lowering is reasonable. This
replaces a generic test which only asserts that llc does not crash.
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site`
before all invoke instruction except the one in the entry block, which
has the effect of bypassing landing pads on exceptions.
When configuring the call site for a potentially throwing instruction
check that it is not `InvokeInst` -- they are handled by earlier code.
Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel,
and IntrinsicLowering in the same way \@llvm.expect is handled, where
the value is passed through as-is. This can be reached if the intrinsic
is used without optimizations, where it would otherwise be properly
transformed out.
Fixes#115411 for SelectionDAG. A similar patch is likely needed for
GlobalISel.
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
- Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
- Parameter count is now calculated based on the unique hash count.
- Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
- Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
- Moved a sorting operation outside of the loop.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This update follows up on change #112671 and is mostly a NFC, with the following exceptions:
- Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required.
- Parameter count is now calculated based on the unique hash count.
- Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size.
- Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time.
- Moved a sorting operation outside of the loop.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basic-block-sections` take precedence over functions with profiles.
SPARC ABI doesn't use stack realignment, so let LLVM know about it in
`SparcFrameLowering`. This has the side effect of making all overaligned
allocations go through `LowerDYNAMIC_STACKALLOC`, so implement the
missing logic there too for overaligned allocations.
This makes the SPARC backend not crash on overaligned `alloca`s and fix
https://github.com/llvm/llvm-project/issues/89569.
Fix bug that `mir-strip-debug` pass does not remove debug location from
bundled instructions.
Problem arises during testing that debug info does not affect
optimization passes output (`llvm-lit` with ` -Dllc="llc
-debugify-and-strip-all-safe"`), when pass operates on MIR with bundled
instructions + memory operands.
Let mir test check looks like:
```
CHECK-NEXT: BUNDLE {
CHECK-NEXT: $r3 = LD $r1, $r2 :: (load (s64) from %ir.a, !tbaa !2)
CHECK-NEXT: }
```
So as `mir-strip-debug` pass does not process bundled instructions,
running `llc -debugify-and-strip-all-safe` on the test will produce the
following output:
```
BUNDLE {
$r3 = LD $r1, $r2, debug-location !DILocation(line: 3, column: 1, scope: <0x608cb2b99b10>) :: (load (s64) from %ir.a, !tbaa !2)
}
```
And test will fail, but it shouldn't.
Seems like the root cause is that `mir-strip-debug` pass should remove
debug location from bundled instructions.
Previously this would fail if the default target enabled the loop
terminator folding pass (currently just RISC-V), as it runs after loop
strength reduction.
I came across the subtly when setting up lit for z/OS and running it on
a Linux on Power machine. Linux on Power is little endian. This was
resulting in all of these tests being run even though the target triple
was z/OS which is big endian. The lit should really be checking if the
target is little endian not the host. The previous way didn't handle
cross compilation while running lit.
extractelement-shuffle.ll: Test for bugfix in DAGCombiner, moved to
Generic.
2010-07-06-DbgCrash.ll and 2006-10-02-BoolRetCrash.ll: Bugfixes in X86,
run tests with X86 backend.
This was first introduced way back in in 2010 by
6c74a872a8d34d41b751efb68e335cbe91b5a5cc, and has little evidence
of use. Only one test attempts to make use of this, but it's
also redundant since it's also using strip to drop debug info anyway
(and that also makes the test buggy, since it's intended to test
with and without debug info).
The other tests using it were only added to test the option after
discovering it was untested and moved, in later commits.