741 Commits

Author SHA1 Message Date
Aiden Grossman
b1778c7d7b
[AsmPrinter] Remove mbb-profile-dump flag (#76595)
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
2024-01-23 16:48:10 -08:00
Min-Yih Hsu
03be448cce
[RISCV][AMDGPU] Mark test/CodeGen/Generic/live-debug-label.ll XFAIL for RISCV and AMDGPU (#77631)
Both RISC-V and AMDGPU(GCN) deploy two VirtRegRewriter in their codegen
pipeline. This test prematurely stops at the first one, which doesn't
cleanup the virtual register map and cause an assertion failure. Ideally
we can solve this by teaching `-stop-after` how to stop at the last
instance of a Pass, but we're just marking XFAIL for these two targets
for now.
2024-01-10 16:47:34 -08:00
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Daniel Hoekwater
def42537ee [NFC][CodeGen][AArch64] Add tests for unconditional branch duplication
c9f3288 introduced unconditional branch deduplication for basic block
sections and machine function splitting, but it didn't add tests for
AArch64 since prior behavior crashed the test.

This change adds tests for AArch64 and has no functional change.
2024-01-05 23:39:01 +00:00
Orlando Cazalet-Hyams
10b03e6662
[RemoveDIs] Handle DPValues in FastISel (#76952)
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.

The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).

Depends on #76941 (otherwise some modified tests spuriously fail).
2024-01-05 15:11:47 +00:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
paperchalice
9bd32d78a9
[CodeGen] Update DwarfEHPreparePass references in CodeGenPassBuilder.h (#74068)
Forgot to update the counterpart in `CodeGenPassBuilder.h`. Also Rename `dwarfehprepare` -> `dwarf-eh-prepare`.
2023-12-11 09:26:01 +08:00
Min-Yih Hsu
0e24179797
[SelectionDAG] Add support to filter SelectionDAG dumps during ISel by function names (#72696)
`-debug-only=isel-dump` is the new debug type for printing SelectionDAG
after each ISel phase. This can be furthered filter by
`-filter-print-funcs=<function names>`.
Note that the existing `-debug-only=isel` will take precedence over the
new behavior and print SelectionDAG dumps of every single function
regardless of `-filter-print-funcs`'s values.
2023-11-20 14:00:47 -08:00
David Sherwood
bdc0afc871
[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166)
There are some workloads that are negatively impacted by using jump
tables when the number of entries is small. The SPEC2017 perlbench
benchmark is one example of this, where increasing the threshold to
around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum
threshold based on empirical evidence rather than science, and just
manually increased the threshold until I got the best performance
without impacting other workloads. For neoverse-v1 I saw around ~0.2%
improvement in the SPEC2017 integer geomean, and no overall change for
neoverse-n1. If we find issues with this threshold later on we can
always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new
threshold. For most of the affected tests I manually set the min number
of entries back to 4 on the RUN line because the tests seem to rely upon
this behaviour.
2023-11-14 13:00:28 +00:00
Nikita Popov
17764d2c87
[IR] Remove FP cast constant expressions (#71408)
Remove support for the fptrunc, fpext, fptoui, fptosi, uitofp and sitofp
constant expressions. All places creating them have been removed
beforehand, so this just removes the APIs and uses of these constant
expressions in tests.

With this, the only remaining FP operation that still has constant
expression support is fcmp.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-07 09:34:16 +01:00
Luke Lau
2e85123bfe
[VP] Check if VP ops with functional intrinsics are speculatable (#69504)
Noticed whilst working on #69494. VP intrinsics whose functional
equivalent is
an intrinsic were being marked as their lanes being non-speculatable,
even if
the underlying intrinsic was speculatable.

This meant that

```llvm
  %1 = call <4 x i32> @llvm.vp.umax(<4 x i32> %x, <4 x i32> %y, <4 x i1> %mask, i32 %evl)
```

would be expanded out to

```llvm
  %.splatinsert = insertelement <4 x i32> poison, i32 %evl, i64 0
  %.splat = shufflevector <4 x i32> %.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
  %1 = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %.splat
  %2 = and <4 x i1> %1, %mask
  %3 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```

instead of

```llvm
  %1 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```

The cause of this was isSafeToSpeculativelyExecuteWithOpcode checking
the
function attributes for the VP instruction itself, not the functional
intrinsic. Since isSafeToSpeculativelyExecuteWithOpcode expects an
already
materialized instruction, we can't use it directly for the intrinsic
case. So
this fixes it by manually checking the function attributes on the
intrinsic.
2023-10-26 13:46:32 +01:00
Mircea Trofin
f179486204
[AsmPrint] Correctly factor function entry count when dumping MBB frequencies (#67826)
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.

We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
2023-09-29 18:06:53 -07:00
Aiden Grossman
3dc2f2618b
[MLGO] Move MBB Profile Dump test to Generic (#66856)
This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic
from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have
anything to do with the ML guided register allocation heuristic.
2023-09-20 11:50:33 -07:00
David Spickett
69f1cd58aa [llvm][AArch64] Disable BigByval with expensive checks
AArch64 incorrectly nests ADJCALLSTACKDOWN/ADJCALLSTACKUP which fails
to verify with expensive checks enabled.

See https://github.com/llvm/llvm-project/issues/62137 and
https://github.com/llvm/llvm-project/issues/62138.
2023-08-31 10:15:45 +00:00
Daniel Hoekwater
0982d96186 [CodeGen][AArch64] Don't split inline asm goto blocks or their targets
Machine function splitting + branch relaxation currently don't properly
handle inline asm goto blocks that conditional branch to cold goto
labels. While such inline asm is technically invalid, machine
function splitting is the only thing that exposes it as such.

Since machine function splitting doesn't help too much in these
circumstances anyway, disable it for asm goto blocks and their targets.

Differential Revision: https://reviews.llvm.org/D158647
2023-08-29 20:24:38 +00:00
Daniel Hoekwater
ef1c25eb50 [CodeGen][AArch64] Don't split jump table basic blocks
Jump tables on AArch64 are label-relative rather than table-relative, so
having jump table destinations that are in different sections causes
problems with relocation. Jump table lookups have a max range of 1MB, so
all destinations must be in the same section as the lookup code. Both of
these restrictions can be mitigated with some careful and complex logic,
but doing so doesn't gain a huge performance benefit.

Efficiently ensuring jump tables are correct and can be compressed on
AArch64 is a TODO item. In the meantime, don't split blocks that can
cause problems.

Differential Revision: https://reviews.llvm.org/D157124
2023-08-28 21:47:57 +00:00
Snehasish Kumar
3dbabeadd6 [CodeGen] Remove unused option in MachineFunctionSplitter.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158885
2023-08-25 21:24:28 +00:00
Daniel Hoekwater
8c249c44d4 [CodeGen][AArch64] Don't split functions with a red zone on AArch64
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.

Differential Revision: https://reviews.llvm.org/D157127
2023-08-24 21:57:35 +00:00
Daniel Hoekwater
c9f328844d Reland "[CodeGen] Fix unconditional branch duplication issue in bbsections"
Reverted in 4c8d056f50342d5401f5930ed60e5e48b211c3fb because it broke
buildbot `llvm-clang-x86_64-expensive-checks-debian` due to the AArch64
test generating invalid code. The issue still exists, but it's fixed in
D156767, so the AArch64 test should be added there.

Differential Revision: https://reviews.llvm.org/D158674
2023-08-24 21:27:55 +00:00
Daniel Hoekwater
4c8d056f50 Revert "[CodeGen] Fix unconditional branch duplication issue in bbsections"
This reverts commit 994eb5adc40cd001d82d0f95d18d1827b57e496c.
Breaks buildbot `llvm-clang-x86_64-expensive-checks-debian`
https://lab.llvm.org/buildbot/#/builders/16/builds/53620
2023-08-24 16:59:17 +00:00
Daniel Hoekwater
994eb5adc4 [CodeGen] Fix unconditional branch duplication issue in bbsections
If an end section basic block ends in an unconditional branch to its
fallthrough, BasicBlockSections will duplicate the unconditional branch.
This doesn't break x86, but it is a (slight) size optimization and more
importantly prevents AArch64 builds from breaking.

Ex:
```
bb1 (bbsections Hot):
  jmp bb2

bb2 (bbsections Cold):
  /* do work... */
```

After running sortBasicBlocksAndUpdateBranches():
```
bb1 (bbsections Hot):
  jmp bb2
  jmp bb2

bb2 (bbsections Cold):
  /* do work... */
```

Differential Revision: https://reviews.llvm.org/D158674
2023-08-24 16:22:55 +00:00
Daniel Hoekwater
90ab85a1b2 Reland "[CodeGen][AArch64] Make MFS testable on AArch64"
Reverted by 3d22dac6c3b97d7bb92f243886dfb0d32a5c42e9 because it depended
on b9d079d6188b50730e0a67267b7fee36008435ce, which broke some tests.
2023-08-22 20:21:33 +00:00
Fangrui Song
77596e6b16 Revert D157750 "[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation."
This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754.
This reverts commit 30c4b97aec60895a6905816670f493cdd1d7c546.

See post-commit discussions on https://reviews.llvm.org/D157750 that
we should use a different mechanism to handle the error with --cuda-gpu-arch=

The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in
clang/test/Driver, and the following driver behavior (downgrading error
to warning) changes are undesired.
```
% clang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]
```
2023-08-21 13:54:15 -07:00
Nico Weber
3d22dac6c3 Revert "[clang][test] Refine clang machine-function-split tests."
This reverts commit b9d079d6188b50730e0a67267b7fee36008435ce.
Breaks tests on Windows, see https://reviews.llvm.org/D157565#4600939
2023-08-20 10:38:29 -04:00
Han Shen
b9d079d618 [clang][test] Refine clang machine-function-split tests.
This CL includes two changes:
1. moved clang backend-warnings test cases from Driver/ to CodeGen/.
2. removed multiple `cd "$(dirname "%t")"` and replaced with `-o %t`.

Reviewed By: maskray (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D157565
2023-08-18 18:05:47 -07:00
Jonas Hahnfeld
eeac4321c5 Disable two tests without {arm,aarch64}-registered-target 2023-08-17 10:04:38 +02:00
Han Shen
317a0fe5bd [Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation.
When building a fatbinary, the driver invokes the compiler multiple
times with different "--target". (For example, with "-x cuda
--cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with
--target=x86_64_...., once with --target=sm_70) If we use
-fsplit-machine-functions or -fno-split-machine-functions for such
invocation, the driver reports an error.

This CL changes the behavior so:

  - "-fsplit-machine-functions" is now passed to all targets, for non-X86
    targets, the flag is a NOOP and causes a warning.

  - "-fno-split-machine-functions" now negates -fsplit-machine-functions (if
    -fno-split-machine-functions appears after any -fsplit-machine-functions)
    for any target triple, previously, it causes an error.

  - "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions"
    enables MFS on host but disables MFS for GPUS without warnings/errors.

  - "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables
    MFS for GPUS without warnings/errors.

Reviewed by: xur, dhoekwater

Differential Revision: https://reviews.llvm.org/D157750
2023-08-16 23:41:34 -07:00
Daniel Hoekwater
2c43d591c6 [CodeGen] Move function splitting tests from X86 to Generic (NFC)
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.

MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.

Differential Revision: https://reviews.llvm.org/D157563
2023-08-16 18:11:23 +00:00
Daniel Hoekwater
e8540723b3 Revert "[CodeGen] Move function splitting tests from X86 to Generic (NFC)"
This reverts commit 1670e0ea076b12b3fcea2ea63f01bc09e0e7f3b2.
Causes https://lab.llvm.org/buildbot/#/builders/188/builds/33943
2023-08-16 01:46:35 +00:00
Daniel Hoekwater
1670e0ea07 [CodeGen] Move function splitting tests from X86 to Generic (NFC)
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.

MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.

Differential Revision: https://reviews.llvm.org/D157563
2023-08-16 01:25:54 +00:00
Vladislav Dzhidzhoev
06a0ae6524 Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
Got rid of non-determinism in MetadataLoader.cpp.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>

Differential Revision: https://reviews.llvm.org/D144004
2023-06-16 00:49:59 +02:00
Vladislav Dzhidzhoev
b8ea03a4be Revert "Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)""
This reverts commit fcc3981626821addc6c77b98006d02030b8ceb7f,
since Bitcode-upgrading code doesn't seem to be deterministic.
2023-06-15 19:36:36 +02:00
Vladislav Dzhidzhoev
fcc3981626 Reland "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
Run split-dwarf-local-impor3.ll only on x86_64-linux.
2023-06-15 18:15:16 +02:00
Vladislav Dzhidzhoev
fbdeb8cbc1 Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
This reverts commit d80fdc6fc1a6e717af1bcd7a7313e65de433ba85.
split-dwarf-local-impor3.ll fails because of an issue with
Dwo sections emission on Windows platform.
2023-06-15 18:04:32 +02:00
Vladislav Dzhidzhoev
d80fdc6fc1 [DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Fixed PR51501 (tests from D112337).

1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
   entities together with local variables and labels (this patch cares about
   function-local import while D144006 and D144008 use the same approach for
   local types and static variables). So, effectively this patch moves ownership
   of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
   'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
   is considered unsupported (DwarfDebug would assert on such debug metadata).

   DICompileUnit's 'imports' field is supposed to track global imported
   declarations as it does before.

   This addresses various FIXMEs and simplifies the next part of the patch.

2. Postpone emission of function-local imported entities from
   `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
   While in `DwarfDebug::endFunctionImpl()` we do not have all the
   information about a parent subprogram or a referring subprogram
   (whether a subprogram inlined or not), so we can't guarantee we emit
   an imported entity correctly and place it in a proper subprogram tree.
   So now, we just gather needed details about the import itself and its
   parent entity (either a Subprogram or a LexicalBlock) during
   processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
   done in `DwarfDebug::endModule()` when we have all the required
   information to make proper emission.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>

Differential Revision: https://reviews.llvm.org/D144004
2023-06-15 17:17:53 +02:00
Vladislav Dzhidzhoev
77f8f40cd4 Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
This reverts commit ed578f02cf44a52adde16647150e7421f3ef70f3.

Tests llvm/test/DebugInfo/Generic/split-dwarf-local-import*.ll fail
when x86_64 target is not registered.
2023-06-15 16:53:36 +02:00
Vladislav Dzhidzhoev
ed578f02cf [DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Fixed PR51501 (tests from D112337).

1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
   entities together with local variables and labels (this patch cares about
   function-local import while D144006 and D144008 use the same approach for
   local types and static variables). So, effectively this patch moves ownership
   of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
   'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
   is considered unsupported (DwarfDebug would assert on such debug metadata).

   DICompileUnit's 'imports' field is supposed to track global imported
   declarations as it does before.

   This addresses various FIXMEs and simplifies the next part of the patch.

2. Postpone emission of function-local imported entities from
   `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
   While in `DwarfDebug::endFunctionImpl()` we do not have all the
   information about a parent subprogram or a referring subprogram
   (whether a subprogram inlined or not), so we can't guarantee we emit
   an imported entity correctly and place it in a proper subprogram tree.
   So now, we just gather needed details about the import itself and its
   parent entity (either a Subprogram or a LexicalBlock) during
   processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
   done in `DwarfDebug::endModule()` when we have all the required
   information to make proper emission.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>

Differential Revision: https://reviews.llvm.org/D144004
2023-06-15 16:15:39 +02:00
Vladislav Dzhidzhoev
a7e7d34dc1 Revert "[DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)"
This reverts commit d04452d54829cd7af5b43d670325ffa755ab0030 since
test llvm-project/llvm/test/Bitcode/DIImportedEntity_backward.ll is broken.
2023-06-15 14:35:54 +02:00
Vladislav Dzhidzhoev
d04452d548 [DebugMetadata][DwarfDebug] Fix DWARF emisson of function-local imported entities (3/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Fixed PR51501 (tests from D112337).

1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
   entities together with local variables and labels (this patch cares about
   function-local import while D144006 and D144008 use the same approach for
   local types and static variables). So, effectively this patch moves ownership
   of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
   'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
   is considered unsupported (DwarfDebug would assert on such debug metadata).

   DICompileUnit's 'imports' field is supposed to track global imported
   declarations as it does before.

   This addresses various FIXMEs and simplifies the next part of the patch.

2. Postpone emission of function-local imported entities from
   `DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
   While in `DwarfDebug::endFunctionImpl()` we do not have all the
   information about a parent subprogram or a referring subprogram
   (whether a subprogram inlined or not), so we can't guarantee we emit
   an imported entity correctly and place it in a proper subprogram tree.
   So now, we just gather needed details about the import itself and its
   parent entity (either a Subprogram or a LexicalBlock) during
   processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
   done in `DwarfDebug::endModule()` when we have all the required
   information to make proper emission.

Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>

Differential Revision: https://reviews.llvm.org/D144004
2023-06-15 14:29:03 +02:00
Matt Arsenault
bc37be1855 LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
2023-04-29 08:44:59 -04:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Simon Pilgrim
aa754f7e0f [IR] llvm::createMinMaxOp - create integer min/max intrinsics instead of icmp/sel
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.

This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.

Differential Revision: https://reviews.llvm.org/D148221
2023-04-13 16:40:43 +01:00
Jay Foad
d4af896572 [PowerPC] Fix UNSUPPORTED syntax in addr-label.ll 2023-03-31 16:47:35 +01:00
Stefan Pintilie
471f1664f3 [NFC][PowerPC] Marked the addr-label.ll test unsupported on PowerPC.
The addr-label.ll test uses the following setup:

define ptr @test1() nounwind {
entry:
	ret ptr blockaddress(@test1b, %test_label)
}

define i32 @test1b() nounwind {
entry:
	ret i32 -1
test_label:
	br label %ret
ret:
	ret i32 -1
}

However, according to the LLVM Reference guide for blockaddress()
"This value only has defined behavior when used as an operand to the
‘indirectbr’ or for comparisons against null." For this test the value
is just returned as a pointer from test1().

On PowerPC this test has unreliable results as the order in which
passes are run can make this test pass or fail. If the %test_label
in test1b() is removed before a number of passes are completed on
test1() then this test will fail on PowerPC.

I have marked this test as UNSUPPORTED on PowerPC.
2023-03-31 10:41:23 -04:00
Momchil Velikov
99e57f06c4 [CodeGenPrepare] Increase the limit on the number of instructions to scan
... when finding all memory uses for an address and make it a
parameter.

Now that we have avoided potentially exponential run time of
`FindAllMemoryUses` in D143893. it'd be beneficial to increase the
limit up from 20.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143894

Change-Id: I3abdf40332ef65e9b2f819ac32ac60e4200ec51d
2023-03-30 14:38:22 +01:00
Momchil Velikov
2453da0a4e [CodeGenPrepare] Fix counting uses when folding addresses into memory instructions
The counter of the number of instructions seen in `FindAllMemoryUses`
is reset after returning from a recursive invocation of
`FindAllMemoryUses` to the value it had before the call. In effect,
depending on the shape of the uses graph, the function may scan up to
`2^N-1` instructions where `N` is the scan limit
(`MaxMemoryUsesToScan`).  This does not look intuitive or intended.

This patch changes the counting to just count the scanned
instructions, independent of the shape of the references.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143893

Change-Id: I99f5de55e84843cf2fbea287d6ae4312fa196240
2023-03-30 14:18:14 +01:00
Momchil Velikov
5c9a26238a [CodeGenPrepare][NFC] Pre-commit test for memory use count fix
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D145705
2023-03-23 17:55:05 +00:00
Momchil Velikov
6a2a5f08de [CodeGenPrepare] Don't give up if unable to sink first arg to a cold call
Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D143892
2023-03-23 17:31:09 +00:00
Tim Northover
e4b352a0b9 Revert "DwarfEHPrepare: insert extra unwind paths for stack protector to instrument"
It's caused more failures than are trivially fixable.

This reverts commit 203b6f31bb71ce63488eb96b303e000e91aee376.
2023-03-16 11:55:53 +00:00