527508 Commits

Author SHA1 Message Date
Piotr Fusik
0078e8f450
[RISCV][NFC] Fix a warning (#127090) 2025-02-13 21:42:36 +01:00
Aiden Grossman
161f64a4c1
[Github][CI] Hashpin actions dependencies (#127011)
This patch has pins several actions dependencies in the premerge
workflow and the Windows/Linux container build workflows to help improve
security in the unlikely event that someone tries to pull off a supply
chain security attack by modifying release asserts for these actions.
2025-02-13 12:16:42 -08:00
Ellis Hoag
83632c039d
[lld][BP] Order .Tgm symbols for startup (#126328)
The Global Function Merger
(https://discourse.llvm.org/t/rfc-global-function-merging/82608) pass
optimistically creates merged instances of functions and suffixes their
names with `.Tgm`. Then in the linker, ICF will (hopefully) fold these
`.Tgm` functions. For example, a function `foo` might become a thunk
`foo` that calls a merged function `foo.Tgm`.

Since IRPGO runs before the global merger, we will only have a profile
for `foo`. We want to correlate this profile to both `foo` and `foo.Tgm`
so they can both be ordered to improve startup time.

I built a large binary and found that it increased the number of
functions ordered for startup, as expected.
```
Functions for startup: 12049 -> 12697
Functions for compression: 34733 -> 34707
```

The reason why we don't see a larger improvement is because there are
some cases where the code was accidentally working:
`getRootSymbol("foo.llvm.5555.Tgm")` already returns `foo`.
2025-02-13 12:10:58 -08:00
LLVM GN Syncbot
b5aa1c4783 [gn build] Port 63c1be724924 2025-02-13 19:42:26 +00:00
Florian Hahn
65640c1d4c
[AssumeBundles] Dereferenceable used in bundle only applies at assume. (#126117)
Update LangRef and code using `Dereferenceable` in assume bundles to
only use the information if it is safe at the point of use.

`Dereferenceable` in an assume bundle is only guaranteed at the point of
the assumption, but may not be guaranteed at later points, because the
pointer may have been freed.

Update code using `Dereferenceable` to only use it if the pointer cannot
be freed. This can further be refined to check if the pointer could be
freed between assume and use.

This follows up on https://github.com/llvm/llvm-project/pull/123196.

With that change, it should be safe to expose dereferenceable
assumptions more widely as in
https://github.com/llvm/llvm-project/pull/121789

PR: https://github.com/llvm/llvm-project/pull/126117
2025-02-13 20:41:23 +01:00
LU-JOHN
5decab178f
AMDGPU: Reduce shl64 to shl32 if shift range is [63-32] (#125574)
Reduce:

   DST = shl i64 X, Y

where Y is in the range [63-32] to:

   DST = [0, shl i32 X, (Y & 32)]


Alive2 analysis:

https://alive2.llvm.org/ce/z/w_u5je

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-02-13 13:40:25 -06:00
Petr Hosek
2bdeeaa185
[libc] Use __builtin_elementwise_fma instead of __builtin_fma (#126288)
__builtin_elementwise_fma doesn't consider errno and is thus more
suitable for libc fma implementation.
2025-02-13 11:40:04 -08:00
Paul Kirth
63c1be7249
[llvm][fatlto] Add FatLTOCleanup pass (#125911)
When using FatLTO, it is common to want to enable certain types of whole
program optimizations (WPD) or security transforms (CFI), so that they
can be made available when performing LTO. However, these transforms
should not be used when compiling the non-LTO object code. Since the
frontend must emit different IR, we cannot simply clone the module and
optimize the LTO section and non-LTO section differently to work around
this. Instead, we need to remove any problematic instruction sequences.

This patch adds a new pass whose responsibility is to clean up the IR
in the FatLTO pipeline after creating the bitcode section, which is
after running the pre-link pipeline but before running module
optimization. This allows us to safely drop any conflicting instructions
or IR constructs that are inappropriate for non-LTO compilation.
2025-02-13 11:39:02 -08:00
Alex MacLean
ecdfa36eca
Reland "[NVPTX] Cleanup/Refactoring in NVPTX AsmPrinter and RegisterInfo (NFC)" (#127089) 2025-02-13 11:35:31 -08:00
Jason Molenda
b666ac3b63
[lldb] Change lldb's breakpoint handling behavior, reland (#126988)
lldb today has two rules: When a thread stops at a BreakpointSite, we
set the thread's StopReason to be "breakpoint hit" (regardless if we've
actually hit the breakpoint, or if we've merely stopped *at* the
breakpoint instruction/point and haven't tripped it yet). And second,
when resuming a process, any thread sitting at a BreakpointSite is
silently stepped over the BreakpointSite -- because we've already
flagged the breakpoint hit when we stopped there originally.

In this patch, I change lldb to only set a thread's stop reason to
breakpoint-hit when we've actually executed the instruction/triggered
the breakpoint. When we resume, we only silently step past a
BreakpointSite that we've registered as hit. We preserve this state
across inferior function calls that the user may do while stopped, etc.

Also, when a user adds a new breakpoint at $pc while stopped, or changes
$pc to be the address of a BreakpointSite, we will silently step past
that breakpoint when the process resumes. This is purely a UX call, I
don't think there's any person who wants to set a breakpoint at $pc and
then hit it immediately on resuming.

One non-intuitive UX from this change, butt is necessary: If you're
stopped at a BreakpointSite that has not yet executed, you `stepi`, you
will hit the breakpoint and the pc will not yet advance. This thread has
not completed its stepi, and the ThreadPlanStepInstruction is still on
the stack. If you then `continue` the thread, lldb will now stop and
say, "instruction step completed", one instruction past the
BreakpointSite. You can continue a second time to resume execution.

The bugs driving this change are all from lldb dropping the real stop
reason for a thread and setting it to breakpoint-hit when that was not
the case. Jim hit one where we have an aarch64 watchpoint that triggers
one instruction before a BreakpointSite. On this arch we are notified of
the watchpoint hit after the instruction has been unrolled -- we disable
the watchpoint, instruction step, re-enable the watchpoint and collect
the new value. But now we're on a BreakpointSite so the watchpoint-hit
stop reason is lost.

Another was reported by ZequanWu in
https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we
attach to/launch a process with the pc at a BreakpointSite and
misbehave. Caroline Tice mentioned it is also a problem they've had with
putting a breakpoint on _dl_debug_state.

The change to each Process plugin that does execution control is that

1. If we've stopped at a BreakpointSite that has not been executed yet,
we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that.
When the thread resumes, if the pc is still at the same site, we will
continue, hit the breakpoint, and stop again.

2. When we've actually hit a breakpoint (enabled for this thread or
not), the Process plugin should call
Thread::SetThreadHitBreakpointSite(). When we go to resume the thread,
we will push a step-over-breakpoint ThreadPlan before resuming.

The biggest set of changes is to StopInfoMachException where we
translate a Mach Exception into a stop reason. The Mach exception codes
differ in a few places depending on the target (unambiguously), and I
didn't want to duplicate the new code for each target so I've tested
what mach exceptions we get for each action on each target, and
reorganized StopInfoMachException::CreateStopReasonWithMachException to
document these possible values, and handle them without specializing
based on the target arch.

I first landed this patch in July 2024 via
https://github.com/llvm/llvm-project/pull/96260

but the CI bots and wider testing found a number of test case failures
that needed to be updated, I reverted it. I've fixed all of those issues
in separate PRs and this change should run cleanly on all the CI bots
now.

rdar://123942164
2025-02-13 11:30:10 -08:00
Philip Reames
72f4e656b8 [RISCV] Revise interface of isLegalBitRotate [nfc]
Remove a dead parameter (DAG), and replace the ShuffleVectorSDNode param
with the two things we need from the shuffle (mask and VT).  There's
further room to improve this code, but this gets me what I need for an
upcoming patch.
2025-02-13 11:17:54 -08:00
alx32
4ac79a8c98
[lld-macho] Use Symbols as branch target for safe_thunks ICF (#126835)
## Problem

The `safe_thunks` ICF optimization in `lld-macho` was creating thunks
that pointed to `InputSection`s instead of `Symbol`s. While, generally,
branch relocations can point to symbols or input sections, in this case
we need them to point to symbols as subsequently the branch extension
algorithm expects branches to always point to `Symbol`'s.

## Solution
This patch changes the ICF implementation so that safe thunks point to
`Symbol`'s rather than `InputSection`s.

## Testing
The existing `arm64-thunks.s` test is modified to include
`--icf=safe_thunks` to explicitly verify the interaction between ICF and
branch range extension thunks. Two functions were added that will be
merged together via a thunk. Before this patch, this test would generate
an assert - now this scenario is correctly handled.
2025-02-13 11:07:12 -08:00
John Harrison
c2e96778e0
[lldb-dap] Ensure we do not print the close sentinel when closing stdout. (#126833)
If you have an lldb-dap log file you'll almost always see a final
message like:

```
<-- 
Content-Length: 94

{
  "body": {
    "category": "stdout",
    "output": "\u0000\u0000"
  },
  "event": "output",
  "seq": 0,
  "type": "event"
}
<-- 
Content-Length: 94

{
  "body": {
    "category": "stderr",
    "output": "\u0000\u0000"
  },
  "event": "output",
  "seq": 0,
  "type": "event"
}
```

The OutputRedirect is always writing the `"\0"` byte as a final stdout
message during shutdown. Instead, I adjusted this to detect the sentinel
value and break out of the read loop as if we detected EOF.

---------

Co-authored-by: Pavel Labath <pavel@labath.sk>
2025-02-13 10:35:50 -08:00
Petr Hosek
ea77dd8715
[libc] Include locale support in baremetal configuration (#127103)
Having locale is a requirement for C++ streams.
2025-02-13 10:26:33 -08:00
Jacek Caban
8252e0ef82
[LLD][COFF] Emit ARM64X relocations for CHPE ExtraRFETable entries (#126713)
In the native view, ExtraEFRTable references the x86 exception table.
The EC view references the ARM exception table, as it did before this
change.
2025-02-13 19:22:57 +01:00
Arthur Eubanks
7d9a12cec2 [gn build] Manually port 89d636ba 2025-02-13 18:18:30 +00:00
Michael Buch
0feb00f17c [lldb][test] TestCPPEnumPromotion: make sure enums are preserved in dSYM
On macOS CI this was failing with:
```
FAIL: test_dsym (TestCPPEnumPromotion.TestCPPEnumPromotion)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1784, in test_method
    return attrvalue(self)
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/lang/cpp/enum_promotion/TestCPPEnumPromotion.py", line 28, in test
    self.expect_expr("+EnumUChar::UChar", result_type=UChar_promoted.type.name)
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2540, in expect_expr
    value_check.check_value(self, eval_result, str(eval_result))
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 299, in check_value
    test_base.assertSuccess(val.GetError())
  File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2575, in assertSuccess
    self.fail(self._formatMessage(msg, "'{}' is not success".format(error)))
AssertionError: 'error: <user expression 0>:1:2: use of undeclared identifier 'EnumUChar'
    1 | +EnumUChar::UChar
      |  ^
' is not success
```
But only for the `dSYM` variant of the test.

Looking at the dSYM, none of the enums are actually preserved in the debug-info. We have to actually use the enum types in source to get dsymutil to preserve them. This patch does just that.
2025-02-13 18:13:09 +00:00
Jacek Caban
c52fbabc93
[LLD][COFF] Set __buildid symbol in both symbol tables on ARM64X (#126777) 2025-02-13 19:10:46 +01:00
Philip Reames
059722da5e Revert "[RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608)" and follow up commit.
This reverts commit 9cc8442a2b438962883bbbfd8ff62ad4b1a2b95d.
This reverts commit 859c871184bdfdebb47b5c7ec5e59348e0534e0b.

A performance regression was reported on the original review.  There appears
to have been an unexpected interaction here.  Reverting during investigation.
2025-02-13 09:57:33 -08:00
klensy
4ee173a168
add me to mailmap (#126226)
Should add ability for buildbot to find proper mail.


f1a84bbe55/master/buildbot/changes/gitpoller.py (L418)

At least buildbot parses user names and mails with respect to mailmap.

Co-authored-by: klensy <nightouser@gmail.com>
2025-02-13 17:49:48 +00:00
Alexey Bataev
d18b1ebef5 [SLP]Check if vector user exist before accessing it
Need to check if vector user exist before accessing it to avoid compiler
crash.
Fixes #126581
2025-02-13 09:44:34 -08:00
Sylvestre Ledru
c81139f417
libc/cmake: don't fail if LLVM_VERSION_SUFFIX isn't defined (#126359)
Closes: #126358

cc @samvangysegem

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-02-13 18:42:28 +01:00
Joel E. Denny
eb8ffd617a
[flang] AliasAnalysis: Handle fir.load on fir.alloca (#117785)
For example, determine that the address in p below cannot alias the
address of v:

```
subroutine test()
  real, pointer :: p
  real, target :: t
  real :: v
  p => t
  v = p
end subroutine test
```
2025-02-13 12:40:03 -05:00
Slava Zakharin
660cdace55
[flang] Fixed write past allocated descriptor in PointerAssociateRemapping. (#127000)
The pointer descriptor might be smaller than the target descriptor,
so `operator=` would write beyound the pointer descriptor.
2025-02-13 09:39:36 -08:00
Martin Erhart
9a63a2c4ba
[mlir][index] Add CAPI (#127039) 2025-02-13 17:37:49 +00:00
Stanislav Mekhanoshin
07405ca036
[AMDGPU] clang-format SIProgramInfo.h. NFC. (#127033) 2025-02-13 09:35:29 -08:00
Simon Pilgrim
4a97ce5f75 [X86] X86FixupVectorConstantsPass - pull out getPrimitiveSizeInBits call. NFC. 2025-02-13 17:25:08 +00:00
Kazu Hirata
4bda95304f
[llvm-profgen] Avoid repeated hash lookups (NFC) (#127028) 2025-02-13 09:12:33 -08:00
Kazu Hirata
9a59145d8e
[memprof] Avoid repeated map lookups (NFC) (#127027) 2025-02-13 09:12:04 -08:00
Kazu Hirata
fec04f286e
[FileCheck] Avoid repeated hash lookups (NFC) (#127026) 2025-02-13 09:11:43 -08:00
Kazu Hirata
e7bf6a4e04
[CodeGen] Avoid repeated map lookups (NFC) (#127025) 2025-02-13 09:11:17 -08:00
Kazu Hirata
44b61e056d
[Analysis] Avoid repeated hash lookups (NFC) (#127024) 2025-02-13 09:10:57 -08:00
Kazu Hirata
d096f45322
[clang-scan-deps] Avoid repeated map lookups (NFC) (#127023) 2025-02-13 09:10:38 -08:00
Ilia Kuklin
f30c891464
[lldb] Analyze enum promotion type during parsing (#115005)
The information about an enum's best promotion type is discarded after
compilation and is not present in debug info. This patch repeats the
same analysis of each enum value as in the front-end to determine the
best promotion type during DWARF info parsing.

Fixes #86989
2025-02-13 22:08:31 +05:00
Craig Topper
e750c7e636
[RISCV] Set Feature32Bit/Feature64Bit based on triple for -mcpu=help. (#127031)
llvm-mc keeps going after printing help text and creates an assembler.
If we don't set one of the XLen sized feature bits we trip a fatal error
in RISCVFeatures::validate.

llvm-mc should probably be fixed, but I don't know if its the only tool
with this issue.
2025-02-13 09:07:23 -08:00
Ellis Hoag
79fff6aa32
[lld][BP] Avoid ordering ICF'ed sections (#126327)
ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems
that the original sections are marked as not live, but are still kept
around. Prior to this patch, those ICF'ed sections would be passed to BP
and ordered before being skipped when writing the output. Now, these
sections are no longer passed to BP, saving runtime and possibly
improving BP's output.

In a large binary, I found that the number of sections ordered using BP
decreased, while the number of duplicate sections drastically decreased
as expected.
```
Functions for startup: 50755 -> 50520
Functions for compression: 165734 -> 105328
Duplicate functions: 1827231 -> 55230
```
2025-02-13 08:57:44 -08:00
Abhilash Majumder
55f3df875d
[NVPTX] Fix and refine prefetch.* intrinsics (#126899)
This is follow-up PR from #125887  which fixes the intrinsic failures .

---------

Co-authored-by: abmajumder <abmajumder@nvidia.com>
2025-02-13 17:54:01 +01:00
Piotr Zegar
a663e78a6e
[clang-tidy] Add recursion protection in ExceptionSpecAnalyzer (#66810)
Normally endless recursion should not happen in ExceptionSpecAnalyzer,
but if AST would be malformed (missing include), this could cause crash.

I run into this issue when due to missing include constructor argument
were parsed as FieldDecl.
As checking for recursion cost nothing, why not to do this in check just
in case.

Fixes #111436
2025-02-13 17:51:28 +01:00
Georgiy Samoylov
1138a4964a
[lldb] Fix build problem in llgs tests for RISC-V (#127091)
During testing of LLDB on RISC-V target, tests from the llgs category
were built with an error: `Error when building test subject.`

```
llvm-project/lldb/test/API/tools/lldb-server/main.cpp:151:40: error: missing ')' after '__builtin_debugtrap'
  151 | #elif __has_builtin(__builtin_debugtrap())
      |                     ~~~~~~~~~~~~~~~~~~~^
llvm-project/lldb/test/API/tools/lldb-server/main.cpp:151:20: note: to match this '('
  151 | #elif __has_builtin(__builtin_debugtrap())
      |                    ^
```

This patch fixes this error.
2025-02-13 16:48:03 +00:00
Vyacheslav Levytskyy
2f8de7b466
[SPIR-V] Type inference must realize that a <1 x Type> vector type is not a legal vector type in LLT (#124560)
In this PR we account for possible <1 x LLVM Type> input to ensure that
we produce legal vector types during type inference.

We modify an LLVM type to conform with future transformations in
IRTranslator, if it's a <1 x Type> vector type, replacing it by the
element type, because <1 x Type> vector type is not a legal vector type
in LLT and IRTranslator will represent it as the scalar eventually.
2025-02-13 17:46:42 +01:00
Jay Foad
ba45592377
[AMDGPU] Try to fix -mattr=dumpcode on big-endian hosts (#127073)
Blind fix for #116982 failing on big-endian buildbots.
2025-02-13 16:44:22 +00:00
Kazu Hirata
88015d12ca [mlir] Fix a warning
This patch fixes:

  mlir/lib/Conversion/ComplexCommon/DivisionConverter.cpp:61:2: error:
  extra ';' outside of a function is incompatible with C++98
  [-Werror,-Wc++98-compat-extra-semi]
2025-02-13 08:36:07 -08:00
Craig Topper
8da8ff8768
[flang][RISCV] Add target-abi ModuleFlag. (#126188)
This is needed to generate proper ABI flags in the ELF header for LTO
builds. If these flags aren't set correctly, we can't link with objects
that were built with the correct flags.

For non-LTO builds the mcpu/mattr in the TargetMachine will cause the
backend to infer an ABI. For LTO builds the mcpu/mattr aren't set.

I've only added lp64, lp64f, and lp64d ABIs. ilp32* requires riscv32
which is not yet supported in flang. lp64e requires a different
DataLayout string and would need additional plumbing.

Fixes #115679
2025-02-13 08:08:09 -08:00
Mikhail Goncharov
21811818d6 [bazel] port aecb764cc2e026ecb5c418dd56f2722c6f263e8b 2025-02-13 17:05:33 +01:00
David Green
b2165f214e
[CostModel] Account for power-2 urem in funnel shift costs (#127037)
As can be seen in https://godbolt.org/z/qvMqY79cK, a urem by a power-2
constant will be code-generated as an And of a mask. The cost model for
funnel shifts tries to account for that by passing OP_PowerOf2 as the
operand info for the second operand. As far as I can tell returning a
lower cost for urem with a OP_PowerOf2 is only implemented on X86
though.

This patch short-cuts that by calling getArithmeticInstrCost(And, ..)
directly when we know the typesize will be a power-of-2. This is an
alternative to the patch in #126912 which is a more general solution for
power-2 udiv/urem costs, this more narrowly just fixes funnel shifts.
2025-02-13 16:05:00 +00:00
Hyunsung Lee
de09986596
[mlir][math] powf(a, b) drop support when a < 0 (#126338)
Related: #124402

- change inefficient implementation of `powf(a, b)` to handle `a < 0`
case
  - thus drop `a < 0` case support

However, some special cases are being used such as:
  - `a < 0` and `b = 0, b = 0.5, b = 1 or b = 2`
  - convert those special cases into simpler ops.
2025-02-13 08:01:47 -08:00
Vitaly Buka
a1345eb240
Revert "[libclang] Always Dup in createRef(StringRef)" (#127076)
Reverts llvm/llvm-project#125020


https://lab.llvm.org/buildbot/#/builders/24/builds/5252/steps/12/logs/stdio

```
==c-index-test==2512295==ERROR: AddressSanitizer: heap-use-after-free on address 0xe19338c27992 at pc 0xc66be4784830 bp 0xe0e33660df00 sp 0xe0e33660d6e8
READ of size 23 at 0xe19338c27992 thread T1
    #0 0xc66be478482c in printf_common(void*, char const*, std::__va_list) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors_format.inc:563:9
    #1 0xc66be478643c in vprintf /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1699:1
    #2 0xc66be478643c in printf /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1757:1
    #3 0xc66be4839384 in FilteredPrintingVisitor /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/c-index-test/c-index-test.c:1359:5
    #4 0xe4e3454f12e8 in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CIndex.cpp:227:11
    #5 0xe4e3454f48a8 in bool clang::cxcursor::CursorVisitor::visitPreprocessedEntities<clang::PreprocessingRecord::iterator>(clang::PreprocessingRecord::iterator, clang::PreprocessingRecord::iterator, clang::PreprocessingRecord&, clang::FileID) CIndex.cpp
    
0xe19338c27992 is located 82 bytes inside of 105-byte region [0xe19338c27940,0xe19338c279a9)
freed by thread T1 here:
    #0 0xc66be480040c in free /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:51:3
    #1 0xc66be4839728 in GetCursorSource c-index-test.c
    #2 0xc66be4839368 in FilteredPrintingVisitor /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/c-index-test/c-index-test.c:1360:12
    #3 0xe4e3454f12e8 in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CIndex.cpp:227:11
    #4 0xe4e3454f48a8 in bool clang::cxcursor::CursorVisitor::visitPreprocessedEntities<clang::PreprocessingRecord::iterator>(clang::PreprocessingRecord::iterator, clang::PreprocessingRecord::iterator, clang::PreprocessingRecord&, clang::FileID) CIndex.cpp


previously allocated by thread T1 here:
    #0 0xc66be4800680 in malloc /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3
    #1 0xe4e3456379b0 in safe_malloc /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/MemAlloc.h:26:18
    #2 0xe4e3456379b0 in createDup /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CXString.cpp:95:40
    #3 0xe4e3456379b0 in clang::cxstring::createRef(llvm::StringRef) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CXString.cpp:90:10
```
2025-02-13 07:42:40 -08:00
Alexey Bataev
2ad816648f
[SLP]Improved reduction cost/codegen
SLP vectorizer is able to combine several reductions from the list of
(potentially) reduced values with the different opcodes/values kind.
Currently, these reductions are handled independently of each other. But
instead the compiler can combine them into wide vector operations and
then perform only single reduction.
E.g, if the SLP vectorizer emits currently something like:
```
%r1 = reduce.add(<4 x i32> %v1)
%r2 = reduce.add(<4 x i32> %v2)
%r = add i32 %r1, %r2
```

it can be emitted as:
```
%v = add <4 x i32> %v1, %v2
%r = reduce.add(<4 x i32> %v)
```

It allows to improve the performance in some cases.

AVX512, -O3+LTO
Metric: size..text

Program                                                                                           size..text
                                                                                                  results     results0    diff
                      test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test     4553.00     4615.00  1.4%
                                 test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   412708.00   416820.00  1.0%
        test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12901.00    12981.00  0.6%
                        test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    22717.00    22813.00  0.4%
                             test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39722.00    39850.00  0.3%
                      test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39725.00    39853.00  0.3%
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-builtin-bitops-1.test    15918.00    15967.00  0.3%
                                       test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   155491.00   155587.00  0.1%
                                     test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test   227894.00   227942.00  0.0%
                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1062188.00  1062364.00  0.0%
                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   793672.00   793720.00  0.0%
                              test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   657371.00   657403.00  0.0%
                             test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   657371.00   657403.00  0.0%
                   test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2074917.00  2074933.00  0.0%
                    test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2074917.00  2074933.00  0.0%
                                     test-suite :: MultiSource/Applications/JM/lencod/lencod.test   855219.00   855203.00 -0.0%

Benchmarks/Shootout-C++ - same transformed reduction
Adobe-C++/loop_unroll - same transformed reductions, new vector code
AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - same transformed reductions
FreeBench/fourinarow - same transformed reductions
MiBench/telecomm-gsm - same transformed reductions
execute/GCC-C-execute-builtin-bitops-1 - same transformed reductions
CFP2006/433.milc - better vector code, several x i64 reductions + trunc
to i32 gets trunced to x i32 reductions
ImageProcessing/Blur - same transformed reductions
Benchmarks/7zip - same transformed reductions, extra 4 x vectorization
CINT2006/464.h264ref - same transformed reductions
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - same transformed reductions
CINT2017speed/600.perlbench_s
CINT2017rate/500.perlbench_r - transformed same reduction
JM/lencod - extra 4 x vectorization

RISC-V, SiFive-p670, -O3+LTO

Metric: size..text

Program                                                                                           size..text
                                                                                                  results    results0   diff
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-builtin-bitops-1.test    8990.00    9514.00   5.8%
                                test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  588504.00  588488.00  -0.0%
                    test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  147464.00  147440.00  -0.0%
              test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test   21496.00   21492.00  -0.0%
                                     test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test  165420.00  165372.00  -0.0%
                                    test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  843928.00  843648.00  -0.0%
                                    test-suite :: External/SPEC/CINT2006/458.sjeng/458.sjeng.test  100712.00  100672.00  -0.0%
                      test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   24384.00   24336.00  -0.2%
                             test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   24380.00   24332.00  -0.2%
             test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test   10348.00   10316.00  -0.3%
                                 test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test  221304.00  220480.00  -0.4%
                      test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test    3750.00    3736.00  -0.4%
                            test-suite :: SingleSource/Regression/C/Regression-C-DuffsDevice.test     678.00     370.00 -45.4%

execute/GCC-C-execute-builtin-bitops-1 - extra 4 x reductions, same
transformed reductions
CINT2006/464.h264ref - extra 4 x reductions, same transformed reductions
MiBench/consumer-lame - 2 4 x i1 merged to 8 x i1 reductions (bitcast + ctpop)
MiBench/automotive-susan - same transformed reductions
ImageProcessing/Blur - same transformed reductions
Benchmarks/7zip - same transformed reductions
CINT2006/458.sjeng - 2 4 x i1 merged to 8 x i1 reductions (bitcast + ctpop)
MiBench/telecomm-gsm - same transformed reductions
Benchmarks/mediabench - same transformed reductions
Vectorizer/VPlanNativePath - same transformed reductions
Adobe-C++/loop_unroll - extra 4 x reductions, same transformed reductions
Benchmarks/Shootout-C++ - extra 4 x reductions, same transformed reductions
Regression/C/Regression-C-DuffsDevice - same transformed reductions

Reviewers: hiraditya, topperc, preames

Pull Request: https://github.com/llvm/llvm-project/pull/118293
2025-02-13 10:36:28 -05:00
Robert Imschweiler
41e49fadd4
[AMDGPU] Fix llvm.amdgcn.workitem.id-unsupported-calling-convention.ll (#127041)
Follow-up fix for #126058. (@arsenm)
2025-02-13 22:23:47 +07:00
Robert Imschweiler
0da8d0f9b7
[AMDGPU] Change handling of unsupported non-compute shaders with HSA (#126798)
Previous handling in `SITargetLowering::LowerFormalArguments` only
reported a diagnostic message and continued execution by returning a
non-usable `SDValue`. This results in llvm crashing later with an
unrelated error. This commit changes the detection of an unsupported
non-compute shader to be a fatal error right away.

As an example situation, take the usage of an `amdgpu_ps` function and
the `amdgcn-unknown-amdhsa` target triple.
```
define amdgpu_ps void @foo(ptr %p, i32 %i) {
        store i32 %i, ptr %p
        ret void
}
```
Compiling this code (with `llc -mtriple=amdgcn-unknown-amdhsa
-mcpu=gfx942`, for example) fails with:
```
error: <unknown>:0:0: in function foo void (ptr, i32): unsupported non-compute shaders with HSA

llc:
[...]/git/trunk21.0/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11790:
void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&):
Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.
[...]
```
2025-02-13 22:23:08 +07:00