llvm-project

Author	SHA1	Message	Date
Piotr Fusik	0078e8f450	[RISCV][NFC] Fix a warning (#127090 )	2025-02-13 21:42:36 +01:00
Aiden Grossman	161f64a4c1	[Github][CI] Hashpin actions dependencies (#127011 ) This patch has pins several actions dependencies in the premerge workflow and the Windows/Linux container build workflows to help improve security in the unlikely event that someone tries to pull off a supply chain security attack by modifying release asserts for these actions.	2025-02-13 12:16:42 -08:00
Ellis Hoag	83632c039d	[lld][BP] Order .Tgm symbols for startup (#126328 ) The Global Function Merger (https://discourse.llvm.org/t/rfc-global-function-merging/82608) pass optimistically creates merged instances of functions and suffixes their names with `.Tgm`. Then in the linker, ICF will (hopefully) fold these `.Tgm` functions. For example, a function `foo` might become a thunk `foo` that calls a merged function `foo.Tgm`. Since IRPGO runs before the global merger, we will only have a profile for `foo`. We want to correlate this profile to both `foo` and `foo.Tgm` so they can both be ordered to improve startup time. I built a large binary and found that it increased the number of functions ordered for startup, as expected. ``` Functions for startup: 12049 -> 12697 Functions for compression: 34733 -> 34707 ``` The reason why we don't see a larger improvement is because there are some cases where the code was accidentally working: `getRootSymbol("foo.llvm.5555.Tgm")` already returns `foo`.	2025-02-13 12:10:58 -08:00
LLVM GN Syncbot	b5aa1c4783	[gn build] Port 63c1be724924	2025-02-13 19:42:26 +00:00
Florian Hahn	65640c1d4c	[AssumeBundles] Dereferenceable used in bundle only applies at assume. (#126117 ) Update LangRef and code using `Dereferenceable` in assume bundles to only use the information if it is safe at the point of use. `Dereferenceable` in an assume bundle is only guaranteed at the point of the assumption, but may not be guaranteed at later points, because the pointer may have been freed. Update code using `Dereferenceable` to only use it if the pointer cannot be freed. This can further be refined to check if the pointer could be freed between assume and use. This follows up on https://github.com/llvm/llvm-project/pull/123196. With that change, it should be safe to expose dereferenceable assumptions more widely as in https://github.com/llvm/llvm-project/pull/121789 PR: https://github.com/llvm/llvm-project/pull/126117	2025-02-13 20:41:23 +01:00
LU-JOHN	5decab178f	AMDGPU: Reduce shl64 to shl32 if shift range is [63-32] (#125574 ) Reduce: DST = shl i64 X, Y where Y is in the range [63-32] to: DST = [0, shl i32 X, (Y & 32)] Alive2 analysis: https://alive2.llvm.org/ce/z/w_u5je --------- Signed-off-by: John Lu <John.Lu@amd.com>	2025-02-13 13:40:25 -06:00
Petr Hosek	2bdeeaa185	[libc] Use __builtin_elementwise_fma instead of __builtin_fma (#126288 ) __builtin_elementwise_fma doesn't consider errno and is thus more suitable for libc fma implementation.	2025-02-13 11:40:04 -08:00
Paul Kirth	63c1be7249	[llvm][fatlto] Add FatLTOCleanup pass (#125911 ) When using FatLTO, it is common to want to enable certain types of whole program optimizations (WPD) or security transforms (CFI), so that they can be made available when performing LTO. However, these transforms should not be used when compiling the non-LTO object code. Since the frontend must emit different IR, we cannot simply clone the module and optimize the LTO section and non-LTO section differently to work around this. Instead, we need to remove any problematic instruction sequences. This patch adds a new pass whose responsibility is to clean up the IR in the FatLTO pipeline after creating the bitcode section, which is after running the pre-link pipeline but before running module optimization. This allows us to safely drop any conflicting instructions or IR constructs that are inappropriate for non-LTO compilation.	2025-02-13 11:39:02 -08:00
Alex MacLean	ecdfa36eca	Reland "[NVPTX] Cleanup/Refactoring in NVPTX AsmPrinter and RegisterInfo (NFC)" (#127089 )	2025-02-13 11:35:31 -08:00
Jason Molenda	b666ac3b63	[lldb] Change lldb's breakpoint handling behavior, reland (#126988 ) lldb today has two rules: When a thread stops at a BreakpointSite, we set the thread's StopReason to be "breakpoint hit" (regardless if we've actually hit the breakpoint, or if we've merely stopped at the breakpoint instruction/point and haven't tripped it yet). And second, when resuming a process, any thread sitting at a BreakpointSite is silently stepped over the BreakpointSite -- because we've already flagged the breakpoint hit when we stopped there originally. In this patch, I change lldb to only set a thread's stop reason to breakpoint-hit when we've actually executed the instruction/triggered the breakpoint. When we resume, we only silently step past a BreakpointSite that we've registered as hit. We preserve this state across inferior function calls that the user may do while stopped, etc. Also, when a user adds a new breakpoint at $pc while stopped, or changes $pc to be the address of a BreakpointSite, we will silently step past that breakpoint when the process resumes. This is purely a UX call, I don't think there's any person who wants to set a breakpoint at $pc and then hit it immediately on resuming. One non-intuitive UX from this change, butt is necessary: If you're stopped at a BreakpointSite that has not yet executed, you `stepi`, you will hit the breakpoint and the pc will not yet advance. This thread has not completed its stepi, and the ThreadPlanStepInstruction is still on the stack. If you then `continue` the thread, lldb will now stop and say, "instruction step completed", one instruction past the BreakpointSite. You can continue a second time to resume execution. The bugs driving this change are all from lldb dropping the real stop reason for a thread and setting it to breakpoint-hit when that was not the case. Jim hit one where we have an aarch64 watchpoint that triggers one instruction before a BreakpointSite. On this arch we are notified of the watchpoint hit after the instruction has been unrolled -- we disable the watchpoint, instruction step, re-enable the watchpoint and collect the new value. But now we're on a BreakpointSite so the watchpoint-hit stop reason is lost. Another was reported by ZequanWu in https://discourse.llvm.org/t/lldb-unable-to-break-at-start/78282 we attach to/launch a process with the pc at a BreakpointSite and misbehave. Caroline Tice mentioned it is also a problem they've had with putting a breakpoint on _dl_debug_state. The change to each Process plugin that does execution control is that 1. If we've stopped at a BreakpointSite that has not been executed yet, we will call Thread::SetThreadStoppedAtUnexecutedBP(pc) to record that. When the thread resumes, if the pc is still at the same site, we will continue, hit the breakpoint, and stop again. 2. When we've actually hit a breakpoint (enabled for this thread or not), the Process plugin should call Thread::SetThreadHitBreakpointSite(). When we go to resume the thread, we will push a step-over-breakpoint ThreadPlan before resuming. The biggest set of changes is to StopInfoMachException where we translate a Mach Exception into a stop reason. The Mach exception codes differ in a few places depending on the target (unambiguously), and I didn't want to duplicate the new code for each target so I've tested what mach exceptions we get for each action on each target, and reorganized StopInfoMachException::CreateStopReasonWithMachException to document these possible values, and handle them without specializing based on the target arch. I first landed this patch in July 2024 via https://github.com/llvm/llvm-project/pull/96260 but the CI bots and wider testing found a number of test case failures that needed to be updated, I reverted it. I've fixed all of those issues in separate PRs and this change should run cleanly on all the CI bots now. rdar://123942164	2025-02-13 11:30:10 -08:00
Philip Reames	72f4e656b8	[RISCV] Revise interface of isLegalBitRotate [nfc] Remove a dead parameter (DAG), and replace the ShuffleVectorSDNode param with the two things we need from the shuffle (mask and VT). There's further room to improve this code, but this gets me what I need for an upcoming patch.	2025-02-13 11:17:54 -08:00
alx32	4ac79a8c98	[lld-macho] Use Symbols as branch target for safe_thunks ICF (#126835 ) ## Problem The `safe_thunks` ICF optimization in `lld-macho` was creating thunks that pointed to `InputSection`s instead of `Symbol`s. While, generally, branch relocations can point to symbols or input sections, in this case we need them to point to symbols as subsequently the branch extension algorithm expects branches to always point to `Symbol`'s. ## Solution This patch changes the ICF implementation so that safe thunks point to `Symbol`'s rather than `InputSection`s. ## Testing The existing `arm64-thunks.s` test is modified to include `--icf=safe_thunks` to explicitly verify the interaction between ICF and branch range extension thunks. Two functions were added that will be merged together via a thunk. Before this patch, this test would generate an assert - now this scenario is correctly handled.	2025-02-13 11:07:12 -08:00
John Harrison	c2e96778e0	[lldb-dap] Ensure we do not print the close sentinel when closing stdout. (#126833 ) If you have an lldb-dap log file you'll almost always see a final message like: ``` <-- Content-Length: 94 { "body": { "category": "stdout", "output": "\u0000\u0000" }, "event": "output", "seq": 0, "type": "event" } <-- Content-Length: 94 { "body": { "category": "stderr", "output": "\u0000\u0000" }, "event": "output", "seq": 0, "type": "event" } ``` The OutputRedirect is always writing the `"\0"` byte as a final stdout message during shutdown. Instead, I adjusted this to detect the sentinel value and break out of the read loop as if we detected EOF. --------- Co-authored-by: Pavel Labath <pavel@labath.sk>	2025-02-13 10:35:50 -08:00
Petr Hosek	ea77dd8715	[libc] Include locale support in baremetal configuration (#127103 ) Having locale is a requirement for C++ streams.	2025-02-13 10:26:33 -08:00
Jacek Caban	8252e0ef82	[LLD][COFF] Emit ARM64X relocations for CHPE ExtraRFETable entries (#126713 ) In the native view, ExtraEFRTable references the x86 exception table. The EC view references the ARM exception table, as it did before this change.	2025-02-13 19:22:57 +01:00
Arthur Eubanks	7d9a12cec2	[gn build] Manually port 89d636ba	2025-02-13 18:18:30 +00:00
Michael Buch	0feb00f17c	[lldb][test] TestCPPEnumPromotion: make sure enums are preserved in dSYM On macOS CI this was failing with: ``` FAIL: test_dsym (TestCPPEnumPromotion.TestCPPEnumPromotion) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1784, in test_method return attrvalue(self) File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/test/API/lang/cpp/enum_promotion/TestCPPEnumPromotion.py", line 28, in test self.expect_expr("+EnumUChar::UChar", result_type=UChar_promoted.type.name) File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2540, in expect_expr value_check.check_value(self, eval_result, str(eval_result)) File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 299, in check_value test_base.assertSuccess(val.GetError()) File "/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2575, in assertSuccess self.fail(self._formatMessage(msg, "'{}' is not success".format(error))) AssertionError: 'error: <user expression 0>:1:2: use of undeclared identifier 'EnumUChar' 1 \| +EnumUChar::UChar \| ^ ' is not success ``` But only for the `dSYM` variant of the test. Looking at the dSYM, none of the enums are actually preserved in the debug-info. We have to actually use the enum types in source to get dsymutil to preserve them. This patch does just that.	2025-02-13 18:13:09 +00:00
Jacek Caban	c52fbabc93	[LLD][COFF] Set __buildid symbol in both symbol tables on ARM64X (#126777 )	2025-02-13 19:10:46 +01:00
Philip Reames	059722da5e	Revert "[RISCV] Default to MicroOpBufferSize = 1 for scheduling purposes (#126608 )" and follow up commit. This reverts commit 9cc8442a2b438962883bbbfd8ff62ad4b1a2b95d. This reverts commit 859c871184bdfdebb47b5c7ec5e59348e0534e0b. A performance regression was reported on the original review. There appears to have been an unexpected interaction here. Reverting during investigation.	2025-02-13 09:57:33 -08:00
klensy	4ee173a168	add me to mailmap (#126226 ) Should add ability for buildbot to find proper mail. `f1a84bbe55/master/buildbot/changes/gitpoller.py (L418)` At least buildbot parses user names and mails with respect to mailmap. Co-authored-by: klensy <nightouser@gmail.com>	2025-02-13 17:49:48 +00:00
Alexey Bataev	d18b1ebef5	[SLP]Check if vector user exist before accessing it Need to check if vector user exist before accessing it to avoid compiler crash. Fixes #126581	2025-02-13 09:44:34 -08:00
Sylvestre Ledru	c81139f417	libc/cmake: don't fail if LLVM_VERSION_SUFFIX isn't defined (#126359 ) Closes: #126358 cc @samvangysegem --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-02-13 18:42:28 +01:00
Joel E. Denny	eb8ffd617a	[flang] AliasAnalysis: Handle fir.load on fir.alloca (#117785 ) For example, determine that the address in p below cannot alias the address of v: ``` subroutine test() real, pointer :: p real, target :: t real :: v p => t v = p end subroutine test ```	2025-02-13 12:40:03 -05:00
Slava Zakharin	660cdace55	[flang] Fixed write past allocated descriptor in PointerAssociateRemapping. (#127000 ) The pointer descriptor might be smaller than the target descriptor, so `operator=` would write beyound the pointer descriptor.	2025-02-13 09:39:36 -08:00
Martin Erhart	9a63a2c4ba	[mlir][index] Add CAPI (#127039 )	2025-02-13 17:37:49 +00:00
Stanislav Mekhanoshin	07405ca036	[AMDGPU] clang-format SIProgramInfo.h. NFC. (#127033 )	2025-02-13 09:35:29 -08:00
Simon Pilgrim	4a97ce5f75	[X86] X86FixupVectorConstantsPass - pull out getPrimitiveSizeInBits call. NFC.	2025-02-13 17:25:08 +00:00
Kazu Hirata	4bda95304f	[llvm-profgen] Avoid repeated hash lookups (NFC) (#127028 )	2025-02-13 09:12:33 -08:00
Kazu Hirata	9a59145d8e	[memprof] Avoid repeated map lookups (NFC) (#127027 )	2025-02-13 09:12:04 -08:00
Kazu Hirata	fec04f286e	[FileCheck] Avoid repeated hash lookups (NFC) (#127026 )	2025-02-13 09:11:43 -08:00
Kazu Hirata	e7bf6a4e04	[CodeGen] Avoid repeated map lookups (NFC) (#127025 )	2025-02-13 09:11:17 -08:00
Kazu Hirata	44b61e056d	[Analysis] Avoid repeated hash lookups (NFC) (#127024 )	2025-02-13 09:10:57 -08:00
Kazu Hirata	d096f45322	[clang-scan-deps] Avoid repeated map lookups (NFC) (#127023 )	2025-02-13 09:10:38 -08:00
Ilia Kuklin	f30c891464	[lldb] Analyze enum promotion type during parsing (#115005 ) The information about an enum's best promotion type is discarded after compilation and is not present in debug info. This patch repeats the same analysis of each enum value as in the front-end to determine the best promotion type during DWARF info parsing. Fixes #86989	2025-02-13 22:08:31 +05:00
Craig Topper	e750c7e636	[RISCV] Set Feature32Bit/Feature64Bit based on triple for -mcpu=help. (#127031 ) llvm-mc keeps going after printing help text and creates an assembler. If we don't set one of the XLen sized feature bits we trip a fatal error in RISCVFeatures::validate. llvm-mc should probably be fixed, but I don't know if its the only tool with this issue.	2025-02-13 09:07:23 -08:00
Ellis Hoag	79fff6aa32	[lld][BP] Avoid ordering ICF'ed sections (#126327 ) ICF runs before BPSectionOrderer. When a section is ICF'ed, it seems that the original sections are marked as not live, but are still kept around. Prior to this patch, those ICF'ed sections would be passed to BP and ordered before being skipped when writing the output. Now, these sections are no longer passed to BP, saving runtime and possibly improving BP's output. In a large binary, I found that the number of sections ordered using BP decreased, while the number of duplicate sections drastically decreased as expected. ``` Functions for startup: 50755 -> 50520 Functions for compression: 165734 -> 105328 Duplicate functions: 1827231 -> 55230 ```	2025-02-13 08:57:44 -08:00
Abhilash Majumder	55f3df875d	[NVPTX] Fix and refine prefetch.* intrinsics (#126899 ) This is follow-up PR from #125887 which fixes the intrinsic failures . --------- Co-authored-by: abmajumder <abmajumder@nvidia.com>	2025-02-13 17:54:01 +01:00
Piotr Zegar	a663e78a6e	[clang-tidy] Add recursion protection in ExceptionSpecAnalyzer (#66810 ) Normally endless recursion should not happen in ExceptionSpecAnalyzer, but if AST would be malformed (missing include), this could cause crash. I run into this issue when due to missing include constructor argument were parsed as FieldDecl. As checking for recursion cost nothing, why not to do this in check just in case. Fixes #111436	2025-02-13 17:51:28 +01:00
Georgiy Samoylov	1138a4964a	[lldb] Fix build problem in llgs tests for RISC-V (#127091 ) During testing of LLDB on RISC-V target, tests from the llgs category were built with an error: `Error when building test subject.` ``` llvm-project/lldb/test/API/tools/lldb-server/main.cpp:151:40: error: missing ')' after '__builtin_debugtrap' 151 \| #elif __has_builtin(__builtin_debugtrap()) \| ~~~~~~~~~~~~~~~~~~~^ llvm-project/lldb/test/API/tools/lldb-server/main.cpp:151:20: note: to match this '(' 151 \| #elif __has_builtin(__builtin_debugtrap()) \| ^ ``` This patch fixes this error.	2025-02-13 16:48:03 +00:00
Vyacheslav Levytskyy	2f8de7b466	[SPIR-V] Type inference must realize that a <1 x Type> vector type is not a legal vector type in LLT (#124560 ) In this PR we account for possible <1 x LLVM Type> input to ensure that we produce legal vector types during type inference. We modify an LLVM type to conform with future transformations in IRTranslator, if it's a <1 x Type> vector type, replacing it by the element type, because <1 x Type> vector type is not a legal vector type in LLT and IRTranslator will represent it as the scalar eventually.	2025-02-13 17:46:42 +01:00
Jay Foad	ba45592377	[AMDGPU] Try to fix -mattr=dumpcode on big-endian hosts (#127073 ) Blind fix for #116982 failing on big-endian buildbots.	2025-02-13 16:44:22 +00:00
Kazu Hirata	88015d12ca	[mlir] Fix a warning This patch fixes: mlir/lib/Conversion/ComplexCommon/DivisionConverter.cpp:61:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]	2025-02-13 08:36:07 -08:00
Craig Topper	8da8ff8768	[flang][RISCV] Add target-abi ModuleFlag. (#126188 ) This is needed to generate proper ABI flags in the ELF header for LTO builds. If these flags aren't set correctly, we can't link with objects that were built with the correct flags. For non-LTO builds the mcpu/mattr in the TargetMachine will cause the backend to infer an ABI. For LTO builds the mcpu/mattr aren't set. I've only added lp64, lp64f, and lp64d ABIs. ilp32* requires riscv32 which is not yet supported in flang. lp64e requires a different DataLayout string and would need additional plumbing. Fixes #115679	2025-02-13 08:08:09 -08:00
Mikhail Goncharov	21811818d6	[bazel] port aecb764cc2e026ecb5c418dd56f2722c6f263e8b	2025-02-13 17:05:33 +01:00
David Green	b2165f214e	[CostModel] Account for power-2 urem in funnel shift costs (#127037 ) As can be seen in https://godbolt.org/z/qvMqY79cK, a urem by a power-2 constant will be code-generated as an And of a mask. The cost model for funnel shifts tries to account for that by passing OP_PowerOf2 as the operand info for the second operand. As far as I can tell returning a lower cost for urem with a OP_PowerOf2 is only implemented on X86 though. This patch short-cuts that by calling getArithmeticInstrCost(And, ..) directly when we know the typesize will be a power-of-2. This is an alternative to the patch in #126912 which is a more general solution for power-2 udiv/urem costs, this more narrowly just fixes funnel shifts.	2025-02-13 16:05:00 +00:00
Hyunsung Lee	de09986596	[mlir][math] `powf(a, b)` drop support when a < 0 (#126338 ) Related: #124402 - change inefficient implementation of `powf(a, b)` to handle `a < 0` case - thus drop `a < 0` case support However, some special cases are being used such as: - `a < 0` and `b = 0, b = 0.5, b = 1 or b = 2` - convert those special cases into simpler ops.	2025-02-13 08:01:47 -08:00
Vitaly Buka	a1345eb240	Revert "[libclang] Always Dup in createRef(StringRef)" (#127076 ) Reverts llvm/llvm-project#125020 https://lab.llvm.org/buildbot/#/builders/24/builds/5252/steps/12/logs/stdio ``` ==c-index-test==2512295==ERROR: AddressSanitizer: heap-use-after-free on address 0xe19338c27992 at pc 0xc66be4784830 bp 0xe0e33660df00 sp 0xe0e33660d6e8 READ of size 23 at 0xe19338c27992 thread T1 #0 0xc66be478482c in printf_common(void, char const, std::__va_list) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors_format.inc:563:9 #1 0xc66be478643c in vprintf /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1699:1 #2 0xc66be478643c in printf /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1757:1 #3 0xc66be4839384 in FilteredPrintingVisitor /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/c-index-test/c-index-test.c:1359:5 #4 0xe4e3454f12e8 in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CIndex.cpp:227:11 #5 0xe4e3454f48a8 in bool clang::cxcursor::CursorVisitor::visitPreprocessedEntities<clang::PreprocessingRecord::iterator>(clang::PreprocessingRecord::iterator, clang::PreprocessingRecord::iterator, clang::PreprocessingRecord&, clang::FileID) CIndex.cpp 0xe19338c27992 is located 82 bytes inside of 105-byte region [0xe19338c27940,0xe19338c279a9) freed by thread T1 here: #0 0xc66be480040c in free /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:51:3 #1 0xc66be4839728 in GetCursorSource c-index-test.c #2 0xc66be4839368 in FilteredPrintingVisitor /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/c-index-test/c-index-test.c:1360:12 #3 0xe4e3454f12e8 in clang::cxcursor::CursorVisitor::Visit(CXCursor, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CIndex.cpp:227:11 #4 0xe4e3454f48a8 in bool clang::cxcursor::CursorVisitor::visitPreprocessedEntities<clang::PreprocessingRecord::iterator>(clang::PreprocessingRecord::iterator, clang::PreprocessingRecord::iterator, clang::PreprocessingRecord&, clang::FileID) CIndex.cpp previously allocated by thread T1 here: #0 0xc66be4800680 in malloc /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 #1 0xe4e3456379b0 in safe_malloc /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/MemAlloc.h:26:18 #2 0xe4e3456379b0 in createDup /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CXString.cpp:95:40 #3 0xe4e3456379b0 in clang::cxstring::createRef(llvm::StringRef) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/clang/tools/libclang/CXString.cpp:90:10 ```	2025-02-13 07:42:40 -08:00
Alexey Bataev	2ad816648f	[SLP]Improved reduction cost/codegen SLP vectorizer is able to combine several reductions from the list of (potentially) reduced values with the different opcodes/values kind. Currently, these reductions are handled independently of each other. But instead the compiler can combine them into wide vector operations and then perform only single reduction. E.g, if the SLP vectorizer emits currently something like: ``` %r1 = reduce.add(<4 x i32> %v1) %r2 = reduce.add(<4 x i32> %v2) %r = add i32 %r1, %r2 ``` it can be emitted as: ``` %v = add <4 x i32> %v1, %v2 %r = reduce.add(<4 x i32> %v) ``` It allows to improve the performance in some cases. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test 4553.00 4615.00 1.4% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 412708.00 416820.00 1.0% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test 12901.00 12981.00 0.6% test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test 22717.00 22813.00 0.4% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39722.00 39850.00 0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39725.00 39853.00 0.3% test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-builtin-bitops-1.test 15918.00 15967.00 0.3% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 155491.00 155587.00 0.1% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 227894.00 227942.00 0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1062188.00 1062364.00 0.0% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 793672.00 793720.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 657371.00 657403.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 657371.00 657403.00 0.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2074917.00 2074933.00 0.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2074917.00 2074933.00 0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 855219.00 855203.00 -0.0% Benchmarks/Shootout-C++ - same transformed reduction Adobe-C++/loop_unroll - same transformed reductions, new vector code AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - same transformed reductions FreeBench/fourinarow - same transformed reductions MiBench/telecomm-gsm - same transformed reductions execute/GCC-C-execute-builtin-bitops-1 - same transformed reductions CFP2006/433.milc - better vector code, several x i64 reductions + trunc to i32 gets trunced to x i32 reductions ImageProcessing/Blur - same transformed reductions Benchmarks/7zip - same transformed reductions, extra 4 x vectorization CINT2006/464.h264ref - same transformed reductions CINT2017rate/525.x264_r CINT2017speed/625.x264_s - same transformed reductions CINT2017speed/600.perlbench_s CINT2017rate/500.perlbench_r - transformed same reduction JM/lencod - extra 4 x vectorization RISC-V, SiFive-p670, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-builtin-bitops-1.test 8990.00 9514.00 5.8% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 588504.00 588488.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 147464.00 147440.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan.test 21496.00 21492.00 -0.0% test-suite :: MicroBenchmarks/ImageProcessing/Blur/blur.test 165420.00 165372.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 843928.00 843648.00 -0.0% test-suite :: External/SPEC/CINT2006/458.sjeng/458.sjeng.test 100712.00 100672.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 24384.00 24336.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 24380.00 24332.00 -0.2% test-suite :: SingleSource/UnitTests/Vectorizer/VPlanNativePath/outer-loop-vect.test 10348.00 10316.00 -0.3% test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 221304.00 220480.00 -0.4% test-suite :: SingleSource/Benchmarks/Shootout-C++/Shootout-C++-matrix.test 3750.00 3736.00 -0.4% test-suite :: SingleSource/Regression/C/Regression-C-DuffsDevice.test 678.00 370.00 -45.4% execute/GCC-C-execute-builtin-bitops-1 - extra 4 x reductions, same transformed reductions CINT2006/464.h264ref - extra 4 x reductions, same transformed reductions MiBench/consumer-lame - 2 4 x i1 merged to 8 x i1 reductions (bitcast + ctpop) MiBench/automotive-susan - same transformed reductions ImageProcessing/Blur - same transformed reductions Benchmarks/7zip - same transformed reductions CINT2006/458.sjeng - 2 4 x i1 merged to 8 x i1 reductions (bitcast + ctpop) MiBench/telecomm-gsm - same transformed reductions Benchmarks/mediabench - same transformed reductions Vectorizer/VPlanNativePath - same transformed reductions Adobe-C++/loop_unroll - extra 4 x reductions, same transformed reductions Benchmarks/Shootout-C++ - extra 4 x reductions, same transformed reductions Regression/C/Regression-C-DuffsDevice - same transformed reductions Reviewers: hiraditya, topperc, preames Pull Request: https://github.com/llvm/llvm-project/pull/118293	2025-02-13 10:36:28 -05:00
Robert Imschweiler	41e49fadd4	[AMDGPU] Fix llvm.amdgcn.workitem.id-unsupported-calling-convention.ll (#127041 ) Follow-up fix for #126058. (@arsenm)	2025-02-13 22:23:47 +07:00
Robert Imschweiler	0da8d0f9b7	[AMDGPU] Change handling of unsupported non-compute shaders with HSA (#126798 ) Previous handling in `SITargetLowering::LowerFormalArguments` only reported a diagnostic message and continued execution by returning a non-usable `SDValue`. This results in llvm crashing later with an unrelated error. This commit changes the detection of an unsupported non-compute shader to be a fatal error right away. As an example situation, take the usage of an `amdgpu_ps` function and the `amdgcn-unknown-amdhsa` target triple. ``` define amdgpu_ps void @foo(ptr %p, i32 %i) { store i32 %i, ptr %p ret void } ``` Compiling this code (with `llc -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx942`, for example) fails with: ``` error: <unknown>:0:0: in function foo void (ptr, i32): unsupported non-compute shaders with HSA llc: [...]/git/trunk21.0/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:11790: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed. [...] ```	2025-02-13 22:23:08 +07:00

... 3 4 5 6 7 ...

527508 Commits