llvm-project

Author	SHA1	Message	Date
Dmitry Borisenkov	a38d5e0632	[SelectionDAG] Use LAST_INTEGER_VALUETYPE instead of i64 (#98299 ) When looking for a largest legal integer type for a target `TargetLowering::findOptimalMemOpLowering` assumes that `MVT::i64` is the largets possible integer type. The patch removes this assumption and uses `MVT::LAST_INTEGER_VALUETYPE` instead.	2024-07-10 21:38:50 +04:00
AtariDreams	4f8b2fff6d	[DAG] Use break instead of continue to leave do while (false) loop (NFC) (#97966 )	2024-07-10 20:51:06 +04:00
paperchalice	abde52aa66	[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118 ) - Add `LiveIntervalsAnalysis`. - Add `LiveIntervalsPrinterPass`. - Use `LiveIntervalsWrapperPass` in legacy pass manager. - Use `std::unique_ptr` instead of raw pointer for `LICalc`, so destructor and default move constructor can handle it correctly. This would be the last analysis required by `PHIElimination`.	2024-07-10 19:34:48 +08:00
Daniel Kiss	1782810b84	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option. Releand with test fixes.	2024-07-10 11:32:41 +02:00
paperchalice	145a692947	[CodeGen] Format `PHIElimination.cpp` NFC (#98289 ) clang-format will format entire class when `class PHIElimination : public MachineFunctionPass {` is changed. Format it firstly to reduce unnecessary changes when porting it to new pass manager.	2024-07-10 17:13:02 +08:00
Daniel Kiss	4b2daeccc7	Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284 ) Reverts llvm/llvm-project#82819	2024-07-10 10:22:38 +02:00
Daniel Kiss	e15d67cfc2	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option.	2024-07-10 10:06:14 +02:00
Kazu Hirata	ef9aba2a2f	[CodeGen] Use range-based for loops (NFC) (#98104 )	2024-07-10 16:10:48 +09:00
Alex Bradbury	4d052a7618	[Intrinsics][PreISelIntrinsicLowering] llvm.memset.inline length no longer needs to be constant (#95397 ) As requested in https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496 this patch removes the requirement that the length of llvm.memset.inline is a constant, and adjusts PreISelIntrinsicLowering so it supports expanding such the intrinsic in the case it has a non-constant length.	2024-07-10 07:58:52 +01:00
AdityaK	3e4adef946	[NFC] Add reference to the clustering algortihm for switch statements (#98239 ) Menezes, Evandro, Sebastian Pop, and Aditya Kumar. "Clustering case statements for indirect branch predictors." arXiv preprint arXiv:1910.02351 (2019). https://arxiv.org/pdf/1910.02351v2	2024-07-09 21:26:32 -07:00
David Tellenbach	8f159096e0	[AsmPrinter] Don't check for inlineasm dialect on non-X86 platforms (#98097 ) AArch64 uses MCAsmInfo::AssemblerDialect to control the style of emitted Neon assembly. E.g. Apple platforms use AsmWriterVariantTy::Apple by default which collides with InlineAsm::AD_Intel (both value 1). Checking for inlineasm dialects on non-X86 platforms can thus lead to problems.	2024-07-09 12:44:52 -07:00
Min-Yih Hsu	7e2f96194f	[MachineSink] Fix missing sinks along critical edges (#97618 ) 4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V. This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities. This change also improves two of the AArch64's cases.	2024-07-09 10:48:22 -07:00
Luke Lau	baf22a527c	[SelectionDAG] Handle vscale range wrapping in isKnownNeverZero As pointed out by @preames, ConstantRange can wrap so it's possible for zero to be in a range without zero being the minimum. This fixes this by checking contains instead.	2024-07-09 23:05:22 +08:00
paperchalice	4010f894a1	[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941 ) - Add `SlotIndexesAnalysis`. - Add `SlotIndexesPrinterPass`. - Use `SlotIndexesWrapperPass` in legacy pass.	2024-07-09 12:09:11 +08:00
paperchalice	ac0b2814c3	[CodeGen][NewPM] Port `LiveVariables` to new pass manager (#97880 ) - Port `LiveVariables` to new pass manager. - Convert to `LiveVariablesWrapperPass` in legacy pass manager.	2024-07-09 10:50:43 +08:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
Kazu Hirata	d1f0ba6155	[AsmPrinter] Use range-based for loops (NFC) (#97977 )	2024-07-09 05:55:29 +09:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
Momchil Velikov	a497e987e5	Reapply "[AArch64] Lower extending sitofp using tbl (#92528 )" This re-commits d1a4f0c9fb559eb4c2fb56112e56343bcd333edc after a issue was fixed in f92bfca9fc217cad9026598ef6755e711c0be070 ("[AArch64] All bits of an exact right shift are demanded (#97448)").	2024-07-08 11:55:29 +01:00
esmeyi	c119da23af	[PowerPC] Function descriptor symbol may be omitted for external symbol. #97526 If a function's address is taken, which means it may be called via a function pointer, we need the function descriptor for it. Otherwise, the function descriptor can be omitted for external symbols.	2024-07-08 03:47:33 -04:00
Fangrui Song	2718654c54	[MC] Support .cfi_label GNU assembler 2.26 introduced the .cfi_label directive. It does not expand to any CFI instructions, but defines a label in .eh_frame/.debug_frame, which can be used by runtime patching code to locate the FDE. .cfi_label is not allowed for CIE's initial instructions, and can therefore be used to force the next instruction to be placed in a FDE instead of a CIE. In glibc since 2018, sysdeps/riscv/start.S utilizes .cfi_label to force DW_CFA_undefined to be placed in a FDE. arc/csky/loongarch ports have copied this use. ``` .cfi_startproc // DW_CFA_undefined is allowed for CIE's initial instructions. // Without .cfi_label, gas would place DW_CFA_undefined in a CIE. .cfi_label .Ldummy .cfi_undefined ra .cfi_endproc ``` No CFI instruction is associated with .cfi_label, so the `case MCCFIInstruction::OpLabel:` code in BOLT is unreachable and onlt to make -Wswitch happy. Close #97222 Pull Request: https://github.com/llvm/llvm-project/pull/97922	2024-07-07 12:41:13 -07:00
Kazu Hirata	75bc20ff89	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914 )	2024-07-07 08:23:41 +09:00
Youngsuk Kim	34855405b0	[llvm] Avoid 'raw_string_ostream::str' (NFC) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO item to remove `raw_string_ostream::str()`.	2024-07-05 17:22:03 -05:00
Bjorn Pettersson	c2fbc701aa	[SelectionDAG] Let ComputeKnownSignBits handle (shl (ext X), C) (#97695 ) Add simple support for looking through ZEXT/ANYEXT/SEXT when doing ComputeKnownSignBits for SHL. This is valid for the case when all extended bits are shifted out, because then the number of sign bits can be found by analysing the EXT operand. A future improvement could be to pass along the "shifted left by" information in the recursive calls to ComputeKnownSignBits. Allowing us to handle this more generically.	2024-07-05 22:37:26 +02:00
Luke Lau	e4b28420f6	[SelectionDAG] Handle VSCALE in isKnownNeverZero (#97789 ) VSCALE is by definition greater than zero, but this checks it via getVScaleRange anyway. The motivation for this is to be able to check if the EVL for a VP strided load is non-zero in #97394. I added the tests to the RISC-V backend since the existing X86 known-never-zero.ll test crashed when trying to lower vscale for the +sse2 RUN line.	2024-07-05 16:11:06 +08:00
Shengchen Kan	a48305e0f9	[X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1 This fixes the crash when building llvm-test-suite with avx512f + cf.	2024-07-05 15:50:21 +08:00
Shengchen Kan	c60b9307d0	Revert "[X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1" This reverts commit 74984dee51307779a3eab10a8cd6102be37e1081. It caused AArch64 test sve-nontemporal-masked-ldst.ll to fail.	2024-07-05 15:14:30 +08:00
Shengchen Kan	74984dee51	[X86][CodeGen] Convert masked.load/store to CLOAD/CSTORE node only when vector size = 1 This fixes the crash when building llvm-test-suite with avx512f + cf.	2024-07-05 14:35:42 +08:00
Craig Topper	33112cbf59	[DAGCombiner] Remove unnecessary assert from getShiftAmountTy wrapper. NFC The same assert appears in the TargetLowering function. Refine comment to describe as a convenience wrapper and leave it to TargetLowering documentation to explain.	2024-07-04 19:05:54 -07:00
Craig Topper	8419da8bd4	[SelectionDAG] Remove LegalTypes argument from getShiftAmountConstant. (#97653 ) #97645 proposed to remove LegalTypes from getShiftAmountTy. This patches removes it from getShiftAmountConstant which is one of the callers of getShiftAmountTy.	2024-07-04 18:33:25 -07:00
Craig Topper	3141c11fe8	[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757 ) This argument is no longer used inside the function. Remove it from the interface.	2024-07-04 15:24:54 -07:00
Simon Pilgrim	687531fbed	[DAG] PromoteIntRes_EXTRACT_SUBVECTOR - pull out repeated getOperand/getVectorElementType calls. NFC.	2024-07-04 17:12:43 +01:00
Craig Topper	34fe032fdb	[DAGCombiner] Use getShiftAmountConstant where possible. (#97683 ) In #97645, I proposed removing the LegalTypes operand to TargetLowering::getShiftAmountTy. This means we don't need to use the DAGCombiner wrapper for getShiftAmountTy that manages this flag. Now we can use getShiftAmountConstant and let it call TargetLowering::getShiftAmountTy.	2024-07-04 08:44:50 -07:00
Craig Topper	f4d058fdb1	[SelectionDAG] Ignore LegalTypes parameter in TargetLoweringBase::getShiftAmountTy. (#97645 ) When this flag was false, `getShiftAmountTy` would return `PointerTy` instead of the target's preferred shift amount type for scalar shifts. This used to be needed when the target's preferred type wasn't large enough to support the shift amount needed for an illegal type. For example, any scalar type larger than i256 on X86 since X86's preferred shift amount type is i8. For a while now, we've had code that uses `MVT::i32` if `LegalTypes` is true, but the target's preferred type is too small. This fixed a repeated cause of crashes where the `LegalTypes` flag wasn't set to false when illegal types could be present. This has made it unnecessary to set the `LegalTypes` flag correctly, and as a result more and more places don't. So I think its time for this flag to go away. This first patch just disconnects the flag. The interface and all callers will be cleaned up in follow up patches. The X86 test change is because we now have the same shift type for both shifts in a (srl (sub C, (shl X, 32), 32) sequence. This makes the shift amounts appear equal in value and type which is needed to enable a combine.	2024-07-04 08:42:53 -07:00
Nicholas Guy	6222c8f030	[IR][LangRef] Add partial reduction add intrinsic (#94499 ) Adds the llvm.experimental.partial.reduce.add.* overloaded intrinsic, this intrinsic represents add reductions that result in a narrower vector.	2024-07-04 13:32:42 +01:00
Haohai Wen	73f5f83b19	[BasicBlockSections] Using MBBSectionID as DenseMap key (#97295 ) getSectionIDNum may return same value for two different MBBSectionID. e.g. A Cold type MBBSectionID with number 0 and a Default type MBBSectionID with number 2 get same value 2 from getSectionIDNum. This may lead to overwrite of MBBSectionRanges. Using MBBSectionID itself as DenseMap key is better choice.	2024-07-04 09:52:38 +08:00
Craig Topper	a3c5c83273	[DAGCombiner] Remove unneeded getValueType() calls in visitMULHS/MULHU. NFC We have an existing VT variable that should match N0.getValueType.	2024-07-03 13:35:04 -07:00
Yingwei Zheng	d5c9ffd545	[SDAG] Intersect poison-generating flags after CSE (#97434 ) This patch fixes a miscompilation when `N` gets CSEed to `Existing`: ``` Existing: t5: i32 = sub nuw Constant:i32<0>, t3 N: t30: i32 = sub Constant:i32<0>, t3 ``` Fixes https://github.com/llvm/llvm-project/issues/96366.	2024-07-03 20:32:46 +08:00
David Green	3b73cb3bf1	[AArch64][GlobalISel] Create copy rather than single-element concat The verifier does not accept single-element G_CONCAT_VECTORS, so if there is a single Op generate a COPY instead.	2024-07-03 10:22:15 +01:00
Alexis Engelke	bb260eb87d	[CodeGen] Only deduplicate PHIs on critical edges (#97064 ) PHIElim deduplicates identical PHI nodes to reduce the number of copies inserted. There are two cases: 1. Identical PHI nodes are in different blocks. That's the reason for this optimization; this can't be avoided at SSA-level. A necessary prerequisite for this is that the predecessors of all basic blocks (where such a PHI node could occur) are the same. This implies that all (>= 2) predecessors must have multiple successors, i.e. all edges into the block are critical edges. 2. Identical PHI nodes are in the same block. CSE can remove these. There are a few cases, however, where they still occur regardless: - expand-large-div-rem creates PHI nodes with large integers, which get lowered into one PHI per MVT. Later, some identical values (zeroes) get folded, resulting in identical PHI nodes. - peephole-opt occasionally inserts PHIs for the same value. - Some pseudo instruction emitters create redundant PHI nodes (e.g., AVR's insertShift), merging the same values more than once. In any case, this happens rarely and MachineCSE handles most cases anyway, so that PHIElim only gets to see very few of such cases (see changed test files). Currently, all PHI nodes are inserted into a DenseMap that checks equality not by pointer but by operands. This hash map is pretty expensive (hashing itself and the hash map), but only really useful in the first case. Avoid this expensive hashing most of the time by restricting it to basic blocks with only critical input edges. This improves performance for code with many PHI nodes, especially at -O0. (Note that Clang often doesn't generate PHI nodes and -O0 includes no mem2reg. Other compilers always generate PHI nodes.)	2024-07-03 11:19:05 +02:00
Thorsten Schütt	c5b67dde98	[GlobalIsel][NFC] Modernize UBFX combine (#97513 ) Credits: https://reviews.llvm.org/D99283	2024-07-03 09:19:40 +02:00
Kazu Hirata	3641efcf8c	[CodeGen] Use range-based for loops (NFC) (#97500 )	2024-07-02 19:24:53 -07:00
Ryotaro KASUGA	0a369b06e3	Reapply "[MachinePipeliner] Fix constraints aren't considered in cert… (#97259 ) …ain cases" (#97246) This reverts commit e6a961dbef773b16bda2cebc4bf9f3d1e0da42fc. There is no difference from the original change. I re-ran the failed test and it passed. So the failure wasn't caused by this change. test result: https://lab.llvm.org/buildbot/#/builders/176/builds/585	2024-07-03 09:15:41 +09:00
Kazu Hirata	58fd3bea6d	[CodeGen] Use range-based for loops (NFC) (#97467 )	2024-07-02 16:36:13 -07:00
Igor Kudrin	23db37c51c	[CodeGen] Do not emit TRAP for `unreachable` after `@llvm.trap` (#94570 ) With `--trap-unreachable`, `clang` can emit double `TRAP` instructions for code that contains a call to `__builtin_trap()`: ``` > cat test.c void test() { __builtin_trap(); } > clang test.c --target=x86_64 -mllvm --trap-unreachable -O1 -S -o - ... test: ... ud2 ud2 ... ``` `SimplifyCFGPass` inserts `unreachable` after a call to a `noreturn` function, and later this instruction causes `TRAP/G_TRAP` to be emitted in `SelectionDAGBuilder::visitUnreachable()` or `IRTranslator::translateUnreachable()` if `TargetOptions.TrapUnreachable` is set. The patch checks the instruction before `unreachable` and avoids inserting an additional trap.	2024-07-02 15:36:02 -07:00
Youngsuk Kim	a95c85fba5	[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318 ) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-07-01 21:52:37 -04:00
Kazu Hirata	bf6f2c1c43	[CodeGen] Use range-based for loops (NFC) (#97187 )	2024-07-01 16:11:09 -07:00
Shilei Tian	9a4f57ec1e	[SelectionDAG] Use `EVT::getIntegerVT` in `getBitcastedAnyExtOrTrunc` (#96658 ) `SelectionDAG::getBitcastedAnyExtOrTrunc` assumes that there is always a valid integer type corresponding to another type, which is not always true when it comes to vector type. For example, `<3 x i8>` doesn't have a corresponding integer type. Fix SWDEV-464698.	2024-07-01 15:10:57 -04:00
Simon Pilgrim	163d00c666	[DAG] Pull out repeated SDLoc in SELECT/SETCC folds. NFC.	2024-07-01 18:03:46 +01:00
Alexis Engelke	80ffec7884	[AsmPrinter] Remove timers (#97046 ) Timers are an out-of-line function call and a global variable access, here twice per emitted instruction. At this granularity, not only the time results become skewed, but the timers also add a performance overhead when profiling is disabled. Also outside of the innermost loop, timers add a measurable overhead. As this is quite expensive for a mostly unused profiling facility, remove the timers. Fixes #39650.	2024-07-01 16:20:54 +02:00

1 2 3 4 5 ...

36058 Commits