llvm-project

Author	SHA1	Message	Date
Craig Topper	4477500533	[RISCV] ISel (and (shift X, C1), C2)) to shift pair in more cases Previously, these isel optimizations were disabled if the AND could be selected as a ANDI instruction. This patch disables the optimizations only if the immediate is valid for C.ANDI. If we can't use C.ANDI, we might be able to compress the shift instructions instead. I'm not checking the C extension since we have relatively poor test coverage of the C extension. Without C extension the code size should be equal. My only concern would be if the shift+andi had better latency/throughput on a particular CPU. I did have to add a peephole to match SRLIW if the input is zexti32 to prevent a regression in rv64zbp.ll. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122701	2022-03-30 11:46:42 -07:00
Craig Topper	7417eb29ce	[RISCV] Use getSplatBuildVector instead of getSplatVector for fixed vectors. The splat_vector will be legalized to build_vector eventually anyway. This patch makes it take fewer steps. Unfortunately, this results in some codegen changes. It looks like it comes down to how the nodes were ordered in the topological sort for isel. Because the build_vector is created earlier we end up with a different ordering of nodes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122185	2022-03-30 11:36:34 -07:00
Fangrui Song	e78cea0a91	[X86][test] Precommit D122541 tests for prologue/epilogue CFI Currently there is no CFI_INSTRUCTION MIR test with .ll input. This patch adds some -stop-after=prologepilog tests.	2022-03-30 09:02:23 -07:00
Sanjay Patel	436b875e49	[SDAG] avoid libcalls to fmin/fmax for soft-float targets This is an extension of D70965 to avoid creating a mathlib call where it did not exist in the original source. Also see D70852 for discussion about an alternative proposal that was abandoned. In the motivating bug report: https://github.com/llvm/llvm-project/issues/54554 ...we also have a more general issue about handling "no-builtin" options. Differential Revision: https://reviews.llvm.org/D122610	2022-03-30 11:22:03 -04:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Sanjay Patel	849d577e56	[x86] add tests for fcmp with 0.0 operand; NFC	2022-03-30 08:37:15 -04:00
Sanjay Patel	5b4bbaa8d8	[SystemZ] generate full checks for tests; NFC These may change if we transform the fcmp (setcc) to avoid a constant operand.	2022-03-30 08:37:15 -04:00
Simon Pilgrim	14a89d00c7	[X86] Extend xor-lea test coverage Add XOR(ADD/SUB(X,Y),MIN_SIGNED_VALUE) tests and adjust some XOR(SHL(X,C),MIN_SIGNED_VALUE) shifts to better match LEA scales	2022-03-30 13:34:32 +01:00
Fraser Cormack	43a91a8474	[SelectionDAG] Don't create illegally-typed nodes while constant folding This patch fixes a (seemingly very rare) crash during vector constant folding introduced in D113300. Normally, during legalization, if we create an illegally-typed node during a failed attempt at constant folding it's cleaned up before being visited, due to it having no uses. If, however, an illegally-typed node is created during one round of legalization and isn't cleaned up, it's possible for a second round of legalization to create new illegally-typed nodes which add extra uses to the old illegal nodes. This means that we can end up visiting the old nodes before they're known to be dead, at which point we crash. I'm not happy about this fix. Creating illegal types at all seems like a bad idea, but we all-too-often rely on illegal constants being successfully folded and being fixed up afterwards. However, we can't rely on constant folding actually happening, and we don't have a foolproof way of peering into the future. Perhaps the correct fix is to revisit the node-iteration order during legalization, ensuring we visit all uses of nodes before the nodes themselves. Or alternatively we could try and clean up dead nodes immediately after failing constant folding. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122382	2022-03-30 13:17:55 +01:00
Simon Pilgrim	e000dbc39f	[X86] Add test coverage based off Issue #51609	2022-03-30 12:57:22 +01:00
Luo, Yuanke	7471d8b13c	[X86][AMX] Pre-checkin the test case for AMX undef and zero	2022-03-30 17:53:01 +08:00
Luo, Yuanke	1141c8b6fc	[X86][AMX] Fix bug for amx cast tranform After combining amx cast operation, some amx cast intrinsic may be dead code. This patch is to delete such dead code and avoid crash.	2022-03-30 17:22:30 +08:00
Simon Pilgrim	bce954321a	[X86] Add PR47857 test case	2022-03-30 09:51:36 +01:00
Liqin Weng	4cb85da811	[RISCV] Add CMIX isel pattern for (xor (and (xor rs1, rs3), rs2), rs3) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122702	2022-03-30 16:51:09 +08:00
Simon Pilgrim	6697e3354f	[X86] combineADC - fold ADC(C1,C2,Carry) -> ADC(0,C1+C2,Carry) If we're not relying on the flag result, we can fold the constants together into the RHS immediate operand and set the LHS operand to zero, simplifying for further folds. We could do something similar if the flag result is in use and the constant fold doesn't affect it, but I don't have any real test cases for this yet. As suggested by @davezarzycki on Issue #35256 Differential Revision: https://reviews.llvm.org/D122482	2022-03-30 09:11:55 +01:00
Nikita Popov	8a72391f60	[IR] Require intrinsic struct return type to be anonymous This is an alternative to D122376. Rather than working around the problem, this patch requires that struct return types in intrinsics are anonymous/literal and adds auto-upgrade code to convert existing uses of intrinsics with named struct types. This ensures that the mapping between intrinsic name and intrinsic function type is actually bijective, as it is supposed to be. This also fixes https://github.com/llvm/llvm-project/issues/37891. Differential Revision: https://reviews.llvm.org/D122471	2022-03-30 09:51:24 +02:00
Liqin Weng	7f81765898	[RISCV][NFC] Add immediate tests for the icmp instruction Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122651	2022-03-30 02:51:26 +00:00
Chenbing Zheng	780eb9f586	[DAGCombine] add tests for bitreverse-shift optimization This patch add some tests to show some optimization opportunities for bitreverse-shift. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121507	2022-03-30 09:50:28 +08:00
Zakk Chen	b578330754	[RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof. masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could check maskedoff value to decide mask policy rather than have a addtional policy operand. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D122456	2022-03-29 18:05:33 -07:00
Zakk Chen	10b2760da0	Revert "[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR" This reverts commit 10fd2822b77e12215b4ea82fc6d0a052961eb9d9. I have a better implementation for those operations without the additional policy operand. masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could assume undef maskedoff is mask agnostic. Differential Revision: https://reviews.llvm.org/D122455	2022-03-29 18:05:33 -07:00
Yonghong Song	5898979387	BPF: support inlining __builtin_memcmp intrinsic call Delyan Kratunov reported an issue where __builtin_memcmp is not inlined into simple load/compare instructions. This is a known issue. In the current state, __builtin_memcmp will be converted to memcmp call which won't work for bpf programs. This patch added support for expanding __builtin_memcmp with actual loads and compares up to currently maximum 128 total loads. The implementation is identical to PowerPC. Differential Revision: https://reviews.llvm.org/D122676	2022-03-29 15:03:26 -07:00
Eli Friedman	a8ebd85e46	[MC] Make MCAsmInfo::isAcceptableChar reflect MCAsmInfo::doesAllowAtInName On targets which don't allow "@" in unquoted identifiers, make sure we don't emit them; otherwise, we can't parse our own output. Differential Revision: https://reviews.llvm.org/D122516	2022-03-29 14:01:32 -07:00
Sanjay Patel	7f5c2f6a76	[x86] consolidate tests and auto-gen complete check lines; NFC The same test was duplicated in 2 files.	2022-03-29 14:55:04 -04:00
Stanislav Mekhanoshin	f311f934e1	[AMDGPU] gfx940 VALU hazard recognizer Differntial Revision: https://reviews.llvm.org/D122339	2022-03-29 10:57:54 -07:00
Simon Pilgrim	d32d65b903	[X86] Regenerate x86-interleaved-access.ll with AVX1OR2 common check-prefix to reduce duplication	2022-03-29 12:24:46 +01:00
Jay Foad	2b754384ad	[AMDGPU] Generate checks in atomic_optimizations_*.ll This had already been done for some of these files but not all.	2022-03-29 11:05:23 +01:00
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
Chenbing Zheng	6a01b676cf	[DAGCombine] add tests for bswap-shift optimization Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D121504	2022-03-29 16:34:52 +08:00
Zi Xuan Wu	0365c54ca3	[CSKY] Add CSKYTargetObjectFile to support exception handling Initialize TargetLoweringObjectFileELF and EH header.	2022-03-29 16:05:30 +08:00
Zi Xuan Wu	27c18558e6	[CSKY] Add missing codegen pattern for 16-bit instruction In generic cpu model, there are only low 16 registers and little 32-bit instruction. CK801 is the cpu family with least basic features like generic model. Add test run and check for generic cpu model in original test case to cover basic LLVM IR functionality.	2022-03-29 16:05:30 +08:00
Lian Wang	2c503dcb4f	[RISCV][NFC] Remove redundant check and rename functions in some IR tests Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122204	2022-03-29 07:29:58 +00:00
Liqin Weng	d660c0d793	[RISCV] Optimize LI+SLT to SLTI+XORI for immediates in specific range This transform will reduce one GPR. Reviewed By: craig.topper, benshi001 Differential Revision: https://reviews.llvm.org/D122051	2022-03-29 14:46:49 +08:00
zhongyunde	2b3becb41d	[AArch64][GlobalISel] Add new MOVI pattern for fp constants GlobalISel is used in option -O0, so add MOVI pattern for it, which is done similar in gcc.(https://godbolt.org/z/8j6fzG3h6) Fix https://github.com/llvm/llvm-project/issues/53651 Reviewed By: dmgreen, paquette Differential Revision: https://reviews.llvm.org/D122559	2022-03-29 10:57:22 +08:00
Craig Topper	01203918d1	[RISCV] Add computeKnownBits support for RISCVISD::GORC. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D121575	2022-03-28 16:56:33 -07:00
Craig Topper	e68257fcee	[RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI. Modified DAGCombiner to pass the shift the bittest input and the shift amount to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h This is an alternative to D122454. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122458	2022-03-28 12:46:36 -07:00
Sanjay Patel	382de90896	[RISCV] add tests for minnum/maxnum; NFC Issue #54554	2022-03-28 15:40:23 -04:00
Craig Topper	cfe533da05	[RISCV] Add lowering for vp.fptosi and vp.sitofp. This as an alternative version of D120641. Starting from the code here https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/raw/EPI/llvm/lib/Target/RISCV/RISCVISelLowering.cpp but with some modifications to how the interim types are calculated, and adding support for f16. Still need to add fptosi for mask vectors. Lots of masked isel patterns added so we can pass the mask through the type changes. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D122512	2022-03-28 11:06:41 -07:00
Simon Pilgrim	8a1956dfa5	[X86] lowerV64I8Shuffle - attempt to match with lowerShuffleAsLanePermuteAndPermute Fixes #54562	2022-03-28 17:21:27 +01:00
Thomas Symalla	3bd15c03c6	[AMDGPU] Fix adding modifiers when creating v_cmpx instructions. Revision https://reviews.llvm.org/D122332 added a pattern transformation where v_cmpx instructions are introduced. However, the modifiers are not correctly inherited from the original operands. The patch adds the source modifiers, if they are exist, or sets them to 0. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D122489	2022-03-28 17:52:53 +02:00
Jyotsna Verma	65a2f6ad9c	[Hexagon] Create an intrinsic to profile using a custom handler The intrinsic is lowered into a hexagon pseudo instruction which after register allocation is expanded into A2_tfrsi and J2_call.	2022-03-28 10:31:41 -05:00
Daniil Kovalev	a8c277041a	[NVPTX] Fix poorly designed assertion introduced in D120129 NVPTXTargetLowering::getFunctionParamOptimizedAlign, which was introduces in D120129, contained a poorly designed assertion checking that a function with internal or private linkage is not a kernel. It relied on invariants that were not actually guaranteed, and that resulted in compiler crash with some CUDA versions (see discussion with @jdoerfert in D120129). This patch changes that assertion and makes it use isKernelFunction which is designed exactly for such checks. This patch also includes a test with IR that caused compiler crash before. Differential Revision: https://reviews.llvm.org/D122562	2022-03-28 17:34:58 +03:00
Simon Pilgrim	614363ecc0	[X86] Add shuffle tests from Issue #54562	2022-03-28 13:54:17 +01:00
Carl Ritson	1f52d02ceb	[AMDGPU] Split waterfall loop exec manipulation Split waterfall loops into multiple blocks so that exec mask manipulation (s_and_saveexec) does not occur in the middle of a block. VGPR live range optimizer is updated to handle waterfall loops spanning multiple blocks. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D122200	2022-03-28 17:44:54 +09:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
zhongyunde	758be63ac6	[test][AArch64] Add a test case for D121180 NFC Now, perform last active true vector combine only where we're extracting from a flag-setting operation. But in fact, the last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so precommit this case to test the potentially further optimization. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122453	2022-03-26 19:12:16 +08:00
Ben Shi	bce2e208e0	[AVR] Optimize int16 airthmetic right shift for shift amount 7/14/15 Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115618	2022-03-26 06:53:27 +00:00
Ben Shi	49b0b5f0fa	[AVR][NFC] Fix incorrect register states in expanding pseudo instructions Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D118354	2022-03-25 16:02:15 +00:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test changes.	2022-03-25 09:36:50 -05:00
Simon Pilgrim	f84b5c11dd	[X86] Add test showing failure to fold multiple constant args in ADC As noticed on Issue #35256	2022-03-25 13:42:08 +00:00

1 2 3 4 5 ...

42810 Commits