llvm-project

Author	SHA1	Message	Date
Sanjay Patel	10ea01d80d	[VectorCombine] make cost calc consistent for binops and cmps Code duplication (subsequently removed by refactoring) allowed a logic discrepancy to creep in here. We were being conservative about creating a vector binop -- but not a vector cmp -- in the case where a vector op has the same estimated cost as the scalar op. We want to be more aggressive here because that can allow other combines based on reduced instruction count/uses. We can reverse the transform in DAGCombiner (potentially with a more accurate cost model) if this causes regressions. AFAIK, this does not conflict with InstCombine. We have a scalarize transform there, but it relies on finding a constant operand or a matching insertelement, so that means it eliminates an extractelement from the sequence (so we won't have 2 extracts by the time we get here if InstCombine succeeds). Differential Revision: https://reviews.llvm.org/D75062	2020-02-25 08:41:59 -05:00
Sanjay Patel	e9c79a7aef	[VectorCombine] refactor to reduce duplicated code; NFC This should be the last step in the current cleanup. Follow-ups should resolve the TODO about cost calc and enable the more general case where we extract different elements.	2020-02-21 15:56:00 -05:00
Sanjay Patel	34e3485560	[VectorCombine] refactor cost calcs to reduce duplication; NFC More cleanup is possible now, but we probably need to resolve the TODO about the existing difference between compares and binops.	2020-02-21 15:12:00 -05:00
Sanjay Patel	fc4455891c	[VectorCombine] refactor matching code to reduce duplication; NFC cmp/binop were already diverging even though they are largely the same logic.	2020-02-21 12:06:51 -05:00
Sanjay Patel	62dd44d76d	[VectorCombine] fix cost calc for extract-cmp getOperationCost() is not the cost we wanted; that's not the throughput value that the rest of the calculation uses. We may want to switch everything in this code to use the getInstructionThroughput() wrapper to avoid these kinds of problems, but I'll look at that as a follow-up because that can create other logical diffs via using optional parameters (we'd need to speculatively create the vector instruction to make a fair(er) comparison).	2020-02-16 10:40:28 -05:00
Kadir Cetinkaya	1674f772b4	[VecotrCombine] Fix unused variable for assertion disabled builds	2020-02-14 09:30:29 +01:00
Sanjay Patel	19b62b79db	[VectorCombine] try to form vector binop to eliminate an extract element binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider. We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks). Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now. Differential Revision: https://reviews.llvm.org/D74495	2020-02-13 17:23:27 -05:00
Sanjay Patel	a2a0f9a43a	[VectorCombine] remove unused debug counter; NFC The variable was added to the initial commit via copy/paste of existing code, but it wasn't actually used in the code. We can add it back with the proper usage if/when that is needed.	2020-02-11 08:24:07 -05:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00

9 Commits