Description
This discussion came up during the vectorizer improvements call.
We currently create VPlans for VF=1, fixed VFs, scalable VFs and VF=vscale x 1 on sve.
Whether or not these VPlans are tail folded is determined by TTI->preferPredicateOverEpilogue
or the -prefer-predicate-over-epilogue
flag.
Instead of using a hook, we could creating a new VPlan with tail folding and let the cost model decide whether or not to select it based on profitability.
We could probably also consider all the different tail folding styles, but to keep the number of VPlans reasonable we could begin by leaving that to TTI.
So e.g. on RISC-V, we would consider VF=1, VF=fixed, VF=scalable and VF=scalable + tail folding. The proposed default EVL tail folding style isn't compatible with fixed VFs.
On AArch64, we would probably have more VPlans since its tail folding style is supported by both fixed + scalable VFs IIUC.
One significant benefit for this would be that we be able to fall back to non-tail folded loops for scenarios that aren't fully supported with tail folding e.g. interleaved groups on RISC-V. But we would also need to consider the fact that non-tail folded loops may not always have their vectorized body run due to the minimum trip count, and account for that in the cost.