Nice work! Could you please provide more insights about why predicting patch-wise neural field is more scalable than directly predicting pixels?