### π Describe the improvement or the new tutorial It was challenging for me to initially grasp why `requires_grad` was done *after* weights, but *in the same line* as bias under https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nn At first glance, the code looks inconsistent: 1. `weights` initialization is split into two lines. 2. `bias` initialization is done in one line. **The Logic Gap** The tutorial currently explains *that* we do it, but not exactly *why* the distinction exists between these two specific variables. * **The Bias** is created using a factory function (`torch.zeros`) with no subsequent mathematical operations. It is born as a "Leaf Node" (a source parameter). * **The Weights** involve a mathematical operation (`/ math.sqrt(...)`). If we set `requires_grad=True` inside `torch.randn()`, PyTorch records the division as a computational step. The resulting `weights` variable becomes a **non-leaf node** (a calculated outcome), which the optimizer cannot update. **Proposed Improvement** I propose modifying the comment block to explicitly mention that `requires_grad` must be deferred until *after* the initialization math is complete to preserve the tensor as a trainable parameter (Leaf Node). ### Existing tutorials on this topic * https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html * https://docs.pytorch.org/tutorials/beginner/nn_tutorial.html#neural-net-from-scratch-without-torch-nn ### Additional context <img width="1375" height="560" alt="Image" src="https://github.com/user-attachments/assets/07bb6dfb-2b18-4665-97e8-85682ba6c2bd" />