pos_emb = self.position_embedding_table(...)  why create this every time in forward pass?

## Problem

In the current implementation, this tensor is created on every forward pass:

```python
pos_emb = self.position_embedding_table(torch.arange(T, device=device))
```

This means we are allocating a new tensor every time, even though the values are constant and do not change across iterations.

This feels unnecessary and slightly wasteful.

---

## Proposed Solution: Register as a Buffer

We can pre-create the tensor once and register it as a buffer on the module:

```python
self.token_embedding_table = nn.Embedding(vocab_size, n_embd)
self.position_embedding_table = nn.Embedding(block_size, n_embd)
self.lm_head = nn.Linear(n_embd, vocab_size)

# Pre-create position indices once
self.register_buffer("position_ids", torch.arange(block_size))
```

Then in `forward()`:

```python
pos = self.position_ids[:T]
```

---

## Why This Helps

* Avoids repeated tensor allocation every forward pass
* Keeps the tensor on the correct device (CPU/GPU) automatically
* Makes it part of the model state (saved in `state_dict`, but not trainable)
* Slightly more efficient and cleaner design for static tensors

---

## Question

Is this the recommended approach? I am also learning so will appreciate some feedback :)
If this makes sense, I can open a small PR for it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pos_emb = self.position_embedding_table(...) why create this every time in forward pass? #54

Problem

Proposed Solution: Register as a Buffer

Why This Helps

Question

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pos_emb = self.position_embedding_table(...) why create this every time in forward pass? #54

Description

Problem

Proposed Solution: Register as a Buffer

Why This Helps

Question

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions