the generate() function right now only takes the last position of token generated, and shifts the entire window of input one position forward to generate the next output token, still only taking the last position. https://github.com/karpathy/ng-video-lecture/blob/master/gpt.py#L189
I was curious and looked at the entire output contents. For the input, I fed the output of a previous run with the current generate() function, so the input token sequence would be completely "based on the behavior of the model itself", so to speak. then I generated the entire list of T tokens from output. to my surprise, the output is very much gibberish , and quite different from the input (though I could still see a few matches).
I can't figure out why the current method of only taking from the last output position produces seemingly fluent sequences, while the output from middle of the block doesn't make sense. in the current scheme, input grows from torch.zeros((1,1)), up to block size, so during this period, it should be no different from what an output position in the middle of block_size sees, as the output position has masked out all input after it, effective it becomes the end of output window too
the generate() function right now only takes the last position of token generated, and shifts the entire window of input one position forward to generate the next output token, still only taking the last position. https://github.com/karpathy/ng-video-lecture/blob/master/gpt.py#L189
I was curious and looked at the entire output contents. For the input, I fed the output of a previous run with the current generate() function, so the input token sequence would be completely "based on the behavior of the model itself", so to speak. then I generated the entire list of T tokens from output. to my surprise, the output is very much gibberish , and quite different from the input (though I could still see a few matches).
I can't figure out why the current method of only taking from the last output position produces seemingly fluent sequences, while the output from middle of the block doesn't make sense. in the current scheme, input grows from torch.zeros((1,1)), up to block size, so during this period, it should be no different from what an output position in the middle of block_size sees, as the output position has masked out all input after it, effective it becomes the end of output window too