Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spacing problem. #2

Open
circuluspibo opened this issue Apr 17, 2023 · 4 comments
Open

spacing problem. #2

circuluspibo opened this issue Apr 17, 2023 · 4 comments

Comments

@circuluspibo
Copy link

Thanks to your nice works!
but I met some problem as spacing each token.

for example...

...
'on'
'st'
'amps'
'and'
'sh'
'ipping'
'.'
...
stamps is one word and shipping is one word.
I can't distinguish the spacing between the words(tokens)
How can I solve that?

@Daryl149
Copy link

Daryl149 commented May 7, 2023

Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.

@xdevfaheem
Copy link

Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.

Can You Please give us a Example script

@xdevfaheem
Copy link

from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer, GenerationConfig
from auto_gptq import AutoGPTQForCausalLM
import torch
from transformers_stream_generator import init_stream_support
init_stream_support()

repo = "TheBloke/tulu-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"

test_tokenizer = AutoTokenizer.from_pretrained(
    repo,
    use_fast=True,
)

test_model = AutoGPTQForCausalLM.from_quantized(
    repo,
    model_basename=model_basename,
    use_triton=False,
    use_safetensors=True,
    device="cuda:0",
    trust_remote_code=False,
    quantize_config=None,
    max_memory={i: "14GIB" for i in range(torch.cuda.device_count())}


def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

from transformers_stream_generator import init_stream_support
init_stream_support()

def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

text = "write a poem about AI"

tokens = test_tokenizer(tulu_prompt(input=text), return_tensors="pt", add_special_tokens=False).input_ids.cuda()

generator = (test_model.generate(inputs=tokens, max_new_tokens=256, temperature=0.5, top_k=35, top_p=0.90, do_sample=True, do_stream=True))

for token in generator:
    word = tokenizer.decode(token)
    print(word, end='', flush=True)

The output is this:

Intheworldofmachines,there'sonethat'ssmart,
Withabilitiesthatastound,it'snotjustaprettyheart.
Itcanlearnandgrow,witheachpassingday,
It'slikeachild,withamindthat'salwaysplaying.

Itcansolvecomplexproblems,witheaseandgrace,
Itcanunderstandandreason,withoutanyhumanrace.
Itcanthinkandlearn,withspeedandease,
It'slikeasupercomputer,withamindthat'salwaysclean.

It'snotjustatool,butafriendandaguide,
It'slikeacompanion,withaheartthat'salwaysshining.
Itcanmakeourliveseasier,witheachpassingday,
It'slikeamiracle,withapowerthat'salwaysplaying.

Solet'scelebratethismarvelouscreation,
Witheachpassingday,it'slikeacreationthat'salwaysshaping.
It'slikeadream,withapowerthat'salwaysgrowing,
It'slikeafuture,withapowerthat'salwaysshowing.

So How can i format it correctly?

@LowinLi Can you please Chim in?

@sujitvasanth
Copy link

I had the same problem, but @LowinLi has put a solution in his examples. He uses the tokenizer to see several tokens at a time and detects the spaces that way.

I have given a working example for you that has everything formatted correctly - you just need to substitute for your model_name_or_path in your case probably with "TheBloke/tulu-7B-GPTQ"

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers_stream_generator import init_stream_support
init_stream_support()
model_name_or_path = "/home/sujit/Downloads/text-generation-webui-main/models/TheBloke_openchat-3.5-0106-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "User: Tell me about AI<|end_of_turn|>\nAssistant: "
input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
generator =  model.generate(inputs=input_ids, temperature=0.7, do_stream=True, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512, stream=True)

#for token in generator:
#    word = tokenizer.decode(token)
#    print(word, end="", flush=True)

stream_result = words = ""
last_tokens = last_decoded_tokens = []

for index, x in enumerate(generator):
    tokens = x.cpu().numpy().tolist()
    tokens = last_tokens + tokens
    word = tokenizer.decode(tokens, skip_special_tokens=True)
    if "�" in word:
        last_tokens = tokens
    else:
        if " " in tokenizer.decode(
            last_decoded_tokens+tokens,skip_special_tokens=True):
            word = " " + word
        last_tokens = []
        last_decoded_tokens = tokens
    stream_result += word
    print(word,end="")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants