Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTZero PPL - ValueError: cannot convert float NaN to integer #11

Open
MelaniaNitu opened this issue Jul 9, 2023 · 2 comments
Open

Comments

@MelaniaNitu
Copy link

MelaniaNitu commented Jul 9, 2023

@BurhanUlTayyab Thanks for sharing the implementation. When running GPTZero code, I get the following error:

[/content/DetectGPT/model.py](https://localhost:8080/#) in getPPL_1(self, sentence)
    374             if end_loc == seq_len:
    375                 break
--> 376         ppl = int(torch.exp(torch.stack(nlls).sum() / end_loc))
    377         return ppl
    378 

**ValueError: cannot convert float NaN to integer**

The code I use to test GPTZero is:

  import pandas as pd
  from model import GPT2PPLV2
  import torch
  
  model = GPT2PPLV2()
  
  res_texts = []
  max_tokens = 512
  
  filtered_list = [text for text in mylist if len(text.split()) >= 100]  # Remove texts with less than 100 words

  for text in filtered_list:
      input_text = text[:max_tokens]
      result = model(input_text, 300, "v1")
      res_texts.append(result)

I have pre-processed the input text to handle NaN values or empty lines as shown below, however I still get this error when trying to run GPTZero model.

df['text'] = df['text'].fillna('')
df['text'] = df['text'].apply(lambda x: re.sub(r'\n\s*\n', '\n', x.strip()) if isinstance(x, str) else np.nan)
df['text'] = df['text'].apply(lambda x: x.strip().replace('\n\n', '\n') if isinstance(x, str) else '')
new_df = df.dropna(subset=['text'])

Can you please change the model.py code to handle NaN or provide a workaround to "skip" any line containing NaN when running the model?

Thanks in advance.

@MelaniaNitu MelaniaNitu changed the title GPTZero PPL error GPTZero PPL - ValueError: cannot convert float NaN to integer Jul 9, 2023
@nick-tonjum
Copy link

Any solution yet? I'm also facing this.

@Kanishk-Kumar
Copy link

Hi @BurhanUlTayyab , any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants