Skip to content

Conversation

@xiaoxin01
Copy link
Contributor

Large file will cause memory leak, so add char/page limit function.

@xiaoxin01
Copy link
Contributor Author

@jonathaningram Could you please help to review this PR?

Because some pdf or docx files have many pages and take a lot of time to parse text, I think a limitation is needed.

@jonathaningram
Copy link
Member

@xiaoxin01 thank you for the PR. We'll take a look over it and get back to you as soon as possible. This PR changes the API surface of the package a fair bit so we'll want to look into those changes and we'll discuss here.

t.Fatalf("got error = %v, want nil", err)
}

resp, _, err := ConvertPDF(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoxin01 I think the best approach for supporting this kind configuration is to introduce new Converters. Something like:

package docconv

type PDFConverter struct {
  pageRange []int
  maxWords int
}

func NewPDFConverter(pageRange []int, maxWords int) *PDFConverter {
  return &PDFConverter{
    pageRange: pageRange,
    maxWords: maxWords,
  }
}

func (c *PDFConverter) Convert(r io.Reader) (string, map[string]string, error) {
  // ...
}

We want to avoid anymore global package state so I think a solution like this is probably going to be better. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants