Newlines in Tfidf vectorizer corpus cause runtime exceptions when loading a trained vectorizer

## Description
When using the [Pecos Tfidf Vectorizer](https://github.com/amzn/pecos/blob/mainline/pecos/utils/featurization/text/vectorizers.py), if you train it using a corpus which includes newlines, this causes errors when loading the saved version. The error is because the vocab file is parsed using newlines to [delimit (index,vocab) pairs](https://github.com/amzn/pecos/blob/mainline/pecos/core/utils/tfidf.hpp#L363-L386) and if the vocab contains a newline it will crash since the entry is now across multiple lines.

## How to Reproduce?
Using latest version of libpecos

### Steps to reproduce

```python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pecos.utils.featurization.text.vectorizers import Tfidf
>>> vectorizer = Tfidf()
>>> trained = vectorizer.train(["test\ncorpus"], config={'ngram_range':(1,1)})
>>> trained.save('test')
>>> Tfidf.load('test')
terminate called after throwing an instance of 'std::runtime_error'
  what():  Corrupted vocab file.
Aborted
```

## What have you tried to solve it?

1. This is solvable by cleaning the input but it may be desirable to handle this case internally so that cases where newlines are important do not break the vectorizer.

## Error message or code output
(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

```
terminate called after throwing an instance of 'std::runtime_error'
  what():  Corrupted vocab file.
Aborted
```

## Environment
- Operating system:
- Python version: 3.10
- PECOS version: 1.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newlines in Tfidf vectorizer corpus cause runtime exceptions when loading a trained vectorizer #263

Description

How to Reproduce?

Steps to reproduce

What have you tried to solve it?

Error message or code output

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Newlines in Tfidf vectorizer corpus cause runtime exceptions when loading a trained vectorizer #263

Description

Description

How to Reproduce?

Steps to reproduce

What have you tried to solve it?

Error message or code output

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions