-
Notifications
You must be signed in to change notification settings - Fork 600
Description
Thank you for reaching out and helping us improve Vaex!
Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.
Description
Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.
Software information
- Vaex version (
import vaex; vaex.__version__)
: 4.16.1 - Vaex was installed via: pip / conda-forge / from source pip
- OS: Linux (colab)
Additional information
If you run this on a limited machine like google colab free, you will get a OOM crash when exporting to hdf5, even though it works fine exporting to arrow. We need to convert the string to a large_string because of pyarrow issues https://issues.apache.org/jira/browse/ARROW-17828
import vaex
import pyarrow as pa
df = vaex.example()
df["text"] = vaex.vconstant("OHYEA"*10000, len(df))
@vaex.register_function()
def to_large(arr):
return arr.cast(pa.large_string())
df["text"] = df["text"].to_large()
#OOM
df.export("file.hdf5")