pyarrow

Here are 127 public repositories matching this topic...

vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

visualization python data-science machine-learning bigdata tabular-data hdf5 machinelearning dataframe memory-mapped-file pyarrow

Updated Apr 1, 2026
Python

ibis-project / ibis

Star

the portable Python dataframe library

mysql python bigquery sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql trino pyarrow datafusion duckdb polars

Updated Jun 19, 2026
Python

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

machine-learning deep-learning tensorflow pytorch pyspark parquet parquet-files sysml pyarrow

Updated Jan 2, 2026
Python

narwhals-dev / narwhals

Star

Lightweight and extensible compatibility layer between dataframe libraries!

pandas pyspark dask ibis pyarrow cudf duckdb polars

Updated Jun 19, 2026
Python

gizmodata / gizmosql

Sponsor

Star

🚀 GizmoSQL — High-Performance Database Server

tls sqlalchemy sql database jdbc sqlite databases sqlite3 jwt-authentication ibis apache-arrow pyarrow duckdb apache-arrow-flight-sql adbc apache-arrow-flight gizmosql gizmodata

Updated Jun 18, 2026
C++

wheretrue / biobear

Sponsor

Star

Work with bioinformatic files using Arrow, Polars, and/or DuckDB

python bioinformatics biology arrow biopython samtools pyarrow rust-bio duckdb polars

Updated Mar 10, 2025
Rust

dacort / faker-cli

Star

Command-line interface to quickly generate fake CSV and JSON data

aws json csv parquet faker-provider pyarrow deltalake

Updated Jul 11, 2024
Python

vertti / daffy

Star

Lightweight DataFrame validation decorators for Pandas, Polars, Modin, and PyArrow. No custom types required.

python validation data-validation pandas decorator dataframe python-decorator data-quality runtime-validation pyarrow modin pydantic dataframe-schema polars narwhals dataframe-validation

Updated Jun 19, 2026
Python

zen-xu / pyarrow-stubs

Sponsor

Star

Type annotations for pyarrow

typing pyarrow

Updated Jun 15, 2026
Python

RandomFractals / chicago-crimes

Sponsor

Star

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.

julia parquet jupyter-notebooks chicago pyarrow crimes duckdb polars large-csv malloy malloydata

Updated Jan 29, 2023
Jupyter Notebook

kraina-ai / overturemaestro

Star

An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features

python open-source openstreetmap geo geospatial pyarrow overturemaps overture-maps

Updated Jun 1, 2026
Python

Genentech / pysummaries

Star

Generate beautiful summary tables from pandas, polars or pyarrow dataframes

python pandas-dataframe pandas tables clinical-research summary-statistics tableone table1 pyarrow polars polars-dataframe real-world-data-analysis

Updated May 13, 2026
Python

ashvardanian / StringTape

Star

Apache Arrow-compatible space-efficient "tape" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variable length strings

arrow tape allocator string-manipulation apache-arrow pyarrow

Updated Nov 21, 2025
Rust

thread53 / pqviewer

Star

View Apache Parquet Files In Your Terminal

python cli terminal textual parquet pyarrow

Updated Mar 31, 2025
Python

icaropires / pdf2dataset

Star

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features