🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Add this line to your application’s Gemfile:
gem "polars-df"This library follows the Polars Python API.
Polars.read_csv("iris.csv")
.lazy
.filter(Polars.col("sepal_length") > 5)
.groupby("species")
.agg(Polars.all.sum)
.collectYou can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")From Parquet
Polars.read_parquet("file.parquet")
# or lazily with
Polars.scan_parquet("file.parquet")From Active Record
Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")From JSON
Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")
# or lazily with
Polars.scan_ndjson("file.ndjson")From Feather / Arrow IPC
Polars.read_ipc("file.arrow")
# or lazily with
Polars.scan_ipc("file.arrow")From Avro
Polars.read_avro("file.avro")From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})From an array of hashes
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])Get number of rows
df.heightGet column names
df.columnsCheck if a column exists
df.include?(name)Select a column
df["a"]Select multiple columns
df[["a", "b"]]Select first rows
df.headSelect last rows
df.tailFilter on a condition
df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]And, or, and exclusive or
df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xorBasic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].absRounding
df["a"].round(2)
df["a"].ceil
df["a"].floorLogarithm
df["a"].log # natural log
df["a"].log(10)Exponentiation
df["a"].expTrigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atanHyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanhSummary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].varGroup
df.groupby("a").countWorks with all summary statistics
df.groupby("a").maxMultiple groups
df.groupby(["a", "b"]).countAdd rows
df.vstack(other_df)Add columns
df.hstack(other_df)Inner join
df.join(other_df, on: "a")Left join
df.join(other_df, on: "a", how: "left")One-hot encoding
df.to_dummiesArray of hashes
df.rows(named: true)Hash of series
df.to_hCSV
df.to_csv
# or
df.write_csv("file.csv")Parquet
df.write_parquet("file.parquet")Numo array
df.to_numoYou can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})Supported types are:
- boolean -
Boolean - float -
Float64,Float32 - integer -
Int64,Int32,Int16,Int8 - unsigned integer -
UInt64,UInt32,UInt16,UInt8 - string -
Utf8,Binary,Categorical - temporal -
Date,Datetime,Time,Duration - other -
Object,List,Struct,Array[unreleased]
Get column types
df.schemaFor a specific column
df["a"].dtypeCast a column
df["a"].cast(Polars::Int32)Add Vega to your application’s Gemfile:
gem "vega"And use:
df.plot("a", "b")Specify the chart type (line, pie, column, bar, area, or scatter)
df.plot("a", "b", type: "pie")Group data
df.groupby("c").plot("a", "b")Stacked columns or bars
df.groupby("c").plot("a", "b", stacked: true)View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/polars-ruby.git
cd polars-ruby
bundle install
bundle exec rake compile
bundle exec rake test