-
Notifications
You must be signed in to change notification settings - Fork 111
Dataset API proposal
As a part of Incanter and core.matrix integration process, there is an idea to evolve existing in core.matrix dataset type and use it in Incanter.
In order to do that, Incanter dataset functions should be implemented in core.matrix.
Dataset is a matrix, i.e. it implements all matrix protocols.
Columns of dataset support heterogeneous datatypes and are uniquely identified by name of arbitrary type. On attempt of creating a column with a duplicate name, an exception should be raised.
By default, column names are incrementing Long values starting from 0, i.e. 0, 1, 2, etc
Dataset is not seqable. In order to get seq of rows, clojure.core.matrix/rows can be used.
(dataset column-names cols)
(dataset m)
Creates dataset from:
column-names and seq of columns
map of columns with associated list of values.
matrix - its columns will be used as dataset columns and incrementing Long values starting from 0, i.e. 0, 1, 2, etc will be used as column names.
(column-names ds)
Returns a persistent vector containing column names in the same order as they are placed in the dataset.
(column-name ds idx)
Returns column name at given index.
(select-columns ds cols)
Produces a new dataset with the columns in the specified order.
cols is a collection of column names to be selected.
(except-columns ds cols)
Returns new dataset with all columns except specified.
cols is a collection of column names to be excluded.
(merge-columns & args)
Returns a dataset created by combining columns of the given datasets.
In case of columns with duplicate names, an exception is raised.
(add-column ds col)
Adds column to the dataset.
If a column with the same name already exists in a dataset, exception would be raised.
(rename-columns ds col-map)
Renames columns based on map of old new column name pairs.
If a column with the same name already exists in a dataset, exception would be raised.
(replace-column ds col-name vs)
Replaces column in a dataset with new values.
(update-column ds col-name f & args)
Applies function f & args to the specified column of dataset and replaces the column with the resulting new values.
(get-row ds idx)
Returns row at given index.
(conj-rows & args)
Returns a dataset created by combining the rows of the given datasets and/or collections.
(to-matrix ds)
Creates matrix from dataset.
(to-map ds)
Returns map of columns with associated list of values.
(get-element ds c r)
Returns element at given column and row.
(group-by ds cols)
Returns a map of datasets, where keys are grouping columns.
(join ds & args)
Returns a dataset created by right-joining two or datasets.