KhiopsML · ElouenGinat · Jun 23, 2026 · Jun 23, 2026 · Jun 26, 2026 · Jun 26, 2026
diff --git a/README.md b/README.md
@@ -10,29 +10,12 @@
 
 Khisto is a Python library for creating histograms using the **Khiops optimal binning algorithm**. Unlike standard histograms that use fixed-width bins or simple heuristics, Khisto automatically determines the optimal number of bins and their variable widths to best represent the underlying data distribution.
 
-## Features
-
-- **Optimal Binning**: Uses the MODL (Minimum Description Length) principle to find the best discretization.
-- **Variable-Width Bins**: Captures dense regions with fine bins and sparse regions with wider bins.
-- **NumPy Compatible**: Drop-in replacement for `numpy.histogram`.
-- **Matplotlib Integration**: `khisto.matplotlib.hist` works like `plt.hist`.
-- **Core Histogram API**: Inspect every available granularity with `khisto.core.compute_histograms` and `HistogramResult`.
-- **Minimal Dependencies**: Only requires NumPy (matplotlib optional for plotting).
+Documentation is available at **[khiops.github.io/khisto-python](https://khiopsml.github.io/khisto-python/)**.
 
 | Standard Gaussian | Heavy-tailed Pareto |
 | --- | --- |
 | ![Adaptive Gaussian histogram](docs/images/gaussian-quick-start.png) | ![Adaptive Pareto histogram](docs/images/pareto-quick-start.png) |
 
-## Reproducing The Example Distributions
-
-The complete runnable script is available in `scripts/generate_distribution_examples.py`.
-
-Run it from the repository root to regenerate both example distributions and the figure files used in this README:
-
-```bash
-python scripts/generate_distribution_examples.py
-```
-
 ## Installation
 
 ```bash
@@ -47,85 +30,28 @@ pip install "khisto[matplotlib]"
 
 ## Quick Start
 
-### NumPy-like API
-
-```python
-import numpy as np
-from khisto import histogram
-
-# Generate 10,000 samples from a standard Gaussian distribution.
-data = np.random.normal(0, 1, 10000)
-
-# Compute optimal histogram (drop-in replacement for np.histogram)
-hist, bin_edges = histogram(data)
-
-# With density normalization
-density, bin_edges = histogram(data, density=True)
-
-# Limit maximum number of bins
-hist, bin_edges = histogram(data, max_bins=10)
-
-# Specify range
-hist, bin_edges = histogram(data, range=(-2, 2))
-```
-
-Using 10,000 samples keeps the adaptive refinement visible while remaining fast to compute.
-
-Heavy-tailed example:
-
 ```python
 import numpy as np
 import matplotlib.pyplot as plt
 from khisto.matplotlib import hist
 
-# Generate 10,000 samples from a Pareto distribution, shifted to start at 1 for better log-log visualization
-shape = 3
-long_tail_data = np.random.pareto(shape, size=10000) + 1
+# Generate 10,000 samples from a Pareto distribution
+long_tail_data = np.random.pareto(3, size=10000)
 
 # Plot an adaptive histogram on logarithmic axes.
 n, bins, patches = hist(long_tail_data, density=True)
-plt.xscale("log")
+plt.xscale("symlog")
 plt.yscale("log")
 plt.show()
-```
-
-### Matplotlib Integration
 
-```python
-import numpy as np
-import matplotlib.pyplot as plt
-from khisto.matplotlib import hist
-
-# Generate 10,000 samples from a standard Gaussian distribution.
-data = np.random.normal(0, 1, 10000)
-
-# Density is usually the most interpretable view with variable-width bins.
-n, bins, patches = hist(data, density=True)
-plt.xlabel('Value')
-plt.ylabel('Density')
-plt.show()
+# Generate 10,000 samples from a Normal distribution
+normal_data = np.random.normal(size=10000)
 
-# Cumulative density follows matplotlib semantics.
-n, bins, patches = hist(data, density=True, cumulative=True)
-plt.ylabel('Cumulative probability')
+# Plot an adaptive histogram
+n, bins, patches = hist(normal_data, density=True)
 plt.show()
 ```
 
-## How It Works
-
-Khisto uses the Khiops optimal binning algorithm based on the MODL (Minimum Optimal Description Length) principle. Instead of using fixed-width bins like traditional histograms, it:
-
-1. Analyzes the data distribution
-2. Finds bin boundaries that minimize information loss
-3. Creates variable-width bins that adapt to data density
-
-This results in histograms that better represent the underlying distribution, with finer bins in dense regions and wider bins in sparse regions.
-
-The method implemented in Khiops is comprehensively detailed in [2] and further extended in [1].
-
-- [1] M. Boullé. Floating-point histograms for exploratory analysis of large scale real-world data sets. Intelligent Data Analysis, 28(5):1347-1394, 2024
-- [2] V. Zelaya Mendizábal, M. Boullé, F. Rossi. Fast and fully-automated histograms for large-scale data sets. Computational Statistics & Data Analysis, 180:0-0, 2023
-
 ## Development
 
 ```bash
@@ -140,16 +66,6 @@ uv sync --group dev --extra all
 uv run pytest
 ```
 
-## Documentation
-
-Full documentation is hosted at **[khiops.github.io/khisto-python](https://khiops.github.io/khisto-python/)**.
-
-- [API Reference](https://khiops.github.io/khisto-python/array/histogram/index.html) — NumPy-like histogram API
-- [Matplotlib Integration](https://khiops.github.io/khisto-python/matplotlib/index.html) — `hist` plotting function
-- [Core API](https://khiops.github.io/khisto-python/core/index.html) — full access to histogram granularity levels
-- [API Comparison](https://khiops.github.io/khisto-python/api_comparison.html) — side-by-side with NumPy and Matplotlib
-- [Demo Notebook](https://khiops.github.io/khisto-python/demo.html) — interactive walkthrough
-
 ## License
 
 [BSD 3-Clause Clear License](LICENSE)
diff --git a/docs/demo.ipynb b/docs/demo.ipynb
diff --git a/docs/index.rst b/docs/index.rst
@@ -43,7 +43,7 @@ Get started
    data = np.random.normal(0, 1, 10_000)
    hist, bin_edges = histogram(data)          # optimal bins, no guessing
 
-.. grid:: 1 1 3 3
+.. grid:: 1 1 2 2
    :gutter: 3
    :class-container: sd-mt-3
 
@@ -68,16 +68,6 @@ Get started
       ``compute_histograms`` exposes every granularity level so you can
       pick the resolution that suits your analysis.
 
-.. grid:: 1 1 2 2
-   :gutter: 3
-   :class-container: sd-mt-1
-
-   .. grid-item-card:: :octicon:`git-compare;1.5em` API comparison
-      :link: api_comparison
-      :link-type: doc
-
-      Side-by-side parameter tables for NumPy, Matplotlib, and Khisto.
-
    .. grid-item-card:: :octicon:`play;1.5em` Interactive demo
       :link: demo
       :link-type: doc
@@ -90,8 +80,8 @@ Get started
    :hidden:
 
    Histograms <array/histogram/index>
-   Core <core/index>
    Matplotlib <matplotlib/index>
+   Core <core/index>
 
 .. toctree::
    :maxdepth: 2

diff --git a/sandbox/khisto_demo.ipynb b/sandbox/khisto_demo.ipynb
@@ -130,7 +130,7 @@
       "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
       "\u001b[31mValueError\u001b[39m                                Traceback (most recent call last)",
       "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[4]\u001b[39m\u001b[32m, line 2\u001b[39m\n\u001b[32m      1\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m khisto \u001b[38;5;28;01mimport\u001b[39;00m matplotlib\n\u001b[32m----> \u001b[39m\u001b[32m2\u001b[39m matplotlib.hist([data, [\u001b[32m1\u001b[39m, \u001b[32m2\u001b[39m, \u001b[32m3\u001b[39m], [\u001b[32m2\u001b[39m,\u001b[32m2\u001b[39m,\u001b[32m2\u001b[39m,\u001b[32m2\u001b[39m]], max_bins=\u001b[32m20\u001b[39m, alpha=\u001b[32m0.5\u001b[39m)\n",
-      "\u001b[36mFile \u001b[39m\u001b[32m~/python/khisto-python/src/khisto/matplotlib/hist.py:122\u001b[39m, in \u001b[36mhist\u001b[39m\u001b[34m(x, range, max_bins, density, cumulative, histtype, orientation, log, color, label, ax, edgecolor, linewidth, alpha, **kwargs)\u001b[39m\n\u001b[32m    119\u001b[39m     ax = plt.gca()\n\u001b[32m    121\u001b[39m \u001b[38;5;66;03m# Compute histogram using khisto\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m122\u001b[39m hist_values, bin_edges = \u001b[30;43mkhisto_histogram\u001b[39;49m\u001b[30;43m(\u001b[39;49m\n\u001b[32m    123\u001b[39m \u001b[30;43m    \u001b[39;49m\u001b[30;43mx\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mrange\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mrange\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mmax_bins\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mmax_bins\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mdensity\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mdensity\u001b[39;49m\n\u001b[32m    124\u001b[39m \u001b[30;43m\u001b[39;49m\u001b[30;43m)\u001b[39;49m\n\u001b[32m    125\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m cumulative_mode != \u001b[32m0\u001b[39m:\n\u001b[32m    126\u001b[39m     hist_values = _apply_cumulative(\n\u001b[32m    127\u001b[39m         hist_values,\n\u001b[32m    128\u001b[39m         bin_edges,\n\u001b[32m    129\u001b[39m         density=density,\n\u001b[32m    130\u001b[39m         reverse=cumulative_mode < \u001b[32m0\u001b[39m,\n\u001b[32m    131\u001b[39m     )\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/python/khisto-python/src/khisto/matplotlib/hist.py:122\u001b[39m, in \u001b[36mhist\u001b[39m\u001b[34m(x, range, max_bins, density, cumulative, histtype, orientation, log, color, label, ax, edgecolor, linewidth, alpha, **kwargs)\u001b[39m\n\u001b[32m    119\u001b[39m     ax = plt.gca()\n\u001b[32m    121\u001b[39m \u001b[38;5;66;03m# Compute histogram using khisto\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m122\u001b[39m hist_values, bin_edges = \u001b[30;43mkhisto_histogram\u001b[39;49m\u001b[30;43m(\u001b[39;49m\u001b[30;43mx\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mrange\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mrange\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mmax_bins\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mmax_bins\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mdensity\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mdensity\u001b[39;49m\u001b[30;43m)\u001b[39;49m\n\u001b[32m    123\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m cumulative_mode != \u001b[32m0\u001b[39m:\n\u001b[32m    124\u001b[39m     hist_values = _apply_cumulative(\n\u001b[32m    125\u001b[39m         hist_values,\n\u001b[32m    126\u001b[39m         bin_edges,\n\u001b[32m    127\u001b[39m         density=density,\n\u001b[32m    128\u001b[39m         reverse=cumulative_mode < \u001b[32m0\u001b[39m,\n\u001b[32m    129\u001b[39m     )\n",
       "\u001b[36mFile \u001b[39m\u001b[32m~/python/khisto-python/src/khisto/array/histogram/api.py:107\u001b[39m, in \u001b[36mhistogram\u001b[39m\u001b[34m(a, range, max_bins, density)\u001b[39m\n\u001b[32m     53\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mhistogram\u001b[39m(\n\u001b[32m     54\u001b[39m     a: ArrayLike,\n\u001b[32m     55\u001b[39m     \u001b[38;5;28mrange\u001b[39m: Optional[\u001b[38;5;28mtuple\u001b[39m[\u001b[38;5;28mfloat\u001b[39m, \u001b[38;5;28mfloat\u001b[39m]] = \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[32m     56\u001b[39m     max_bins: Optional[\u001b[38;5;28mint\u001b[39m] = \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[32m     57\u001b[39m     density: \u001b[38;5;28mbool\u001b[39m = \u001b[38;5;28;01mFalse\u001b[39;00m,\n\u001b[32m     58\u001b[39m ) -> \u001b[38;5;28mtuple\u001b[39m[NDArray[np.float64], NDArray[np.float64]]:\n\u001b[32m     59\u001b[39m \u001b[38;5;250m    \u001b[39m\u001b[33;03m\"\"\"Compute an optimal histogram using the Khiops binning algorithm.\u001b[39;00m\n\u001b[32m     60\u001b[39m \n\u001b[32m     61\u001b[39m \u001b[33;03m    Parameters\u001b[39;00m\n\u001b[32m   (...)\u001b[39m\u001b[32m    105\u001b[39m \u001b[33;03m       Analysis, 180:0-0, 2023.\u001b[39;00m\n\u001b[32m    106\u001b[39m \u001b[33;03m    \"\"\"\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m107\u001b[39m     arr = \u001b[30;43mnp\u001b[39;49m\u001b[30;43m.\u001b[39;49m\u001b[30;43masarray\u001b[39;49m\u001b[30;43m(\u001b[39;49m\u001b[30;43ma\u001b[39;49m\u001b[30;43m,\u001b[39;49m\u001b[30;43m \u001b[39;49m\u001b[30;43mdtype\u001b[39;49m\u001b[30;43m=\u001b[39;49m\u001b[30;43mnp\u001b[39;49m\u001b[30;43m.\u001b[39;49m\u001b[30;43mfloat64\u001b[39;49m\u001b[30;43m)\u001b[39;49m\n\u001b[32m    109\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m arr.ndim != \u001b[32m1\u001b[39m:\n\u001b[32m    110\u001b[39m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[32m    111\u001b[39m             \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33mExpected 1-D array, got \u001b[39m\u001b[38;5;132;01m{\u001b[39;00marr.ndim\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m-D array instead. \u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    112\u001b[39m             \u001b[33m\"\u001b[39m\u001b[33mReshape your data or flatten it before calling histogram.\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m    113\u001b[39m         )\n",
       "\u001b[31mValueError\u001b[39m: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part."
      ]
@@ -153,7 +153,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "id": "23c19584",
    "metadata": {},
    "outputs": [
@@ -174,7 +174,7 @@
        " <a list of 3 BarContainer objects>)"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     },
@@ -195,7 +195,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "id": "2749c664",
    "metadata": {},
    "outputs": [
@@ -219,7 +219,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "id": "1cca2392",
    "metadata": {},
    "outputs": [
@@ -241,7 +241,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
    "id": "2ad6d7e5",
    "metadata": {},
    "outputs": [
@@ -273,7 +273,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 5,
    "id": "b6c4ea8c",
    "metadata": {},
    "outputs": [
@@ -297,6 +297,7 @@
    ],
    "source": [
     "from khisto.matplotlib import hist\n",
+    "from khisto.matplotlib.hist import _hist\n",
     "\n",
     "# Basic histogram plot\n",
     "fig, ax = plt.subplots(figsize=(8, 5))\n",
@@ -311,7 +312,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 6,
    "id": "09479225",
    "metadata": {},
    "outputs": [
@@ -339,7 +340,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 8,
    "id": "6c89bf07",
    "metadata": {},
    "outputs": [
@@ -366,7 +367,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 12,
    "id": "25d8d0e5",
    "metadata": {},
    "outputs": [
@@ -403,7 +404,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 15,
    "id": "d985437b",
    "metadata": {},
    "outputs": [
@@ -487,7 +488,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 18,
    "id": "51179a02",
    "metadata": {},
    "outputs": [
@@ -521,7 +522,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 21,
    "id": "e9bbabc9",
    "metadata": {},
    "outputs": [
@@ -558,7 +559,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": 22,
    "id": "1190f8aa",
    "metadata": {},
    "outputs": [
@@ -594,7 +595,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 23,
    "id": "bf2ba150",
    "metadata": {},
    "outputs": [

diff --git a/src/khisto/core/backend.py b/src/khisto/core/backend.py
@@ -235,13 +235,14 @@ def _process_histogram_file(file_path: Path) -> list[HistogramResult]:
     ]
 
 
-def compute_histograms(x: np.ndarray) -> list[HistogramResult]:
+def compute_histograms(x: NDArray[np.float64]) -> list[HistogramResult]:
     """Compute optimal histogram of an array using khisto CLI binary input.
 
     Parameters
     ----------
-    x : np.ndarray
-        Array of numeric values.
+    x : NDArray[np.float64]
+        Array of numeric values. Only 1-dimensional arrays are supported.
+        Missing values (NaN) are filtered out.
 
     Returns
     -------
@@ -257,10 +258,14 @@ def compute_histograms(x: np.ndarray) -> list[HistogramResult]:
         If input array is empty after filtering.
     """
     x = np.asarray(x, dtype=np.float64)
+
+    if len(x) == 0:
+        raise ValueError("Input array is empty")
+
     x = x[~np.isnan(x)]
 
     if len(x) == 0:
-        raise ValueError("Input array is empty after filtering")
+        raise ValueError("Input array is empty after filtering missing values")
 
     # Use delete=False so the files are closed before the subprocess reads them.
     # On Windows, files keep an exclusive lock while open, whence,