[WIP] Dataset documentation (microsoft#992)

* fix: Pass str not ParamSpec * fix: Use the the argument string to connect * feat: Add add_parameter_ to compose better Note to self and future reader: an underscore appended to a function name denotes a function whose side-effects needs to be committed! * oopified all the things, tad tired * fix: man I must have been tired * fix: Use list instead of unpacking * add Todo, finish implementation * add todo about metadata * fix: Return 0 instead of none if the dataset is empty * feat: Add subscriber * add example * docs: add example using zmq * fix: Typo in sql * remove junk * docs: polish API, add docstrings * polish up * docs: add examples * remove old tests * add requirements * remove leftovers * remvoe spec * move examples out * clean notebooks * fix: Move experiment id concerns to the PUBLIC api. That is: a dataset needs an exp_id. If it's not passed to the new_dataset then we use the last exp_id. * fix: Create better dirs for merge with qcodes. * fix: integrate with qcodes * fix: make sqlite operations public * fix:many details * feat: Add exp container * fix: Configure container in config * docs: update examples * fix: Remove leftover todo * style: PEP8'ing for great justice * Shuffle around notebooks * [WIP] pave the way for auto-plotting; "inject dependencies" * [WIP] Introduce half-finished plot function * Feature/dataset (microsoft#806) * Add notebook with some benchmarking experiments * Fix add_results and insert_many * Update benchmarking * MILESTONE: first working version of plot_by_id for 1D and 2D * sort axes for 1D plotting * add examples of working quick plotting * feat: add measurements.py file, add Runner * add Measurement object * fix: don't use the type keyword * Add interface for using QCoDeS parameters * [MILESTONE] first notebook with the context manager * add debug logging, some of which should be removed again * Remove some debugging, make _rows_from_datapoints 130 times faster * add adaptive sweep to context manager notebook * typos and log * typo * refactor code and reuse add layouts * Make dataset pass with mypy * remove absolute import from __init__ as it seems to break autodoc * no longer build for python 3.5 * Add some debugging to exp container * Add tests for dataset very basic but working setup and teardown * Validate table name to prevent sql injection * Actually check * improve tests * fix testing if unicode * better test * More tests * remove db * add_result takes a dict from str to single values not list * Improve tests * Correct function sig * remove db which should never have been comitted * remove prints * imporve tests * more tests * remove print statment * Remove unused import * Make sure to close all connections in teardown This fixes issues with test failure due to failure with removing the tmpdir on windows * Correct dac validators (microsoft#906) use the correct validator range (cf. d5a driver) and adjust naming * add publisher from logging branch * Update notebook * typoes * Add simple example of json exporter for the webui: * add notebook with example of exporting with notebook * add optional kwargs at creation time to subscriber * style: change camelCase to snake_case * [WIP] tests for the measurement context manager * add unregister_parameter + test * add some testing for paramspec * change paramspecs paramtypes in test from "number" to "real" * remove old param_spec function * 100% coverage of param_spec * add register_custom_parameter + test * fix: make refactor work * add a number_of_results property to DataSet * remove debug print statement * lowercase paramtype * test datasaver and support/implement array unravelling * validate paramspec names * remove debugging print statement * update test with valid ParamSpec names * add tests for exit/enter-actions and change OrderedDict to list * fully cover add enter/exit actions * cover write_period property in test * avoid infinite recursion in write_period * fully cover unregister_parameter in test * add a station to the datasaver scalar test * mypy * codacy: removed unused imports * make database errors hard errors in the ctx mgr * clean up non-working SQL error test * add test for add_parameter_values * add test for modify_result and insert an exception * sort imports and test CompletedError in modify_results * add exception checks to test_add_data_1d * add SQLiteSettings object to hold settings read at import time * add little test for sqlitesettings * correct docstring and add dependency checks * make insert_many_values consider input length + test * add SQLiteSettings to __init__.py * copy DEFAULT_LIMITS to avoid modifying it * playing with Travis * revert "playing with Travis" * update test_adding_too_many_results * add VERSION to qc.SQLiteSettings * try to use VERSION to make Travis happy * add a failing test to read Travis' sqlite version * cheat with version to check if MAX_COMPOUND_SELECT is to blame * fix typo * add test + remedy for writing a divisor of MAX_VARIABLE_NUMBER * increase a deadline to avoid flakiness on Travis * increase another deadline to avoid flakiness on Travis * replace "id" by "run_id" and "exp_id" * remove unused imports and variables * turn deadlines off for two otherwise flaky tests * remove more unused stuff and a redundant test * remove old double definition * add a few tests for sqlite_base * modify a test to do a double-catch * add functions to get and set user version to be used for db upgrade * add simple test to do a silly upgrade of the database * update error catching in test_atomic_raises * remove unused variable * Add a docstring to Subscriber * squash! Add a docstring to Subscriber * squash! squash! Add a docstring to Subscriber * Add a pedagogical notebook on the Subscriber * Change Subscriber log debug and offset min_count * Update example notebook to use redefined min_count * Add a simple test for dataset subscription * Change snapshot to match exiting structure more closely * Build dataset notebooks too * correct makefile * add dataset notebooks to index * Merge types 'real' and 'integer' into 'numeric' * Update notebooks to latest API changes The changes being: 'real' and 'integer' -> 'numeric' and no id attribute of anything anymore. * Add titles to notebooks * Remove types from context manager API * Add support for ArrayParameters in the DataSaver * [WIP] Add tests for ArrayParameter support * make examples executable * WIP dataset importer for old data * Add dataset as fixtures for tests * use the simple importer * update notebook * add tests of loading old dataset * add property for dataset to datasaveR * Add support for storing metadata to importor * update notebook * Add smoketest of json in dataset * Update add_results to make test_datasaver_array_parameters pass * Disable deadline for test_datasaver_array_parameters * PEP8 one line * Expand docstring for Measurement * Add format_string as attribute to Experiment * Make Measurement name settable and use that for result_table * refactor loading * add more examples to dataset context manager * Speed up plotting functions for later use in data exporting * more efficient sorting on 2d array * update notebook * refactor code to make numpy array export reuseable * Add data exporter to numpy array * update notebook with use of exporter * Add example notebook with real instruments * Tidy up Context Manager notebook * Tidy up Load old data notebook Sphinx is unhappy about png images. When using %matplotlib notebook, that problem is circumvented. * Move Real Instruments example notebook * Change Makefile to execute DataSet example notebooks * Fix typo * docs: Include the Real_instruments subfolder * Fix typo in index.rst * Update Makefile to create needed directories * add notebook with dond * Remove example-loading notebook * Remove subscriber-example notebook * Add jupyter_client to docs_requirements * add plantuml diagram * Make Makefile generate scripts instead of executing * Revert "Make Makefile generate scripts instead of executing" This reverts commit 46ede2b. * Add jupyter to docs_requirements * [temp] Print available jupyter kernels on travis * Stop printing jupyter-kernelspec list on travis * Add scipy to docs_requirements * Give Travis 20 times longer to execute each notebook * Increase hypothesis deadline for combined loop test * add notebook with dond * add plantuml diagram * Move dataset spec to documentation * add diagram to docs * willams document as converted by pandoc * refactor to improve rst * add more diagrams * Move figures to subfolder * add new section to fill out * fix rst syntax in spec * typos * update dataset diagram * fix typos in docs * start writing text * Add subscriptions to Measurement object (+test&notebook) * Update logging for subscribers in data_set * Replace spaces with underscores in notebook name * Tweak some test parameters in Measurement test * Fix mypy issues in measurements.py * Tweak test parameters again * Add some more explanation about the dataset * Fixing a few typos in dataset_design.rst
Akshita07 · Mar 20, 2018 · 5f60c9a · 5f60c9a
1 parent 510bfa9
commit 5f60c9a
Show file tree

Hide file tree

Showing 28 changed files with 2,523 additions and 81 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -392,3 +392,5 @@
 # we are using non local images for badges. These will change so we dont
 # want to store them locally.
 suppress_warnings = ['image.nonlocal_uri']
+
+numfig=True
diff --git a/docs/dataset/dataset_design.rst b/docs/dataset/dataset_design.rst
@@ -0,0 +1,75 @@
+.. highlight:: python
+
+==============
+Dataset Design
+==============
+
+.. _sec:design_introduction:
+
+Introduction
+============
+
+.. _datasetdiagram:
+.. figure:: figures/datasetdiagram.svg
+   :align: center
+   :width: 100%
+
+   Basic workflow
+
+This document aims to explain the design and working of the QCoDeS DataSet.
+In :numref:`datasetdiagram` we sketch the basic design of the dataset.
+The dataset implementation is organised in 3 layers shown vertically in
+:numref:`datasetdiagram` Each of the layers implements functionality for
+reading and writing to the dataset. The layers are organised hierarchically
+with the top most one implementing a high level interface and the lowest
+layer implementing the communication with the database. This is done in order
+to facilitate two competing requirements. On one hand the dataset should
+be easy to use enabling simple and easy to use functionality for performing
+standard measurements with a minimum of typing. On the other hand the dataset
+should enable users to perform any measurement that they may find useful.
+It should not force the user into a specific measurement pattern that may be
+suboptimal for more advanced use cases. Specifically it should possible to
+formulate any experiment as python code using standard language constructs
+(for and while loops among others) with a minimal effort.
+
+The legacy QCoDeS dataset ``qcodes.data`` and loop ``qcodes.Loop`` is
+primarily oriented towards ease of use for the standard use case but makes
+it challenging to formulate more complicated experiments without significant
+work reformatting the experiments in a counterintuitive way.
+
+
+The QCoDeS dataset currently implements two
+interfaces directly targeting end users. It is not expected that the user
+of QCoDeS will need to interface directly with the lowest layer communicating
+with the database.
+
+The ``dataset`` layer defined in the :ref:`dataset-spec` provides the most
+flexible user facing layer. Insert reference to notebook. but requires users
+to manually register `ParamSpecs`. The dataset implements two functions for
+inserting one or more rows of data into the dataset and immediately writes it
+to disk. It is, however, the users responsibility to ensure good performance
+by writing to disk at suitable intervals.
+
+The measurement context manager layer provides additional support for flushing
+data to disk at selected intervals for better performance without manual
+intervention. It also provides easy registration of ParamSpecs on
+the basis of QCoDeS parameters or custom parameters.
+
+But importantly it does not:
+
+* Automatically infer the relationship between dependent and independent
+  parameters. The user must supply this metadata for correct plotting.
+* Automatically register parameters.
+* Enforce any structure on the measured data. (1D, on a grid ect.)
+  This may make plotting more difficult as any structure will have to
+
+
+It is envisioned that a future layer is added on top of the existing layers
+to automatically register parameters and save data at the cost of being
+able to write the measurement routine as pure python functions.
+
+We note that the dataset currently exclusively supports storing data in an
+SQLite database. This is not an intrinsic limitation of the dataset and
+measurement layer. It is possible that at a future state support for writing
+to a different backend will be added.
+
diff --git a/docs/dataset/figures/bad_trees.pdf b/docs/dataset/figures/bad_trees.pdf
diff --git a/docs/dataset/figures/bad_trees.svg b/docs/dataset/figures/bad_trees.svg
diff --git a/docs/dataset/figures/bad_trees_remedied.pdf b/docs/dataset/figures/bad_trees_remedied.pdf
diff --git a/docs/dataset/figures/bad_trees_remedied.svg b/docs/dataset/figures/bad_trees_remedied.svg
diff --git a/docs/dataset/figures/datasetdiagram.puml b/docs/dataset/figures/datasetdiagram.puml
@@ -0,0 +1,39 @@
+@startuml
+
+package "measurements.py" {
+    [DataSaver]
+    [Runner]
+    [Measurement]
+}
+package "data_set.py" {
+    [DataSet]
+}
+package "sqlite_base.py" {
+    [sqlite functions]
+}
+
+package "experiment_container.py" {
+    [Experiment]
+}
+
+package "param_spec.py" {
+    [ParamSpec]
+}
+
+database "SQLite" {
+    [experiment.db]
+}
+
+[Measurement] -> [Runner] : Calling 'run' creates:
+[Measurement] --> [ParamSpec] : Registers instances of:
+[Runner] --> [DataSet] : '~__enter~__' creates:\n'~__exit~__' flushes:
+[Runner] -> [DataSaver] : '~__enter__' returns:
+[Runner] --> [Experiment] : Creats DataSet with ref to:
+[DataSaver] --> [DataSet] : Stores data via:
+[DataSet] -> [ParamSpec] : Holds instances of:
+[DataSet] --> [sqlite functions] : Inserts data into DB
+[Experiment] --> [sqlite functions] : Creates experiments in DB
+[sqlite functions] --> [experiment.db] : SQL calls
+
+
+@enduml
diff --git a/docs/dataset/figures/datasetdiagram.svg b/docs/dataset/figures/datasetdiagram.svg
diff --git a/docs/dataset/figures/dependencies_01.pdf b/docs/dataset/figures/dependencies_01.pdf