Skip to content

Commit

Permalink
[WIP] Dataset documentation (microsoft#992)
Browse files Browse the repository at this point in the history
* fix: Pass str not ParamSpec

* fix: Use the the argument string to connect

* feat: Add add_parameter_ to compose better

Note to self and future reader:
    an underscore appended to a function name denotes
    a function whose side-effects needs to be committed!

* oopified all the things, tad tired

* fix: man I must have been tired

* fix: Use list instead of unpacking

* add Todo, finish implementation

* add todo about metadata

* fix: Return 0 instead of none if the dataset is empty

* feat: Add subscriber

* add example

* docs: add example using zmq

* fix: Typo in sql

* remove junk

* docs: polish API, add docstrings

* polish up

* docs: add examples

* remove old tests

* add requirements

* remove leftovers

* remvoe spec

* move examples out

* clean notebooks

* fix: Move experiment id concerns to the PUBLIC api.

That is: a dataset needs an exp_id.
If it's not passed to the new_dataset then we use the last exp_id.

* fix: Create better dirs for merge with qcodes.

* fix: integrate with qcodes

* fix: make sqlite operations public

* fix:many details

* feat: Add exp container

* fix: Configure container in config

* docs: update examples

* fix: Remove leftover todo

* style: PEP8'ing for great justice

* Shuffle around notebooks

* [WIP] pave the way for auto-plotting; "inject dependencies"

* [WIP] Introduce half-finished plot function

* Feature/dataset (microsoft#806)

* Add notebook with some benchmarking experiments

* Fix add_results and insert_many

* Update benchmarking

* MILESTONE: first working version of plot_by_id for 1D and 2D

* sort axes for 1D plotting

* add examples of working quick plotting

* feat: add measurements.py file, add Runner

* add Measurement object

* fix: don't use the type keyword

* Add interface for using QCoDeS parameters

* [MILESTONE] first notebook with the context manager

* add debug logging, some of which should be removed again

* Remove some debugging, make _rows_from_datapoints 130 times faster

* add adaptive sweep to context manager notebook

* typos and log

* typo

* refactor code and reuse add layouts

* Make dataset pass with mypy

* remove absolute import from __init__ as it seems to break autodoc

* no longer build for python 3.5

* Add some debugging to exp container

* Add tests for dataset

very basic but working setup and teardown

* Validate table name to prevent sql injection

* Actually check

* improve tests

* fix testing if unicode

* better test

* More tests

* remove db

* add_result takes a dict from str to single values not list

* Improve tests

* Correct function sig

* remove db which should never have been comitted

* remove prints

* imporve tests

* more tests

* remove print statment

* Remove unused import

* Make sure to close all connections in teardown

This fixes issues with test failure due to failure with removing the tmpdir
on windows

* Correct dac validators (microsoft#906)

use the correct validator range (cf. d5a driver) and adjust naming

* add publisher from logging branch

* Update notebook

* typoes

* Add simple example of json exporter for the webui:

* add notebook with example of exporting with notebook

* add optional kwargs at creation time to subscriber

* style: change camelCase to snake_case

* [WIP] tests for the measurement context manager

* add unregister_parameter + test

* add some testing for paramspec

* change paramspecs paramtypes in test from "number" to "real"

* remove old param_spec function

* 100% coverage of param_spec

* add register_custom_parameter + test

* fix: make refactor work

* add a number_of_results property to DataSet

* remove debug print statement

* lowercase paramtype

* test datasaver and support/implement array unravelling

* validate paramspec names

* remove debugging print statement

* update test with valid ParamSpec names

* add tests for exit/enter-actions and change OrderedDict to list

* fully cover add enter/exit actions

* cover write_period property in test

* avoid infinite recursion in write_period

* fully cover unregister_parameter in test

* add a station to the datasaver scalar test

* mypy

* codacy: removed unused imports

* make database errors hard errors in the ctx mgr

* clean up non-working SQL error test

* add test for add_parameter_values

* add test for modify_result and insert an exception

* sort imports and test CompletedError in modify_results

* add exception checks to test_add_data_1d

* add SQLiteSettings object to hold settings read at import time

* add little test for sqlitesettings

* correct docstring and add dependency checks

* make insert_many_values consider input length + test

* add SQLiteSettings to __init__.py

* copy DEFAULT_LIMITS to avoid modifying it

* playing with Travis

* revert "playing with Travis"

* update test_adding_too_many_results

* add VERSION to qc.SQLiteSettings

* try to use VERSION to make Travis happy

* add a failing test to read Travis' sqlite version

* cheat with version to check if MAX_COMPOUND_SELECT is to blame

* fix typo

* add test + remedy for writing a divisor of MAX_VARIABLE_NUMBER

* increase a deadline to avoid flakiness on Travis

* increase another deadline to avoid flakiness on Travis

* replace "id" by "run_id" and "exp_id"

* remove unused imports and variables

* turn deadlines off for two otherwise flaky tests

* remove more unused stuff and a redundant test

* remove old double definition

* add a few tests for sqlite_base

* modify a test to do a double-catch

* add functions to get and set user version to be used for db upgrade

* add simple test to do a silly upgrade of the database

* update error catching in test_atomic_raises

* remove unused variable

* Add a docstring to Subscriber

* squash! Add a docstring to Subscriber

* squash! squash! Add a docstring to Subscriber

* Add a pedagogical notebook on the Subscriber

* Change Subscriber log debug and offset min_count

* Update example notebook to use redefined min_count

* Add a simple test for dataset subscription

* Change snapshot to match exiting structure more closely

* Build dataset notebooks too

* correct makefile

* add dataset notebooks to index

* Merge types 'real' and 'integer' into 'numeric'

* Update notebooks to latest API changes

The changes being: 'real' and 'integer' -> 'numeric' and no id
attribute of anything anymore.

* Add titles to notebooks

* Remove types from context manager API

* Add support for ArrayParameters in the DataSaver

* [WIP] Add tests for ArrayParameter support

* make examples executable

* WIP dataset importer for old data

* Add dataset as fixtures for tests

* use the simple importer

* update notebook

* add tests of loading old dataset

* add property for dataset to datasaveR

* Add support for storing metadata to importor

* update notebook

* Add smoketest of json in dataset

* Update add_results to make test_datasaver_array_parameters pass

* Disable deadline for test_datasaver_array_parameters

* PEP8 one line

* Expand docstring for Measurement

* Add format_string as attribute to Experiment

* Make Measurement name settable and use that for result_table

* refactor loading

* add more examples to dataset context manager

* Speed up plotting functions for later use in data exporting

* more efficient sorting on 2d array

* update notebook

* refactor code to make numpy array export reuseable

* Add data exporter to numpy array

* update notebook with use of exporter

* Add example notebook with real instruments

* Tidy up Context Manager notebook

* Tidy up Load old data notebook

Sphinx is unhappy about png images. When using %matplotlib notebook,
that problem is circumvented.

* Move Real Instruments example notebook

* Change Makefile to execute DataSet example notebooks

* Fix typo

* docs: Include the Real_instruments subfolder

* Fix typo in index.rst

* Update Makefile to create needed directories

* add notebook with dond

* Remove example-loading notebook

* Remove subscriber-example notebook

* Add jupyter_client to docs_requirements

* add plantuml diagram

* Make Makefile generate scripts instead of executing

* Revert "Make Makefile generate scripts instead of executing"

This reverts commit 46ede2b.

* Add jupyter to docs_requirements

* [temp] Print available jupyter kernels on travis

* Stop printing jupyter-kernelspec list on travis

* Add scipy to docs_requirements

* Give Travis 20 times longer to execute each notebook

* Increase hypothesis deadline for combined loop test

* add notebook with dond

* add plantuml diagram

* Move dataset spec to documentation

* add diagram to docs

* willams document as converted by pandoc

* refactor to improve rst

* add more diagrams

* Move figures to subfolder

* add new section to fill out

* fix rst syntax in spec

* typos

* update dataset diagram

* fix typos in docs

* start writing text

* Add subscriptions to Measurement object (+test&notebook)

* Update logging for subscribers in data_set

* Replace spaces with underscores in notebook name

* Tweak some test parameters in Measurement test

* Fix mypy issues in measurements.py

* Tweak test parameters again

* Add some more explanation about the dataset

* Fixing a few typos in dataset_design.rst
  • Loading branch information
jenshnielsen authored and WilliamHPNielsen committed Mar 20, 2018
1 parent 510bfa9 commit 5f60c9a
Show file tree
Hide file tree
Showing 28 changed files with 2,523 additions and 81 deletions.
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,3 +392,5 @@
# we are using non local images for badges. These will change so we dont
# want to store them locally.
suppress_warnings = ['image.nonlocal_uri']

numfig=True
75 changes: 75 additions & 0 deletions docs/dataset/dataset_design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. highlight:: python

==============
Dataset Design
==============

.. _sec:design_introduction:

Introduction
============

.. _datasetdiagram:
.. figure:: figures/datasetdiagram.svg
:align: center
:width: 100%

Basic workflow

This document aims to explain the design and working of the QCoDeS DataSet.
In :numref:`datasetdiagram` we sketch the basic design of the dataset.
The dataset implementation is organised in 3 layers shown vertically in
:numref:`datasetdiagram` Each of the layers implements functionality for
reading and writing to the dataset. The layers are organised hierarchically
with the top most one implementing a high level interface and the lowest
layer implementing the communication with the database. This is done in order
to facilitate two competing requirements. On one hand the dataset should
be easy to use enabling simple and easy to use functionality for performing
standard measurements with a minimum of typing. On the other hand the dataset
should enable users to perform any measurement that they may find useful.
It should not force the user into a specific measurement pattern that may be
suboptimal for more advanced use cases. Specifically it should possible to
formulate any experiment as python code using standard language constructs
(for and while loops among others) with a minimal effort.

The legacy QCoDeS dataset ``qcodes.data`` and loop ``qcodes.Loop`` is
primarily oriented towards ease of use for the standard use case but makes
it challenging to formulate more complicated experiments without significant
work reformatting the experiments in a counterintuitive way.


The QCoDeS dataset currently implements two
interfaces directly targeting end users. It is not expected that the user
of QCoDeS will need to interface directly with the lowest layer communicating
with the database.

The ``dataset`` layer defined in the :ref:`dataset-spec` provides the most
flexible user facing layer. Insert reference to notebook. but requires users
to manually register `ParamSpecs`. The dataset implements two functions for
inserting one or more rows of data into the dataset and immediately writes it
to disk. It is, however, the users responsibility to ensure good performance
by writing to disk at suitable intervals.

The measurement context manager layer provides additional support for flushing
data to disk at selected intervals for better performance without manual
intervention. It also provides easy registration of ParamSpecs on
the basis of QCoDeS parameters or custom parameters.

But importantly it does not:

* Automatically infer the relationship between dependent and independent
parameters. The user must supply this metadata for correct plotting.
* Automatically register parameters.
* Enforce any structure on the measured data. (1D, on a grid ect.)
This may make plotting more difficult as any structure will have to


It is envisioned that a future layer is added on top of the existing layers
to automatically register parameters and save data at the cost of being
able to write the measurement routine as pure python functions.

We note that the dataset currently exclusively supports storing data in an
SQLite database. This is not an intrinsic limitation of the dataset and
measurement layer. It is possible that at a future state support for writing
to a different backend will be added.

Binary file added docs/dataset/figures/bad_trees.pdf
Binary file not shown.
173 changes: 173 additions & 0 deletions docs/dataset/figures/bad_trees.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dataset/figures/bad_trees_remedied.pdf
Binary file not shown.
262 changes: 262 additions & 0 deletions docs/dataset/figures/bad_trees_remedied.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions docs/dataset/figures/datasetdiagram.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
@startuml

package "measurements.py" {
[DataSaver]
[Runner]
[Measurement]
}
package "data_set.py" {
[DataSet]
}
package "sqlite_base.py" {
[sqlite functions]
}

package "experiment_container.py" {
[Experiment]
}

package "param_spec.py" {
[ParamSpec]
}

database "SQLite" {
[experiment.db]
}

[Measurement] -> [Runner] : Calling 'run' creates:
[Measurement] --> [ParamSpec] : Registers instances of:
[Runner] --> [DataSet] : '~__enter~__' creates:\n'~__exit~__' flushes:
[Runner] -> [DataSaver] : '~__enter__' returns:
[Runner] --> [Experiment] : Creats DataSet with ref to:
[DataSaver] --> [DataSet] : Stores data via:
[DataSet] -> [ParamSpec] : Holds instances of:
[DataSet] --> [sqlite functions] : Inserts data into DB
[Experiment] --> [sqlite functions] : Creates experiments in DB
[sqlite functions] --> [experiment.db] : SQL calls


@enduml
52 changes: 52 additions & 0 deletions docs/dataset/figures/datasetdiagram.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/dataset/figures/dependencies_01.pdf
Binary file not shown.
Loading

0 comments on commit 5f60c9a

Please sign in to comment.