Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the advantage of using yacs instead of traditional yaml files and yaml parsing? #56

Open
minimatest opened this issue Sep 3, 2022 · 1 comment

Comments

@minimatest
Copy link

I know. A stupid and naive question. But I am a beginner and I am struggling to find a major use case for my workflow so maybe if the advantages can be spelled out, that would be even better!

Also, does YACS support the notion of assigning python objects and variables to the configuration parameters (unlike YAML)? For example, if one wanted to specify a callback function as a parameter, is it possible to do this with YACS (again of course, you can't specify this in YAML and the way around it is to make a hacky getattr call).

@jveitchmichaelis
Copy link

jveitchmichaelis commented Jan 3, 2023

Here are my thoughts on this, as a user (tldr - YACS is simple and lightweight, but there are a few options out there). First, what's a "good" configuration system?

  • Generally you want to have a default global configuration for your application/experiment
  • Experiments should be repeatable and it's nice to store the initial conditions of program in one place
  • We'd like to modify our default config with user-specified parameters
  • We want users to have some conveniences like only requiring to provide modified params, and it'd be nice if we could have hierarchical configurations (e.g. inheritance). This means you don't need users to specify every single parameter every time. Suppose you update your package and add 10 new options, you don't want that to break existing code and hard-updating a ton of child configs is also a pain.
  • You want to provide different ways of supplying and overriding options e.g. loading from a file on disk, command line args, function arguments, from a dictionary/other object.
  • The configuration should be validated. The simplest should be type and bounds checking, but this could also include things like checking a date is in a valid range (e.g. in the future). We also want to ensure that users can't accidentally typo configurations that they think are being used, but aren't.
  • "dot mapping" from keys to attributes is pleasing to the eye
  • We should have easy serialisation
  • Support for callbacks are other things are nice, though I think a lookup table from a list of choices is also a valid approach.

YACS is quite simple - it's < 500 lines of code, but it handles a lot of tedious edge cases that you'd otherwise have to implement yourself. It's also well tested - a lot of people have used Detectron. I'm curious to hear what other people are using, because there don't seem to be many configuration management systems for Python. There is hydra which is a lot heavier, but can also handle more complex things like running jobs. Hydra is built on OmegaConf. There's also anyconfig and dynaconf. All of these do mostly the same things and built to solve the same problems. There are also libraries like schema, which you could use to validate some structure and apply defaults.

If you just need to load a few options from a flat file, then there isn't any issue using the 'ini' format, or plain YAML. But at some point you can outgrow those. You'd need to implement some kind of dictionary wrapper if you want to use attributes as keys, instead of strings. I think this is much cleaner to look at, but personal preference. I think configparser allows you to set defaults, so you do have some basic inheritance there.

Let's take the example of inheriting a config and updating a single value. You need to write the logic to sanely update a nested dictionary. The union operation | is not sufficient. If you have {a: 1, b: {c: 2, d: 3}} and you unite with {b: {c: 4}}, Python will just clobber all of b and b.d no longer exists. You could write a function to do this, but YACS provides that and it also handles multiple levels of inheritance through different files. Other libraries also do some fancy interpolation stuff e.g. referencing variables in the config and more "dynamic" parameters.

YACS provides a bunch of convenience functions to allow merging from a dictionary, from a yaml file on disk and others. The readme has some information on the philosophy here, which you may or may not agree with (e.g. command line arguments are handled in a somewhat non-standard way if you're used to using argparse).

There isn't much in the way of validation, you could look at a library like schema for that. However YACS should complain if you provide a key which isn't in the default config. This is nice because it prevents users from making typos. It also allows you to flag configuration options as deprecated, renamed, whether you will allow merged configurations and so on. Finally it also does some type coercion checking e.g. if you try to update an int with a string.

YACS also allows you to enforce immutability so that once you've started your experiment/app and you handle user input, everything is frozen. This is very useful for repeatable experiments because you can log/store the config and you should be able to trust that the parameters weren't modified later on.

You also get serialisation, but that's a very thin wrapper around yaml dump and you could easily customise it.

Functions-as-values is an interesting case. I'm not sure what the cleanest way to handle that is. This is a nice blog post that discusses using a registry decorator: https://julienbeaulieu.github.io/2020/03/16/building-a-flexible-configuration-system-for-deep-learning-models/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants