Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jooc: why are some values in stream meta-data lists of dicts and some just dicts? #77

Closed
dmedine opened this issue Sep 3, 2021 · 2 comments

Comments

@dmedine
Copy link

dmedine commented Sep 3, 2021

I find this rather confusing. For example, if I want to look at channel labels for a stream I need something like:

for i, ch in enumerate(eeg_stream['info']['desc'][0]['channels'][0]['channel']):
    print(ch['label'][0])

In what scenario would there ever be more than one label, desc, or (for that matter) list of channel dictionaries in info? It makes sense that 'channels' contains a list of 'channel' dicts, but that its value itself be a list of one list of 'channel' dicts.

Whenever I work with this I find myself constantly having to remind which values are lists and which are simply values. Especially confusing to me is the fact that nominal_srate is a list (again, there will only ever be one so why?) but effective_srate is not a list.

I am sure there is a perfectly good explanation for this but I can't work out what it could be.

@cboulay
Copy link
Contributor

cboulay commented Sep 3, 2021

I think it's just due to the _xml2dict implementation that treats every level like a list:

pyxdf/pyxdf/pyxdf.py

Lines 490 to 496 in 5384490

def _xml2dict(t):
"""Convert an attribute-less etree.Element into a dict."""
dd = defaultdict(list)
for dc in map(_xml2dict, list(t)):
for k, v in dc.items():
dd[k].append(v)
return {t.tag: dd or t.text}

The same thing happens whenever you use scipy.io.loadmat to load a matlab .mat file. THAT function, however, has an option that you can set squeeze_me=True that eliminates this extra container.

#74 is a PR that I haven't had time to get back to but changes a lot of the pyxdf code. One of the things I do in that PR is flatten the stream header, exactly as you'd expect.

pyxdf/pyxdf/pyxdf.py

Lines 759 to 762 in c037ae5

flatted_stream_header = {
k: v[0] if isinstance(v, list) and (len(v) == 1) else v
for k, v in chunk["info"].items()
}

If you have time to test the PR then maybe we can get it merged in sooner.

@tstenner
Copy link
Contributor

tstenner commented Sep 3, 2021

Let's hope I can bring this from "confusing" to "straightforward but entirely unsatisfying".

Your eeg_stream is a dict with the ['info'] field containing the XML data. In the absence of a schema, every field (other than the root element) in the XML can be specified as many times as there's space. So, something like this would be perfectly fine:

<info>
  <desc>
    <channels type="EEG">
      <channel>FPz</channel>
      <channel>FP1</channel>
      <channel>FP2</channel>
    </channels>
    <channels type="EOG">
      <channel>VEOG</channel>
      <channel>HEOG</channel>
    </channels>
  </desc>
</info>

The answer from the XML enthusiast community is "Use an XML schema" (like this), that says "There has to be exactly one <desc> element, which may contain zero or one <channel> elements". The XML reader can then infer that the info should look like this:

class Desc {
  List<Channel> channels?;
  String? metadata1;
  String? metadata2;
}
class Info {
  Desc desc;
  DateTime creationTime;
}

From what I've seen, the Matlab XML reader is already at its limits with the simple task of converting an XML string to a dictionary, and even if we were to drop it, adding new elements to the schema that were previously not there will break code that assumed its in a dictionary (something like eeg_stream.info.desc["foobar"][0] would then be eeg_stream.info.desc.foobar).

So, instead of a hugely complex mess where just getting a field from the info differs between all loaders, its simply "everythings a list of dicts, get used to sprinkling [0]). Replacing all single-item lists with the only item is tempting, but will break lots of generic code as soon as a list has only one item, e.g. [chan for chan in channels] will break if there's really only one channel.

@cbrnr cbrnr closed this as completed Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants