Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more discussion and explanation of deep-lookup operator #854

Closed
michaelhkay opened this issue Nov 22, 2023 · 13 comments
Closed

Need more discussion and explanation of deep-lookup operator #854

michaelhkay opened this issue Nov 22, 2023 · 13 comments
Labels
Discussion A discussion on a general topic. XDM An issue related to the XPath Data Model

Comments

@michaelhkay
Copy link
Contributor

michaelhkay commented Nov 22, 2023

During discussion of the ?? operator it was pointed out that we need more examples and explanation, especially of how to handle cases where the "flattening" behaviour of the operator is inconvenient. This applies equally to paths using the existing ? operator - $x?y?z and $x??z both have this problem. For example, there is no way of filtering the result of$x?y?z or $x??z to select only members of size 3.

@michaelhkay
Copy link
Contributor Author

I would like to explore whether we can follow the example of JSONPath by making the lookup operators return a more complex object that is then "flattened" (in some sense) when the context demands it. This is analagous to the way the result of a path expression on a node tree is atomised when the context demands atomic values rather than nodes.

Let's follow JSONPath by saying that the lookup operators return a sequence of "member locators", one for each member of an array or map that is selected. Except where otherwise specified, the effect of supplying a sequence of member locators to any function or operator is to flatten it to a sequence of items (thus preserving compatibility). But some operations are available that treat it differently: for example

  • An operation that delivers the sequence of member locators as a sequence of arrays/parcels/value-records, one for each value locator.
  • An operation that delivers the location information of where the value was found - specifically, the containing map or array, and the key/index of the value within the containing map or array.

This then provides the necessary infrastructure to support PR #832 - so long as the downwards selection is done using lookup operators, (a) we can update the located members as a unit (rather than operating on individual items), and (b) we can propagate updates upwards to return a new version of the containing tree.

Finding syntax, as always, is difficult. Perhaps we could write members{$x?y?z} or members{$x??z} to prevent the flattening.

array:members() would then likewise deliver a sequence of "member locators". Functions would available on member locators to obtain the value of the member (a sequence), its key/index, its containing map/array, etc.

@ChristianGruen
Copy link
Contributor

Finding syntax, as always, is difficult. Perhaps we could write members{$x?y?z} or members{$x??z} to prevent the flattening.

Before discussing syntactical questions, it would be helpful to head for some common terminology. Right now, it seems hard (at least to me) to follow what we believe is, or could be, a member, an entry, a value, etc.

Here is what I think is the status quo:

  • A member is a value stored inside an array, without any information about where it appears in the array.
  • An entry is a key-value pair stored in a map.
  • A value is either the very basic unit of the data model, or the value of a map entry.

With the proposal above, would you suggest sticking with this terminology?
Or would you rather suggest renaming map entries to members?
If not, what unit would be referenced by a member locator of a map entry?

In #826 (comment), I suggested defining a function for both arrays and maps with the same name, which returns a singleton map for each unit. I proposed array:entries and map:entries

[ (), 1] =>  array:entries()               (: map { 1: () }, map { 2: 1 } :)
map { 'a': (), 'b': 1 } =>  map:entries()  (: map { 'a': () }, map { 'b': 1 } :)

…but following this thread I assume the functions (provided we’d want them) would rather be called array:members and maps:members?

@ChristianGruen ChristianGruen added XDM An issue related to the XPath Data Model Discussion A discussion on a general topic. labels Nov 23, 2023
@michaelhkay
Copy link
Contributor Author

Outline, sketchy thoughts:

  1. Introduce objects and classes - see From Records to Objects #720
  2. Introduce a system-defined class Entry representing a single entry in a map or array, together with its location within that map or array. Methods include ?key() to give the key (map key or array index), ?value() to give the corresponding value, ?owner() to give the containing map or array.
  3. Introduce an implicit conversion called unpackaging, along the same lines as atomization. If an Entry E is supplied in a context where the required type is something other than Entry, the Entry E is implicitly replaced by E?value().
  4. The lookup operator, together with various other constructs such as array:members(), map:entries(), for member $m, etc are changed to return a sequence of entries, relying on implicit unpackaging to achieve compatibility.
  5. We define some kind of magic syntax to force the result of a lookup to be a set of entries without unpackaging, for example entry{a?b?c}. This means two things: (a) the values are not flattened, so we can treat the value as a whole. For example count(entry{a?b?c}[empty(?value())]) returns the number of entries whose value is an empty sequence. And (b) the result gives us access to the location of the entries in the JTree that we were searching, making it usable for deep update operations, e.g. deep-delete($map, entry{??note}).
  6. Filtering remains a challenge. Currently $map?*[. gt 3] filters the result of ?* after flattening. How do we select all the entries of size 4 or more? We could do entry{$map?*}[count(?value()) ge 4]. But that gets messy if it's in the middle of a path.
  7. I'm very reluctant to introduce more cryptic symbols, but perhaps we need $map^key to select entries without flattening?
  8. Or perhaps we make it a property of the map that lookups are unflattening: hilly($map)?key creates a hilly map with the same contents as $map, and when lookup is applied to a hilly map, it doesn't do flattening.

@ChristianGruen
Copy link
Contributor

The lookup operator, together with various other constructs such as array:members(), map:entries(), for member $m, etc are changed to return a sequence of entries, relying on implicit unpackaging to achieve compatibility.

If lookups return Entry items, can empty([()]?*) still be true? In other words, are entries to be unpacked whenever you perform any operation on them, or will there still be exceptions?

In any case, I think it should be possible to be able to write simple constructs like for member $m in $array return count($m) without further syntax.

And I wonder whether there are additional use cases apart from updates to justify this fairly fundamental enhancement? The title of this issue is about the deep-lookup operator. What would be the connection?

Sorry for all the questions, which aren't meant to be heretic. Probably I can't see yet how a majority of users could benefit from the changes.

@michaelhkay
Copy link
Contributor Author

The title of this issue is about the deep-lookup operator.

Dimitre raised the concern about the fact that the deep-lookup operator (??c) does implicit flattening and I promised to look into it. The first thing you notice is that shallow lookup (?a?b?c) has exactly the same problem. I think there are two sides to the problem: one is that you lose information about the boundaries between different values that were selected: for example, you can't ask how many entries selected by ?a?b?c were empty, because they have vanished. The other is that you lose location information, such as you would get from an equivalent expression in JSONPath, and which is needed for doing deep update.

With traditional node selection, a path like /a/b/@c gives you nodes, and if you're only interested in the values, then atomization happens implicitly. I'm searching for a way of achieving a similar convenience with maps and arrays.

@michaelhkay
Copy link
Contributor Author

Perhaps we can make it work like this:

  1. We introduce a new kind of item called a JNode. We can talk later about where this fits into the type system; it could be a new primitive, or it could be introduced as part of some general move towards an extensible and object-oriented type system.
  2. A map or array can be converted to a JNode with a function call jnode($map-or-array). The resulting jnode is the root of a tree. It has a value() property which is the original map or array.
  3. The lookup operators ? and ?? when applied to a JNode return a sequence of JNodes. Each of these JNodes represents an entry in a map or array with the requested key or index. These JNodes have additional properties: key() gives the associated map key (or array index), owner() gives the containing JNode, root() gives the root JNode.
  4. So to count how many ??phone entries in a tree are empty, you can do jnode($map)??phone[empty(?value())]. That's because ??phone when applied to a JNode doesn't return the phone number, it returns a JNode containing the phone number as its value().
  5. When the lookup operator is applied to a JNode N, the search effectively takes place within the JNode's value(), but the selected JNodes in the result identify the JNode via which they were found, giving a full ancestor trace back to the root.
  6. Because JNodes found via a sequence of lookup operations carry this link back up the search tree, deep update becomes possible: for example deep delete ?a?b??note from $map can wrap $map in a JNode, select the descendant JNodes to be deleted using the expression ?a?b??note applied to that JNode as its context item, and return a new map that differs from $map in that the relevant entries are absent.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 25, 2023

I definitely like the last approach. It reminds me of a project we did some time ago to be able to traverse in maps & arrays with the help of function items (only now I notice it’s very similar). Here’s some basic runnable code that presents the basic idea:

(: creates a function item encapsulating a value and its optional parent :)
declare function Q{Path}new($value, $parent := ()) {
  function($op) {
    if(string($op) = '_value') then (
      (: return encapsulated value :)
      $value
    ) else if(string($op) = '_parent') then (
      (: create function item pointing to the parent :)
      Q{Path}new($parent, $parent ! .('_parent'))
    ) else (
      (: create function item pointing to a child :)
      Q{Path}new($value?($op), $value)
    )
  }
};

let $value := [ map {}, map { 'xyz': 'text' } ]
let $path := Q{Path}new($value)
return $path(2)('xyz')('_parent')('_value')

If we allowed lookups on function items (see #51), the last line could be written as follows:

return $path?2?xyz?_parent?_value

@michaelhkay
Copy link
Contributor Author

Indeed, that's very similar in concept. There's a problem in both solutions that they require reserved names for special properties (such as _parent and _value). Could adopt a pragmatic approach to that, for example if you want to refer to a standard user property that clashes with one of the magic names, you have to write it in quotes.

Can one get by without introducing a new data type, and just do everything by layering on top of the types we already have? I think that's probably not going to be too usable: at some level you want to be able to say "is this item a JNode?". But it might be that the most economical way of achieving that is with annotations, so a JNode is just a map annotated with %class="JNode".

@ChristianGruen
Copy link
Contributor

There's a problem in both solutions that they require reserved names for special properties (such as _parent and _value).

Yes. As long as the lookup operator only accepts atomic values, we could pass on sequences with 0 or more than 2 items or function items. Next, the proposed approach does not support wildcard or descendant traversals.

One solution could be to define the actual operation in the supplied function item:

declare function Q{Locator}new($context, $ancestors := []) {
  function($arg) {
    let $path := array:join(($ancestors, [ $context ]))
    return if($arg instance of function(*)) then (
      (: compute result :) $arg($path)
    ) else (
      (: create child locator :) Q{Locator}new($context?$arg, $path)
    )
  }
};
declare function Q{Locator}parent($path) {
  let $ancestors := $path => array:trunk()
  return Q{Locator}new(array:foot($ancestors), array:trunk($ancestors))
};
declare function Q{Locator}value($path) {
  $path => array:foot()
};

let $value := [ map {}, map { 'xyz': 'text' } ]
let $locator := Q{Locator}new($value)
return $locator(2)('xyz')(Q{Locator}parent#1)(Q{Locator}value#1)

Can one get by without introducing a new data type, and just do everything by layering on top of the types we already have?

One proof-of-concept solution for fn:deep-delete($map-or-array, fn { ?2?xyz }) could be to…

  1. wrap $map-or-array into a function of the type above (which gives us LOCATOR),
  2. invoke the selector argument with LOCATOR (which gives us a RESULT), and
  3. retrieve the path by calling RESULT(identity#1) (which gives us a PATH), and
  4. build the updated map/array utilizing PATH.

Following these steps, it should be fairly easy to write such a fn:deep-delete function in XQuery. But…

I think that's probably not going to be too usable: at some level you want to be able to say "is this item a JNode?"

…I definitely agree. Error messages for the selector function would be close to undecipherable. A %jnode or %locator annotation would be a start. The next step would be object-like structures (#720).

Said this, I still have some sympathy for an even simpler solution (although it’s far from visionary):

fn:deep-delete($map-or-array, [ 2, 'xyz' ])

@michaelhkay
Copy link
Contributor Author

fn:deep-delete($map-or-array, [ 2, 'xyz' ])

I think the problem with the tumbler approach is that practical use cases will invitably involve predicates.

@dnovatchev
Copy link
Contributor

dnovatchev commented Nov 26, 2023

I am sorry I missed this interesting discussion.

What is wrong with returning a sequence of arrays, each containing a single member?

I like the idea of JNodes, but don't like the name JNode. Maybe use a name like "structure" or "composite", or "packaged", or "wrapped"? In .NET the term used is "boxed" and "unboxed".

function($x as wrapped(some-type) ) means that the function has an parameter $x that is a singleton-array that contains a single member whose type is some-type.

In case when a function accepts an argument of type some-type but is passed a wrapped(some-type) value, then we could introduce a rule that in any such case the wrapped(some-type) value will be coerced to a value of the expected type some-type (unwrapped).

@ndw
Copy link
Contributor

ndw commented Jun 4, 2024

Addressed by PR #832

@ndw ndw added the PR Pending A PR has been raised to resolve this issue label Jun 4, 2024
@ChristianGruen ChristianGruen removed the PR Pending A PR has been raised to resolve this issue label Oct 8, 2024
@michaelhkay
Copy link
Contributor Author

Although PR #832 was abandoned, the work was progressed and the current design using modifiers after the ? or ?? operator addresses the issues raised here. I'm therefore closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. XDM An issue related to the XPath Data Model
Projects
None yet
Development

No branches or pull requests

4 participants