Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays: Representation of single members of an array #826

Open
ChristianGruen opened this issue Nov 9, 2023 · 13 comments
Open

Arrays: Representation of single members of an array #826

ChristianGruen opened this issue Nov 9, 2023 · 13 comments
Labels
Discussion A discussion on a general topic. Editorial Minor typos, wording clarifications, example fixes, etc. PRG-hard Categorized as "hard" at the Prague f2f, 2024 PRG-required Categorized as "required for 4.0" at the Prague f2f, 2024 XQFO An issue related to Functions and Operators

Comments

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 9, 2023

When introducing the new array features to some users, the for member syntax was welcomed by everyone.

However, there was some confusion (again, see my past feedback to the mailing list) about what the QT4 group considers to be “members of an array”, and about value records.

In particular, the “value record” representation of arrays led to questions that I didn’t have a good answer for. In particular, people didn’t understand why an array member was returned as a map, and why that map is (again) called “array member” or “value record” – a term no one associated with arrays (at least for now… which somewhat is not surprising, as it has just been introduced).

Next, due to atomization (as mentioned before), array:split allows us to omit the explicit ?value lookups that are required for array:members:

sum(array:members($array)?value)
sum(array:split($array))

I suppose I have been biased in my presentation, but I’ve failed to give good arguments to justify the current solution in the spec. The questions that I think need to be answered are:

  • How will people benefit from the (usually intermediate) map representation for array members?
  • What exactly do we win with array:members and array:of-members instead of using the existing array:join function, combined with the new array:split function?

Out of interest, I have rewritten the formal equivalencies for the array functions with array:split/array:join:

  • array:append

array:of-members((array:members($array), map{'value':$member}))
array:join((array:split($array), array { $member }))

  • array:build

array:of-members($input ! map { 'value': $action(.) })
array:join($input ! array { $action(.) })

  • array:filter

array:of-members(array:members($array) => filter(function($m) { $predicate($m?value) })
array:join(array:split($array) => filter(function($m) { $predicate($m?*) })

  • array:for-each

array:of-members(array:members($array) ! map { 'value': $action(?value) })
array:join(array:split($array) ! array { $action(?*) })

  • array:for-each-pair
array:of-members(
  for-each-pair(array:members($array1), 
    array:members($array2), 
    function($m, $n) {map{'value': $action($m?value, $n?value)}}))
array:join(
  for-each-pair(array:split($array1), array:split($array2),
    function($m, $n) { array { $action($m?*, $n?*) } }))
  • array:insert-before

array:of-members(array:members($array) => insert-before($position, map{'value':$member}))
array:join(array:split($array) => insert-before($position, array { $member }))

  • array:remove

array:of-members(array:members($array) => remove($positions))
array:join(array:split($array) => remove($positions))

  • array:reverse

array:of-members(array:members($array) => reverse())
array:join(array:split($array) => reverse())

  • array:slice

array:of-members(array:members($array) => slice($start, $end, $step))
array:join(array:split($array) => slice($start, $end, $step))

  • array:split

array:of-members(array:members($array) => sort($collation, function($x) { $key($x?value) }))
array:join(array:split($array) => sort($collation, function($x) { $key($x?*) }))

  • array:subarray

array:of-members(array:members($array) => subsequence($start, $length))
array:join(array:split($array) => subsequence($start, $length))

  • array { $sequence }

array:of-members($sequence ! map { 'value': . })
array:join($sequence ! array { . })

  • [E1, E2, E3, ..., En]

array:join((map { 'value': E1 }, map { 'value': E2 }, map { 'value': E3 }, ... map { 'value': En }))
array:join((array { E1 }, array { E2 }, array { E3 }, ... array { En }))

  • $array?*

array:members($array) ! ?value
array:split($array) ! ?*

  • $array?$N / $array($N)

array:members($array)[$N]?value
array:split($array)[$N]?* (or array:get($array, $N))


As a side note, I noticed that the equivalence given for array:join must be buggy:

(: current equivalence presented in the spec :)
array:of-members($arrays ! array:members(.))

(: returns [ 1, 2, 3 ] :)
let $arrays := ([ 1 ], [ 2, 3 ])
return array:of-members($arrays ! array:members(.))

Concluding, If I could choose, I would tend to drop array:members and array:of-members and rename array:split to array:members.

@ChristianGruen ChristianGruen added XQFO An issue related to Functions and Operators Editorial Minor typos, wording clarifications, example fixes, etc. Discussion A discussion on a general topic. labels Nov 9, 2023
@michaelhkay
Copy link
Contributor

michaelhkay commented Nov 9, 2023

The difference between array:split and array:members is essentially a choice on how to represent an array member: in one case we do it with a "value record" and in the other we do it with a singleton array.

In recent work I have experimented with both, and I have to say I'm not happy with either. Neither really works well when you attempt a transformation based on a recursive tree walk using pattern matching.

I'd like to consider going back to my original idea of splitting an array into "parcels" (or building an array from parcels), where a parcel is a zero-arity function carrying the annotation %parcel; calling the function delivers the contents of the array member. This is about as close as we can get to an encapsulated representation of the concept without actually extending the data model.

I've just re-read your email summarising feedback from BaseX users. It's a very useful contribution, but I think it's very much an XQuery users' perspective. It doesn't feel to me that these users are struggling with the challenge of doing complex structural transformations of JSON documents.

@michaelhkay
Copy link
Contributor

But I do agree that at the XQuery and XPath level, "for member $x in $array" and "for key $k value $v in $map" are nicer; and I'm inclined to (revert to) proposing something similar for XSLT:

<xsl:for item="$x" in="$sequence">...</xsl:for>
<xsl:for member="$m" in="$array">...</xsl:for>
<xsl:for key="$k" value="$v" in="$map">...</xsl:for>

In each case allowing the "loop body" part of the expression to be either a sequence constructor or a select attribute.

For join operations there's definitely a benefit in being able to bind range variables rather than the context item.

@ChristianGruen
Copy link
Contributor Author

We use something like parcels for our current Java bindings: Java objects, in particular those that have no obvious XDM type, are wrapped into function items, and can explicitly be converted to XDM types by invoking them. It’s pretty convenient.

%member feels like an appropriate name (but I guess your vision is more generic and not necessarily limited to arrays).

I completely agree that this discussion is driven by XQuery, and I haven't considered generic map/array updates at all. In our world, complex updates on JSON are usually done with XQUF (sometimes verbose, and custom to our JSON XML representation, but definitely powerful and versatile):

'{ "one": 1, "due": 2, "three": 3 }'
! json:parse(.)
! (json update {
  delete node ./three,
  rename node ./one as 'uno'
})
! json:serialize(.)

@ChristianGruen
Copy link
Contributor Author

As a side note, I noticed that the equivalence given for array:join must be buggy:

My side note can be ignored; the equivalent expression looks alright.

@ChristianGruen
Copy link
Contributor Author

I'd like to consider going back to my original idea of splitting an array into "parcels" (or building an array from parcels), where a parcel is a zero-arity function carrying the annotation %parcel; calling the function delivers the contents of the array member. This is about as close as we can get to an encapsulated representation of the concept without actually extending the data model.

I believe we absolutely need to find other names for array:members and array:of-members:

  • In the XQFO spec, $member is defined multiple times as parameter, and its type is always item()*.
  • The rule of array:get (which also returns item()*) states that it “returns the member at a specified position in the array”.

If the return type will be %parcel functions, possible names could be array:parcels and array:of-parcels.

In any case, we may need to find and document more uses for these two functions, and cases where array:join wouldn’t work, or at least be more verbose. It’s a commonplace, but with any new concept, there’s some risk that we’ll overwhelm users.

@michaelhkay
Copy link
Contributor

michaelhkay commented Nov 13, 2023

We need a mechanism to split an array into its parts (members) and to reassemble those parts in a different way. The question is, what is the best way of representing the parts? array:split and array:join represent the parts as an array of arrays, and that is certainly one way of doing it; array:members and array:of members represent the parts as "value records" and that is another way of doing it.

When we're doing a rule-based tree-walking transformation in the XSLT style, we want to write rules that process the parts of the array and transform them. That means we need to match them, which means we need to distinguish them from other kinds of value. The challenge is therefore to find a representation that makes these "parts of an array" easily recognisable as such. Splitting into "value records" serves that purpose rather better than splitting into sub-arrays, though it is by no means perfect.

When we work with XML, intermediate data values can be made very easily recognizable by choosing distinctive element names. Working with maps and arrays is much more difficult because there are no element names to match. Perhaps annotations can fill the gap.

@michaelhkay
Copy link
Contributor

There was discussion today about deep lookup and deep update, and both of these would benefit from being able to talk about the "leaf values" in a map or array as something that's more than just a sequence of items. Rather in the same way that a text node is more than just a string.

Related: when we talk about key-value pairs in a map, I often find it awkward that the word "value" is used both to mean "any XDM value; a sequence", and to mean one part of a map entry. Things would get much easier if we could improve the terminology:

  • There are two kinds of functions: tabulated functions and procedural functions.
  • There are two kinds of tabulated functions: maps and arrays.
  • A tabulated function consists of a set of entries, called key-member pairs. The key is an atomic item, which in the case of an array is always an integer; the member is an arbitrary value.
  • The term "atomic item" (or just atom?) replaces "atomic value".

It would be nice to think of a deep-lookup returning a set of members, in the same way as a path expression selects a set of nodes, which is then implicitly flattened/atomized if the context requires a flat sequence. This still leaves all the options open for how "members" are represented.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Nov 21, 2023

Related: when we talk about key-value pairs in a map, I often find it awkward that the word "value" is used both […]

Maybe values of map entries could be called members, and…

This still leaves all the options open for how "members" are represented.

…instead of array:members, we could have array:entries, which returns singleton maps with the array index as key and the member as value, and possibly array:merge (instead of array:of-members) to create an array from those entries. And we could have another thought on map:pair, map:pairs and map:of-pairs: I feel they're pretty redundant and could be removed.

@ChristianGruen
Copy link
Contributor Author

The proposed functions could also be used to convert arrays to maps, and vice versa:

$array
=> array:entries() => map:merge()
=> map:entries() => array:merge()

array:merge could be defined to allow for the creation of sparsely populated arrays:

(: Result: [ (), (), 'III', (), 'V' ] :)
array:merge((map { 3: 'III' }, map { 5: 'V' }))

@ndw ndw added PRG-hard Categorized as "hard" at the Prague f2f, 2024 PRG-required Categorized as "required for 4.0" at the Prague f2f, 2024 labels Jun 4, 2024
@michaelhkay
Copy link
Contributor

As a result of #1331, we now have formal equivalents for all array and map functions, with the exception of map:find, which depend only on other map/array functions, or on a small set of primitive constructors and accessors defined in XDM. Moreover, these equivalents have now been tested, at least to the extent that the run all the array/map function examples successfully.

For the array functions, I have typically used array:members() and array:of members() to convert arrays to sequences and back. This makes it particularly easy to define functions such as array:remove and array:reverse in terms of their sequence equivalents. It could have been done using array:split and array:join, but this would have given weaker type checking.

As an internal mechanism for defining the array/map functionality, these functions have proved very useful. Whether they are equally useful for "real users" is an open question. But I see no reason to make them private.

@ChristianGruen
Copy link
Contributor Author

I believe we should continue this discussion and look at some questions more broadly. I’ll probably open a new issue for it.

This makes it particularly easy to define functions such as array:remove and array:reverse in terms of their sequence equivalents. It could have been done using array:split and array:join, but this would have given weaker type checking.

Just to understand: Those are equivalent writings for array:reverse

$array => array:split() => reverse() => array:join()
$array => array:members() => reverse() => array:of-members()

Why would array:members and array:of-members give us better typing? Doesn’t it simply represent another way to wrap the processed data?

I think the most confusing thing about array:member remains that it results in a map, and that array:of-members expects maps as input. I wonder whether we shouldn’t try to aim for more primitive data structures whenever we present equivalent code.

@michaelhkay
Copy link
Contributor

Why would array:members and array:of-members give us better typing? Doesn’t it simply represent another way to wrap the processed data?

The key problem is when you get confused about whether the members of the array have been "parcelled" or not. If you pass a "parcelled" value to a function that's not expecting it, it's nice to get a type error. Ideally for this purpose a "parcel" would be a completely separate data type. Short of that, I think the "value record" representation is more likely to trigger a type error than the "array" representation. The other option, which probably scores better than either of the above, is to parcel the members as zero-arity function items.

@ChristianGruen
Copy link
Contributor Author

The key problem is when you get confused about whether the members of the array have been "parcelled" or not. If you pass a "parcelled" value to a function that's not expecting it, it's nice to get a type error.

Thanks. Yes, I agree we might really need a custom type to improve typing. Otherwise, code like…

{ "v": 1 } => reverse() => array:of-members()

…might raise errors like “Cannot convert map(*) to record(value)”, which doesn’t really remind of array operations. On the other hand, typing may not be too important at all if we primarily want to use the functions to present equivalencies in the spec.

Maybe you have seen #1338, in which I have proposed to use array:pairs/array:of-pairs or array:entries/array:merge to get rid of the terminological “members” ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. Editorial Minor typos, wording clarifications, example fixes, etc. PRG-hard Categorized as "hard" at the Prague f2f, 2024 PRG-required Categorized as "required for 4.0" at the Prague f2f, 2024 XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

3 participants