Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define map for Sets #51703

Open
schlichtanders opened this issue Oct 14, 2023 · 4 comments
Open

Define map for Sets #51703

schlichtanders opened this issue Oct 14, 2023 · 4 comments
Labels
collections Data structures holding multiple items, e.g. sets feature Indicates new feature / enhancement requests

Comments

@schlichtanders
Copy link

schlichtanders commented Oct 14, 2023

julia> map(identity, Set([1,2,3]))
ERROR: map is not defined on sets
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] map(f::Function, #unused#::Set{Int64})
   @ Base ./abstractarray.jl:3294
 [3] top-level scope
   @ REPL[39]:1

I know that there is an ongoing discussion about what map should do on Dicts and that there is also a similar closed issue which combined Set/Dict, but really, reading through it does not seem to give any argument why map should not be defined for Sets.

I guess the discussion for Set is what the semantics of map is:

  1. whether it should always return a fixed type (like Python's map)
  2. or whether map follows the general functor semantics (found e.g. in haskell or Scala), which means that map returns the same parent type which the input has.

Let's go for functor semantics (2.) - some thoughts

While for Dict semantics 2 are not really clear in julia, because a dict may be understood as a vector of pairs or some fancy indexed values, for Set the semantics 2 is indeed crystal clear.

map(x -> x+1, Set([1,2,3])) would return Set([2,3,4]) when following semantic 2.

Also within Julia's standard library the SparseVector notably follow semantics 2:

map(x -> x+1, sparse([1,2,0])) returns a SparseVector

I would also like to argue that, in general, for julia having multiple dispatch it makes far more sense to follow the functor semantics (2.).

  • Julia supports defining custom types and also encourages people to define custom types (also custom arrays, sets etc.). It would be against this spirit if every type has to define their own map function in order to stay within their own types.
  • while python cannot do this obviously, multiple dispatch makes it really easy to define functor semantics.
  • I guess this can lead also to better type-stability overall (not entirely sure about this one)

Conclusion

Although the discussion for Dict is not resolved, it makes sense to define map for Set in julia.

@tecosaur tecosaur added the collections Data structures holding multiple items, e.g. sets label Oct 15, 2023
@jariji
Copy link
Contributor

jariji commented Oct 16, 2023

Related: #32081, #33701, #38344

@nsajko nsajko added the feature Indicates new feature / enhancement requests label Jan 16, 2025
@LilithHafner LilithHafner added the triage This should be discussed on a triage call label Jan 16, 2025
@MasonProtter
Copy link
Contributor

MasonProtter commented Jan 16, 2025

Triage says the structural map for Set is too much of a footgun, because it can cause lots of silent issues where you get unexpected iteration orders.

We used to do this long ago, but then @vtjnash had an example of iterating over a set of files asking how big they were and accidentally got wonky results not realizing he was getting and unordered thing out.

The feeling was that if we do make this not a method error, then we'd just make it hit the generic fallback

map(f, A) = collect(Generator(f, A))

There was a lengthy, mostly inconclusive discussion on "what does map mean anyways?"

@jariji
Copy link
Contributor

jariji commented Jan 16, 2025

I see a couple different kinds of map functions. One class has a fixed output type: it takes any collection (iterator, set, whatever) and returns Array in the case of Base.map (or takes any collection and returns Generator in the case of Iterators.map, or takes any collection and returns Transducer in the case of Transducers.Map).

Another kind, sometimes called fmap ("functor map"; functor in the Haskell sense meaning "mappable type") takes any container and returns that same type : Tuple to Tuple, iterator to iterator, Maybe{X} to Maybe{Y}, NamedTuple to NamedTuple.

Base.map does a bit of both of these: sometimes it returns Array for non-Array argument types and sometimes it returns the argument type. In practice this works okayish (map(identity, (1 for _ in 1:3))::Vector is usually fine and map(identity, (1,2)) == (1,2) is too), though a cleaner separation between the two kinds of maps might be better. That way every invocation would have a clear contract so implementers and callers would know exactly what's supposed to happen.

A special consideration with map(f, ::Set) is that it can reduce the number of elements if f isn't one to one, which an InterfaceSpec for fmap might disallow. In this sense, fmap(f, ::Set) could not be implemented, and Base.map(f, ::Set), if Base.map is the map function that always returns Arrays.

If this two-way decomposition is adopted, then Base.map(f, ::Tuple) should return Array, not Tuple , and fmap (suitably renamed for the Julia audience) would be defined for various collection types to return the argument type.

A third interpretation of map is a Base.map that doesn't always return Array but always returns an indexable collection preserving the indices and size of its argument when they exist, in which case Base.map(f, ::Tuple)::Tuple would be okay, as would Base.map(f, ::Generator)::Vector and Base.map(f, ::Set)::Vector. Base.map(f, ::Set)::Set would not work because the result is not indexable and doesn't preserve size. This seems like a good idea to me.

tl;dr: map can return Array or Tuple (or similar). Base.map(f, ::Set) should return Vector. It would be nice to also
define another fmap that preserves container type.

@MasonProtter
Copy link
Contributor

I should also mention that Triage almost universally agreed that it was probably a bad idea that we have a special method for map(f, ::String) (see #57071)

@MasonProtter MasonProtter removed the triage This should be discussed on a triage call label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests

6 participants