[ITensors] [ENHANCEMENT] `Index` tags redesign

# `Index` tags redesign proposal

This is a proposal for a redesign of the `Index` object, specifically related to tags.

There are a few things that bother me about the design of `Index` tags:
1. There is a finite maximum number of tags, and that number is fixed in a way that is not easily changed by users.
2. Tags have a maximum length, again to a length which is fixed and not easily changed by users, which has caused awkward situations in ITensorNetworks.jl when we want to automatically give indices tags based on vertex and edge labels, which can become long.
3. Some tags have extra meaning, like site types used to make operators, tags which store site or link numbers, tags that store which unit cell we are in (like in [ITensorInfiniteMPS.jl](https://github.com/ITensor/ITensorInfiniteMPS.jl)), etc. but it isn't easy to distinguish those from other tags that are just meant for printing.

My proposal for fixing those issues is to redesign the `Index` object to store the tags in a `Dict{Symbol,String}`:
```julia
struct Index{Space}
  space::Space
  dir::Arrow
  id::UInt64
  plev::Int
  tags::Dict{Symbol,String}
end
```

Here is a demonstration of how it might get constructed, printed, etc.:
<details><summary>Definitions</summary><p>

```julia
using IterTools: flagfirst

@enum Arrow In = -1 Out = 1 Neither = 0

struct Index{Space}
  space::Space
  dir::Arrow
  id::UInt64
  plev::Int
  tags::Dict{Symbol,String}
end

# Enable syntax `i.p` for property `p` in `metadata`.
function Base.getproperty(i::Index, p::Symbol)
  if p in fieldnames(typeof(i))
    return getfield(i, p)
  end
  return getfield(i, :tags)[p]
end

space(i::Index) = getfield(i, :space)
plev(i::Index) = getfield(i, :space)
id(i::Index) = getfield(i, :id)
tags(i::Index) = getfield(i, :tags)

map_dict(f, d::AbstractDict) = Dict((i => string(d[i]) for i in eachindex(d)))

function Index(space; dir=Neither, id=rand(UInt64), plev=0, kwargs...)
  metadata = map_dict(string, kwargs)
  return Index(space, dir, id, plev, metadata)
end

function Base.propertynames(i::Index)
  ps = collect(fieldnames(typeof(i)))
  filter!(p -> p ≠ :tags, ps)
  append!(ps, eachindex(getfield(i, :tags)))
  return ps
end

function Base.:(==)(i1::Index, i2::Index)
  return (i1.id == i2.id) && (i1.plev == i2.plev) && (i1.metadata == i2.metadata)
end

function Base.show(io::IO, i::Index)
  print(io, "Index(")
  for (isfirst, p) in IterTools.flagfirst(propertynames(i))
    if !isfirst
      print(io, "|")
    end
    print(io, p, "=", getproperty(i, p))
  end
  print(io, ")")
  return io
end
```

</p></details>

```julia
julia> i = Index(2; plev=1, n=1, type="S=1/2")
Index(space=2|dir=Neither|id=18242763518944074104|plev=1|n=1|type=S=1/2)
```

This design would be helpful for designing infinite tensor networks, such as the design used in [ITensorInfiniteMPS.jl](https://github.com/ITensor/ITensorInfiniteMPS.jl) where a "cell" tag is used to mark which unit cell a site of the infinite MPS is in, which helps with implementing lazy infinite indexing where only the tensors in the first unit cell are stored. Generalizing that design to higher dimensions, where you want to store cell values for each dimension, puts a big strain on the current tag design, while it would fit naturally with this new design proposal.

There are of course very reasonable alternative choices for the storage of tags that have the same benefits, but have some other tradeoffs:
1. One could decide to use a `NamedTuple` instead of a `Dict` to store the tags. An advantage of that design is that you could more naturally have non-homogeneous tag types (i.e. some tags could be strings, while others could be integers). Also, `NamedTuple` is statically sized and therefore could be stack allocated, depending on the types of the tags. A concern I have with that design is that if an ITensor has indices with different tag categories (i.e. one has `n=1` but another doesn't specify `n`) then the Indices don't all have the same types, which could cause some type stability issues (which could cause performance issues in certain cases). Relatedly, `NamedTuple` would generally put more strain on the compiler, potentially having an impact on compile times.
2. The types of the keys and values of the `tags` `Dict` could be something besides the types I proposed above. It seems reasonable for the keys to be `Symbol` and the values to be `String` but we should test out different options to see if there are issues with performance. I think it was a mistake in terms of usability to use a fixed sized string for the tags since it is very awkward when they overflow and it isn't easy to give users control over the maximum length, I think storing them as either `String` or `Symbol` will be reasonable choices. Choosing between `String` and `Symbol` is a subtle choice, `Symbol`s are basically interned strings so require less storage and have faster hashing/comparison, so are natural choices for the keys and could be good choices for the values. `String`s are easier to manipulate/iterate on so would be a nicer choice for users.

# A look ahead to redesigning around named ranges and `NamedDimArrays`

In the future, the `Index` type will likely become a named unit range over the values that the tensor index/mode runs over:
```julia
struct IndexName
  dir::Arrow
  id::UInt64
  plev::Int
  tags::Dict{Symbol,String}
end

struct Index{T,Range<:AbstractUnitRange{T}} <: AbstractNamedUnitRange{T,Range,IndexName}
  range::Range
  name::IndexName
end
```
Relatedly, the `ITensor` type will likely become a subtype of `AbstractNamedDimArray`, based on the code being developed here: https://github.com/ITensor/ITensors.jl/tree/v0.6.16/NDTensors/src/lib/NamedDimsArrays.

That's orthogonal to the proposal in this issue around redesigning tags, but is just a teaser for how the `Index` type will fit into the redesign of NDTensors.jl when we move away from the `Tensor`/`TensorStorage` types. It also gives some indication about how users who don't like the choices we made about the `Index`/`ITensor` types, for example the `Index` metadata based around random ID numbers, could define their own array types with named dimensions (i.e. design their own `ITensor` type that still shares a lot of the same code, like smart addition and contraction, but makes some other interface choices).

@emstoudenmire I'm curious to hear your thoughts on this.

@JoeyT1994 @ryanlevy this may be relevant for ITensorNumericalAnalysis.jl, since you might want to encode information about the function dimension/bits in the `Index`, and this would add more flexibility for doing that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ITensors] [ENHANCEMENT] `Index` tags redesign #1524

`Index` tags redesign proposal

A look ahead to redesigning around named ranges and `NamedDimArrays`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ITensors] [ENHANCEMENT] Index tags redesign #1524

Description

Index tags redesign proposal

A look ahead to redesigning around named ranges and NamedDimArrays

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[ITensors] [ENHANCEMENT] `Index` tags redesign #1524

`Index` tags redesign proposal

A look ahead to redesigning around named ranges and `NamedDimArrays`