TensorBFS
diff --git a/‎.github/workflows/CI.yml
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/CI.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md
Lines changed: 2 additions & 2 deletions b/‎README.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/Manifest.toml
Lines changed: 0 additions & 100 deletions b/‎docs/Manifest.toml
Lines changed: 0 additions & 100 deletions
diff --git a/‎docs/Project.toml
Lines changed: 2 additions & 0 deletions b/‎docs/Project.toml
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/make.jl
Lines changed: 22 additions & 2 deletions b/‎docs/make.jl
Lines changed: 22 additions & 2 deletions
diff --git a/‎docs/serve.jl
Lines changed: 35 additions & 0 deletions b/‎docs/serve.jl
Lines changed: 35 additions & 0 deletions
diff --git a/‎docs/src/index.md
Lines changed: 1 addition & 8 deletions b/‎docs/src/index.md
Lines changed: 1 addition & 8 deletions
diff --git a/‎docs/src/performance.md
Lines changed: 93 additions & 0 deletions b/‎docs/src/performance.md
Lines changed: 93 additions & 0 deletions
diff --git a/‎docs/src/ref.md
Lines changed: 19 additions & 0 deletions b/‎docs/src/ref.md
Lines changed: 19 additions & 0 deletions
@@ -19,7 +19,7 @@ jobs:
       matrix:
         version:
           - '1'
-          - 'nightly'
+          # - 'nightly'
         os:
           - ubuntu-latest
         arch:
 
@@ -3,5 +3,6 @@ Manifest.toml
 *.jl.cov
 *.jl.mem
 /docs/build/
+/docs/src/generated/
 .vscode/
 Session.vim
@@ -30,8 +30,8 @@ pkg> add TensorInference
 To update, just type `up` in the package mode.
 
 ## Examples
-Examples are in the [example](example) folder, which contains the following list of example problems
-- [asia network](example/asia)
+Examples are in the [examples](examples) folder, which contains the following list of example problems
+- [asia network](examples/asia)
 
 
 ## Supporting and Citing
 
@@ -1,3 +1,5 @@
 [deps]
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
+LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
 TensorInference = "c2297e78-99bd-40ad-871d-f50e56b81012"
@@ -1,10 +1,24 @@
 using TensorInference
-using Documenter
+using TensorInference: OMEinsum
+using TensorInference.OMEinsum: OMEinsumContractionOrders
+using Documenter, Literate
+
+# Literate
+const EXAMPLE_DIR = pkgdir(TensorInference, "examples")
+const LITERATE_GENERATED_DIR = pkgdir(TensorInference, "docs", "src", "generated")
+mkpath(LITERATE_GENERATED_DIR)
+for each in readdir(EXAMPLE_DIR)
+    workdir = joinpath(LITERATE_GENERATED_DIR, each)
+    cp(joinpath(EXAMPLE_DIR, each), workdir; force=true)
+    input_file = joinpath(workdir, "main.jl")
+    @info "building" input_file
+    Literate.markdown(input_file, workdir; execute=true)
+end
 
 DocMeta.setdocmeta!(TensorInference, :DocTestSetup, :(using TensorInference); recursive=true)
 
 makedocs(;
-    modules=[TensorInference],
+    modules=[TensorInference, OMEinsumContractionOrders],
     authors="Jin-Guo Liu, Martin Roa Villescas",
     repo="https://github.com/TensorBFS/TensorInference.jl/blob/{commit}{path}#{line}",
     sitename="TensorInference.jl",
@@ -16,7 +30,13 @@ makedocs(;
     ),
     pages=[
         "Home" => "index.md",
+        "Examples" => [
+            "Asia network" => "generated/asia/main.md",
+           ],
+        "Performance Tips" => "performance.md",
+        "References" => "ref.md",
     ],
+    doctest = false,
 )
 
 deploydocs(;
 
@@ -0,0 +1,35 @@
+function serve(;host::String="0.0.0.0", port::Int=8000)
+    # setup environment
+    docs_dir = @__DIR__
+    julia_cmd = "using Pkg; Pkg.instantiate()"
+    run(`$(Base.julia_exename()) --project=$docs_dir -e $julia_cmd`)
+
+    serve_cmd = """
+    using LiveServer;
+    LiveServer.servedocs(;
+        doc_env=false,
+        skip_dirs=[
+            joinpath("docs", "src", "generated"),
+            joinpath("docs", "build"),
+        ],
+        skip_files=[
+            joinpath("docs", "Manifest.toml"),
+        ],
+        literate="examples",
+        host=\"$host\",
+        port=$port,
+    )
+    """
+    try
+        run(`$(Base.julia_exename()) --project=$docs_dir -e $serve_cmd`)
+    catch e
+        if e isa InterruptException
+            return
+        else
+            rethrow(e)
+        end
+    end
+    return
+end
+
+serve()
@@ -4,11 +4,4 @@ CurrentModule = TensorInference
 
 # TensorInference
 
-Documentation for [TensorInference](https://github.com/TensorBFS/TensorInference.jl).
-
-```@index
-```
-
-```@autodocs
-Modules = [TensorInference]
-```
+Documentation for [TensorInference](https://github.com/TensorBFS/TensorInference.jl).
@@ -0,0 +1,93 @@
+# Performance Tips
+## Optimize contraction orders
+
+Let us use the independent set problem on 3-regular graphs as an example.
+```julia
+julia> using TensorInference, Artifacts, Pkg
+
+julia> Pkg.ensure_artifact_installed("uai2014", pkgdir(TensorInference, "test", "Artifacts.toml"));
+
+julia> function get_instance_filepaths(problem_name::AbstractString, task::AbstractString)
+        model_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai")
+        evidence_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai.evid")
+        solution_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai." * task)
+        return model_filepath, evidence_filepath, solution_filepath
+    end
+
+julia> model_filepath, evidence_filepath, solution_filepath = get_instance_filepaths("Promedus_14", "MAR")
+
+julia> instance = read_instance(model_filepath; evidence_filepath, solution_filepath)
+```
+
+Next, we select the tensor network contraction order optimizer.
+```julia
+julia> optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.1:100)
+```
+
+Here, we choose the local search based [`TreeSA`](@ref) algorithm, which often finds the smallest time/space complexity and supports slicing.
+One can type `?TreeSA` in a Julia REPL for more information about how to configure the hyper-parameters of the [`TreeSA`](@ref) method, 
+while the detailed algorithm explanation is in [arXiv: 2108.05665](https://arxiv.org/abs/2108.05665).
+Alternative tensor network contraction order optimizers include
+* [`GreedyMethod`](@ref) (default, fastest in searching speed but worst in contraction complexity)
+* [`KaHyParBipartite`](@ref)
+* [`SABipartite`](@ref)
+
+```julia
+julia> tn = TensorNetworkModel(instance; optimizer)
+```
+The returned object `tn` contains a field `code` that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type
+```julia
+julia> contraction_complexity(problem)
+```
+
+The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements.
+
+```julia
+julia> p1 = probability(tn)
+```
+
+## Slicing technique
+
+For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e.
+loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom.
+In the [`TreeSA`](@ref) optimizer, one can set `nslices` to a value larger than zero to turn on this feature.
+
+```julia
+julia> tn = TensorNetworkModel(instance; optimizer=TreeSA());
+
+julia> contraction_complexity(tn)
+(20.856518235241687, 16.0, 18.88208476145812)
+```
+
+As a comparision we slice over 5 degrees of freedom, which can reduce the space complexity by at most 5.
+In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1,
+i.e. the peak memory usage is reduced by a factor ``32``, while the (theoretical) computing time is increased by at a factor ``< 2``.
+```
+julia> tn = TensorNetworkModel(instance; optimizer=TreeSA(nslices=5));
+
+julia> timespacereadwrite_complexity(problem)
+(21.134967710592804, 11.0, 19.84529401927876)
+```
+
+## GEMM for Tropical numbers
+No extra effort is required to enjoy the BLAS level speed provided by [`TropicalGEMM`](https://github.com/TensorBFS/TropicalGEMM.jl).
+The benchmark in the `TropicalGEMM` repo shows this performance is close to the theoretical optimal value.
+Its implementation on GPU is under development in Github repo [`CuTropicalGEMM.jl`](https://github.com/ArrogantGao/CuTropicalGEMM.jl) as a part of [Open Source Promotion Plan summer program](https://summer-ospp.ac.cn/).
+
+## Working with GPUs
+To upload the computation to GPU, you just add `using CUDA` before calling the `solve` function, and set the keyword argument `usecuda` to `true`.
+```julia
+julia> using CUDA
+[ Info: OMEinsum loaded the CUDA module successfully
+
+julia> marginals(tn; usecuda = true)
+```
+
+Functions support `usecuda` keyword argument includes
+* [`probability`](@ref)
+* [`log_probability`](@ref)
+* [`marginals`](@ref)
+* [`most_probable_config`](@ref)
+
+## Benchmarks
+Please check our [paper (link to be added)]().
@@ -0,0 +1,19 @@
+# References
+
+## TensorInference
+```@autodocs
+Modules = [TensorInference]
+Order   = [:function, :type]
+Private = false
+```
+
+## Tensor Network
+```@docs
+contraction_complexity
+GreedyMethod
+TreeSA
+SABipartite
+KaHyParBipartite
+MergeVectors
+MergeGreedy
+```