|
| 1 | +# Performance Tips |
| 2 | +## Optimize contraction orders |
| 3 | + |
| 4 | +Let us use the independent set problem on 3-regular graphs as an example. |
| 5 | +```julia |
| 6 | +julia> using TensorInference, Artifacts, Pkg |
| 7 | + |
| 8 | +julia> Pkg.ensure_artifact_installed("uai2014", pkgdir(TensorInference, "test", "Artifacts.toml")); |
| 9 | + |
| 10 | +julia> function get_instance_filepaths(problem_name::AbstractString, task::AbstractString) |
| 11 | + model_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai") |
| 12 | + evidence_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai.evid") |
| 13 | + solution_filepath = joinpath(artifact"uai2014", task, problem_name * ".uai." * task) |
| 14 | + return model_filepath, evidence_filepath, solution_filepath |
| 15 | + end |
| 16 | + |
| 17 | +julia> model_filepath, evidence_filepath, solution_filepath = get_instance_filepaths("Promedus_14", "MAR") |
| 18 | + |
| 19 | +julia> instance = read_instance(model_filepath; evidence_filepath, solution_filepath) |
| 20 | +``` |
| 21 | + |
| 22 | +Next, we select the tensor network contraction order optimizer. |
| 23 | +```julia |
| 24 | +julia> optimizer = TreeSA(ntrials = 1, niters = 5, βs = 0.1:0.1:100) |
| 25 | +``` |
| 26 | + |
| 27 | +Here, we choose the local search based [`TreeSA`](@ref) algorithm, which often finds the smallest time/space complexity and supports slicing. |
| 28 | +One can type `?TreeSA` in a Julia REPL for more information about how to configure the hyper-parameters of the [`TreeSA`](@ref) method, |
| 29 | +while the detailed algorithm explanation is in [arXiv: 2108.05665](https://arxiv.org/abs/2108.05665). |
| 30 | +Alternative tensor network contraction order optimizers include |
| 31 | +* [`GreedyMethod`](@ref) (default, fastest in searching speed but worst in contraction complexity) |
| 32 | +* [`KaHyParBipartite`](@ref) |
| 33 | +* [`SABipartite`](@ref) |
| 34 | + |
| 35 | +```julia |
| 36 | +julia> tn = TensorNetworkModel(instance; optimizer) |
| 37 | +``` |
| 38 | +The returned object `tn` contains a field `code` that specifies the tensor network with optimized contraction order. To check the contraction complexity, please type |
| 39 | +```julia |
| 40 | +julia> contraction_complexity(problem) |
| 41 | +``` |
| 42 | + |
| 43 | +The returned object contains log2 values of the number of multiplications, the number elements in the largest tensor during contraction and the number of read-write operations to tensor elements. |
| 44 | + |
| 45 | +```julia |
| 46 | +julia> p1 = probability(tn) |
| 47 | +``` |
| 48 | + |
| 49 | +## Slicing technique |
| 50 | + |
| 51 | +For large scale applications, it is also possible to slice over certain degrees of freedom to reduce the space complexity, i.e. |
| 52 | +loop and accumulate over certain degrees of freedom so that one can have a smaller tensor network inside the loop due to the removal of these degrees of freedom. |
| 53 | +In the [`TreeSA`](@ref) optimizer, one can set `nslices` to a value larger than zero to turn on this feature. |
| 54 | + |
| 55 | +```julia |
| 56 | +julia> tn = TensorNetworkModel(instance; optimizer=TreeSA()); |
| 57 | + |
| 58 | +julia> contraction_complexity(tn) |
| 59 | +(20.856518235241687, 16.0, 18.88208476145812) |
| 60 | +``` |
| 61 | + |
| 62 | +As a comparision we slice over 5 degrees of freedom, which can reduce the space complexity by at most 5. |
| 63 | +In this application, the slicing achieves the largest possible space complexity reduction 5, while the time and read-write complexity are only increased by less than 1, |
| 64 | +i.e. the peak memory usage is reduced by a factor ``32``, while the (theoretical) computing time is increased by at a factor ``< 2``. |
| 65 | +``` |
| 66 | +julia> tn = TensorNetworkModel(instance; optimizer=TreeSA(nslices=5)); |
| 67 | +
|
| 68 | +julia> timespacereadwrite_complexity(problem) |
| 69 | +(21.134967710592804, 11.0, 19.84529401927876) |
| 70 | +``` |
| 71 | + |
| 72 | +## GEMM for Tropical numbers |
| 73 | +No extra effort is required to enjoy the BLAS level speed provided by [`TropicalGEMM`](https://github.com/TensorBFS/TropicalGEMM.jl). |
| 74 | +The benchmark in the `TropicalGEMM` repo shows this performance is close to the theoretical optimal value. |
| 75 | +Its implementation on GPU is under development in Github repo [`CuTropicalGEMM.jl`](https://github.com/ArrogantGao/CuTropicalGEMM.jl) as a part of [Open Source Promotion Plan summer program](https://summer-ospp.ac.cn/). |
| 76 | + |
| 77 | +## Working with GPUs |
| 78 | +To upload the computation to GPU, you just add `using CUDA` before calling the `solve` function, and set the keyword argument `usecuda` to `true`. |
| 79 | +```julia |
| 80 | +julia> using CUDA |
| 81 | +[ Info: OMEinsum loaded the CUDA module successfully |
| 82 | + |
| 83 | +julia> marginals(tn; usecuda = true) |
| 84 | +``` |
| 85 | +
|
| 86 | +Functions support `usecuda` keyword argument includes |
| 87 | +* [`probability`](@ref) |
| 88 | +* [`log_probability`](@ref) |
| 89 | +* [`marginals`](@ref) |
| 90 | +* [`most_probable_config`](@ref) |
| 91 | +
|
| 92 | +## Benchmarks |
| 93 | +Please check our [paper (link to be added)](). |
0 commit comments