Performance dip when using ForwardDiff compared to Turing

I've noticed that there's a performance dip when using ForwardDiff with a model defined in TuringGLM, compared to defining the model directly in Turing. I've set up a MWE to show this.

First I set up 4 models, two in TuringGLM (with and without custom priors), and two in Turing, with the default and custom priors given to the TuringGLM models.

```
using Turing, TuringGLM, TuringBenchmarking, BenchmarkTools
using ReverseDiff: ReverseDiff
using CSV, DataFrames, LinearAlgebra

hibbs_df = CSV.read(
    download("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat"),
    DataFrame
);

# TuringGLM model
f = @formula(vote ~ growth)
m_glm = turing_model(f, hibbs_df)

# TuringGLM model with custom priors
priors = CustomPrior(Normal(0, 10), Normal(52, 14), nothing)
m_glm_custom = turing_model(f, hibbs_df; priors=priors)

# extract data for Turing models
y = TuringGLM.data_response(f, hibbs_df)
X = TuringGLM.data_fixed_effects(f, hibbs_df)

# model with default priors
@model function regression_default(X, y; residual=std(y))
    α ~ 50.755 + TDist(3.0)*6.071256084780443
    β ~ filldist(TDist(3.0), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing = regression_default(X, y; residual=std(y))

# model with custom priors
@model function regression_custom(X, y; residual=std(y))
    α ~ Normal(52, 14)
    β ~ filldist(Normal(0, 10), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing_custom = regression_custom(X, y; residual=std(y))
```

Then using [TuringBenchmarking.jl](https://github.com/TuringLang/TuringBenchmarking.jl), I benchmark each of the four models with both Forward and Reverse diff backends:

The results of the benchmark are shown in the table below. You can see that for Reversediff the benchmarks are the same, but with ForwardDiff TuringGLM is ~20-30% slower than Turing (I've included the full results below).

| Model | ForwardDiff, linked (time, μs)| ReverseDiff, linked (time, μs) | ForwardDiff, not linked (time, μs)| ReverseDiff, not linked (time, μs) |
| :---: | :---: | :---: | :---: | :---: |
| TuringGLM (default prior) | 3.967 | 2.772 | 3.976 | 1.990 |
| Turing (default prior) | 3.046 | 2.676 | 3.059 | 1.931 |
| TuringGLM (custom prior) | 4.013 | 2.102 | 3.905 | 1.868 |
| Turing (custom prior| 2.776 | 1.986 | 2.827 | 1.829 |

<details><summary>Click here for in detail output</summary>
<p>

TuringGLM model 1 (default priors)
```
suite_glm = TuringBenchmarking.make_turing_suite(
    m_glm,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_glm)
```
Output:
```
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.882 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.772 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.967 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.836 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.990 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.976 μs)
```

Turing model 1 (default priors)
```
suite_turing = TuringBenchmarking.make_turing_suite(
    m_turing,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing)
```
Output:
```
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.256 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.676 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.046 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.207 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.931 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.059 μs)
```

TuringGLM model 2 (custom priors)
```
suite_glm_custom = TuringBenchmarking.make_turing_suite(
    m_glm_custom,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_glm_custom)
```
Output:
```
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.724 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.102 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.013 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(2.737 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.868 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.905 μs)
```

Turing model 2 (custom priors)
```
suite_turing_custom = TuringBenchmarking.make_turing_suite(
    m_turing_custom,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing_custom)
```
Output:
```
2-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.176 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.986 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.776 μs)
  "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
          tags: []
          "evaluation" => Trial(1.160 μs)
          "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.829 μs)
          "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.827 μs)
```
</p>
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance dip when using ForwardDiff compared to Turing #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	ForwardDiff, linked (time, μs)	ReverseDiff, linked (time, μs)	ForwardDiff, not linked (time, μs)	ReverseDiff, not linked (time, μs)
TuringGLM (default prior)	3.967	2.772	3.976	1.990
Turing (default prior)	3.046	2.676	3.059	1.931
TuringGLM (custom prior)	4.013	2.102	3.905	1.868
Turing (custom prior	2.776	1.986	2.827	1.829

Performance dip when using ForwardDiff compared to Turing #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions