Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

TDC algorithm is a classical policy evaluation algorithm which can guarantee convergence in the off-policy setting. This repo provides the experiments codes for the newly proposed varaince-reduced TDC algorithm in the paper, Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis. The algorithm is presented as follow,

In the folder optimizer, we implemented the policy evaluation algorithms used in this paper (including TD, TDC, VRTD, and VRTDC). In garnet.py, we implemented the Garnet problem enviroment, which is similar to the environment provided in OpenAI but supports more custumized setting and includes more environment information including the transition kernel and stationary distribution under a specified behavior policy.

Compare the convergence curve among four algorithms

The comparision among four algorithms is implemented in main.py for the Garnet problem and frozen_lake.py for the Frozen Lake enviroment. Run

python main.py, and
python frozen_lake.py

to repeat the experiments.

Compare the asymptotic error among VRTD and VRTDC

The comparision between the asymptotic errors of VRTD and VRTDC is implemented in test.py for the Garnet problem and frozen_lake_test.py for the Frozen Lake enviroment. Run

python test.py, and
python frozen_lake_test.py

to repeat the experiments. Or python testmul.py to use multiple cores of CPU for different trajectories.

Experiments Results

Our experiment results show that VRTDC can outperform TD, TDC, and variance-reduced TD algorithm in the Garnet problem. The left figure shows VRTDC requires less psudo-gradient computations for achieving the same precision. And the right figure shows VRTDC always has smaller asymptotic error compared to VRTD. For the Frozen Lake environment, VRTDC can also outperform TD, TDC, and variance-reduced TD algorithm. The left figure shows VRTDC requires less psudo-gradient computations for achieving the same precision. And the right figure shows VRTDC always has smaller asymptotic error compared to VRTD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Compare the convergence curve among four algorithms

Compare the asymptotic error among VRTD and VRTDC

Experiments Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figs		figs
optimizer		optimizer
README.md		README.md
frozen_lake.py		frozen_lake.py
frozen_lake_test.py		frozen_lake_test.py
garnet.py		garnet.py
main.py		main.py
plot.py		plot.py
test.py		test.py
testmul.py		testmul.py
utils.py		utils.py

mshaocong/VRTDC

Folders and files

Latest commit

History

Repository files navigation

Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Compare the convergence curve among four algorithms

Compare the asymptotic error among VRTD and VRTDC

Experiments Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages