GitHub

TTRL Algorithm (repeat for each question in the test set):

Preprocess the data provided into a format that can be used by the TTRL algorithm

[ { "question": "integrate(1/(x2 - x + 1), x)", "variants": [ { "variant": "integrate(1/(x2 - x + 1), x)", "reasoning": "The integral can be transformed using the substitution u = x - 1/2, which simplifies the denominator.", "difficulty": "easier" } ] }, { "question": "integrate(1/(x2 - x + 1), x)", "variants": [ { "variant": "integrate(1/(x2 - x + 1), x)", "reasoning": "The integral can be transformed using the substitution u = x - 1/2, which simplifies the denominator.", "difficulty": "easier" } ] } ]

Get a test question and its variants from the above file
Put the variants into a parquet file to be ready for the RL -- Here tinyzero RL starts --
For each question, run GRPO using CustomTinyZero and the numerical_integration reward function a. Pass a reward signal to the RL algorithm (in reward function) b. Update the policy
Once done, use model to do test on actual question (with pass@1, pass@5 and pass@10 evaluation)
Roll back to the original model

Difference between TTRL and RL training method

The training method we were using before is to generate variants and train the model on them to do integration, with the hope that lowered learning curve improves the performance of the model. TTRL finds variants for each question and then the model is trained on them specifically for that question.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
raw_dataset_27		raw_dataset_27
test_trees		test_trees
.gitignore		.gitignore
README.md		README.md
generate_variants.py		generate_variants.py
math_utils.py		math_utils.py
ttrl.py		ttrl.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTRL Algorithm (repeat for each question in the test set):

Difference between TTRL and RL training method

About

Releases

Packages

Languages

Tufalabs/TTRL

Folders and files

Latest commit

History

Repository files navigation

TTRL Algorithm (repeat for each question in the test set):

Difference between TTRL and RL training method

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages