-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference testing using random data #199
Conversation
@rom1504 could you take a look? The next step would be to determine a good approach to storing and recalling test data across tags or commits. |
Can you explain why you want to generate test data multiple times ? |
50mb is a single sample for each model config. It is total across all configs. |
|
||
@pytest.mark.skip('storing and recalling of test data not ready') | ||
@pytest.mark.parametrize('model_name, pretrained, precision, jit', util_test.model_config_permutations) | ||
def test_inference_with_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How slow is this ?
Let's try to keep test running in less than 5min
Either remove redundant tests or use the (matrix) parallel feature of GH actions
(And also possibly the parallel feature of pytest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
======================================================= test session starts =======================================================
platform linux -- Python 3.9.2, pytest-7.0.1, pluggy-1.0.0 -- /REDACTED/open_clip/.venv/bin/python
cachedir: .pytest_cache
rootdir: /REDACTED/open_clip
plugins: xdist-2.5.0, forked-1.4.0
collected 62 items
tests/test_inference.py::test_inference_with_data[RN50-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN50-yfcc15m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50-cc12m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50-quickgelu-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50-quickgelu-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN50-quickgelu-yfcc15m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50-quickgelu-cc12m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN101-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN101-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN101-yfcc15m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN101-quickgelu-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN101-quickgelu-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN101-quickgelu-yfcc15m-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50x4-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50x4-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN50x16-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50x16-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[RN50x64-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[RN50x64-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion400m_e31-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion400m_e31-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion400m_e32-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion400m_e32-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion2b_e16-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion2b_e16-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion2b_s34b_b79k-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-laion2b_s34b_b79k-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-laion400m_e31-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-laion400m_e31-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-laion400m_e32-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-32-quickgelu-laion400m_e32-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-laion400m_e31-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-laion400m_e31-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-laion400m_e32-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-laion400m_e32-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-plus-240-laion400m_e31-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-plus-240-laion400m_e31-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-plus-240-laion400m_e32-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-B-16-plus-240-laion400m_e32-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion400m_e31-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion400m_e31-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion400m_e32-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion400m_e32-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion2b_s32b_b82k-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-laion2b_s32b_b82k-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-336-openai-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-L-14-336-openai-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-H-14-laion2b_s32b_b79k-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-H-14-laion2b_s32b_b79k-fp32-True] PASSED
tests/test_inference.py::test_inference_with_data[ViT-g-14-laion2b_s12b_b42k-fp32-False] PASSED
tests/test_inference.py::test_inference_with_data[ViT-g-14-laion2b_s12b_b42k-fp32-True] PASSED
tests/test_simple.py::test_inference[False] PASSED
tests/test_simple.py::test_inference[True] PASSED
======================================================== warnings summary =========================================================
tests/test_inference.py::test_inference_with_data[RN50-openai-fp32-True]
/REDACTED/open_clip/tests/test_inference.py:38: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:600.)
image_features = model.encode_image(prepped)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================ 62 passed, 1 warning in 374.16s (0:06:14) ============================================
0:06:14 on i7-4790K
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch size 1, single sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing configs in list_models(), now all unit tests (training, infer, hf, ...) take about 8:30-12:00 minutes. With setup over head 10-14 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems now 3min, what changed ?
The output should also be stored only once. Is it really useful to test every single model config ? |
The outputs are different for each model, so we need to store it. Regarding the configurations, correct me if I'm wrong, but some pretrained models are treated differently within the same model architecture, ex. openai. |
Regarding input data: Since the data is random, having a larger pool of samples increases robustness by reducing the probability of encountering only false positives. Overall I tried to design this to be an integration test. I'll go ahead and split input and output data, so we only need one set for inputs per input size accross all models and inflection points. |
Could you remove the skip of the test so we could see if it works and how long it takes to run ? |
I can, but the GH action will fail due to missing test data.
|
Quote from OP.
For these steps I still need feedback from you. |
It's ok, it just needs to pass before merging |
I don't think this is necessary. Test data should be introduced manually for new models. I think it would make sense to make it so test data is < 5MB (so that means much less for each model) and then simply store in the the repo |
We could also consider storing test data in releases data if we truly need to make it big but if not needed it's best to keep it small and in the repo |
Test data is already minimal per model. |
I'll refactor tomorrow to generate one input image per size, one input string for all models. |
sounds good |
Changed test data creation to only create one image per size, and one string for all models.
EDIT: nevermind, misread the warning as the error. Maybe data should be created by the CI as well. I'm not certain this will alleviate the problem though. |
Do you have a suggestion for an acceptable maximum precision for |
@rom1504 test pass now (it failed because of either a timeout or memory constraint, but all tests up to vit-g passed, which was interrupted during download) |
I'm reconsidering running tests with pretrained weights. Instead, running tests against randomly initialized non-openai models (as listed by |
Config of models is unrelated from openai/non-openai.
The model is fully specified by the config.
I agree testing random weights sounds good. Using a fixed seed should work
well.
…On Fri, Nov 11, 2022, 01:48 lopho ***@***.***> wrote:
I'm reconsidering running tests with pretrained weights.
No matter what, it will either take too long or, if done in parallel, hit
the memory restraints.
Both due to weight downloads.
Instead, running tests against randomly initialized non-openai models (as
listed by list_models())
Then a separate test for openai models.
I'm not sure yet how to do this while retaining blackbox testing mentality.
E.g. not having internal knowledge of how openai models differ from
non-openai in terms of config.
—
Reply to this email directly, view it on GitHub
<#199 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437SXHSMPLIC45R6X5SLWHWJWBANCNFSM6AAAAAARXQ43HY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Testing all models as listed by # generate test data for all models in tests/data
# will only overwrite old data with --overwrite
python tests/util_test.py --all
# for a specific model
python tests/util_test.py --model RN50 Total runtime for all tests (inference, training, hf, ...) is ~10-14min including setup. Ready to merge? |
I think it would be cool to use gh action infra for parallel testing to run N (probably =8) tests in parallel (it uses N actual containers so no ram issue) |
I'm thinking something like that:
then in the inference test, you use that number (0 to 7) to pick only 1/8 of the models if the environment variable exists, otherwise run all How does that sound? |
another (simpler) option is to simply put the names of the models to test in the gh action matrix and use that |
overall nice work! it's almost ready now |
Using
It's an i7-4790k and does not support AVX-512-BF16 or AVX-512-FP16. I would like to keep it without autocast, to make the test usable regardless of instruction set supported by the cpu and to keep implicit changes to behaviour to a minimum (ex. output is bfloat16 instead of float32 with autocast) |
Did you disable jit ? It tends to make things slow |
yes, jit is disabled for all tests |
ok I guess let's not use autocast then any thoughts regarding gh action parallism ? |
looks like tests took only 3m ?! did anything change ? |
im on parallelizing it right now I've got it working with vit-g when running in parallel. EDIT: yes, i mean ViT-G |
it's vit-G which was failing |
I think we could merge now really, let's maybe do that and do another PR for parallel ? |
This is a parallel run on my staging branch, if you want to take a look. |
Ok, parallel can wait. Let me squash the commits first before merging please. |
It'll all be squashed during GitHub merge |
ah, sorry, too late, it's already done. |
It's ok, I guess 2 commits makes sense here
…On Sat, Nov 12, 2022, 18:44 lopho ***@***.***> wrote:
ah, sorry, too late, it's already done.
—
Reply to this email directly, view it on GitHub
<#199 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437WAE26QIHXSZA3RDCLWH7JPFANCNFSM6AAAAAARXQ43HY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Merged Parallel with GH action would still be appreciated And next big step would be same but with one step of training (see test training simple for a place to start), so we make sure training keeps doing the same thing If you want to continue working on this, it would be pretty helpful There is active development in this repo, and keeping things working will be important |
I have started to work on integration tests for with random data.
This test runs on all pretrained models at fp32 with JIT True/False where applicable.
Related issue: #198
list_models()
To generate test data:
populates the
tests/data
folder with one torch pt per model config, to be used by the test.