Conversation
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
|
[🤖]: Hi @karpnv 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
…. Updaetd logging system Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: Jorjeous <Jorjeous@users.noreply.github.com>
|
@karpnv please check again, now all LGTM |
| import operator | ||
| import os | ||
| import pickle | ||
| import tarfile |
Check notice
Code scanning / CodeQL
Unused import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
To fix an unused import in Python, you remove the import statement for any module that is not referenced anywhere in the file. This reduces clutter and avoids implying dependencies that are not actually required.
For this specific case, the best fix is to delete the line import tarfile at line 27 in tools/speech_data_explorer/data_explorer.py. No replacement import or additional code is needed, since the module is reportedly not used. This change preserves existing functionality because it only removes a symbol that is never referenced. The rest of the import block (e.g., pickle, defaultdict, expanduser, Path, etc.) remains untouched.
You only need to adjust that single line within tools/speech_data_explorer/data_explorer.py; no new methods, definitions, or other imports are required.
| @@ -24,7 +24,6 @@ | ||
| import operator | ||
| import os | ||
| import pickle | ||
| import tarfile | ||
| from collections import defaultdict | ||
| from os.path import expanduser | ||
| from pathlib import Path |
| for line in lines[1:]: # Skip header line | ||
| parts = line.split() | ||
| if len(parts) >= 4: | ||
| file_type = parts[0] |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix an unused local variable you either (1) remove the variable binding if the value is not needed and the right-hand side has no side effects, or (2) rename it to an “unused” name (like _ or unused_file_type) if it is intentionally present but not used. Here, file_type = parts[0] is a simple index into an existing list, with no side effects, so we can safely delete the assignment without affecting functionality.
The best minimal fix is to remove the file_type = parts[0] line inside parse_dali_index and leave the rest of the loop unchanged. This preserves all current behavior while eliminating the unused variable and the CodeQL warning. No new methods, imports, or definitions are required; only this one line in tools/speech_data_explorer/data_explorer.py needs to be changed.
| @@ -300,7 +300,6 @@ | ||
| for line in lines[1:]: # Skip header line | ||
| parts = line.split() | ||
| if len(parts) >= 4: | ||
| file_type = parts[0] | ||
| offset = int(parts[1]) | ||
| size = int(parts[2]) | ||
| filename = parts[3] |
|
[🤖]: Hi @karpnv 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
|
testing |
|
[🤖]: Hi @karpnv 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Example usage for non bucketed dataset |
example usage for bucketed dataset |
S3 support
Collection: ASR
Changelog
--s3cfgExample: ~/.s3cfg[default]. Set to "" to disable S3 support. Default is "".--tar-base-path(e.g., s3://ASR/tarred/audio_0.tar or s3://ASR/tarred/audio__OP_0..2047_CL_.tar).When specified, audio_filepath values in the manifest are treated as filenames within this tar archive.
Usage
python tools/speech_data_explorer/data_explorer.py s3://abc/sharded_manifests/manifest_0.json --tar-base-path s3://abc/tarred/audio_0.tar --s3cfg ~/.s3cfg[default]python tools/speech_data_explorer/data_explorer.py s3://abc/sharded_manifests/bucket_OP_1..8_CL_/manifest__OP_0..2047_CL_.jsonl --tar-base-path s3://abc/tarred/bucket_OP_1..8_CL_/audio__OP_0..2047_CL_.tar --s3cfg ~/.s3cfg[default]GitHub Actions CI
PR Type: