This repository was archived by the owner on May 13, 2025. It is now read-only.

38 lines (33 loc) · 3.69 KB

Data Preprocessing

Our preprocessing pipeline is crytalized in VideoPreprocessor module, and it contains two different stages. We create scripts for both Linux and Windows OS, but below is an exposition when running on Linux.

Stage 1
Resampling video with specified FPS → Rescale → Central crop -> Extract video stream (script).
Stage 2:
Split video into n non-overlapping video segments (clips) -> Temporal sampling -> Save as Pytorch tensor (.pt format) (script).
Stage 3:
Copy remaining files to preprocessed directory (script).

Common configurations

Parameter	Default	Description
`device`	`"cpu"`	Device for preprocessing video with ffmpeg. Note that ffmpeg should be built with GPU acceleration.
`cpu_ratio`	`0.5`	Ratio b/t the utilization of cpu and gpu if device is both.
`save_root`	`"preprocessed"`	Output root of preprocessed videos.
`root`	`--`	Root of dataset, which is read by VideoFolderDataset class.
`loader`	`"v6"`	Video loader api (check this).
`batch_size`	`48`	Dataloader batch size.
`processes`	`os.cpu_count() // 2`	Number of processes for multiprocessing.
`fn_name`	`--`	Which preprocessing stage to run.

Stage 1 only

Parameter	Default	Description
`run_async`	`False`	Waiting time when running in an async manner.
`wait_time`	`20`	Run ffmpeg in an async manner.

Stage 2 only

Parameter	Default	Description
`del_prev_result`	`False`	Delete previous stage result.
`include_labeled`	``	Also split labeled video into segements that will be used for train/ val later on.

Stage 3 only

Parameter	Default	Description
`vid_ext`	`--`	List of video extension string for being ignored when moving remaining files.