Skip to content
View caojiaolong's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report caojiaolong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
caojiaolong/README.md

👋 Hi, I'm Jiao-Long Cao

🎓 I'm currently a second-year Ph.D. student at the College of Computer Science, Nankai University, advised by Prof. Qibin Hou and Prof. Ming-Ming Cheng in the Tianjin Key Laboratory of Visual Computing and Intelligent Perception (VCIP).
Before that, I obtained my B.S. in Mathematics from the Shiing-Shen Chern Class at Nankai University.

🔬 My current research interests focus on multimodal vision foundation models, exploring how vision and other modalities can be unified through large-scale representation learning. In the future, I plan to explore broader directions in computer vision and machine learning.


🧠 Research Interests

  • Multimodal Models
  • Visual Representation Learning
  • Large-Scale Pretraining
  • Computer Vision

🛠️ Tech Stack

  • Deep Learning: PyTorch, Transformers🤗
  • Languages: Python, Matlab

🌱 Current Goal

Building general and scalable multimodal vision foundation models that can serve as strong backbones for diverse downstream tasks.


📫 Contact


“Stay curious, stay critical, and keep building.”
— Inspired by the spirit of open research

Pinned Loading

  1. huggingface/transformers huggingface/transformers Public

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Python 152k 31.1k

  2. huggingface/pytorch-image-models huggingface/pytorch-image-models Public

    The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

    Python 35.7k 5.1k

  3. VCIP-RGBD/DFormer VCIP-RGBD/DFormer Public

    [CVPR 2025] DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation & [ICLR 2024] DFormer & [NeuriPS 2025] OmniSegmentor

    Python 394 45

  4. Awesome-Mamba Awesome-Mamba Public

    Collect papers about Mamba (a selective state space model).

    14

  5. RGBDBenchmark RGBDBenchmark Public

    This repository contains various RGBD models and aims to provide a benchmark for evaluating their FLOPs, MACs, and the number of parameters. We will continue to add more functionalities in the future

    Jupyter Notebook 3

  6. Audio2Chart Audio2Chart Public

    This repository is used to restore my final homework in AI class, I tried to implement a neural network for converting any audio into a playable 4 Keys Malody chart (Malody is a rhythm action game).

    Python 2 1