My research interest includes Efficient Computer Vision and Inference/Training Scaling Laws. My google scholar: google scholar citations 170+

⏩ Research Highlight

Efficient Dataset Condensation: 🎉🎉: [Elucidated Dataset Condensation] and [G-VBSM].
Knowledge Distillation: 🎉🎉: [CCD] and [TST].
Diffusion-Related Work: 🎉🎉: [IV-mixed Sampler], [Catch-up Distillation] and [DiffuseExpand].

📝 Selected Publications

Dataset Condensation

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching [CVPR 2024 Highlight] [Code]

Shitong Shao, Zeyuan Yin, Muxin Zhou, Xindong Zhang, Zhiqiang Shen

TL;DR: We suggest that sufficient and various “local-match-global” matching are more precise and effective than a single one and has the ability to create a distilled dataset with richer information and better generalization.

Self-supervised Dataset Distillation: A Good Compression Is All You Need [Arxiv] [Code]

Muxin Zhou, Zeyuan Yin, Shitong Shao, Zhiqiang Shen

TL;DR: We consider addressing statistical-based matching through the new lens of model informativeness in compression on the original dataset pretraining.

Elucidating the Design Space of Dataset Condensation [NeurIPS 2024] [Code]

Shitong Shao, Zikai Zhou, Huanran Chen, and Zhiqiang Shen

TL;DR: We propose a comprehensive design framework that includes specific, effective strategies. These strategies establish a benchmark for both small and large-scale dataset condensation.

Knowledge Condensation

What Role Does Data Augmentation Play in Knowledge Distillation? [ACCV 2022 oral] [Code]

Wei Li, Shitong Shao, Weiyan Liu, Ziming Qiu, Zhihao Zhu, Wei Huan (The first-author is my supervisior)

TL;DR: The worth of data augmentation has always been overlooked by researchers in knowledge distillation, and no work analyzes its role in particular detail. To fix this gap, we analyze the effect of data augmentation on knowledge distillation from a multi-sided perspective.

Bootstrap Generalization Ability from Loss Landscape Perspective [ECCV 2022 workshop] [Code]

Huanran Chen, Shitong Shao, Ziyi Wang, Zirui Shang, Jin Chen, Xiaofeng Ji and Xinxiao Wu

TL;DR: We bootstrap the generalization ability of the deep learning model from the loss landscape perspective in four aspects, including backbone, regularization, training paradigm, and learning rate.

Multi-perspective analysis on data augmentation in knowledge distillation [Neurocomputing] [Code]

Wei Li, Shitong Shao, Ziming Qiu, Aiguo Song (The first-author is my supervisior)

TL;DR: Expansion of ACCV oral.

Hybrid knowledge distillation from intermediate layers for efficient Single Image Super-Resolution [Neurocomputing]

Jiao Xie, Linrui Gong, Shitong Shao, Shaohui Lin, Linkai Luo

TL;DR: We propose a novel efficient SISR method via hybrid knowledge distillation from intermediate layers, termed HKDSR, which leverages the knowledge from frequency information into that RGB information.

Teaching What You Should Teach: A Data-Based Distillation Method [IJCAI2023 oral] [Code]

Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao Wu

TL;DR: We introduce the “Teaching what you Should Teach” strategy into a knowledge distillation framework, and propose a data-based distillation method named “TST” that searches for desirable augmented samples to assist in distilling more efficiently and rationally.

Precise Knowledge Transfer via Flow Matching [Arxiv]

Shitong Shao, Zhiqiang Shen, Linrui Gong, Huanran Chen and Xu Dai

TL;DR: We propose a novel knowledge transfer framework that introduces continuous normalizing flows for progressive knowledge transformation and leverages multi-step sampling strategies to achieve precision knowledge transfer.

Rethinking Centered Kernel Alignment in Knowledge Distillation [IJCAI 2024 oral]

Zikai Zhou, Yunhang Shen, Shitong Shao, Huanran Chen, Linrui Gong, Shaohui Lin

TL;DR: This paper first provides a theoretical perspective to illustrate the effectiveness of CKA, which decouples CKA to the upper bound of Maximum Mean Discrepancy~(MMD) and a constant term.

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search [ECCV]

TL;DR: This paper searches through evolutionary algorithms thereby determining the best matching function for knowledge distillation..

Diffusion Model

IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis [Arxiv] [Code]

Shitong Shao, Zikai Zhou, Lichen Bai, Haoyi Xiong, Zeke Xie

TL;DR: We propose IV-mixed Sampler, which leverages the strengths of image diffusion models (IDMs) to assist VDMs surpass their current capabilities.

Expanding dataset for 2D medical image segmentation using diffusion models [IJCAI 2023 workshop] [Code].

Shitong Shao, Xiaohan Yuan, Zhen Huang, Ziming Qiu, Shuai Wang, Kevin Zhou

TL;DR: We propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks.

Catch-up distillation: You only need to train once for accelerating sampling [Arxiv] [Code]

Shitong Shao, Xu Dai, Shouyi Yin, Lujun Li, Huanran Chen, Yang Hu

TL;DR: We propose the Catch-Up Distillation (CUD), which encourages the current moment output of the velocity estimation model ``catch up’’ with its previous moment output. Specifically, CUD adjusts the original Ordinary Differential Equation training objective to align the current moment output with both the ground truth label and the previous moment output, utilizing Runge-Kutta-based multi-step alignment distillation for precise ODE estimation while preventing asynchronous updates.

Your Diffusion Model is Secretly a Certifiably Robust Classifier [NeurIPS 2024] [Code]

Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, Jun Zhu

TL;DR: We generalize the diffusion classifiers to classify Gaussian-corrupted data by deriving the evidence lower bounds (ELBOs) for these distributions, approximating the likelihood using the ELBO, and calculating classification probabilities via Bayes’ theorem.

Alignment of Diffusion Models: Fundamentals, Challenges, and Future [Arxiv] [Code]

Buhua Liu, Shitong Shao, Bao Li, Lichen Bai, Haoyi Xiong, James Kwok, Sumi Helal, Zeke Xie

TL;DR: A survey of preference alignment in diffusion models.

Masked Generative Transformer

Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer

Shitong Shao, Zikai Zhou, Tian Ye, Lichen Bai, Zhiqiang Xu, Zeke Xie

TL;DR: Elucidate the design choices for inference of masked generative transformer.

📖 Educations

2024.09 - present, Hong Kong University of Science and Technology (Guangzhou), PhD Student.
2021.09 - 2024.06, Southeastern University, Master Student.
2017.09 - 2021.06, East China Normal University, Undergraduate Student.

💻 Internships

I have interned at Biren, PLCT, Oneflow, Shang Hai AI Lab and OPPO, including a year-long internship at Shang Hai AI Lab.

🔥 Links

Links to my friends and advisors.

Advisors: Zeke Xie, Zhiqiang Shen and Xu Dai.
Friends: Huanran Chen, Tian Ye and Zikai Zhou.