APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 5

A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model

Chia-Yi Lin, National Taiwan University, Taiwan, Jun-Cheng Chen, Academia Sinica, Taiwan, Ja-Ling Wu, National Taiwan University, Taiwan
 
Suggested Citation
Chia-Yi Lin, Jun-Cheng Chen and Ja-Ling Wu (2024), "A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 5, e400. http://dx.doi.org/10.1561/116.20240015

Publication Date: 07 Oct 2024
© 2024 C.-Y. Lin, J.-C. Chen and J.-L. Wu
 
Subjects
Deep learning,  Object and Scene Recognition,  Segmentation and Grouping
 
Keywords
Semantic segmentationreal-time, vision-language pre-training, CLIPvision-language pre-training, CLIPCLIP
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 41 times

In this article:
Introduction 
Related Work 
Methodology 
Experiments 
Conclusion 
References 

Abstract

In this work, we propose a lightweight approach to enhance realtime semantic segmentation by leveraging the pre-trained visionlanguage models, specifically utilizing the text encoder of Contrastive Language-Image Pretraining (CLIP) to generate rich semantic embeddings for text labels. Then, our method distills this textual knowledge into the segmentation model, integrating the image and text embeddings to align visual and textual information. Additionally, we implement learnable prompt embeddings for better class-specific semantic comprehension. We propose a two-stage training strategy for efficient learning: the segmentation backbone initially learns from fixed text embeddings and subsequently optimizes prompt embeddings to streamline the learning process. The extensive evaluations and ablation studies validate our approach’s ability to effectively improve the semantic segmentation model’s performance over the compared methods.

DOI:10.1561/116.20240015

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2023
See the other articles that are part of this special issue.