now publishers - A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model

APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 5

A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model

Chia-Yi Lin, National Taiwan University, Taiwan, Jun-Cheng Chen, Academia Sinica, Taiwan, Ja-Ling Wu, National Taiwan University, Taiwan

Suggested Citation

Chia-Yi Lin, Jun-Cheng Chen and Ja-Ling Wu (2024), "A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 5, e400. http://dx.doi.org/10.1561/116.20240015

Publication Date: 07 Oct 2024

Subjects

Deep learning, Object and Scene Recognition, Segmentation and Grouping

Keywords

Semantic segmentation, real-time, vision-language pre-training, CLIP, vision-language pre-training, CLIP, CLIP

Journal details

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 390 times

In this article:

Abstract

In this work, we propose a lightweight approach to enhance realtime semantic segmentation by leveraging the pre-trained visionlanguage models, specifically utilizing the text encoder of Contrastive Language-Image Pretraining (CLIP) to generate rich semantic embeddings for text labels. Then, our method distills this textual knowledge into the segmentation model, integrating the image and text embeddings to align visual and textual information. Additionally, we implement learnable prompt embeddings for better class-specific semantic comprehension. We propose a two-stage training strategy for efficient learning: the segmentation backbone initially learns from fixed text embeddings and subsequently optimizes prompt embeddings to streamline the learning process. The extensive evaluations and ablation studies validate our approach’s ability to effectively improve the semantic segmentation model’s performance over the compared methods.

DOI:10.1561/116.20240015

Related publications

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2023
See the other articles that are part of this special issue.

Introduction
Related Work
Methodology
Experiments
Conclusion
References

A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model

Share

Journal details

Abstract

Related publications