APSIPA Transactions on Signal and Information Processing > Vol 13 > Issue 1

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Jintang Xue, University of Southern California, USA, jintangx@usc.edu , Yun-Cheng Wang, University of Southern California, USA, Chengwei Wei, University of Southern California, USA, C.-C. Jay Kuo, University of Southern California, USA
 
Suggested Citation
Jintang Xue, Yun-Cheng Wang, Chengwei Wei and C.-C. Jay Kuo (2024), "Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection", APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 1, e37. http://dx.doi.org/10.1561/116.20240046

Publication Date: 16 Dec 2024
© 2024 J. Xue, Y.-C. Wang, C. Wei and C.-C. J. Kuo
 
Subjects
Data compression,  Feature detection and selection,  Dimensionality reduction
 
Keywords
Word embeddingfeature selectiondimension reductionnatural language processing
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 20 times

In this article:
Introduction 
Related Work 
Proposed Method 
Experiments 
Conclusion and Future Work 
References 

Abstract

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space’s dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weaklysupervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs.

DOI:10.1561/116.20240046