APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 1

Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based Input Tiling

Weihao Zhuang, Kobe University, Japan, zhuangweihao@stu.kobe-u.ac.jp , Tristan Hascoet, Kobe University, Japan, Xunquan Chen, Kobe University, Japan, Ryoichi Takashima, Kobe University, Japan, Tetsuya Takiguchi, Kobe University, Japan, Yasuo Ariki, Kobe University, Japan
 
Suggested Citation
Weihao Zhuang, Tristan Hascoet, Xunquan Chen, Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki (2023), "Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based Input Tiling", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 1, e3. http://dx.doi.org/10.1561/116.00000015

Publication Date: 18 Jan 2023
© 2023 W. Zhuang, T. Hascoet, X. Chen, R. Takashima, T. Takiguchi and Y. Ariki
 
Subjects
 
Keywords
Convolutional neural networkmemory optimizationreceptive field
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 715 times

In this article:
Introduction 
The Proposed Method 
Computation vs. Memory Trade-off 
Results and Discussion 
Conclusion 
References 

Abstract

Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature-extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.

DOI:10.1561/116.00000015