By Hamza Al Maharmeh, Wayne Center for Integrated Circuits and Systems (WINCAS), Wayne State University, USA, hamza.m@wayne.edu | Mohammed Ismail, Wayne Center for Integrated Circuits and Systems (WINCAS), Wayne State University, USA, ismail@wayne.edu | Mohammad Alhawari, Wayne Center for Integrated Circuits and Systems (WINCAS), Wayne State University, USA, alhawari@wayne.edu
The increasing demand for high performance and energy efficiency in Artificial Neural Networks (ANNs) and Deep Learning (DL) accelerators has driven a wide range of application-specific integrated circuits (ASICs). In recent years, this field has started to deviate from the conventional digital implementation of machine learning-based (ML) accelerators; instead, researchers have started to investigate implementation in the analog domain. This is due to two main reasons: (a) better performance, and (b) lower power consumption. Analog processing has become more efficient than its digital counterparts, especially for Deep Neural Networks (DNNs), partly because emerging analog memory technologies have enabled local storage and processing known as compute-in-memory (CIM), thereby reducing the amount of data movement between the memory and the processor. However, there are a lot of challenges in the analog domain approach, such as the lack of a capable commercially available non-volatile analog memory, and the analog domain is susceptible to variation and noise. Additionally, analog cores involve digital-to-analog converters (DACs) and analog-to-digital converters (ADCs), which consume up to 64% of total power consumption. An emerging trend has been to employ time-domain (TD) circuits to implement the multiply-accumulate (MAC) operation. TD cores require time-to-digital converters (TDCs) and digital-to-time converters (DTCs).
However, DTC and TDC can be more energy and area efficient than DAC and ADC. TD accelerators leverage both digital and analog features, thereby enabling energy-efficient computing and scaling with complementary metal–oxide–semiconductor (CMOS) technology. The performance of TD accelerators can be substantially improved if custom-designed analog delay cells, DTC, and TDC are used. This work reviews state-of-the-art TD accelerators and discusses system considerations and hardware implementations. Additionally, the work analyzes the energy and area efficiency of the TD architectures, including spatially unrolled (SU) and recursive (REC) architectures, for varying input resolutions and network sizes to provide insight for designers into how to choose the appropriate TD approach for a particular application. Furthermore, it discusses our implemented scalable SU-TD accelerator synthesized in 65 nm CMOS technology with an efficient DTC circuit that utilizes a laddered inverter (LI) circuit that consumes 3× less power than the inverter-based DTC and achieves 116 TOPS/W. Finally, we discuss the limitations of time-domain computation and future work.
The increasing demand for high performance and energy efficiency in Artificial Neural Networks (ANNs) and Deep Learning (DL) accelerators has driven a wide range of application specific integrated circuits (ASICs). In recent years, this field has started to deviate from the conventional digital implementation of machine learning-based (ML) accelerators; instead, researchers have started to investigate implementation in the analog domain. This is due to two main reasons: better performance and lower power consumption. Analog processing has become more efficient than its digital counterparts, especially for Deep Neural Networks (DNNs), partly because emerging analog memory technologies have enabled local storage and processing known as compute in-memory (CIM), thereby reducing the amount of data movement between the memory and the processor.
However, there are many challenges in the analog domain approach, such as the lack of a capable commercially available nonvolatile analog memory, and the analog domain is susceptible to variation and noise. Additionally, analog cores involve digital-to-analog converters (DACs) and analog-to-digital converters (ADCs), which consume up to 64% of total power consumption. An emerging trend has been to employ time-domain (TD) circuits to implement the multiply-accumulate (MAC) operation. TD cores require time-to-digital converters (TDCs) and digital-to-time converters (DTCs). However, DTC and TDC can be more energy and area efficient than DAC and ADC. TD accelerators leverage both digital and analog features, thereby enabling energy-efficient computing and scaling with complementary metal–oxide–semiconductor (CMOS) technology. The performance of TD accelerators can be substantially improved if custom-designed analog delay cells, DTC, and TDC are used.
This monograph reviews state-of-the-art TD accelerators and discusses system considerations and hardware implementations. Additionally, the work analyzes the energy and area efficiency of the TD architectures, including spatially unrolled (SU) and recursive (REC) architectures, for varying input resolutions and network sizes to provide insight for designers into how to choose the appropriate TD approach for a particular application. The monograph also discusses an implemented scalable SU-TD accelerator synthesized in 65nm CMOS technology, and concludes with the limitations of time-domain computation and future work.