APSIPA Transactions on Signal and Information Processing > Vol 9 > Issue 1

Toward human-centric deep video understanding

Industrial Technology Advances

Wenjun Zeng, Microsoft Research Asia, China, wezeng@microsoft.com
 
Suggested Citation
Wenjun Zeng (2020), "Toward human-centric deep video understanding", APSIPA Transactions on Signal and Information Processing: Vol. 9: No. 1, e1. http://dx.doi.org/10.1017/atsip.2019.26

Publication Date: 13 Jan 2020
© 2020 Wenjun Zeng
 
Subjects
 
Keywords
Human-centricVideo understandingDeep learning
 

Share

Open Access

This is published under the terms of the Creative Commons Attribution licence.

Downloaded: 2259 times

In this article:
I. INTRODUCTION 
II. HUMAN-CENTRIC: WHY and HOW? 
III. HUMAN-CENTRIC VISION TASKS 
IV. FUTURE PERSPECTIVES 

Abstract

People are the very heart of our daily work and life. As we strive to leverage artificial intelligence to empower every person on the planet to achieve more, we need to understand people far better than we can today. Human–computer interaction plays a significant role in human-machine hybrid intelligence, and human understanding becomes a critical step in addressing the tremendous challenges of video understanding. In this paper, we share our views on why and how to use a human centric approach to address the challenging video understanding problems. We discuss human-centric vision tasks and their status, highlighting the challenges and how our understanding of human brain functions can be leveraged to effectively address some of the challenges. We show that semantic models, view-invariant models, and spatial-temporal visual attention mechanisms are important building blocks. We also discuss the future perspectives of video understanding.

DOI:10.1017/atsip.2019.26