By Florin Rusu, University of California Merced, USA, frusu@ucmerced.edu
Multidimensional arrays are a fundamental abstraction to represent data across scientific domains ranging from astronomy to genetics, medicine, business intelligence, and engineering. Arrays come under multiple shapes — from dense rasters to sparse data cubes and tensors — and have been studied extensively across many computing domains. In this survey, we provide a comprehensive guide for past, present, and future research in array data management from a database perspective. Unlike previous surveys that are limited to raster processing in the context of scientific data, we consider all types of arrays — rasters, data cubes, and tensors. We identify and analyze the most important research ideas on arrays proposed over time. We cover all data management aspects, from array algebras and query languages to storage strategies, execution techniques, and operator implementations. Moreover, we discuss which research ideas are adopted in real systems and how are they integrated in complete data processing pipelines. Finally, we compare arrays with the relational data model. The result is a thorough survey on array data management that should be consulted by anyone interested in this research topic — independent of experience level.
Multidimensional arrays are one of the fundamental computing abstractions to represent data across virtually all areas of science and engineering, and beyond. Due to their ubiquity, multidimensional arrays have been studied extensively across many areas of computer science.
This survey provides a comprehensive guide for past, present, and future research in array data management from a database perspective. Unlike previous surveys that are limited to raster processing in the context of scientific data, this survey considers all types of arrays: rasters, data cubes, and tensors. The author’s goal is to identify and analyze the most important research ideas on arrays and to serve two objectives: first, to summarize the most relevant work on multidimensional array data management by identifying the major research problems; and second, to organize this material to provide an accurate perspective on the state-of-the-art and future directions in array processing.
Multidimensional Array Data Management covers all data management aspects, from array algebras and query languages to storage strategies, execution techniques, and operator implementations. Moreover, the author discusses which research ideas are adopted in real systems and how are they integrated in complete data processing pipelines. Finally, the author compares arrays with the relational data model. The result is a thorough survey on array data management that is an excellent resource for anyone interested in this topic, independent of experience level.