By Thomas Heinis, Imperial College London, UK, t.heinis@imperial.ac.uk | Anastasia Ailamaki, Ecole Polytechnique Federale de Lausanne, Switzerland, ailamaki@epfl.ch
While we are witnessing rapid growth in data across the sciences and in many applications, this growth is particularly remarkable in the medical domain, be it because of higher resolution instruments and diagnostic tools (e.g. MRI), new sources of structured data like activity trackers, the wide-spread use of electronic health records and many others. The sheer volume of the data is not, however, the only challenge to be faced when using medical data for research. Other crucial challenges include data heterogeneity, data quality, data privacy and so on. In this article, we review solutions addressing these challenges by discussing the current state of the art in the areas of data integration, data cleaning, data privacy, scalable data access and processing in the context of medical data. The techniques and tools we present will give practitioners — computer scientists and medical researchers alike — a starting point to understand the challenges and solutions and ultimately to analyse medical data and gain better and quicker insights. T. Heinis
Data Infrastructure for Medical Research is an ideal primer for medical and computer science professionals designing and understanding data infrastructure for medical research. It focuses primarily on the data management and computer aspects of the problem. Nevertheless, the monograph is intended to appeal to both audiences: technical/computer scientists in the process of designing infrastructure and systems for the analysis of medical data as well as medical professionals who want to understand the challenges involved.
The monograph looks at the most common and challenging topics in the context of developing infrastructure storing, processing and analyzing medical data. It surveys state-of-the-art methods, current research efforts of the most important topics around the treatment of medical data as well as the most relevant up and coming research efforts in the context of medical data.