Fodil Belghait, Alain April
The extensive adoption of high-throughput genomics, microarray, and deep sequencing technologies has accelerated the possibility of more complex precision medicine research using very large amounts of heterogeneous data [1]. The availability of this data allows data scientists and clinicians to develop tailored individual strategies. Therapeutic and preventive treatments can be proposed, with greater accuracy, targeting subgroups of patients for specific illnesses using large amounts of genomic, clinical, lifestyle, and environment data [2]. Next generation sequencing (NGS) technology is key in supporting precision medicine research; however, the data’s volume and complexity poses challenges for its clinical application [3]. While Big Data’s analytics could uncover hidden patterns, new correlations, and other insights through the examination of large-scale data sets, it is still difficult to master [4]. In this paper, we present what is required of future large-scale precision medicine platforms in terms of data extensibility and the scalability of processing on demand. It presents a proposed platform architecture as well as open-source Big Data technologies that would allow to easily enrich a flexible data schema, provide the power needed to load large amounts of data and make this centralized database available for specific precision medicine research.
Precision medicine; Genotyping; Clinical database; Cloud computing; Big Data; Bioinformatics.