报告题目:Big Microbiome Data Analysis
报 告 人:Prof. Xiaohua Tony Hu, College of Computing and Informatics, Drexel University
报告时间:5月15日 上午10:00
报告地点:闵行校区生物药学楼2-116
联 系 人:魏冬青 34204573, dqwei@sjtu.edu.cn
徐 沁 34204573, xuqin523@sjtu.edu.cn
摘要:We know little about the microbial world. Microbiome sequencing (i.e. metagenome, 16s rRNA) extracts DNA directly from a microbial environment without culturing any species. Recently, huge amount of data are generated from many micorbiome projects such as Human Microbiome Project (HMP), Metagenomics of the Human Intestinal Tract (MetaHIT), et al. Analyzing these data will help us to better understand the function and structure of microbial community of human body, earth and other environmental eco-systems. However, the huge data volume, the complexity of microbial community and the intricate data properties have created a lot of opportunities and challenges for data analysis and mining. For example, it is estimate that in the microbial eco-system of human gut, there are about 1000 kinds of bacteria with 10 billion bacteria and more than 4 million genes in more than 6000 orthologous gene family. The challenges are due to the complex properties of microbiome: large-scale, complicated, diversity, correlation, composition, hierarchy, incompleteness etc. Current microbiomes data analysis methods seldom consider these data properties and often make some assumptions such as linear, Euclidean space, metric-space, continue data type, which conflict with the true data properties. For example, some similarities are non-metric because the prevalent existence of some species; and the interactions among species and environment are complex in high order. Thus it is urgent to develop novel computational methods to overcome these assumptions and consider the microbiome data properties in the analysis procedure. In this talk, we will discuss some computational methods to analyze and visualize microbiome big data. Our studies are focusing on 1) novel machine learning and computational technologies for dimension reduction and visualization of microbiome data based on non-Euclidean spaces (manifold learning) to discover nonlinear intrinsic features and patterns in these data to overcome the linear assumptions, 2) novel statistical methods for variable selection in microbiome data by integrating group information among variables.