Ph.D. Oral Defense: "Large-scale Multiple Hypothesis Testing with Complex Data Structure"

Xiaoyu Dai, Washington University in Saint Louis

Abstract: In the last decade, motivated by a variety of applications in medicine, bioinformatics, genomics, brain imaging, etc., a growing amount of statistical research has been devoted to large-scale multiple testing, where thousands or even greater numbers of tests are conducted simultaneously. However, due to the complexity of real data, the assumptions of many existing multiple testing procedures, such as that tests are independent or that null distributions of p-values are continuous, may not hold. This poses limitations in their performances such as low detection power and inflated false discovery rate (FDR). In this dissertation, we study how to better proceed multiple testing problems under complex data structures. Firstly, we consider multiple testing with discrete test statistics. Secondly, we consider the discrete multiple testing with prior ordering information incorporated. Thirdly, we study the multiple testing under complex dependency. We propose novel procedures under each scenario, based on the marginal critical functions (MCFs) of randomized tests, the conditional random field (CRF) or the deep neural network (DNN). The theoretical properties of our procedures are carefully studied, and their performances are evaluated through various simulations and real applications with the analysis of genetic data from next-generation sequencing (NGS) experiments.

Host: Nan Lin