劉軍,美國哈佛大學統計系終身教授,兼任哈佛生物統計系教授,斯坦福大學統計系終身教授,北京大學數學學院長江講座教授,清華大學數學系訪問教授。美國數理統計學會和美國統計學會會士、國際計算生物學會會士、美國國家科學院院士。創建清華大學統計學研究中心并任名譽主任至2024年,以籌建發展委員會主任身份在清華大學創建統計與數據科學系。主要從事于貝葉斯統計理論、蒙特卡洛方法、統計機器學習、狀態空間模型和時間序列、生物信息學、計算生物學等方向的研究。曾獲考普斯會長獎、晨興應用數學金獎、泛華統計協會許寶騄獎。在各類國際頂尖學術雜志(Science,Nature,Cell,JASA,JMLR等)發表論文300余篇和專著,被引用9萬余次。指導40多位博士生、30多位博士后。
報告摘要:Mirror statistic (or knockoff statistic) is a key component for most p-value-free feature selection methods. However, it is unclear how to choose the best statistic when additional prior information or covariate information is available. In this paper, we first describe a large class of possible choices of mirror statistics and derive an optimal form of mirror statistic inspired by the two-stage formula proposed in Li and Fithian (2021). Theoretically, we demonstrate the power advantage of this optimal form by considering the Rare/Weak signal model. With prior information, evenly splitting the data into two halves is no longer the most efficient way. Building upon the optimal form of the mirror statistic, we investigate how the splitting ratio affects the power of a feature selection procedure and introduce the Adaptive-Data-Splitting (ADS) approach. Both simulations and real data examples show that ADS performs significantly better than the original equal-splitting.