報告題目:Optimal Subsampling Algorithm for Big Data Regression
報 告 人:艾明要 教授 北京大學
報告時間:2019年11月21日上午10:00-11:00
報告地點:數學樓第二報告廳
報告摘要:
To fast approximate the MLE with massive data, this paper studies the optimal subsampling method under the A-optimality criterion for generalized linear models (GLM). The consistency and asymptotic normality of the estimator from a general subsampling algorithm are established, and optimal subsampling probabilities under the A- and L-optimality criteria are derived. Furthermore, using Frobenius norm matrix concentration inequality, finite sample properties of the subsample estimator based on optimal subsampling probabilities are also derived. Since the optimal subsampling probabilities depend on the full data estimate, an adaptive two-step algorithm is developed. Asymptotic normality and optimality of the estimator from this adaptive algorithm are established. The proposed methods are illustrated and evaluated through numerical experiments on simulated and real datasets.
報告人簡介:
艾明要,北京大學數學科學學院統計學教研室主任、教授、博士生導師。兼任中國數學會概率統計學會秘書長,中國現場統計研究會常務理事,試驗設計分會理事長,高維數據統計分會副理事長等。國際重要統計期刊《Statistica Sinica》、《Journal of Statistical Planning and Inference》、《Statistics and Probability Letters》、《STAT》副主編,國内核心期刊 《系統科學與數學》編委,科學出版社《統計與數據科學系列叢書》編委。主要從事試驗設計與分析、計算機試驗、大數據分析和應用統計的教學和研究工作,在Ann Statist、JASA、Biometrika、Technometrics、Statist Sinica等國内外頂尖期刊發表學術論文六十餘篇,主持完成多項國家自然科學基金面上項目和重點項目子課題,參與完成國家科技部973課題2項。