PhD Defense: "Understanding the Semantics of Networked Text"

Gengxin Miao

June 1st (Friday), 3:00pm
Engineering Science Building (ESB), Rm 2001

With the recent advances of information technology, massive amounts of textual data are being generated and stored and it hence requires corresponding innovations in big data analytics to process these data. This semantic-rich textual data are often interconnected in complex networks. Examples include information communicated in social networks, tasks routed in expert networks, and messages sent through email networks. Analyzing the problem resolution history in an expert network helps to navigate future problems more efficiently, and mining the associated textual datasets can lead to better solutions. However, the data are not only massive in scale but also interconnected in complex networks, which present two challenges. First, new models are needed to consolidate the semantic information in the textual datasets and the structural information in the underlying network. Second, the models need to be optimized globally to utilize the collective intelligence within the network.

This Ph.D. dissertation defense provides an overview of our efforts to address these challenges. A generative model developed to capture information flows in expert networks is discussed. This model is demonstrated to be capable of navigating new tasks efficiently in the expert network. The talk then discusses a model to detect the latent association in document pairs, e.g., problem and solution pairs, system-error and root-cause pairs, and symptom and treatment pairs. This model captures the association between the source and target documents at the document level and can be used to retrieve the target document when the source document is given. Thus, it is useful for root-cause mining recommendations. Finally, this talk touches on other ongoing work, i.e., collaborative network modeling and mining group patterns in gene expression data.

About Gengxin Miao:

photo of gengxin miao Gengxin Miao is a Ph.D. candidate in the Department of Electrical and Computer Engineering at the University of California, Santa Barbara, working under the supervision of Professor Louise Moser and Professor Xifeng Yan. Prior to joining UCSB, Gengxin obtained her B.E. and M.S. degree in Automation from Tsinghua University. Her research interests lie in the fields of data mining, machine learning, and data extraction and integration, with an emphasis on modeling and mining large-scale heterogeneous information towards better understanding, and more. Gengxin's Ph.D. dissertation focuses on developing models to characterize the flow of information and methods to facilitate the consumption of information, and on exploring the applications of these models and methods across multiple domains. Gengxin has published papers in a number of conferences, including VLDB, KDD, WWW, PerCom, etc. and holds three U.S. patents. Gengxin received the 2011-2012 IBM Ph.D. Fellowship and the UCSB Fellowship Award. With a fellow Ph.D. student at UCSB, she was awarded first prize in the 2007 IEEE Services Computing Contest for their work on a distributed e-healthcare system.

Hosted by: Professor Louise E. Moser