H. Andrew Schwartz H. Andrew Schwartz is director of the Human Language Analysis Beings (HLAB) housed in the Computer Science Department at Stony Brook University (SUNY). His interdisciplinary research focuses on human-centered natural language processing for the health and social sciences. Andrew is also a PI/co-founder for the World Well-Being Project, a multi-disciplinary consortium between the University of Pennsylvania, Stony Brook University, and Stanford University focused on developing large-scale language analyses that reveal and predict differences in health, personality, and well-being. Andrew is an active member of the fields of AI-natural language processing, psychology, and health informatics. He is the creator and one of the maintainers of the Differential Language Analysis ToolKit (DLATK), used in over 100 studies and by a variety of tech companies. He received his Ph.D. in Computer Science from the University of Central Florida in 2011 with research on acquiring common sense knowledge from the Web. He is a 2020 Recipient of a DARPA Young Faculty Award. His research often attracts public interest with articles in, e.g., The New York Times, USA Today, and The Washington Post.
PhD Students
Matthew Matero Natural language processing with particular care for areas involving time-series analysis, social media, mental well-being, and computational social science. Additionally, I have a broader interest in applied machine learning as a whole for tasks related to vision and computer networking.
Huy Vu The intersection of computer science and psychology: I love applying data science and natural language processing methods to analyze human thoughts, their characteristics and behaviors. Isn’t it cool to transform a person’s thought, something very abstract, into a concrete numeric vector space and then manipulate, analyze these vectors. Finally, map them back to the human’s thoughts space to predict their next behavior, or to predict their mental health state? My project now involves analyzing social media posts to understand the general public’s major beliefs on specific pre-defined topics.
Nikita Soni
Linh Pham My research interests include natural language processing, data science, and their applications in discovering behavioral and psychological factors, time-series analysis, and human-centered NLP.
Sid Mangalik Sid Mangalik is a Ph.D. student in natural language processing at Stony Brook University working with Andy Schwartz. His previous studies focused on the intersection of artificial intelligence and psychology, where he worked with Ritwik Banerjee on language use in scientific writing and among abusers. His current research is working on identifying connections between community-level language and psychological outcomes. He is the lead data scientist at Monitaur, a start-up focused on making machine learning models fairer, safer, and more transparent. In his free time Sid writes music and released his first album in 2020.
Salvatore Giorgi Salvatore Giorgi is a data scientist working under Dr. Brenda Curtis at the National Institute on Drug Abuse (NIDA). He is also a second-year Ph.D. student in Computer Science at the University of Pennsylvania working under Dr. H. Andrew Schwartz and Dr. Lyle Ungar. His research interests include machine learning applications to substance use and recovery, as well as relationships between individuals and their communities as expressed through language on social media.
Vasudha Varadarajan I am a second-year Ph.D. student working in the areas of natural language processing and the relationships between cognition and language, especially in the context of social media. I am particularly interested in understanding the cognitive style of users through their language usage. I am also interested in interpretability and explainability of language models, multilingual models for NLP and generalizable language modeling.
Oscar Kjell Developing ways to measure, describe and differentiate psychological constructs using Natural Language Processing and Machine Learning. He is particularly interested in measuring psychological well-being including harmony in life and satisfaction with life. Recently, Oscar and HLAB members have developed an r-package called Text for analyzing text using NLP and deep learning: https://r-text.org/
MS Students
Adithya V. Ganesan Adi is a computer science master’s student working with Prof. Schwartz in NLP problems with a human focus. His recent and ongoing projects inform the computational social science community of effective ways to use state-of-the-art NLP tools for human-focused NLP problems. His contributions to open-source have helped this community to conduct faster research on data at scale without being limited by inexperience in programming. During his bachelor’s, his research in time-series focused on non-stationary time-series in volatile systems. He also served as the first data scientist at Motorq (a connected car data platform), where he identified and solved some of the primal problems for the company. From Fall 2021, he will be a Ph.D. student at Stony Brook advised by Dr. Schwartz. During the school days, you can find him in his natural habitat for most of the week: Room 242 (Data Science and Analysis lab), New CS Building.
Farhan Ahmed
Anthony Xiang
Jihu Mun
Research Staff
Weixi Wang
PhD and Postdoc Alumni
MZ Zamani (PhD, 2020) Research Scientist at Ebay Computational social science and natural language processing; Social media language analyses for psychological health and well-being; Integrating language and extra-linguistic data; social networks and graph mining
Youngseo Son (PhD, 2020) Natural Language Processing (NLP) for social media analysis. I especially focus on discourse relation parsing to extract key information for targeted tasks such as opinions and reasons for a political stance or sentiment, and finding the correlations of discourse styles with human variables such as personality. I collaborate with psychologists and computational linguists for Human-centered language modeling to obtain higher accuracies of various NLP tasks from traditional tasks (e.g., sentiment analysis) to novel tasks such as discourse style analysis for psychological assessment and well-being measurement.
Veronica Lynn (PhD, 2019) Now at Facebook Research, Seattle. My interest is primarily in natural language processing, with some overlap into data mining and artificial intelligence. In addition to computer science, I am interested in the humanities and social sciences, particularly psychology, linguistics, and classics, and am drawn to projects that are interdisciplinary in nature. My long-term goal is to pursue a career doing NLP research, hopefully both in industry and academia.
Vivek Kulkarni (PhD, 2017) Now a Postdoc at Stanford University. My research interests are at the intersection of natural language processing and computational social science. In particular, I am focused on making NLP models human-centric, socially aware, and cognizant of linguistic variation. I am also interested in applying NLP methods to uncover social biases through the lens of natural language.
Nicolas M. Legewie (Postdoc) Nicolas M. Legewie received his MA and PhD in social science from Humboldt University of Berlin. Currently, he is a postdoctoral visiting fellow at the Sociology Department, University of Pennsylvania. His research focuses on the role of social environments, such as personal and neighborhood networks, on educational and occupational attainment, and on upward mobility. He also writes and teaches about migration, the life course, research ethics, and research methodology such as video data analysis, mixed methods, and digital social science research. In a current project with H. Andrew Schwartz (Stony Brook) and Salvatore Giorgi (UPenn), he uses quantitative text analysis of large-scale geo-coded Twitter data, in combination with county-level census data, to study the impact of heterogeneity in cultural models of education and occupation in counties on individuals’ college enrollment and completion.
MS Alumni
Sumit Agarwal
Kanishta Agarwal Research Project: Mood Forecasting: Using ARIMA to Predict Future Affect
Pooja Aravinder
Nipun Bayas Research Project: Optimism in Social Media
Austin Borger
Mallikarjuna Budida Research Project: BERT Feature Extraction on TPU
Swatilekha Chaudhury Research Project: NCDS Longitudinal Essays
Pooja Dalaya Research Project: Age and Income Weights for Sample Bias Correction
Pulkit Dongle Research Project: DLATK: DeMySQLfying
Neelaabh Gupta Research Project: Predicting Physical Activity using DLA
Keshav Gupta Research Project: Sample Bias Correction
Deepak Gupta Research Project: LexHub Development
Akash Idnani Research Project: Latent Traits Exploration Extended Multi-Domain Single Factor
Emil Joswin Research Project: Permutation Language Modeling and Data Collator for XLNet
Kiranmayi Kasarapu (MS, 2018; Now at Amazon) Research Project: Social Mobility Prediction
Adarsh Kashyap Research Project: Health Care Utilization
Parth Limbachiya Research Project: Implementation of CoxPh Model
Rowan Menezes Research Project: Brands Across Years
Sourav Mishra
Anvesh Myla Research Project: Quantitative Evaluation of Interpretability of Latent Factors
Mihir Parulekar Research Project: Privacy Analysis of BERT Embeddings
Sania Parveen Research Project: BERT Feature Extraction
Adarsh Prabhakara
Aman Raj (MS, 2017; Now at Google) Research Project: Large-scale Social Media Assessment.
Aravind Reddy Research Project: Adolescent Depression - Longitudinal Representations
Damayanti Sengupta Research Project: Social Media and Mental Health
Deven Shah (MS, 2018; Now at Yahoo Research) Research Project: Human Bias in Predictive Models
Swetambari Verma (MS, 2018; Now at Facebook) Research Project: Assessing Income and Education from Social Media Language