Downloads & Demos | hlab.cs.stonybrook.edu

DLATK (Command-line interface or Python Library) -- Differential Language Analysis TollKit: an open-vocabulary language analysis, computational psychology infrastructure.
text (R package) -- A modular and end-to-end solution for analyzing text with embreddings (Transformers, deep learning or latent semantic analysis), including visualizing and producing predictive models.
Happier Fun Tokenizer (Python Library) -- A social media tokenizer in Python; An improved version of Christopher Pott's Happy Fun Tokenizing.
Language-based Mental Health Assessment (GitHub) -- Temporal measurements of well-being as described in Mangalik et al., 2024. Contains the generated 2020 mental health scores generated in Robust language-based mental health assessments in time and space through social media. These scores control for 2019 findings.

How many transformer dimensions are required for your task? -- A set of dimensionality reduction models for transformer-based embeddings and a web tool that tells you how many you need fo your task. From Ganesan et al., ACL 2021: Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality.
User Factor Adaptation -- Code to run user factor adaptation, a method that enables a model to understand language different based on the background of the author. From Lynn et al., EMNLP 2017: Human centered NLP with user-factor adaptation.
Temporal Orientation (Past, Present, Future) Classifier -- Code and instructions for running a classifier to label the temporal oriental (past, present, or future) of a given social media post. From Schwartz et al., NAACL 2015: Extracting Human Temporal Orientation in Facebook Language.
Age and Gender Predictive Lexica -- Weighted lexica for predicting the age and gender of individuals; see our EMNLP 2014 paper, Developing age and gender predictive lexica over Differential Language Analysis.
Robust Poststratification -- Technique to mitigate selection bias of social media data. From Giorgi et al., ICWSM 2022.