Skip to main content
Human Language Processing Packages:
- DLATK (Command-line interface or Python Library) -- Differential Language Analysis TollKit: an open-vocabulary language analysis, computational psychology infrastructure.
- text (R package) -- A modular and end-to-end solution for analyzing text with embreddings (Transformers, deep learning or latent semantic analysis), including visualizing and producing predictive models.
- Happier Fun Tokenizer (Python Library) -- A social media tokenizer in Python; An improved version of Christopher Pott's Happy Fun Tokenizing.
Software and Demos Associated with Papers:
- How many transformer dimensions are required for your task? -- A set of dimensionality reduction models for transformer-based embeddings and a web tool that tells you how many you need fo your task. From Ganesan et al., ACL 2021: Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality.
- User Factor Adaptation -- Code to run user factor adaptation, a method that enables a model to understand language different based on the background of the author. From Lynn et al., EMNLP 2017: Human centered NLP with user-factor adaptation.
- Temporal Orientation (Past, Present, Future) Classifier -- Code and instructions for running a classifier to label the temporal oriental (past, present, or future) of a given social media post. From Schwartz et al., NAACL 2015: Extracting Human Temporal Orientation in Facebook Language.
- Age and Gender Predictive Lexica -- Weighted lexica for predicting the age and gender of individuals; see our EMNLP 2014 paper, Developing age and gender predictive lexica over Differential Language Analysis.
- Robust Poststratification -- Technique to mitigate selection bias of social media data. From Giorgi et al., ICWSM 2022.
GitHubs