Open-vocabulary methods of language analysis are newer within social science, but are common within computational linguistics and related disciplines . These methods offer a data-driven alternative to the researcher-dependent category definition typically used in linguistic studies. Unlike closed-vocabulary methods, open-vocabulary methods use statistical and probabilistic techniques to identify relevant language patterns or topics. An example of an open-vocabulary method is topic modeling, which uses unsupervised clustering algorithms (i.e., latent Dirichlet allocation or LDA; ) to find potentially meaningful clusters of words in large samples of natural language (for an introduction to topic models, see ).
In a recent example, Schwartz et al.  applied LDA to a large collection of social media messages and identified 2,000 clusters of words, or topics. For example, one topic included the words “love”, “sister”, “friend”, “world”, “beautiful”, “precious”, and “sisters”, and a second topic included “government” “freedom”, “rights”, “country”, “thomas”, “political”, and “democracy”. These topics are generated in a data-driven, “bottom-up” way, as opposed to the theory-driven, “top-down” methods used in closed-vocabulary approaches.
Open-vocabulary methods may reveal new, unexpected patterns of gender similarities and differences. However, a challenge with language topics derived through open-vocabulary methods is how to infer their psychological meaning. Consider the two topics above: the first contains generally positive, relationship-related words, while the second appear to be words related to political discussions. The first topic has some salient social and emotional references, but the psychological meaning of the political topic is less clear. While we may have intuitions about the characteristics of the people who use each topic, the psychological meaning of a topic is not obvious. To this end, psychological theory can provide a framework for understanding and interpreting automatically derived topics. (Source)