Using the Google N-Gram Corpus to measure culture complexity

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Article published in the «Literary and Linguistic Computing» journal — 2013 — 28:4 — pp. 668-675. DOI:10.1093/llc/fqt017
Empirical studies of broad-ranging aspects of culture, such as ‘cultural complexities’ are often extremely difficult. Following the model of Michel et al. (Michel, J.-B., Shen, Y. K., Aiden, A. P. et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176–82), and using a set of techniques originally developed to measure the complexity of language, we propose a text-based analysis of a large corpus of topic-uncontrolled text to determine how cultural complexity varies over time within a single culture. Using the Google Books American 2Gram corpus, we are able to show that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases. We discuss several implication of this novel analysis technique as well as its implications for discussion of the meaning of ‘culture.’

Author(s): Juola P.

Language: English
Commentary: 1934038
Tags: Информатика и вычислительная техника;Искусственный интеллект;Компьютерная лингвистика