corpus linguistics

Visuwords: WordNet gone visual

Visuwords takes Princeton’s WordNet (see previous post), the data it is based on, one step further and turns into into a visually attractive “online graphical dictionary” that lets you “Look up words to find their meanings and associations with other words and concepts. Produce diagrams reminiscent of a neural net. Learn how words associate.”

I’ve got to admit, while WordNet itself might be fun for serious linguists to play with, this is fun for everyone, and far more accessible. I love how the nodes keep popping out. Give it a try at http://www.visuwords.com. The word “help,” for example, produces a particularly rich network.

helpnetwork

Analyze State of the Union Addresses

Search seven of former President Bush’s State of the Union Addresses using the impressive visual interface offered by the New York Times. It shows the speeches (shrunk down) and indicates where your word of interest is, and lets you see the context of every instance. You can also easily visually compare the frequency of one word to several other words, across all seven speeches. This tool/corpus is limited, yet it’s accurate enough that it could be used for research

screenshot from the New York Times website tool

Bonus: Pre-made set of graphs showing patterns of word frequency across 75 years of State of the Union Addresses.

Corpus of Contemporary American English

COCA, the Corpus of Contemporary American English (available at http://corpus.byu.edu/coca/) is a giant (450 million words plus more, and growing) corpus of written and spoken English, freely available online, and with easy-to-use search tools that let you analyze the corpus in a variety of ways, e.g. collocates, frequencies, and searches for words/phrases (including the use of wildcards or parts of speech). A great go-to corpus for corpus linguists, and fun for anyone willing to put it some time playing with it to learn how it works. If you’re confused, you can click a little question mark for an explanation of most features.

Partial screenshot taken from http://corpus2.byu.edu/coca/

Chi-Square Tests at Vassarstats.net

Corpus linguists, need an easy way to do chi-square tests (etc.) online? See if the following calculators at vassarstats.net meet your needs

  • 2×2 contingency table, Phi Coefficient of Association, Chi-Square Test of Association, and/or Fisher Exact Probability Test.
  • 3×3 contingency table, Freeman-Halton extension of the Fisher exact probability test,. works if N does not exceed 9
  • up to 5×5 contingency table,Chi Square, Cramer’s V, and Lambda (the last two items here are out of my league, but, yeah)