Methods and Theories (WS15/16): A17) ROTH, ZADEGAN HASSANI; Correlating word length and frequency

Corpora: English Lexicon Project

Programs: Purpose-built query interface at http://elexicon.wustl.edu/default.asp

Tasks:

The project provides a database of lexical characteristics of words and behavioural data (reaction times in a lexical decision task and naming latencies).
Among the lexical characteristics, you can find the length of words in letters and their frequency the so-called HAL corpus (131 million words from 3,000 Usenet newsgroups collected in February 1995).
Generate two lists:
- one juxtaposing the wordlength and the bare frequency of the word in the HAL corpus;
- one juxtaposing the wordlength and the log-transformed frequency of the word in the HAL corpus.
Transform both lists into Excel tables with length and frequency numbers in extra columns. Calculate a correlation coefficient for the relationship between word length and corpus frequency (function: KORREL).
Can you interpret the correlation coefficients obtained? Which version of the correlation leads to a higher correlation coefficient, and why?

Notes:

At http://elexicon.wustl.edu/default.asp, you will probably have to enter your e-mail-address to be mailed the results.
If you open the results file (in .csv format) in the editor and automatically replace all commas by semicolons, and then all decimal points by commas, and save it again, you can re-open the file in Excel in a way so that it contains different columns.
Check how many lines the list has and enter the cell names as the array over which the correlation coefficient will be calculated.
Try to visualize the correlation with one of Excel's plotting options, e.g. a diagram called "Punkt (X Y)" with a trend line.

අවසන් වරට නවීකරණය කරන ලද: බදාදා, 28 ඔක්තෝබර් 2015, 3:23 PM