A) Corpus studies
Characteristics of a linguistic corpus:
- large collection of texts
- computer-readable
- aimed at representativity
- principled sampling (e.g. ICE: http://ice-corpora.net/ice/design.htm)
Types of corpus:
- raw text
- part-of-speech tagging (e.g. BNC: http://ucrel.lancs.ac.uk/claws7tags.html)
- syntactic parsing (e.g. ICE: http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=fs&titel_id=1408)
Corpus site of the Chair of English Linguistics: http://eng-ling.uni-bamberg.de
Major English language corpora:
- the 'Brown' quartet
- available through the University Library's DBIS (Datenbank-Infosystem): search for ICAME Collection of English Language Corpora or click on http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=f&titel_id=6156
- using Wordsmith Tools (Wordsmith Tools instructions, Step-by-step guide to Version 5)
- Brown University corpus (1 million words, written AmE from 1961)
- LOB (Lancaster-Oslo/Bergen) corpus (1 million words, written BrE from 1961)
- Frown (Freiburg Brown) corpus (1 million words, written AmE from 1992)
- FLOB (Freiburg LOB) corpus (1 million words, written BrE from 1991)
- International Corpus of English (1 million words: 60 % spoken, 40 % written; see http://ice-corpora.net/ice/design.htm):
- components available to date:
- GB
- Canada
- East Africa
- Hong Kong
- India
- Ireland
- Jamaica
- New Zealand
- Philippines
- Singapore
- Sri Lanka (only written sections)
- USA (only written sections)
- (Malta: in preparation in Bamberg)
- (Puerto Rico: in preparation in Bamberg)
- available through the University Library's DBIS (Datenbank-Infosystem):
- all ICE components: http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=fs&titel_id=9323
- using Wordsmith Tools (Wordsmith Tools instructions, Step-by-step guide to Version 5)
- ICE GB: http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=fs&titel_id=1408
- using ICE-CUP (Instructions available via the "Getting started" button from the initial screen)
- BNC (British National Corpus, 100 million words: 10 % spoken, 90 % written)
- free online access at Brigham Young University: http://corpus.byu.edu/bnc/
- access with BNCweb, installed on the departmental server: http://eng-ling.uni-bamberg.de
- access with XAIRA, through the University Library's DBIS (Datenbank-Infosystem): http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=fs&titel_id=7178
Spoken corpora
- Diachronic Corpus of Present Day Spoken English (DCPSE)
- parsed corpus of spoken British English
- 400,000 words each from ICE-GB (early 1990s) and London-Lund Corpus (late 1960s-early 1980s)
- available through the University Library's DBIS (Datenbank-Infosystem): search for DCPSE or click on http://rzblx10.uni-regensburg.de/dbinfo/detail.php?bib_id=ub_ba&colors=&ocolors=&lett=fs&tid=0&titel_id=1408
- Santa Barbara Corpus of Spoken American English (SBCSAE)
- 250,000 words of spoken interaction from all over the United States
- available upon request from Fabian Vetter
Other free (English) corpora:
- free online access at Brigham Young University: http://corpus.byu.edu/
- COCA (Corpus of Contemporary American English, 450 million words: 20 % spoken, 80 % written, 1990-2012)
- COHA (Corpus of Historical American English, 400 million words, 1810-2009)
- Corpus of Canadian English (Strathy Corpus, 50 million words, 1920s-2010s)
- TIME Magazine Corpus of American English (100 million words,1923-2006)
- Global Web-Based English (GloWbE, 1.9 billion words from 20 countries, 2012-13)
- Users taking part in the course "Methods and Theories in Linguistics" can use the password metheling to join the account for the University of Bamberg: In order to do so, make sure the [Organization] in the form under "Personal Information" at the bottom of that page matches the name Universität Bamberg (exactly), and click [Update]. Then enter the password metheling and click on [Join Group], after which you will then be added to the group account for our university. You will then have increased access and advanced features.
Listings of English language corpora:
- http://nora.hd.uib.no/text.htm
- http://torvald.aksis.uib.no/corpora/sites.html
- http://linguistlist.org/sp/Texts.html
Using the Internet as a corpus:
- employing Google (Advanced search): https://www.google.com/advanced_search
- employing Google Books through the Brigham Young University interface: http://googlebooks.byu.edu/
- employing Webcorp: http://www.webcorp.org.uk/
- employing KWiC Finder: http://www.kwicfinder.com/KWiCFinder.html
Free concordancing software:
Zuletzt geändert: Sonntag, 19. April 2015, 22:53