Sivodsi
New member
- Joined
- Oct 16, 2011
- Member Type
- Student or Learner
- Native Language
- English
- Home Country
- New Zealand
- Current Location
- South Korea
Hi there,
I'm working with a list of words taken from transcripts, and want to compare them to the frequencies found in the BNC spoken, the list found at ucrel.lancs.ac.uk/bncfreq/lists/2_2_spokenvwritten.txt.
However I'm having a hard time trying to find the exact rules that were followed for defining 'word' in this corpus. For example, how did the BNC count multiword lexical items?
From scrutinizing the list you can find multiwords like 'brand new', 'even when', 'by now' with their own frequency count, and yet you find 'new' listed as "NoP~" with the diacritic mark indicating that its part of a noun like 'New York'... (I think). It seems inconsistent to me.
So if anybody can find BNC's guidelines for their spoken corpus online I'd be very grateful.
I'm working with a list of words taken from transcripts, and want to compare them to the frequencies found in the BNC spoken, the list found at ucrel.lancs.ac.uk/bncfreq/lists/2_2_spokenvwritten.txt.
However I'm having a hard time trying to find the exact rules that were followed for defining 'word' in this corpus. For example, how did the BNC count multiword lexical items?
From scrutinizing the list you can find multiwords like 'brand new', 'even when', 'by now' with their own frequency count, and yet you find 'new' listed as "NoP~" with the diacritic mark indicating that its part of a noun like 'New York'... (I think). It seems inconsistent to me.
So if anybody can find BNC's guidelines for their spoken corpus online I'd be very grateful.