Can Google Concordance Language?
In language discussions, results taken from search engines are often quoted as examples to show whether something is used as a form or to compare forms to see which is more common, etc. GoogleBlogoscoped has run 27,000 words from a dictionary through Google for popularity- the full results of the study can be downloaded here. The table below shows the top thirty words from the 2006 and 2003 surveys, together with the top thirty words from the British National Corpus (BNC).
The method used in the Google study does not count multiple occurrences in a single page, so the presence of a copyright message at the foot of a page will count for the same as all the times that the occurs, which accounts for the presence of copyright, contact, site, home, etc. However, the other entries suggest that the contents of the Google databases, and therefore any other reputable search engine, are likely to give a fairly accurate reflection for terms that are not related directly to the language of the layout of a webpage. As a rough and ready tool for checking, it seems that search engines can be used as basic concordancing tools.
Poll: Can Google concordance language?
Google 2006 |
Google 2003 |
BNC |
a |
the |
the |
the |
of |
at |
to |
and |
of |
in |
to |
and |
of |
a |
a |
and |
in |
in |
for |
for |
to |
by |
on |
it |
home |
home |
is |
all |
is |
was |
this |
by |
I |
is |
all |
for |
about |
this |
you |
site |
with |
he |
with |
about |
be |
at |
or |
with |
more |
at |
on |
your |
from |
that |
us |
are |
by |
you |
us |
are |
contact |
site |
not |
web |
information |
this |
are |
you |
but |
from |
contact |
's |
information |
an |
they |
it |
more |
his |
copyright |
new |
from |
an |
search |
had |
privacy |
that |
she |
that |
your |
which |
Categories: General
Not counting multiple occurences does weaken the case for its accuracy.
Are there any new lists in CSV format that have been added or created lately?