Mar 25, 2013

What corpora HAVE done for us

Sinclair's seminal work -
the bible of corpus linguistics
In this post I would like to defend linguistic corpora and their relevance to the ELT field which Hugh Dellar raises doubts about.

Years ago before I became familiar with corpus tools (corpus as in linguistic corpus = "collection of samples of real-world texts stored on computer"; plural = corpora) we had a fierce debate with my colleagues whether to use the preposition to or for after the noun hint. We wanted to produce posters for English learning centres we had set up for a number of high schools and each poster was meant to provide "Hints for/to speaking / listening etc".
Emails were sent back and forth about what preposition should be used and the argument inevitably turned to the British/American distinction until somebody used Google Fight to compare hints to and hints for. Google Fight provided us with pseudo-scientific evidence that hints for is slightly more common than hints to – it was back in the days when I was still blissfully unaware that Google search yields different results for different people and sometimes even for the same person on different computers!

That was before I discovered the British National Corpus hosted on the Brigham Young University website. Had I discovered it earlier I would have searched for Hints + Preposition and found that hints on something is actually more common than the other two options we were vehemently debating.

In his recent article Whathave corpora ever done for us, Hugh Dellar raises doubts about the usefulness of corpora to the ELT field. While not completely dismissive of corpus research and its value, Hugh basically argues that its effect on the language teaching profession has been enslaving rather than liberating. I find Hugh's polemic surprising considering the fact that corpus linguistics is what gave impetus to the Lexical approach, of which Hugh is a staunch advocate (used the corpus here to look up a "juicy" adjective for "advocate"!)

Objective view of language
In the past 30 years corpus research has provided irrefutable evidence about how language works, not least that language is highly patterned in that it largely consists of recurrent lexico-grammatical combinations.

Starting with the Collins COBUILD project, corpora have revolutonised lexicography and changed the face of the modern dictionary. These days most respectable dictionaries – an indispensable tool for learners and teachers alike - include examples drawn from the corpus, frequency information and often register variation (if a word is more suitable for formal or informal contexts). 

Corpora have shed light on many aspects of language which were previously described based on intuition. Instead of groping in the dark and anecdotal evidence we now have access to authentic language data. For example, in the past many grammar books presented "any" as a sort of transformation of "some" used in negatives and questions.

I have some time. – I don't have any time. – Do you have any time?

Corpus research has shown that any is more common in affirmative sentences (50% of all usage of "any") and not as frequent in questions (only 10%) as prescriptive grammarians would have you believe.

Frequency of lexical or grammatical items is useful for deciding which materials should be included in a syllabus. This is not to say that these should not be balanced by another consideration: relevance to the learner.

Corpora in the classroom: a boon or bane?
However Hugh’s main argument of corpora is its applicability to classroom teaching and relevance to teachers themselves. As a teacher I find corpora invaluable. Just the other day a student asked me about the difference between classic and classical. I came up with classical music and classic mistake off the top of my head but had to consult a corpus to find further examples:

classic example / case / symptoms / mistake / movie
classical music / composer / tradition

Such puzzles with confusable words can be easily solved by using the Compare function in BNC or COCA

No doubt some people are walking dictionaries and can (off the cuff) rattle off examples of usage but I would look it up in a corpus. Very often I give my students an answer about how a word is used and then consult a corpus or (corpus-based) dictionary to confirm my hunch. I am often right but sometimes I overlook certain patterns. And why not get learners to look up the answers themselves? Although data-driven learning (DDL) hasn't gained much popularity, there is some evidence that getting students to study language data (concordances) by themselves is beneficial to vocabulary learning.

Finally, Hugh argues that corpora make English as a foreign language unnecessarily foreign for non-native teachers by emphasizing certain dubious features of spoken grammar (e.g. "like" for reported speech) that we don't really need to teach learners. This is particularly ironic because Innovations and, to a lesser degree, Outcomes - coursebooks co-authored by Hugh Dellar - are packed with colloquialisms. Innovations Upper-Intermediate has a whole page devoted to vague language (sort of, kind of, -ish) - an important feature of spoken grammar of English. Perhaps I wouldn't teach "like" for productive use in an EFL context. But what about the determiner "this" which has a markedly different use in spoken language, as corpus studies have revealed? In contrast to written language, we often use "this" to refer to things NOT previously mentioned in spoken narratives to make them more vivid.

I saw this weird guy on the train yesterday.
And then there was this loud pop, like something exploded.

Corpora have provided us with more accurate language descriptions and informed dictionaries,  grammar reference books and pedagogical materials. With various corpora and "corpus-light" tools (see here) now widely available online, corpora are no longer a remit of linguists but a valuable resource for teachers and learners.

For another rebuttal of Hugh Dellar's argument, see Mura Nava's post here


  1. hi leo

    nice post, thanks for linky to my post, will update my post to link yrs.
    do you have a link/ref for the 'any' statement?

    am currently reading From Corpus to Classroom by O'Keeffe et al, and it is superb, very well written and very informative

    p.s. commenting on your blogger posts is as tricky as ever! change to wordpress!

  2. Hi Mura

    Re "any" it was first mentioned by Dave Willis in The Lexical Syllabus (1990) but the frequencies I mentioned are from Mindt (1997) as cited by Krieger (2003) here:

    Sorry for not providing references. I used to cite properly in my blog posts and friends laughed at me calling it a "post-masters syndrome"

    P.S. What kind of problems is Blogger causing you?

  3. Hi Leo -
    Well, firstly I'm both flattered and also slightly bemused that a post based on a talk I must've first given about ten years ago, when corpora were being particularly aggressively marketed at conferences and we were being told all teacher should have one - at a cost, of course - or else how we could we possibly teach, has sparked a response post at all!

    I just wanted to add a few bits and pieces, and clarify myself on a couple of others, if I may.

    Firstly, as I hope was clear from my original, somewhat tongue-in-cheek post on corpora (, I am both keenly aware and also most grateful for the way in which corpora linguistics has impacted on dictionaries and reference sources. In fact, I think it's had such an impact that it essentially - from a teacher and learner point of view - renders the corpora redundant. Take your query about HINTS. Look in, say, the Macmillan Advanced, and it tells you this:

    3 [C] a useful suggestion or piece od advice: TIP helpful / handy hints. This leaflet is full of handy hints about safety in the home.

    ON hints on how to improve your computer skills

    Now, to me that seems pretty comprehensive and helpful (though it also suggests that HINTS can be followed by ON or ABOUT with very very little essential difference, and to that I'd add that HINTS FOR also sounds fine to me - especially, and this is the key issue for me, if it's said by a student at anything other than super post-Proficiency!! One of my main worries with all of this is that we end up replacing a tyranny of gramamtical accuracy with an even more scary tyranny of collocational accuracy!).

    Anyway, compare the dictionary entry with what you get if you DO actually go to the BNC and put in HINTS.

    Here are some examples from the first ten that come up:

    In a pre-production interview with this paper Howard Brenton and his director Danny Boyle offered helpful hints as to how we might best grasp the meaning of HID (Hess Is Dead).

    Occasionally he dropped hints on this matter to Gina.

    Hints at races for priority and battles for funding, disappear politely into the background as the protagonist's version, time and again, is taken as gospel.

    Not only are these utterly inconclusive and STILL leave a good teacher - or writer or student - having to use intuition to work out what best follows hints, but they're also chock full of all manner of weirdness. The top ten examples taken as a whole contain some truly bizarre and random language that students would inevitably get bogged down in and lost in and worry about. Why put them through this when you could just let them use a good dictionary?

    And why sweat it too seriously about whether HINTS ON or FOR or ABOUT works best when all will essentially do?

    So that's one thing. This is shaping up to be an epic response, I fear! See what you've let loose.

    Exactly the same applies to things like CLASSIC and CLASSICAL. A good dictionary will always be of way more utility in ELT than raw corpora data.

    Will leave it there for now, anyway, but good to see the debate runs on.

  4. Hi Hugh
    Thank you for your comment. I am aware of the context in which your tongue-in-cheek post was written :)

    Just like you, I am a big fan of learner's dictionaries - I wish many teachers (especially native speakers) would be too - so I am convinced of their utility and often superiority over corpora which provide us with unsorted, raw data. But very often I didn't find the answers I was looking for and had to consult a corpus, especially when it comes to near-synonyms. I also don't see why post-intermediate learns cannot consult corpora themselves. My students love Just-The-Word.

    Having said that, I agree with you when you say that corpus tools will be more relevant when they start sorting by chunks and phrasal verbs and Just-The-Word does it to some extent.

    The debate indeed runs on but I think we're more in agreement here that it might seem :)


  5. Hi Leo,

    An interesting post (and discussion) and there are loads of points that I could pick up on, but it's a Sunday and I'm not going to write an essay!

    Just wanted to agree that as a lexicographer who spends quite a lot of time looking at corpora, I agree that they can be very messy. They are, after all, raw research data, which isn't always that useful to the end user (teacher or student) without a bit of mediation (or quite a bit of training and practice). Even when we're deciding what collocations to show in the dictionary, there can be a tendency to pick out the nice neat ones and ignore the messy stuff, perhaps, you could argue, to the detriment of presenting an absolutely accurate picture. But then any popular reference is going to be simplified to an extent, otherwise it'd be unusable.

    As for the 'tyranny of collocations' that Hugh mentions, I agree that it sometimes feels like a bit of a potential risk, but isn't it just a case of everything in moderation? I think it's useful for learners to understand the concept of collocations, as part of their language tool bag. Just like your dilemma over 'hint', if a student's umming and erring over which word to use in a particular context (more likely written than spoken), then why not give them the option (and skills) to be able to look it up in a dictionary to help decide. That doesn't mean though that a teacher has to come down like a ton of bricks every time a student 'does a mistake'! It's all a matter of balance and judging when it's appropriate to focus on (mis)collocations.

    Julie (Moore).


Related Posts Plugin for WordPress, Blogger...