Leoxicon: Does the chunk argument trump the plagiarism allegations?

Jul 24, 2016

Does the chunk argument trump the plagiarism allegations?

Photo by Marc Nozell via Flickr

One of the hottest news this week has been Melania Trumps’ allegedly plagiarized speech. Why allegedly? Because although Donald Trump’s wife address at the 2016 Republican Convention bears marked similarities to Michele Obama’s speech at the 2008 Democratic Convention, there is not much in it that would effectively constitute stealing in the linguistic sense.

In the last 30 years, corpus research (study of language through samples of 'real world' text) has shown that language is highly formulaic, i.e. consisting of recurring strings of words, otherwise known as “chunks”. What makes them chunks is the fact that they are stored in and retrieved from memory as ‘wholes’ rather than generated on a word-by-word basis at the moment of language production.

Public speeches are a prime example of formulaicity in language in that they consist of conventionalised routines; some of these are very fixed, highly probable combinations whose content can be predicted by the hearer. A few years ago when I was invited to give a talk at a conference I started my session with a slide with the following lines:

thank you for _______ me here today...

a topic I am particularly _______ in

have enjoyed a fruitful _______

hope our relationship continues to _______

and got the audience to complete them. Here’s what they came up with, as I’m sure you did too:

thank you for having me here today...

a topic I am particularly interested in

have enjoyed a fruitful co-operation

hope our relationship continues to grow

Chunks in different genres

The formulaic nature of language was first brought to the fore in a seminal paper by Australian linguist Andrew Pawley and and his colleague Frances Syder, who pointed out that competent language users have at their disposal hundreds of thousands of ready-made phrases (Pawley and Syder 1983). Some linguists have argued that up to 80% of English text (Altenberg 1998) consists of recurring sequences. More conservative estimates suggest that 50-60% of discourse is formulaic (Erman and Warren 2000).

The figure, of course, depends on the genre. There are fewer chunks in creative writing or fiction but more chunks in news reports. Similarly, when it comes to spoken language there will be fewer chunks in storytellers' narratives, but a higher prevalence - probably nearing a more liberal estimate of 80% - in the speech of auctioneers, TV sports announcers or other ‘smooth talkers’ (Kuiper 2004 cited in Schmitt 2010). This is because language users rely on chunks to produce fluent speech under time pressure. In addition to that, chunks perform a number of interactional and social functions, and are used to accomplish various transactions. Your exchange with a shop assistant is likely to be very formulaic and predictable:

Image by Julian Lim on Flickr [CC BY 2.0]

Excuse me, do you work here?

Can I help you?

I’m just looking around.

Have you got ….. in [size] ?

I’m looking for a …

How much is ...

Where is the fitting room?

Academic discourse also relies heavily on chunks. Analyses of academic corpora show that academic writing is made up of a substantial number of recurring word combinations:

On the other hand

At the same time

In the present study

In terms of

As shown in future

It was found that

(from Biber, Conrad, & Cortes, 2004)

Let me just say this

Although they are usually not constructed in real time, political speeches are a shining example of a genre laden with formulaic language. Not only do they contain a high number of (grossly overused) recurrent combinations, they employ similar rhetoric and generally follow the same format. Moreover, they are so remarkably similar that their content can be distilled down to an algorithm. Indeed, that’s what a group of researchers at the University of Massachusetts recently did. They ran 4,000 political speech segments through text analysis software and came up with an algorithm which can generate convincing political speeches. To do this, they built a model based on n-grams, which evaluates the probability of a word appearing after a given number of items (words) – a model commonly used in computational linguistics made popular by Google N-gram Viewer. Put simply, they taught a robot to write speeches similar to formulaic and cliché-ridden speeches by politicians.

Obama’s well applauded Victory speech is no exception. Here’s the final part of his famous 2008 speech:

America, we have come so far. We have seen so much. But there is so much more to do. So tonight, let us ask ourselves – if our children should live to see the next century; […] This is our chance to answer that call. This is our moment. This is our time – to put our people back to work and open doors of opportunity for our kids; to restore prosperity and promote the cause of peace; to reclaim the American Dream and reaffirm that fundamental truth – that out of many, we are one; that while we breathe, we hope, and where we are met with cynicism, and doubt, and those who tell us that we can’t, we will respond with that timeless creed that sums up the spirit of a people...

Photo by cfishy on Flickr [CC BY 2.0]

Apart from clichés such as “This is our moment. This is our time” and “And while we breathe we hope”, it contains collocations (which is one kind of a formulaic sequence) such as “promote the cause of peace” and "reaffirm the truth” as well as a number of predictable strings:

America, we have come so _____. We have seen so _____. But there is so much more to ____.

Let’s now take an excerpt from Melania’s speech, which came under criticism:

From a young age, my parents impressed on me the values that you work hard for what you want in life, that your word is your bond and you do what you say and keep your promise, that you treat people with respect.They taught and showed me values and morals in their daily lives. That is a lesson that I continue to pass along to our son. And we need to pass those lessons on to the many generations to follow.

A quick corpus search will tell you that “impress on(upon)” is commonly used with PARENTS and FATHER, and the things that are usually impressed on are THE IMPORTANCE / NEED / VALUE. The Longman Dictionary actually gives the following example:

Father impressed on me the value of hard work.

So if I was teaching “impress on smb” I’d probably give this as an example of how the verb is used. Then I’m sure you’ll find there is nothing illicit with “work hard” or “keep promise” either. On the contrary, I’m certain you would correct your students if they said *worked hardly or used held instead of keep in "keep a promise"

Looking at “treat people with respect” which is supposedly copied from Michelle Obama’s “treat people with dignity and respect”, you will see that dignity and respect are two of the very highly likely collocates here. Here is how Netspeak, a tool which helps you find a missing word in a sequence (see how you can use it HERE), suggests "treat people with" should be completed:

So to answer the question in the title of this post, was Melania Trump’s speech lifted from Michelle Obama’s or have the accusations of plagiarism been largely trumped up? If Melania’s faux pas indeed constitutes plagiarism, the text of her speech was no more plagiarized than an academic paper containing "Recent research has shown that" or "The results are consistent with data obtained in..."

References

Altenberg, B. (1998). On the phraseology of spoken English: the evidence of recurrent word-combinations. In A. P. Cowie (Ed.), Phraseology: theory, analysis and application (pp. 101–122). Oxford: Oxford University Press.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow: Pearson.

Erman, B. & Warren, B. (2000). The idiom principle and the open choice principle. Text 20(1): 29–62

Pawley, A., & Syder, F.H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency in Richards, J.C. & Schmidt, R.W. (eds) Language and Communication, London; New York: Longman, pp 191 – 225. Available online at http://www.uni-mainz.de/FB/Philologie-II/fb1414/lampert/download/so2008/PawleySyder.pdf

Schmitt, N. (2010). Researching vocabulary. Basingstoke, England: Palgrave Macmillan.

17 comments:

Amy TateJuly 24, 2016 at 8:44 PM
I actually had similar thoughts when this story first broke. I thought perhaps speech language is so formulaic and trite that the similarities were the result of this. (I also thought why on earth would a Trump speechwriter plagiarize Michelle Obama of all people?!) However, the speechwriter herself has said she wrote down parts of Obama's speech in the planning process and then forgot to double check later.To me it's a perfect teaching moment, because all Melania Trump needed to add was, "As Michelle Obama once said ...."

So while much of the phrasing can be traced back to the nature of language and chunking, the credit still goes to Michelle Obama's speechwriter for doing it in this particular order first.

Thanks as always for your defense of the chunk and its importance in language teaching.

http://www.foxnews.com/politics/2016/07/20/melania-trump-speechwriter-admits-mistake-in-lifting-michelle-obama-lines.html
ReplyDelete
Replies
AnonymousJuly 25, 2016 at 12:22 PM
1. You say “corpus research .....has shown that language is highly formulaic, i.e. consisting of recurring strings of words, otherwise known as “chunks”. What makes them chunks is the fact that they are stored in and retrieved from memory as ‘wholes’ rather than generated on a word-by-word basis at the moment of language production”.

a) From the point of view of corpus research, what makes recurring strings of words chunks is their form, not how they’re memorised.

b) It is NOT a fact that chunks are stored in and retrieved from memory as “wholes”. Some hypotheses have suggested that certain types of carefully-defined sequences of words are memorised in carefully-defined ways. By definition, none of these hypotheses is true.

2. The article you refer to by Pawly and Syder argues that control of a language entails knowledge of more than a generative grammar; it also requires knowledge of 'memorized sentences' and 'lexicalized sentence stems'. The terms refer to “two distinct but interrelated classes of units”, and the argument is that a store of these two unit types is among the additional ingredients required for native control. Unlike your over-simplistic treatment of “chunks”, Pawley and Syder go to considerable lengths to carefully describe “lexicalized sentence stems”, pointing out, among other things, that they often contain elements capable of various types of transformation. They admit that haven’t even successfully defined a lexicalized sentence stem; that is, they haven’t fully distinguished it from non-lexicalized sequences. As they say, “the question is what is 'lexicalization'?. ..... An expression may be more or less a standard designation for a concept, more or less clearly analysable into morphemes, more or less fixed in form, more or less capable of being transformed without change of meaning or status as a standard usage, and the concept denoted by the expression may be familiar and culturally recognized to varying degrees. Nor is there a sharp boundary between the units termed here 'sentence stems' and other phraseological units of a lower order.” Your post here ignores all the subtleties of Pawley and Syder’s arguments and uses the term “chunks” without defining what it refers to.

3. Your suggestion that the text of M. Trump’s speech was no more plagiarized than an academic paper containing "Recent research has shown that" or "The results are consistent with data obtained in..." is ridiculous.

M. Obama: “Because we want our children and all children in this nation to know that the only limit to the height of your achievements is the reach of your dreams and your willingness to work for them."

M. Trump: “Because we want our children in this nation to know that the only limit to your achievements is the strength of your dreams and your willingness to work for them.”

Geoff Jordan
ReplyDelete
Replies
Tyson SeburnJuly 26, 2016 at 7:20 PM
Nice try. I get what you're saying, but I don't know... it seems pretty weak from an academic perspective to suggest it's not plagiarised based on commonality of phrases and expected blank-fills. Yes, there's the argument that all political speeches are similar to each other, but this one is closer.

It's sentiment and meaning. It's opportunity not taken to completely rephrase an entire 'chunk' of her speech, when other areas were. It's synonyms substituted more than it is just common collocations--a strategy very commonly employed by EAP students and never successfully passed when compared to the original. I would like to see any student's paper do this and not be flagged. I suppose one could argue that citing very well-done paraphrase should also be done here. Maybe there's something here to be defended about context... she wasn't reciting an academic article, I guess.

Perhaps the better strategy to employ is not to put yourself within reach of such obvious copying when under scrutiny by millions.
ReplyDelete
Replies
ScottJuly 27, 2016 at 12:40 PM
Nice try, Leo, and it might work if we simply compared the two speeches for the individual collocations that they each embed. As you note, sequences like 'parents impress on' and 'treat people with respect' are commonly occurring sequences. The problem is that is these commonly occuring sequences occur TOGETHER in the same text, the chance of this happening randomly is vanishingly small. As one writer points out, assuming a working vocabulary of 20,000 words, and 'assuming that the writer is free to choose any of the 20,000 words and use these words in any order, a series of 5 words that exactly match another source would have the random chance probability all (1/20,000) x (1/20,000) x (1/20,000) x (1/20,000) x (1/20,000) or 1 chance in 3,200,000,000,000,000,000,000 (one chance in 3.26 sexillion)'. Of course, the writer is ignoring the constraints of syntax, and words do not follow one another in a random order. Nevertheless, this does suggest that the more words in sequence that are the same, or nearly the same, there is an exponential increase in the probability that they are copied. In short, 'when multiple sentences have matching word orders…, the assumption can be that plagiarism has occurred (or at the least, that sources were not properly cited).'

https://sctcc.ims.mnscu.edu/shared/FacultyTutorials/MathematicsOfPlagiarism.pdf

Oh, and another thing. Pawley and Syder were not Australians, but New Zealanders, and not exactly colleagues but son and mother (in that order)!
ReplyDelete
Replies
lauraadelesoraccoJuly 27, 2016 at 7:30 PM
I'm no expert, Leo, but I claiming that MT's speech was not plagiarized because she used common formulaic language misses the point. MT could have used language that appears frequently in political discourse language without explicitly taking 3-5 claims expressed by MO in her speech. MT even followed the same order! Your example of academic language is weak, in my opinion, because what you're talking about are introductory phrases in a way, not full-on claims or concepts. I have to agree with both Scott T and Tyson S -the odds of MT's speech occurring without having heard MO's speech (and in that same order) are extremely low. I also doubt any English instructor would let this pass if aware of the original text.

ReplyDelete
Replies
LeoJuly 28, 2016 at 12:18 PM
Hi Laura,

I knew that I wouldn't convince many with my argument :) hence the title of this post is a question rather than a declarative statement. You're right, she does use common formulaic chunks but, as Tyson points out, she copied a 'big chunk' (portion) of MO's speech.

Regarding academic language, in the paper I cite many chunks (or in the researcher's own terms, 'lexical bundles') are indeed linkers, connectors and introductory phrases ("in terms of", "as a result" etc). But if you take, for example, the Academic Phrasebank by the University of Manchester (http://www.phrasebank.manchester.ac.uk/being-critical/), the chunks suggested there go beyond mere linkers and sentence starters to fully-fledged sentences for introducing hypotheses, offering suggestions and identifying weaknesses. It comes with a warning though that these should be integrated creatively but they acknowledge that using them wouldn't constitute plagiarism as they are generic in nature. So it's a very thin line, really...

Thank you for stopping by!
ReplyDelete
Replies
how to copy uncopyable textNovember 5, 2016 at 9:49 AM
The possibly stands out to be more of the motivational objects and values which are either said to be of utmost importance and needs for the students to regard about.
ReplyDelete
Replies
UnknownJune 19, 2017 at 3:49 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Welcome

Hello and welcome to my blog!
This originally started as an ELT resources blog focusing on lexical activities but with time I also started posting thoughts and reflections (or rather rants) on language teaching in general. I still post activities which can be found under the tab called For the classroom - feel free to use them with your students. Whether you use one of my activities or read one of my more reflective posts, please do drop a comment. Good to have you here!

LEO

P.S. All the views expressed here are solely my own.

Pages