M2: Unit 1: Chapter 7 – Limitations and Potential of Corpus Processing Tools

The value of corpora in translation cannot rest on their ability to uncover ‘universals’ of translation, nor is their purpose to claim objectivity since behind the design of any experiment or research program lie intuition and value or human judgement.

Corpus Translation Studies and the techniques they offer are not the key to an objective treatment of the object of enquiry: corpora are products of human beings and thus inevitably reflect their views, presuppositions and limitations.

Moreover, linguistic features have no single interpretation: passives, for example, have a thematic function of moving information to different places in the clause. This allows the agent to be omitted but this may be for various reasons: because the information is obvious, or unknown or irrelevant, or in order (consciously or not) to be vague about or hide the information (Stubbs and Gerbig 1993: 77).

It is also crucial to emphasise that researchers make a query with their tools, obtain results and then interpret these findings. Consequently, as with other methodologies, there will be a gap between the data and the interpretation as a researcher’s judgement is their own.

Corpora can reveal quantifiable textual and extratextual regularities but quantification is not an end in itself. Regularities have to be interpreted and their interpretation as evidence of the operation of norms, for example, is by no means straightforward. Corpora allow for certain things but not for others, e.g. explicitation can be shown with the tools but they cannot explain how this phenomenon happens in translation.

Now, if we consider the software, it is important to notice that WordSmith Tools’s concordancer cannot find what is not there. In other words, it can find for instance definite and indefinite articles in texts but not the instances in which no article is used.

Also, the computer is good at analysing orthographic characters but not their meaning, so it is inevitable that there will be problems in the following areas:
(1) word-forms coming from the same root, e.g. knock, knocking and knocked, which will appear separately,
(2) multi-word units as with in the mean time which will appear as four word-forms, and
(3) polysemic words, given that the computer cannot differentiate between the verb wave and the noun.
The concordancer produces what it is asked to find, which may not be what is looked for.

Frequency lists and word statistics, by their very nature, tend to focus attention on single decontextualised lexical items. However, they give an overall idea of the text and supply ‘a set of hints or clues to the nature of the text (…) one can get an idea of what further information would be worth acquiring (…) and so focus on investigation’ (Sinclair 1986: 188). The basic word-statistics, i.e. type-token ratio and frequency lists, give an idea of the general texture of the texts being looked at. They are of special value in spotting possible fruitful areas of investigation. Moreover, KWIC concordance can be used to quickly call up all the instances of a particular chosen item to check them against the corresponding ST or TT term:

 ‘In this way, the computer serves as an aid to, and not a substitute for, human analysis’ and ‘this is a potentially powerful tool to help analyze translation shifts’ (Munday 1998: 6-7).

When using Multiconcord, there is no specific function to interrogate the corpus for emphatic repetitions since the computer will provide any type of repetitions. You have to go through all the instances brought back by the software and select those relevant for the purpose of your study.

Finally, the computer gives access to larger amounts of data that make it feasible to pursue macroanalyses by going from the microstylistic analysis of individual passages to the macrostylistic interpretation of the whole text. However, as Opas and Rommel point out, even though the computer can make ‘life easier for the literary critic’, it cannot ‘generate meaning and it will always remain a tool’ (1995: 262).