M2: Unit 1: Chapter 4 Continuation – Corpus Based Translation Studies

« back to unit 1                                                                                                                                                                                               » next chapter

Ch 1: Introduction Ch 4: Corpus-Based Translation Studies Ch 7: Limitations & Potential of Corpus Processing Tools
Ch 2: Systemic Functional Grammar Ch 4 Cont’d: Corpus-Based Translation Studies Ch 8: Bibliography
Ch 2 Continued: Systemic Functional Grammar Ch 5: The Feel of the Texts
Ch 3: Point of View (POV) Ch 6: Advantages of Corpus Processing Tools

Chapter 4 Continuation: Corpus Based Translation Studies


Multiconcord is a multilingual concordancer developed by David Woolls and a consortium of European Universities as part of the Lingua program (Woolls 1997). To run the Multiconcord program, texts have to be saved with a specific format. Once the texts are converted, a piece of software, Minimal Mark Up (Minmark) is used to mark them. Minmark places start and <\body> end tags, paragraph indicators at the beginning of each paragraph and sentence markers into the text. Then, a manual editing has to be performed. Mismatches most often appear because there is a large discrepancy in length between the sentences, or because they have been presented in a different order in the translation. There is normally a reason for a lower success rate in the text alignment and the only way to overcome such problems is to check paragraphs.

Multiconcord allows the user to select a pair of languages, i.e. a source and a target language, and to enter a search pattern of words or phrases in a selected language, which may or may not be the source language. For this reason, the method allows detection of creation of certain patterns in the target text, e.g. repetitions, by starting from the target end. It is possible to search a single-word item e.g. wave, a multiple-word item, e.g. summer holidays, a single word using a final wild card e.g. wav*, a single word using an initial wild card e.g. *able, a word or phrase with a central or medial wild card e g. un*ly, a single phrase e.g. in any case, and any combination of these. Multiconcord then gives a list of all the source items it has found, the hits, and allows seeing the full sentence or sentences in the source language along with the sentences which the aligning engine considers to be equivalent in the target language. The aligner works within the parallel paragraphs and attempts to match up all sentences until it reaches the sentence for which the search routine has recorded a hit. Then, it records the matching location in the target language paragraph. Searches can also be refined with the specification of a context word, which must appear within a specified distance of the search word (up to 6 words to the left and right). Results or hits can be sorted alphabetically or filtered by assigning each hit to one of four ad hoc categories. They can also be viewed in sentence or paragraph mode. Multiconcord’s ‘test’ facility can be used to further sort and save results to a file[1].

For instance, in what follows I searched for ‘I cannot’ in the context of ‘I am’ to look for examples of deixis and modality:


The following window is the ‘progress report’ which indicates how many ‘hits’ there are for the search. There are 36 instances of ‘I cannot’ in the context of ‘I am’:


Then you can see the ‘hits’ in context, at the level of sentences:


Or at the level of the paragraph, since one sentence in the original can be translated into two, for instance, and there will be a mismatch at sentence level:


1. In the next chapter, we will look into this particular example to analyse the feel of the texts.