Wednesday 13 April 2022

Can a machine translate a novel? Nicky Harman wonders.

 Rather to my surprise, I found myself at a discussion of this very question at the Literary Translation Centre, in last week's London Book Fair 2022.


This is not my first brush with computer-aided-translation (CAT) tools. Back in the day (2000-2010, so quite a few days back!) I used to teach a CAT tools module on the Translation and Technology (Scientific, Technical and Medical) MSc, at Imperial College London.

First, let’s define some terms: CAT tools do many different things. Translation Memory (TM) apps create a database of segments (sentences or phrases) from the work of previous human translators and offer them up when the human translator comes across identical or similar phrases in a subsequent translation. TM apps are regularly used by companies producing instructions manuals and their translators. Imagine, for example, someone translating an instruction manual for a washing machine where most of the text for different models is repeated, but the spec differs. Note the human agency.

There’s Machine Translation (MT), something we scarcely touched on back then because the results were laughable even between European languages. But things have changed. Roy Youdale, of Bristol University, UK, who was one of the speakers at this talk, writes in a recent article ‘Can Artificial Intelligence Help Literary Translators?’ that ‘A game-changer …. has been the incorporation of machine translation (MT) into CAT tools.’ He goes on: ‘MT basically uses a computer to search and compare the words in a source text with very large databases (billions of words) of texts already translated into the target language. In addition to the translation of individual words, the computer searches for corresponding sequences of words or ‘strings’, a process known as ‘string matching’.’ Anyone who has used DeepL or Google Translate to get the gist of an online article written in a language they can’t read, will know that the results are often quite clear and well-worded.


And then there are termbases, terminology databases, something that has always interested me. These tools allow the translator to store terms and names that recur throughout the novel (characters’ names, names for food, geographical features, and local government organisations, for instance). The app then suggests these matches where they pop up in the file being translated.

So are literary translators using CAT tools successfully? Youdale says that ‘younger literary translators in particular seem more willing to consider experimenting with CAT tools’, and describes whole novels where CAT tools produce a first draft, which is then ‘post-edited’ by the human literary translator. I was needled by the implication that ‘younger translators’ were somehow more advanced in their modus operandi! Maybe I was missing out by not making use of CAT tools. Yesterday, I decided to try an experiment.

I recently translated Jia Pingwa’s panoramic novel of the Chinese countryside, The Shaanxi Opera (forthcoming, 2023), with Dylan Levi King. Our use of what could loosely be called CAT tools was limited to storing terms and names in a spreadsheet to which we both had access, in the cloud. We then had to remember to go and check the spreadsheet, to ensure that we were translating consistently. Could we have gone further?

In the excerpted paragraph below, I have tried two things: I used MT as provided by DeepL, a free online machine translation app. Then I tried importing the same paragraph into a CAT tool which combines MT, translation memory (TM) and a termbase function. I was not able to try out the TM function because this was only one paragraph – no previous text from the novel for the TM to remember. And, honestly, I don’t know many novelists who repeat phrases, let alone whole segments, in their writing. I doubt very much that this function would be of much use. 

The novel and the excerpt.

The Shaanxi Opera has a cast of over a hundred characters and is a complicated story of rival families, dying traditions, and frustrated love. In this short paragraph, the narrator, a young man generally regarded as the village idiot, casts a wry eye over all the shenanigans, and mopes over his adored Bai Xue (Snow), a young married opera singer. He is a bit of a loner, as well as being an orphan, and on the eve of the Spring Festival, he shuts himself away at home.

单身汉是不愿意过年的,你到哪儿去呢,去哪儿都不合 适。 武林和我做豆腐的时候,他问过我:年怎么个过? 他的意思想要到我家去, 我没有应他的话,我宁愿孤单着也不愿和他在一起,他话说不连贯,而且身上有 一股臭味。 所以,我关了院门,年三十的午饭早早就炒了一盘肉,煎了一盆豆腐, 焖了一锅米饭就吃起来。 我端了碗,想起了我爹我娘,我说:“这口饭我替你们 吃吧!”扒下了第一口。 我当然就接着想起了白雪,我说:“白雪,我也替你吃 !”扒下了第二口。 第三口我是替夏天义吃的。 吃过了三口,我还能替谁吃 ,谁还值得我替吃呢? 我是想到了哑巴,想到了土地庙里的土地公和土地婆, 想到了二婶和四婶,想到了君亭和赵宏声。 还有树,我家院子里的树,清风寺里 的白果树,七里沟里那棵木棍长活了的树,还有夏天智家院里的痒痒树,清风街 所有的树。 来运呢? 应该有来运。 再就是染坊里的大叫驴,万宝酒楼上的那只 大花猫,夏天智院里那架牡丹蓬。 还有还有,怎么就把石头给忘了呢? 七里沟里 那么多的石头。

 Machine Translation (MT) result, translated with www.DeepL.com/Translator (free version)

…..Single people don't want to spend New Year's Eve, where do you go? When Wulin and I were making bean curd, he asked me: How do you want to spend New Year? He wanted to come to my house, but I didn't answer him. I would rather be alone than with him, he was incoherent and smelled bad. So I closed the garden door, and for New Year's Eve lunch I stir-fried a plate of meat, fried a pot of tofu and made a pot of rice and ate it early. I took the bowl and, thinking of my parents, I said, "I'll eat this rice for you!" I took the first bite. Of course I then thought of Bai Xue and said, "Bai Xue, I'll eat for you too!" I took a second bite. The third bite I took for Summer Yi. After three bites, who else could I eat for, who else was worth eating for? I thought of the mute, of the land goddess and the land lady at the land temple, of my second and fourth aunts, of Junting and Zhao Hongsheng. And the trees, the trees in my courtyard, the white fruit trees in the Qingfeng Temple, the tree in the Seven Mile Gully where the sticks grew to life, the tickling tree in the summer in Zhi's courtyard, all the trees in Qingfeng Street. Where's Laiyun? There should be some. Then there's the donkey in the dye house, the cat in the Manpo restaurant, and the peony pavilion in the courtyard of the summer family. And how could we forget the stones? There are so many rocks in the Seven Mile Ditch.

Verdict: Pretty crude. More or less comprehensible for most of the excerpt, but some mistakes. The first sentence is a mess. Nowhere near publishable standard and does not do justice to the author’s style.

Here is the same excerpt as produced by a CAT tool, SmartCat.


Verdict: Slightly better, but nowhere near publishable standard and does not do justice to the author’s style. Again, the first sentence is a mess.

UPDATE My thanks to Professor Mark Shuttleworth, Department of Translation, Interpreting and Intercultural Studies, Hong Kong Baptist University, who supplied the screenshot below of another CAT tool, Memsource, at work.  

CAT tools allow the translators to import and apply a glossary which the translator has built from this or a previous translation, so that the right names of people and places appear automatically. I created a glossary (left) for this paragraph in which I chose to use 'bean curd' instead of 'tofu' in the second segment. Memsource, like SmartCat, has produced an MT translation, in this case, post-edited. It now offers me the new translation for tofu, highlighted in yellow, 'bean curd'. All I need to do is accept it



By the same token, in segment 11, I will be offered Lucky (the name of a dog) instead of 'luck' because Lucky is the name I entered in my glossary.

And, finally, here is the entirely human-translated final version of the paragraph. You can read more in The Shaanxi Opera, translated by Nicky Harman and Dylan Levi King, forthcoming, 2023. 

....For men on their own like me, New Year was no fun. It didn’t matter where you went, you didn’t fit in. When Forest Wu and I were making tofu, he asked if I had any plans. I knew he was hinting he wanted to come to my place, but I didn’t answer. I was better off alone than with him. He couldn’t talk properly, and smelled bad, too.I shut my gate behind me, and early that evening I fried a dish of pork, steamed a plate of tofu, and boiled some rice. As I carried my bowl to the table, I thought of Mom and Dad. “I’m eating this for you!” I said as I took the first bite. Of course, my next thought was of Snow Bai. “Snow Bai,” I said, “this bite is for you!” The third mouthful was for Justice Xia. But after that, I had no one else to eat for. I tried to think of someone who was worthy, maybe Tongue-Tied, or the Earth God and Goddess, or Second Aunt or Fourth Aunt, Pavilion or Big-Noise Zhao. And then, of course, there were the trees: the tree in my yard, the gingko tree at the Great Qing Temple, the tree that had grown out of the stick we stuck in the ground in Seven Li Gully, Wisdom Xia’s tickle tree, and all the other trees in Freshwind. Or Lucky, maybe? I should include Lucky. And the donkey at the dye-house, the tabby cat at the House of Treasures, and the peony bush at Wisdom Xia’s. And . . . and . . . How could I have forgotten the stones? All the boulders in Seven Li Gully! ...

I’ll conclude with my personal views of the pros and cons of using a CAT tool for a literary work in Chinese. Pros: The glossary, or ‘termbase’, could be very useful. It would automatically insert into the draft translation all the terms and names as agreed by the translators as they worked their way through the book and populated it with terms. Cons: These apps come as a package. You can’t use the termbase independently of the TM and MT functions, and you have to buy the package and import the novel you are translating into the app before you start work. CAT tools segment per sentence. This makes it difficult for the literary translator to combine or break up, or even move sentences around (as is normal practice). Inhibiting! Even though some re-structuring can be done at the post-editing stage. And, as you can see, the resulting translation is not up to the required literary standard.

 Personally, I have used MT (Google Translate and DeepL) on occasions, to produce a first draft. It somehow feels less tiring, and is useful for inputting proper names. But I have rarely, if ever, accepted a sentence, or even half of one, exactly as the MT produced it. I have re-written almost every word of it. Youdale comments that whole novels have been translated with CAT tools and finished, or post-edited, by a human translator. He does not mention whether anyone has tried translating a Chinese novel this way, but I doubt it. For the moment, and for my work, I’m suspending my judgment on CAT tools.