Distinguishing translations from non-translations and identifying (in)direct translations’ source languages
Ivaska Laura
https://urn.fi/URN:NBN:fi-fe2021042825020
Tiivistelmä
The scope of this study is threefold. First, machine learning will be applied to
distinguish translated from non-translated Finnish texts. Then, it will attempt to
identify the source languages of the translated Finnish texts. Finally, the source
language identification will be tested with indirect translations, that is, with
translations made from translations. The three underlying research questions are: 1)
Can translated Finnish be distinguished from non-translated Finnish? 2) Can the
source languages of Finnish translations be identified? 3) If the answer to question
2 is yes, then what happens when the method is applied to indirect translations; will
the analysis identify the ultimate source language, the mediating language, or
neither?
This study is based on the hypothesis that translated language contains traces
of the source language (Toury 1995). The corpus of the study consists of nontranslated
Finnish prose, Finnish prose literature translations made from English,
German, French, Modern Greek, and Swedish, as well as indirect translations from
Modern Greek into Finnish via English, German, French, and Swedish. The
analyses are based on cluster analysis and support vector machines using the
frequencies of the most frequent lemmatized words.
Results show that translated and non-translated Finnish can be distinguished
by using machine learning techniques. Support vector machine-based source
language identification, however, was only partially successful, while a cluster
analysis suggested that there is coherence within a group of texts translated from
the same source language and variation between the groups of texts with different
source languages. Clustering was further tested with indirect translations, and the
results were mixed: six of the thirteen tested indirect translations clustered with
direct translations from the ultimate source language, two with translations from
their mediating languages, and five with neither.
Kokoelmat
- Rinnakkaistallenteet [19207]