Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora

Pole DC	Wartość	Język
dc.contributor.author	Karimov, Raoul	-
dc.date.accessioned	2019-01-23T11:33:36Z	-
dc.date.available	2019-01-23T11:33:36Z	-
dc.date.issued	2018	-
dc.identifier.citation	Crossroads. A Journal of English Studies 21 (2/2018), pp. 42-52	pl
dc.identifier.uri	http://hdl.handle.net/11320/7504	-
dc.description.abstract	This paper considers the problem of part-of-speech tagging in Middle English corpora (as well as historical corpora in general). Whereas PoS-tagging in general is now considered a solved problem for Modern English and is mainly achieved via hidden Markov models (HMM) and matrix-based word-to-vector conversions with every word in the dictionary being embedded into a single dimension, this approach relies on recurrent syntactic structures and context-free generative grammars and is therefore not applicable to older iterations of the English language due to irregular word order. As such, we believe that Middle English could be better handled by a morphographemic encoding and instance-based machine learning algorithms like SVM, random forests, kNN, etc. Using a moving-average method to generate multidimensional vectors giving a reliable numeric representation of character composition and sequences, we have achieved a precision and recall of 87.5% in classifying Middle English words by their part of speech while using a simplistic combined voting-based binary classifier. This result could be deemed satisfactory and encourages further research in the area.	pl
dc.language.iso	en	pl
dc.publisher	The University of Bialystok	pl
dc.subject	Instance-Based Learning	pl
dc.subject	Corpus	pl
dc.subject	Middle English	pl
dc.subject	PoS-Tagging	pl
dc.subject	Moving Average	pl
dc.title	Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora	pl
dc.type	Article	pl
dc.identifier.doi	10.15290/cr.2018.21.2.04	-
dc.description.Email	raoul.karimov@hotmail.com	pl
dc.description.Biographicalnote	Raoul Karimov, born October 16, 1993 in Chelyabinsk, Russia; graduated from Chelyabinsk State University as a Bachelor of Linguistics in 2014 and as a Master of Linguistics in 2016; currently a PhD student at Chelyabinsk State University. He has completed a Summer School of German Language and Cross-Cultural Communication in 2013 at the University of Bremen, Germany; studied for one year (2017–2018) at the University of Bergen, Norway, under the Russian-Norwegian Study Grants Program. His research interests are: corpus linguistics, applied linguistics, old Germanic languages, and machine learning.	pl
dc.description.Affiliation	Chelyabinsk State University	pl
dc.description.references	Aha, David W., Kibler, Dennis, Albert, Marc K. 1991. Instance-based learning algorithms. Machine Learning 6-1, 37-66.	pl
dc.description.references	Beesley, Kenneth R., Karttunen, Lauri. 2004. Finite-State Morphology. Journal of Computational Linguistics 30-2, 237-249.	pl
dc.description.references	Breiman, Leo. 2001. Random Forests. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf (19 April, 2018).	pl
dc.description.references	Christianini, Nello, Shawe-Taylor, John. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press.	pl
dc.description.references	Frank, Eibe, Witten, Ian H. 2016. Data Mining: Practical Machine Learning Tools and Techniques. Burlington: Morgan Kaufmann.	pl
dc.description.references	Ilyish, Boris A. 1968. History of the English Language. Moscow: Vysshaya Shkola.	pl
dc.description.references	Jędrzejowicz, Piotr, Strychowski, Jakub A. 2005. Neural Network Based Morphological Analyser of the Natural Language. Intelligent Information Processing and Web Mining. Advances in Soft Computing 31, 199–208.	pl
dc.description.references	Jurafsky, Dan, Martin, James H. 2008. Speech and Language Processing. New Jersey: Prentice Hall.	pl
dc.description.references	Malouf, Robert. 2016. Generating morphological paradigms with a recurrent neural network. San Diego Linguistic Papers 6, 122–129.	pl
dc.description.references	Mayhew, Anthony L, Skeat, Walter.1888. A Concise Dictionary of Middle English From A.D. 1150 to 1580. Oxford: Clarendon Press.	pl
dc.description.references	Seyed, Hamid H., Mahdi, Samanipour. 2015. Prediction of Final Concentrate Grade Using Artificial Neural Networks from Gol-E-Gohar Iron Ore Plant. American Journal of Mining and Metallurgy 3-3, 58-62.	pl
dc.description.references	Takala, Pyry. 2016. Word Embeddings for Morphologically Rich Languages. Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 177-182.	pl
dc.description.references	Teijiro, Isokawa, Naruhiko, Nishimura, Nobuyuki, Matsui. 2012. Quaternionic Multilayer Perceptron with Local Analyticity. Information 3, 756-770.	pl
dc.description.references	Web 1 – Helsinki Corpus of English Texts.www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus (4 April, 2018).	pl
dc.identifier.eissn	2300-6250	-
dc.description.issue	21 (2/2018)	-
dc.description.firstpage	42	pl
dc.description.lastpage	52	pl
dc.identifier.citation2	Crossroads. A Journal of English Studies	pl
Występuje w kolekcji(ach):	Crossroads. A Journal of English Studies, 2018, Issue 21

Pliki w tej pozycji:

Plik	Opis	Rozmiar	Format
Crossroads_21_2018_R_Karimov_Combined_Machine-Learning_Approach.pdf		441,79 kB	Adobe PDF	Otwórz

Pokaż uproszczony widok rekordu Zobacz statystyki