REPOZYTORIUM UNIWERSYTETU
W BIAŁYMSTOKU
UwB

Proszę używać tego identyfikatora do cytowań lub wstaw link do tej pozycji: http://hdl.handle.net/11320/7504
Tytuł: Combined Machine-Learning Approach to PoS-Tagging of Middle English Corpora
Autorzy: Karimov, Raoul
Słowa kluczowe: Instance-Based Learning
Corpus
Middle English
PoS-Tagging
Moving Average
Data wydania: 2018
Data dodania: 23-sty-2019
Wydawca: The University of Bialystok
Źródło: Crossroads. A Journal of English Studies 21 (2/2018), pp. 42-52
Abstrakt: This paper considers the problem of part-of-speech tagging in Middle English corpora (as well as historical corpora in general). Whereas PoS-tagging in general is now considered a solved problem for Modern English and is mainly achieved via hidden Markov models (HMM) and matrix-based word-to-vector conversions with every word in the dictionary being embedded into a single dimension, this approach relies on recurrent syntactic structures and context-free generative grammars and is therefore not applicable to older iterations of the English language due to irregular word order. As such, we believe that Middle English could be better handled by a morphographemic encoding and instance-based machine learning algorithms like SVM, random forests, kNN, etc. Using a moving-average method to generate multidimensional vectors giving a reliable numeric representation of character composition and sequences, we have achieved a precision and recall of 87.5% in classifying Middle English words by their part of speech while using a simplistic combined voting-based binary classifier. This result could be deemed satisfactory and encourages further research in the area.
Afiliacja: Chelyabinsk State University
Nota biograficzna: Raoul Karimov, born October 16, 1993 in Chelyabinsk, Russia; graduated from Chelyabinsk State University as a Bachelor of Linguistics in 2014 and as a Master of Linguistics in 2016; currently a PhD student at Chelyabinsk State University. He has completed a Summer School of German Language and Cross-Cultural Communication in 2013 at the University of Bremen, Germany; studied for one year (2017–2018) at the University of Bergen, Norway, under the Russian-Norwegian Study Grants Program. His research interests are: corpus linguistics, applied linguistics, old Germanic languages, and machine learning.
E-mail: raoul.karimov@hotmail.com
URI: http://hdl.handle.net/11320/7504
DOI: 10.15290/cr.2018.21.2.04
e-ISSN: 2300-6250
Typ Dokumentu: Article
Występuje w kolekcji(ach):Crossroads. A Journal of English Studies, 2018, Issue 21

Pliki w tej pozycji:
Plik Opis RozmiarFormat 
Crossroads_21_2018_R_Karimov_Combined_Machine-Learning_Approach.pdf441,79 kBAdobe PDFOtwórz
Pokaż pełny widok rekordu Zobacz statystyki


Pozycja jest chroniona prawem autorskim (Copyright © Wszelkie prawa zastrzeżone)