REPOZYTORIUM UNIWERSYTETU
W BIAŁYMSTOKU
UwB

Proszę używać tego identyfikatora do cytowań lub wstaw link do tej pozycji: http://hdl.handle.net/11320/14262
Tytuł: Predictive Analysis for Text Classification: Discrete Units in Company Registration Discourse
Autorzy: Więcławska, Edyta
Słowa kluczowe: authorship factor
decision tree
legal discourse
predictive analysis
random forest
text classification
Data wydania: 2022
Data dodania: 30-gru-2022
Wydawca: Faculty of Law, University of Białystok; Temida 2
Źródło: Białostockie Studia Prawnicze, Vol. 27 nr 4, 2022, s. 229-252
Abstrakt: Legal discourse shows variation most commonly in terms of contrasts between languages, textual genres, communicative settings (professional vs. lay communication), translation methods and categories of authors, the last constituting a testing ground for the text-prediction task presented in this article. The research project involves quantitative analysis of selected discrete units and their statistical processing with the R tool for the purpose of generating random forest and decision tree models. It is hypothesised that it is possible to effectively predict text authorship based on the grammatical profile of the texts. The prediction model proposed here covers two authorship categories, institutional name and professional title, and these encapsulate authorship sub-categories related to institutional and work position background. The prediction accuracy parameters for the authorship-based text classification in both cases prove to be statistically satisfactory. More specific findings show that the text classification models for some authorship sub-categories are more effective than for others. Further, some discrete units have distinctively high discriminative power for the texts. The analysis is conducted on a customdesigned corpus, composed of English texts processed in company registration proceedings. The corpus is homogenous in terms of the function and the communicative context of the texts, which assures reliability of the findings and at the same time captures the variationist aspect of legal communication by taking the varied authorship factor into account.
Afiliacja: University of Rzeszów, Poland
Nota biograficzna: Edyta Więcławska – PhD, Assistant Professor at the Department of Specialized Languages of the University of Rzeszow, Poland
E-mail: edytawieclawska@poczta.fm
URI: http://hdl.handle.net/11320/14262
DOI: 10.15290/bsp.2022.27.04.14
ISSN: 1689-7404
e-ISSN: 2719-9452
metadata.dc.identifier.orcid: 0000-0003-0798-1940
Typ Dokumentu: Article
metadata.dc.rights.uri: https://creativecommons.org/licenses/by-nc-nd/4.0/
Właściciel praw: © 2022 Edyta Więcławska published by Sciendo. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
Występuje w kolekcji(ach):Białostockie Studia Prawnicze, 2022, Vol. 27 nr 4

Pliki w tej pozycji:
Plik Opis RozmiarFormat 
BSP_27_4_E_Wieclawska_Predictive_Analysis_for_Text_Classification.pdf819,1 kBAdobe PDFOtwórz
Pokaż pełny widok rekordu Zobacz statystyki


Pozycja ta dostępna jest na podstawie licencji Licencja Creative Commons CCL Creative Commons