REPOZYTORIUM UNIWERSYTETU
W BIAŁYMSTOKU
UwB

Proszę używać tego identyfikatora do cytowań lub wstaw link do tej pozycji: http://hdl.handle.net/11320/14262
Pełny rekord metadanych
Pole DCWartośćJęzyk
dc.contributor.authorWięcławska, Edyta-
dc.date.accessioned2022-12-30T09:08:06Z-
dc.date.available2022-12-30T09:08:06Z-
dc.date.issued2022-
dc.identifier.citationBiałostockie Studia Prawnicze, Vol. 27 nr 4, 2022, s. 229-252pl
dc.identifier.issn1689-7404-
dc.identifier.urihttp://hdl.handle.net/11320/14262-
dc.description.abstractLegal discourse shows variation most commonly in terms of contrasts between languages, textual genres, communicative settings (professional vs. lay communication), translation methods and categories of authors, the last constituting a testing ground for the text-prediction task presented in this article. The research project involves quantitative analysis of selected discrete units and their statistical processing with the R tool for the purpose of generating random forest and decision tree models. It is hypothesised that it is possible to effectively predict text authorship based on the grammatical profile of the texts. The prediction model proposed here covers two authorship categories, institutional name and professional title, and these encapsulate authorship sub-categories related to institutional and work position background. The prediction accuracy parameters for the authorship-based text classification in both cases prove to be statistically satisfactory. More specific findings show that the text classification models for some authorship sub-categories are more effective than for others. Further, some discrete units have distinctively high discriminative power for the texts. The analysis is conducted on a customdesigned corpus, composed of English texts processed in company registration proceedings. The corpus is homogenous in terms of the function and the communicative context of the texts, which assures reliability of the findings and at the same time captures the variationist aspect of legal communication by taking the varied authorship factor into account.pl
dc.language.isoenpl
dc.publisherFaculty of Law, University of Białystok; Temida 2pl
dc.rightsUznanie autorstwa-Użycie niekomercyjne-Bez utworów zależnych 4.0-
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/-
dc.subjectauthorship factorpl
dc.subjectdecision treepl
dc.subjectlegal discoursepl
dc.subjectpredictive analysispl
dc.subjectrandom forestpl
dc.subjecttext classificationpl
dc.titlePredictive Analysis for Text Classification: Discrete Units in Company Registration Discoursepl
dc.typeArticlepl
dc.rights.holder© 2022 Edyta Więcławska published by Sciendo. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.pl
dc.identifier.doi10.15290/bsp.2022.27.04.14-
dc.description.Emailedytawieclawska@poczta.fmpl
dc.description.BiographicalnoteEdyta Więcławska – PhD, Assistant Professor at the Department of Specialized Languages of the University of Rzeszow, Polandpl
dc.description.AffiliationUniversity of Rzeszów, Polandpl
dc.description.referencesAijmer K., Parallel and Comparable Corpora, (in:) A. Lüdeling, M. Kytö (eds.), Corpus Linguistics: An International Handbook, Berlin/New York 2009, pp. 275–291.pl
dc.description.referencesBaayen H., van Halteren H., Neijt A., Tweedie E., An Experiment in Authorship Attribution, (in:) Proceedings of JADT 2002, St. Malo 2002, pp. 29–37.pl
dc.description.referencesBaayen H., van Halteren H., Tweedie F., Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution, ‘Literary and Linguistic Computing’ 1996, vol. 1, no. 13, pp. 121–131.pl
dc.description.referencesBhargava M., Mehndiratta P., Asawa K., Stylometric Analysis for Authorship Attribution on Twitter, (in:) V. Bhatnagar, S. Srinivasa (eds.), Big Data Analytics. Second International Conference, BDA 2013 Mysore, India, December 2013 Proceedings. New York/Dordrecht/London 2013, pp. 37–47.pl
dc.description.referencesBhatia V.K., Critical Genre Analysis: Investigating Interdiscursive Performance in Professional Practice, New York 2017.pl
dc.description.referencesBiel Ł., Lost in the Eurofog: The Textual Fit of Translated Law, Berlin 2014.pl
dc.description.referencesBiel Ł., Phraseological Profiles of Legislative Genres: Complex Prepositions as a Special Case of Legal Phrasemes in EU Law and National Law, ‘Fachsprache’ 2015, vol. 37, no. 3–4, pp. 139–160.pl
dc.description.referencesChaski C.E., Who’s at the Keyboard? Authorship Attribution in Digital Evidence Investigations, ‘International Journal of Digital Evidence’ 2005, vol. 4, no. 1, pp. 1–13.pl
dc.description.referencesCordeiro S., Villavicencio A., Idiart M., Ramisch C., Unsupervised Compositionality Prediction of Nominal Compounds, ‘Computational Linguistics’ 2019, vol. 45, no. 1, pp. 1–57.pl
dc.description.referencesCoyotl-Morales R.M., Villaseñor-Pineda L., Montes-y-Gómez M., Rosso P., Authorship Attribution Using Words Sequences, (in:) J.F. Martínez-Trinidad, J.A. Carrasco-Ochoa, J. Kittler (eds.), Progress in Pattern Recognition, Image Analysis and Applications, New York/Dordrecht/London 2006, pp. 844–853.pl
dc.description.referencesFukumoto F., Suzuki Y., Manipulating Large Corpora for Text Classification, (in:) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia 2002, pp. 196–203.pl
dc.description.referencesGotti M., Investigating Specialised Discourse, Bern 2005.pl
dc.description.referencesGoźdź-Roszkowski S., Patterns in Linguistic Variation in American Legal English, Frankfurt am Main 2011.pl
dc.description.referencesGrant T.D., Quantitative Evidence for Forensic Authorship Analysis, ‘International Journal of Speech Language and the Law’ 2007, vol. 14, no. 1, pp. 1–25.pl
dc.description.referencesHalteren H. van, Author Verification by Linguistic Profiling: An Exploration of the Parameter Space, ‘ACM Transactions on Speech and Language Processing’ 2007, vol. 4, no. 1, pp. 1‒17.pl
dc.description.referencesKim S., Kim H., Weninger T., Han J., Kim H.D., Authorship Classification: A Discriminative Syntactic Tree Mining Approach, (in:) Proceedings of the ACM SIGIR, July 24–28, Beijing 2011, pp. 455–464.pl
dc.description.referencesLapshinova-Koltunski E., Variation in Translation: Evidence from Corpora, (in:) C. Fantinuoli, F. Zanettin (eds.), New Directions in Corpus-based Translation Studies, Berlin 2015, pp. 93–114.pl
dc.description.referencesLapshinova-Koltunski E., VARTRA: A Comparable Corpus for Analysis of Translation Variation, (in:) Proceedings of 6th Workshop on Building and Using Comparable Corpora. Association for Computational Linguistics, Sofia 2013, pp. 77–86.pl
dc.description.referencesLapshinova-Koltunski E., Zampieri M., Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective, (in:) D. Legallois, T. Charnois, M. Larjavaara (eds.), The Grammar of Genres and Styles: From Discrete to Non-Discrete Units, Berlin 2018, pp. 92‒117.pl
dc.description.referencesLehmberg T., Wörner K., Annotation Standards, (in:) A. Lüdeling, M. Kytö (eds.), Corpus Linguistics: An International Handbook, Berlin/New York 2009, pp. 484–501.pl
dc.description.referencesLevshina N., How to Do Linguistics with R. Data Exploration and Statistical Analysis, Amsterdam/Philadelphia 2015.pl
dc.description.referencesLongerée D., Mellet S., Towards a Topological Grammar of Genres and Styles: A Way to Combine Paradigmatic Quantitative Analysis with a Syntagmatic Approach, (in:) D. Legallois, T. Charnois, M. Larjavaara (eds.), The Grammar of Genres and Styles: From Discrete to Non-Discrete Units, Berlin 2018, pp. 140–163.pl
dc.description.referencesNirkhi S., Dharaskar R.V., Comparative Study of Authorship Identification Techniques for Cyber Forensic Analysis, ‘International Journal of Advanced Computer Science and Applications’ 2013, vol. 4, no. 5, pp. 32–35.pl
dc.description.referencesNirkhi S., Dharaskar R.V., Thakare V.M., Authorship Verification of Online Messages for Forensic Investigation, ‘Procedia Computer Science’ 2016, vol. 78, pp. 640–645.pl
dc.description.referencesSchmidt H., Tokenizing and Part-of-speech Tagging, (in:) A. Lüdeling, M. Kytö (eds.), Corpus Linguistics: An International Handbook, Berlin/New York 2009, pp. 527–552.pl
dc.description.referencesSprugnoli R., Tonelli S., Novel Event Detection and Classification for Historical Texts, ‘Computational Linguistics’ 2019, vol. 45, no. 2, pp. 229–265.pl
dc.description.referencesStamatatos E., A Survey of Modern Authorship Attribution Methods, ‘Journal of the American Society for Information Science and Technology’ 2009, vol. 60, no. 3, pp. 538–556.pl
dc.description.referencesStamatatos E., Fakotakis N., Kokkinakis G., Automatic Text Categorisation in Terms of Genre and Author, ‘Computational Linguistics’ 2000, vol. 26, no. 4, pp. 471–495.pl
dc.description.referencesStein B., Meyer zu Eissen S., Intrinsic Plagiarism Analysis with Meta Learning, (in:) Proceedings of the SIGIR Workshop on Plagiarism Analysis, Authorship Attribution, and Near-Duplicate Detection, Amsterdam 2007, pp. 45–50.pl
dc.description.referencesWięcławska E., Discrete Units as Markers of English: Polish Contrasts in Company Registration Discourse. ‘Linguodidactica’ 2020, vol. 24, pp. 309–327.pl
dc.description.referencesWięcławska E., English/Polish Contrasts in Legal Language from the Usage-based Perspective, (in:) L. Lanthaler, R. Lukenda (eds.), Redefining and Refocusing Translation and Interpreting Studies: Selected Articles from the 3rd International Conference on Translation and Interpreting Studies TRANSLATA III (Innsbruck 2017), Berlin 2020, pp. 99–104.pl
dc.description.referencesWięcławska E., Quantitative Distribution of Verbal Structures with Reference to the Authorship Factor in Legal Stylistics, ‘Studies in Logic, Grammar and Rhetoric’ 2021, vol. 66, no. 79, pp. 147‒165.pl
dc.description.referencesWięcławska E., Sociolinguistic and Grammatical Aspects of English Company Registration Discourse, ‘Humanities and Social Sciences’ 2019, vol. 26, no. 4, pp. 185–195.pl
dc.description.referencesWilliams C., Tradition and Change in Legal English, Bern 2005.pl
dc.identifier.eissn2719-9452-
dc.description.volume27pl
dc.description.number4pl
dc.description.firstpage229pl
dc.description.lastpage252pl
dc.identifier.citation2Białostockie Studia Prawniczepl
dc.identifier.orcid0000-0003-0798-1940-
Występuje w kolekcji(ach):Białostockie Studia Prawnicze, 2022, Vol. 27 nr 4

Pliki w tej pozycji:
Plik Opis RozmiarFormat 
BSP_27_4_E_Wieclawska_Predictive_Analysis_for_Text_Classification.pdf819,1 kBAdobe PDFOtwórz
Pokaż uproszczony widok rekordu Zobacz statystyki


Pozycja ta dostępna jest na podstawie licencji Licencja Creative Commons CCL Creative Commons