Issue |
E3S Web Conf.
Volume 533, 2024
XXVII International Scientific Conference on Advance in Civil Engineering “Construction the Formation of Living Environment” (FORM-2024)
|
|
---|---|---|
Article Number | 03010 | |
Number of page(s) | 11 | |
Section | Modelling and Mechanics of Building Structures | |
DOI | https://doi.org/10.1051/e3sconf/202453303010 | |
Published online | 07 June 2024 |
Bayesian Belief Networks to handle NLP problems
Moscow State University of Civil Engineering, 26, Yaroslavskoye shosse, Moscow, 129337, Russia
* Corresponding author: sakan@mgsu.ru
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. If rule –based algorithms are extremely complicated and expensive because they require a lot of rules to be taken into account, the stochastic algorithms seem to be more appropriate. POS tagging is the first step for named entities tagging, which is important for understanding the semantics of text. Recently, many deep learning models for POS tagging have emerged. Most of them are based on supervised learning and require a lot of processing power and time to obtain weights that allow you to get the right results for new data. Is it possible to use another probabilistic model for these purposes without training and on small data? We believe Bayesian Belief Networks could be such a model.
Key words: Natural language processing / Bayesian Belief Networks / POS tagging / n-gram models / maximum likelihood estimation
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.