Semantic knowledge networks in education

. The article is devoted to the modeling a semantic knowledge networks. The knowledge network is the basic concept of the problem of knowledge management. This is a new discipline that implements the principles of sustainable development of education. The method of constructing a semantic knowledge network allows us to analyze the connections between educational disciplines: “Economic Cybernetics”, “Algorithms and Programming” and “Calculus”. The paper compares the topological characteristics of the concept graphs related to various disciplines. We develop the algorithm to implement the subject area model in the form of a semantic knowledge network. 125 concepts are analyzed that provide optimal mastering disciplines and establish the connection between them.

The epidemics, the destruction of the natural environment and climate change, the depletion of material and energy resources, the population explosion and lack of food, as well as the civilization crisis as a whole, are complex interdisciplinary problems of the mankind.The need to resolve them leads to the emergence of areas of science that are characterized by convergence of methods and interdisciplinary approaches.Suprasectoral technologies (information, cognitive, nano-, bio-, social technologies) are currently being actively developed, which contribute to the emergence of new branches of science and serve as a new methodological basis for the nature study [7][8][9].Such interdisciplinary scientific fields lead to new directions in science such as risk management, sustainable development, new nature management, etc. Quality of professional training students in the modern sense is determined by their willingness and ability to use the acquired professional competencies to solve not only professional tasks, but also multidisciplinary problems that may contribute to sustainable development at the level of the country, region and the world as a whole.This implies updating the content and methods of professional training of specialists at a modern university taking into account the requirements of interdisciplinary integration and the implementation of sustainable development ideas [10][11][12][13][14]. Interdisciplinary integration in higher education institutions has to be an important component of introducing sustainable development ideas into the training of modern specialists.The problems of sustainable development itself are multidisciplinary.Such integration will solve the significant contradictions of education, namely the contradiction between the vast knowledge and limited human possibilities.The optimal combination of computer science and other academic disciplines within the same topic will provide conditions for a significant increase of the level of the educational process.
In [15] concluded that students have a large non-used potential to understand more deeply the nature of science and acquire the knowledge important for their future lives and work.
Recently, a lot of talk has been going on about the transition to a knowledge-based society.Knowledge management systems are being developed, and the knowledge management specialists are working in large corporations.Unfortunately, in the discussions of this topic higher education is not considered [16,17].It is unacceptable because the knowledge is created, systematized and accumulated within the universities and then it is passed on to the next generation of people.
The learning process is the management of the process of student's knowledge accumulation and systematization.Only a few researchers focus their attention on this fact [18][19][20].An automated learning environment, built on the basis of semantic knowledge networks, is capable to a large extent of solving the wide range of knowledge management tasks in a university.A feature of the modern stage in the development of educational systems is the necessity of expending the use of formal methods for presenting knowledge and organizing the learning process.These trends are based on the use of the achievements of cybernetics, synergetic, and the theory of artificial intelligence.Many objects of cognitive science research should be described, as a network.Over the past two decades, many studies have focused on the network science methodology as an extensive scientific field of studying complex systems (for example, [21][22][23][24]).Complex systems contain several components that interact with each other, producing complex behaviour.Such a complex system is the human brain and the cognitive processes taking place in it.These processes provide memory and language (for example, [25][26][27][28][29][30][31][32]).Network science is based on mathematical graph theory and contains powerful quantitative methods for researching systems, such as networks (for example, [33]).
At this stage in the development of the education system, the priority is to find ways to improve the learning process, its content and structure.Receiving a fundamental and holistic education can be only as result of the learning process at the level of new quality.In this case the content of various disciplines should reflect the logic and structure of knowledge ties between disciplines.In the absence of intersubjective communications, the knowledge will be fragmentary, unsystematic.Cognitive networks are not only a tool for cognition, but can also a basis for controlling student's knowledge.

Analysis of previous studies
In different historical periods, many variants of semantic knowledge networks that take into account the specifics of intellectual activity have been created.In the "precomputer era" the prototype of semantic knowledge networks was used to formalize logical reasoning.At the beginning of the twentieth century, in psychology, graphs were first used to represent hierarchies of concepts and inherit properties, model human memory and intellectual activity.In the early 1960-s the first machine implementations of semantic networks were made.In one of the first practically significant systems [34], 100 primitive types of concepts were introduced to solve the automatic translation problem.Dictionary of 15 000 concepts was defined.
At present, semantic knowledge networks are widely used in solving many different problems, in particular when building knowledge bases, in problems of machine translation and processing of text in a natural language.Due to the wide range of use of such graphs, there is a need for their refinement -an increase in the number of nodes and an increase in the connectivity between them.
Actual modern studies are devoted to the use of semantic networks in the field of education.For example, in the work [35] the interdisciplinary of applied mathematics is quantitatively analyzed by using statistical and network methods on the corpus PNAS 1999-2013.In article [36] discusses the potential Semantic Web for teacher education.
The paper [37] presents a theoretical method for the integration of semantic knowledge network utilization into the classroom.This paper will also introduce insights from Cognitive Linguistics as to how the brain best learns vocabulary.The method in this paper springs from the fields of psychology and neuroscience as well as inspiration from educators who are building new teaching styles.The purpose of the method detailed in this paper is to inspire other educators to incorporate cognitive linguistic insights into their classes as well as further the discourse on integrating this field into the teaching of English as a second or foreign language.
Authors [38] formulate recipe recommendations using ingredient networks.Researchers have shown how information about cooking can be used to glean insights about regional preference sand modifiability of individual ingredients, and also how it can be used to construct two kinds of networks, one of ingredient complements, the other of ingredient substitutes.These networks encode which ingredients go well together, and which can be substituted to obtain superior results, and permit one to predict, given a pair of related recipes, which one will be more highly rated by users.
With the traditional method of constructing a semantic knowledge network, its formation is carried out manually, which requires significant labour costs.Such networks contain a small number of nodes; nevertheless, they have an important advantage -their nodes and connections are checked manually and are correct.An alternative approach is the automatic construction of a semantic network based on an external source generated by Internet users [39].A striking example of such a source is the Wiktionary [40].
Thus, all of these works are devoted to the integration of semantic knowledge networks in teaching.The increasing information volumes of the educational material of the disciplines dictate the need to use cognitive modelling to solve complex problems of training and teaching.

Theoretical framework
There are various ways of representing knowledge, in particular, such visual methods for describing knowledge in the subject field: semantic networks, graphs of conceptual dependencies, scripts, frames, conceptual graphics and ontology.Let's determine the definitions that are important for this work: "semantic knowledge network", "semantic network", "network model", "cognitive map", "cognitive network", "cognitive scheme".The connection diagram of these concepts is shown on Figure 1.Cognitive maps are a concept from cognitive psychology and were first introduced by Tolman.A cognitive map is an active, information-seeking structure.
In our work, the concepts of "semantic knowledge network" and "semantic network" are equated based on their proximity.
In cognitive science the network is one of the most common types of information models.Typically, a network consists of two components -nodes as network elements and edges, reflecting the interaction between the elements.Using these simple components, you can describe a wide range of objects of different nature and complexity.The network models are based on the concept of network.In such models, all relations are explicitly highlighted.These relations constitute the framework of knowledge of the subject area, the model of which must be created.This class of models includes semantic networks, functional networks, and frames (frame representation).
Although the terminology and structure are different there are similarities inherent in almost all semantic networks: -different nodes of one concept belong to different values, if not it is marked that they relate to one concept; -edges of semantic networks create relationships between concept nodes (marks above arcs indicate the type of relationship); -relations between concepts can be linguistic cases, such as "agent", "object", "recipient" and "instrument" (others mean temporal, spatial, logical relations); -the concepts are organized by level in accordance with the degree of generalization.
An associative approach to knowledge representation defines an object value in terms of its connections (associations) with other objects.Thus, when a person perceives an object and discusses about it, in this time a perceived object is mapped into a certain concept (Fig. 2).This concept is part of the general knowledge of the world, so it is connected by various associations with other concepts.Associations define properties and behaviour of the perceived object.Graphs are best suited for explicitly expressing associations between different concepts.Thus, in the form of a semantic network, knowledge of the world is expressed.A semantic knowledge network is a marked graph in which nodes correspond to certain facts or general concepts, and edges mean relationships or associations between different facts or concepts (Fig. 3).
In each academic discipline (in every science) the number of concepts reflecting the knowledge of this discipline (this science) is finite.There are a number of words that need to be conveyed to the audience.The number of these words is not infinite, because time for their transfer is limited.Textbooks establish linear links between concepts.A normalized description of knowledge networks can be formulated as follows.The body of knowledge of the studied discipline is a system (S).The elementary component that is part of S is a word that reflects a certain concept.With the help of words, all the concepts that make up the S system are recorded.Links between the concepts are established using the grammatical rules of a particular language.With respect to each concept from S, there is a primary sentence that contains its definition.The totality of such definitions forms an invariant kernel S, which ensures the unambiguity of the perception of knowledge within a particular academic discipline.The invariant core of the discipline uses words from other areas of knowledge to determine its concepts.All concepts from S are divided into main and auxiliary.The basic concepts include specific concepts of this particular discipline, which are the subject of its definition and study.Supporting concepts include concepts borrowed from other areas of knowledge that are not studied in this discipline, but are used to determine the content of basic concepts.Many of the basic concepts of a particular discipline, together with the internal relationships between them, form a hierarchically ordered network of knowledge, the nodes of which are the identifiers of the basic concepts.
Thus, the knowledge system can be represented in the form of a hierarchical directed graph -a semantic knowledge network.
The semantic knowledge network building algorithm involves several steps: (1) Writing all the basic terms of the subject area and formulate their definitions (composing the thesaurus of the subject area).
(2) Selecting the terms from the list that appear in the definition of the other terms listed in step 1.
(3) At the lower (I) level, arranging the terms in the definition of which the terms from the list are not used.
(4) At the next (II) level, arranging the terms in the definition of which the terms of level I are used.
(5) At the III level -terms in the definition of which the terms of I and II levels are used, etc. ( 6) At the last level, arranging terms that are not used in the definition of other terms.
Visualization of data in a structural network model is the first step, but the strength of the method lies in the ability to extract important knowledge about the system through a statistical analysis of the network topology.It seems that topology bears an evolutionary imprint and functional [42].A detailed analysis of the available metrics can be found, for example, in [43].Consider just a few metrics often used in cognitive model research.
Let us consider in detail the network structure.A network consists of nodes and links between them, edges.Nodes are more or less stable entities that do not change over time.
Edges represent relationships, interactions, transactions, or any other temporary connections that occur between nodes over a certain period of the time.Edges represent connections between them: friendships, proximity, transactions, exchanges and any other temporary connections between stable objects that occur with a certain frequency.
Edges are important to network analysis because they represent the connectivity basis that will be using to get insights about the complexity network.In a graph database, the relationships between the data are just as important as the data itself.
Giant component is an important notion in network analysis.It's an interconnected constellation that includes most of the nodes in a network.
Clusters are the constellations of nodes that are more densely connected together than with the rest of the nodes in the network.Clusters represent different sub networks within a network and can be used to identify various subcategories that are present within.
In modern network theory, the number of node connections (in the theory of graphs, nodes and nodes are edges and vertices of a graph, respectively) is called a degree.A node's degree indicates how many connections it has to the other nodes in the network.The more degree a node has, the more "connected" it is, which indicates its relative influence in the network.
The concept of degree is a local characteristic of a graph.A nonlocal, integral network structure is defined by two concepts -a path and a loop or cycle.A path is a sequential sequence of adjacent nodes and the links between these nodes when the nodes do not repeat.A loop or cycle is a path when the start and end nodes coincide.Networks without loops are trees.The number of nodes (N) (network size) and the number of links (L) are related as N = L -1 [23].
Identifying the nodes with the highest degree (also called "hubs") is an important part of network analysis as it helps identify the most crucial parts of the network.This knowledge can then later be used both to improve network's connectivity (by linking the hubs together) and disrupt it (by removing the nodes).
Betweenness centrality is another important measure of the node's influence within the whole network.While degree simply shows the number of connections the node has, betweenness centrality shows how often the node appears on the shortest path between any two randomly chosen nodes in a network.Thus, betweenness centrality is a much better measure of influence because it takes the whole network into account, not only the local connectivity that the node belongs to.
A node may have high degree but low betweenness centrality.This indicates that it's well-connected within the cluster that it belongs to, but not so well connected to the rest of the nodes that belong to the other clusters within the network.Such nodes may have high local influence, but not globally over the whole network.
Alternatively, other nodes may have low degree but high betweenness centrality.Such nodes may have fewer connections, but the connections they do have are linking different groups and clusters together, making such nodes influential across the whole network.
In network visualization we often range the node sizes by their degree or betweenness centrality to indicate the most influential nodes.
Network topology is an important element of network analysis.If we analyses networks on the structural basis we will discover many differences among them.A tool for studying complex networks based on graph theory is topological analysis.
When performing network analysis and visualization it is important to classify the topology of the network [44].This can be done through quantitative analysis of degree distribution among the nodes and/or through qualitative analysis using various visual graph layouts.
Degree distribution can be a good indicator of the network's topology.If most of the nodes in the network have exactly the same degree, the network is more of a regular one (it may also indicate the presence of tree-like hierarchical system within the network).If most of the nodes have an average number of connections that is the same and then some of the nodes have more and some of the nodes have less (normal bell-curve distribution of degree), we're dealing with a randomized network.Finally, if there's a small, but significant number of nodes with a high degree and then degree distribution follows a long tail towards a gradual decline (scale-free distribution), this is a small-world network, where there's a significant amount of well-connected hubs, which are surrounded by less connected satellites, which form clusters.Those clusters are connected to one another via the hubs and the nodes that belong to several communities at once.
Graph layout a qualitative measure for identifying topology of a network.A very useful type of layout is Force Atlas, where the most connected nodes with the highest degree are pushed apart from each other, while the nodes that are connected to them but have lower degree are grouped around those hubs.After several iterations this sort of layout produces a very readable representation of a network, which can be used to better understand its structural properties and identify the most influential groups, differences between them, and structural gaps within networks.
Network motifs are the different types of constellations that emerge within network graphs.They can provide a lot of useful information about the structural nature of networks.
For example, some networks may be comprised of dyads or pairs of nodes (which indicates that the level of overall connectivity is quite low).Some other networks can have a high proportion of triads, which usually indicate the presence of feedback loops, which makes the resulting network formations much more stable.More complex formations include groups of four nodes that can be connected as a sequence or between each other, forming interconnected clusters that can encode certain levels of complexity that go beyond simple triad feedback constellations.
It is important to take notice of the network motifs that emerge within a network because it will provide a very good indication of the level of complexity and thus the capacity of the network.
Modularity is a quantitative measure that indicates the presence of distinct communities within a network.If the network's modularity is high, it means it has a pronounced community structure, which, in turn, means that there's a space for plurality and diversity inside.If the modularity is too high, however, it might also indicate that the network consists of many disconnected communities, which are not globally connected, making it much less efficient than an interconnected one.
Modularity works through an iterative algorithm, which identifies the nodes that are more densely connected to each other than to the rest of the nodes in the network.It will then calculate the measure of modularity for the network at large.The higher this measure is, the more distinct those communities of densely connected nodes are.If the modularity measure is 0.4 or above it means that the community structure in the network is quite pronounced.If it's less it means that there are no big differences between the different clusters and most of the nodes are equally densely connected to each other across the whole network.
So far, we've looked at the different measures of connectivity that exist within networks and that help us identify the most influential nodes, clusters, and deduce some basic functional properties of the networks we study.
However, one of the most important aspects of network graphs is that they also let you see the gaps, empty blank spaces, between the islands.Those gaps are usually referred to as "structural gaps" and it has been shown that bridging those gaps can spur innovation, create most interesting collaborations, and give rise to new, unexpected ideas.
In other words, "structural gaps" is where creativity and potential are hidden within the network.Therefore, when visualizing a network, it is important to identify those structural gaps and to devise different actions that could help bridge different nodes and clusters across those empty spaces within the graph in order to spur creativity and innovation.

Results and analysis
As an example of modeling semantic knowledge networks, we analyze the relationship between the concepts of academic disciplines.As you know, that discipline mastering is closely connected with the assimilation and comprehension of the course concept thesaurus.To assimilate further concepts within the framework of this discipline, it is necessary to understand the already learned, often in the framework of the already studied disciplines.Therefore, an actual task is to study the dependencies between concepts and to model them, using cognitive networks [44].
The Fig. 4 shows a fragment of the construction of a semantic knowledge network.To implement the subject area model in the form of a semantic knowledge network, we propose the following algorithm: (1) Classification of all concepts of the subject area into macro concepts (class of concepts), meta-concepts (generalized concepts) and micro-concepts (elementary concepts).
(2) The allocation of common properties, characteristics inherent in each level of concepts.
(3) Highlighting the hallmarks of each level of concepts.
(4) Establishing links between concepts related to the same level.
(5) The allocation of inter-level ties.
We have analysed 125 concepts that are necessary for the "Economic Cybernetics" discipline mastering and the relationship between them (communication means the need for one concept to master another).We conducted a similar study for 125 concepts of the "Algorithms and Programming" and 125 concepts of the "Calculus" discipline.
The constructed graphs (Fig. 5-7) can be used to identify the most important concepts that have the highest degree of apex, as well as concepts that are in the way of studying other important course concepts.The obtained graphs were visualized using the Gephi software product [45].
Gephi is free open-source, leading visualization and exploration software for all kinds of networks and runs on Windows, Mac OS X, and Linux.It is highly interactive and user can easily edit the node/edge shapes and colors to reveal hidden patterns.The aim of the Gephi is to assist user in pattern discovery and hypothesis making through efficient dynamic filtering and iterative visualization routines.
Gephi allows to calculate the topological characteristics of the graph, as: -nodes and edges (what networks are made of); -clusters (groups of nodes that are connected); -degree (the number of connections that the node has); -centrality between (how influential a node is); -modularity (community structure).
Gephi comes with a very fast rendering engine and sophisticated data structures for object handling, thus making it one of the most suitable tools for large-scale network visualization.It offers very highly appealing visualizations and, in a typical computer, it can easily render networks up to 300 000 nodes and 1 000 000 edges.Compared to other tools, it comes with a very efficient multithreading scheme, and thus users can perform multiple analyses simultaneously without suffering from panel "freezing" issues.In large-scale network analysis, fast layout is a bottleneck as most sophisticated layout algorithms become CPU and memory greedy by requiring long running time to be completed.While Gephi comes with a great variety of layout algorithms, OpenOrd [46] and Yifan-Hu [47] force-directed algorithms are mostly recommended for large-scale network visualization.OpenOrd, for example, can scale up to over a million nodes in less than half an hour while Yifan-Hu is an ideal option to apply after the OpenOrd layout.Notably, Yifan-Hu layout can give aesthetically comparable views to the ones produced by the widely used but conservative and time-consuming Fruchterman and Reingold [48].Other algorithms offered by Gephi are the circular, contraction, dual circle, random, MDS, Geo, Isometric, GraphViz, and Force atlas layouts.While most of them can run in an affordable running time, the combination of OpenOrd and Yifan-Hu seems to give the most appealing visualizations.Descent visualization is also offered by OpenOrd layout algorithm if a user stops the process when ~50-60% of the progress has been completed.Of course, efficient parameterization of any chosen layout algorithm will affect both the running time and the visual result.
In Fig. 5-7 the size of the nodes-concepts of semantic knowledge networks characterizes the degree of importance and fundamentality of the corresponding terms of the academic discipline.
For the obtained graphs, their topological characteristics were calculated and analyzed.The results of the study are shown in Table 1.

Table 1. Comparison topological characteristics of the graphs
of the relationship between the concepts of the disciplines: "Economic Cybernetics" (E), "Algorithms and Programming" (P) and "Calculus" (M).Let us analyze the found values of measures (Table 1).The Link Density measure is a measure of the density of edges, calculated as the ratio of the number of edges of a graph to the corresponding number of vertices and determines the maximum number of edges in a given graph.Thus, the values 0.17 -for the graph of discipline "Economic cybernetics" and 0.2 -for the "Calculus" means that the edges are filled with about 17.3% and 19.5% of the maximum possible respectively.The density of the graph of concepts of the discipline "Algorithms and Programming" is less: 11%, which can be explained by a smaller number of connections between concepts on average in the graph.
The maximum degree of 121 vertices was demonstrated by the concept graph in the "Algorithms and Programming".The maximum value of the degree of the vertex in the column "Economic cybernetics" -111.The minimum degree of vertices in the graphs "Economic Cybernetics" and "Algorithms and Programming" are 3 and 1, respectively, which are almost the same.For "Calculus", the number of weakly connected nodes is higher -7, and strongly connected -113, which is less than in "Algorithms and Programming", but more than in "Cybernetics".
It also confirms a greater connection between the concepts of the "Economic cybernetics" and "Algorithms and Programming" than the concepts of the "Calculus".
Mean average node degree for the "Economic Cybernetics" graph is 21.45, and for the "Algorithms and Programming" graph -it is 13.66 and for the "Calculus" -24.18.This is confirming the presence of more connections in the last graph.
The global clustering coefficient (clustering) for a graph is the ratio of the number of vertically connected triples of vertices to the number of triangles (cyclically connected triples of vertices).For the "Economic Cybernetics" graph, the clustering coefficient is 0.4, for the "Algorithms and Programming" graph -it is 0.33, and for the "Calculus" -0.59.This means that the concepts of the "Calculus" course are more often on the path to mastering other important concepts.
As for the diameters of the graphs -for the "Economic Cybernetics" concept graph the diameter value is 5, for the "Algorithms and Programming" graph -9 and for "Calculus" -3.The same relationships are observed for average shortest path-lengths.Which may mean the existence of longer paths in the connections between the "Algorithms and Programming" discipline concepts.
The modularity index is less than 0.4, which means that the structure of communities in all three networks is not sufficiently expressed.
In the field of education, there is always a problem of the contradiction between increasing the amount of scientific information and limiting the time allotted for its assimilation.Teaching academic disciplines in higher education requires constant work on educational information in order to move from extensive to intensive teaching methods.Teaching academic disciplines in higher education requires constant work on educational information in order to move from extensive to intensive teaching methods.One of the ways to intensify the educational process can be the optimal "packaging" of educational information.
The solution to this problem is the construction of a semantic network.An important condition for the successful mastering of educational material is the ability of the teacher to highlight the key issues of the program.Nodal issues of the program are the basis for studying the whole topic.Their significance can be determined using a graph or adjacency matrix.
For example, let a topic contain 6 questions and the logical connections between them are presented in the form of an adjacency matrix (Table 2).P1 P2 P3 P4 P5 P6 B P1 0 1 1 0 0 1 3/6 P2 0 0 1 1 1 1 4/6 P3 0 0 0 1 1 0 2/6 P4 0 0 0 0 1 0 1/6 P5 0 0 0 0 0 0 0 P6 0 0 0 1 0 0 1/6 The significance of the question can be characterized by the weight coefficient determined by the formula:  B  S i /k where S i is the number of references to the i-th question when studying the others contained in this topic, k -is the total number of questions in this section.The larger the coefficient leads to the greater the significance of the issue.Thus, it is possible to determine the importance of the discipline (section) in the study of all disciplines of the curriculum.A similar technique can be used in the formation of the content of academic subjects on the basis of discipline standards, in the development of curricula and tests, in the selection and organization of educational information for training.

Conclusions
Algorithms for the formation of a semantic knowledge network are developed.The knowledge network is the basic concept of knowledge management.In fact, we introduce a new discipline that implements the principles of sustainable development of education.The method of constructing a semantic knowledge network of terms allows forming an adjacency matrix that reflects the correlation of terms from a terminological dictionary.This matrix allows to evaluate the quality of the terminology in the particular discipline, as well as to determine quantify the semantic connectivity of the whole tutorial.According to obtained results, we can conclude that the concept system in the "Economic Cybernetics" is connected and complex.This means that in this case when studying any concepts, it is necessary to repeat the meaning of those already studied.The concept system in the "Algorithms and Programming" contains fewer dependencies and less connectivity in comparison with graphs.But the experience of studying these disciplines indicates that also the "Algorithms and Programming" is not easy to learn.Further the problem of planning the learning process based on semantic networks of knowledge will be studied.Namely, the distribution of lectures, practical and laboratory exercises will be determined to achieve successfully the learning objectives.In future work, we will to calculate spectral characteristics of graphs for the studied disciplines, as it was done in [50,51].

Fig. 2 .
Fig. 2. The relationship of the concept, subject and word denoting this subject [41].

Fig. 3 .
Fig. 3.The relationship of various concepts in the human mind [41].

Fig. 5 .
Fig. 5.The semantic knowledge network of the course concepts "Economic Cybernetics".

Fig. 6 .
Fig. 6.The semantic knowledge network of the course concepts "Algorithms and Programming".

Fig. 7 .
Fig. 7.The semantic knowledge network of the course concepts "Calculus".