News classification
Contact us
- Add: No. 9, North Fourth Ring Road, Haidian District, Beijing. It mainly includes face recognition, living detection, ID card recognition, bank card recognition, business card recognition, license plate recognition, OCR recognition, and intelligent recognition technology.
- Tel: 13146317170 廖经理
- Fax:
- Email: 398017534@qq.com
Knowledge Mapping and Deep Learning
Knowledge Mapping and Deep Learning
The arrival of the era of big data has brought an unprecedented data dividend to the rapid development of artificial intelligence. Under the "feeding" of big data, artificial intelligence technology has made unprecedented progress. The pauses are prominently displayed in scholarly projects represented by learning graphs and related fields such as deep learning and machine learning. As the depth of deep learning about the big data is exhausted, the ceiling of the effect of the deep learning model is increasingly approaching. On the other hand, a large number of scholarly maps are emerging from time to time. These treasure trove, which contain a large number of human a priori knowledge, have not yet been effectively applied by deep learning. Blended learning maps and deep learning have become one of the most important ideas for further enhancing the effect of deep learning models. The symbolism represented by the learning graph and the unitedism represented by deep learning are increasingly detached from the track that they originally carried out independently and embarked on a new path of cooperation.
The historical background of the integration of learning map and deep learning
Big Data brings an unprecedented data dividend to machine learning, especially deep learning. Thanks to a wide range of labeling data, deep neural networks can acquire effective hierarchical feature representations and thus achieve excellent results in areas such as image recognition. However, with the disappearance of data dividends, deep learning has increasingly shown its limitations, especially in relying on a large range of data and difficult to effectively apply prior knowledge. These limitations hinder the further development of deep learning. On the other hand, in a large number of theories of deep learning, people have increasingly found that the results of deep learning models often contradict human prior knowledge or expert knowledge. How to get deep learning to get rid of dependence on a wide range of samples? How to make the deep learning model effectively use a large number of existing a priori knowledge? How to make the difference between the results of the deep learning model and the prior knowledge has become an important issue in the current deep learning category.
At present, the human society has accumulated a great deal of knowledge. In particular, in recent years, with the advancement of the science mapping technology, there have been a large number of on-line learning graphs that are machine friendly. The learning map is essentially a semantic network and expresses various types of entities, concepts and their semantic relationships. Related to traditional learning representations (such as ontologies, traditional semantic networks), learning graphs have the advantages of high entity/concept coverage, semantic relationships, constructiveness (usually expressed as RDF format), and high quality, making learning patterns increasingly becoming The most important form of scholarship in the era of big data and artificial intelligence. Whether or not the application of the deep neural network model included in the learning graph can be used to enhance the performance of the model has become one of the most important issues in the deep learning model.
At this stage, the method of applying deep learning technology to the learning map is relatively straightforward. A large number of deep learning models can effectively accomplish end-to-end tasks such as entity identification, relation extraction, and relational completion, and can be used to construct or enrich the learning graph. This article mainly discusses the application of the learning map in the deep learning model. From the current literature, there are two main ways. First, the semantic information in the learning map is input into the deep learning model; the discrete chemical question map is expressed as a continuous vector, so that the prior knowledge of the learning map can be an input for deep learning. The second is the application of learning as a constraint for the purpose of optimization to guide the learning of the deep learning model; it is usually expressed as an a posteriori regular term for optimization purposes. The former's research work has been a lot of literature, and has become the current research hotspot. The knowledge map vector representation is used as an important feature in practical tasks such as question and answer and referral. The latter discussion has only just begun. This article will focus on the introduction of a deep learning model with first-order predicate logic as a constraint.
Learn graphs as input for deep learning
The graph of learning is a typical representative of the recent pause in artificial intelligence symbolism. The entities, concepts, and relationships in the learning graph are all discrete, explicit symbolic representations. These discrete symbolic representations are difficult to apply directly to neural networks based on continuous numerical representations. In order to make the neural network effectively use the symbolic chemistry in the knowledge map, the researchers proposed a large number of learning methods for learning graphs. The representation of learning graphs aims at learning the real-valued vectorized representation of the constituent elements (nodes and edges) of the learning graph. These continuous vectorized representations can be used as input to the neural network, so that the neural network model can be used to apply a great deal of prior knowledge in the learning graph. This trend has given birth to a great deal of research on the representation of learning patterns. This chapter first briefly recalls the representational learning of the learning graph, and further introduces how these vector representations are applied to various practical tasks based on the deep learning model, especially practical applications such as question and answer and referral.
1. Learning graph representation learning
The representation of the learning graph represents a vectorized representation of learning entities and relationships. The key is to rationally define the loss function ƒr(h,t) of the fact map (triad <h,r,t >) in the learning graph. Is a vectorized representation of the two entities h and t of the triple. Normally, when the fact <h,r,t > is established, it is desirable to minimize ƒr(h,t). Thinking about the facts of the entire learning graph, you can learn the entity and the vectorized representation of the relationship by minimizing the picture representation, where O represents the convergence of all facts in the learning graph. Different representational learning can use different criteria and methods to define the corresponding loss function. Here we introduce the basic thoughts of learning graphs based on interval and translation models [1].
Interval-based model. Its representative work is the SE model [2]. The basic idea is that when two entities belong to the same triple <h,r,t >, their vector representations should also be close to each other in the projected space. Thus, the loss function is defined as a vector projected interval image depicting the projection operations of the matrix Wr,1 and Wr,2 for the head entity h and the tail entity t in the triplet. However, since SE introduces two separate projection matrices, it is difficult to capture the semantic correlation between entities and relationships. Socher et al. used this third-order tensor to replace the linear transformation layer in the traditional neural network to describe the scoring function. Bordes et al. proposed the energy matching model and introduced the Hadamard product of multiple matrices to capture the interaction between the entity vector and the relation vector.
Translation-based presentation learning. Its representative work TransE model describes the correlation between entities and relationships through vector space vector translation [3]. The model assumes that if <h,r,t > holds, the embedded representation of the tail entity t should be close to the head entity h plus the embedded representation of the relation vector r, ie h+r≈t. Thus, TransE uses picture rendering as a scoring function. When the triplet is established, the score is lower, whereas the score is higher. TransE is very effective when dealing with simple 1-1 relationships (that is, the ratio of the number of entities connected at both ends of the relationship is 1:1), but the performance is significantly reduced when dealing with complex relationships of N-1, 1-N, and NN. . For these complex relationships, Wang proposed that the TransH model projects entities onto the hyperplane where the relationships are located, thus learning the different representations of the entities under different relationships. Lin proposed that TransR models project entities into relational subspaces through projection matrices, thereby acquiring different entity representations under different relationships.
In addition to the above two types of typical learning graph representation learning models, there are a large number of other representation learning models. For example, Sutskever et al. use tensor factor synthesis and Bayesian clustering to learn the relationship structure. Ranzato et al. introduced a three-way restricted Boltzmann machine to learn the vectorization representation of the learning graph and parameterized it via a tensor.
The current mainstream knowledge map shows that there are still a variety of problems in the learning approach. For example, the semantic correlation between entities and relationships cannot be better described, the learning of expressions that cannot deal with complex relationships can be better, and the model is too complex due to the introduction of a large number of parameters. , and the low computational efficiency is difficult to extend to large-scale knowledge maps and so on. In order to better provide prior knowledge for machine learning or deep learning, the representation learning of learning graphs is still a long-term research topic.
Application of Vectorization of Learning Graphs
Application 1 Question Answering System. Natural speech question and answer is an important way for human-computer interaction. Deep learning makes it possible to generate questions and answers based on question and answer corpus. However, most of today's deep question and answer models still find it difficult to apply a large amount of knowledge to complete accurate responses. Yin et al. proposed a deep-learning question-and-answer model [4] based on the encoder-decoder framework that can be used to apply the learning graphs to middle schools. In deep neural networks, the semantics of a problem is often expressed as a vector. Problems with similar vectors are thought to have similar semantics. This is a typical way of unity. On the other hand, the academic representation of the learning graph is discrete, ie there is not a sudden relationship between learning and learning. This is a typical way of symbolism. After vectorizing the learning graph, the problem can be matched with the triplet (that is, its vector similarity is calculated) to find the best triples match from the knowledge base for a particular problem. The matching process is shown in Figure 1. Regarding the question Q: "How tallis Yao Ming?", the words in the question are first expressed as a vector array HQ. Further look for candidate triples in the learning map that can be matched. Finally, for these candidate triples, the semantic similarities between the problem and different attributes are calculated. It is determined by the following similarity formula: Picture Description Here, S(Q, τ) represents the similarity of the problem Q with the candidate triple τ; xQ represents the vector of the problem (calculated from HQ), and uτ represents the third of the learning graph. The tuple vector, M is the parameter to be learned.
Picture description
Figure 1 Neurology model based on the learning graph
Application 2 referral system. Personalized referral system is one of the important intelligent services of Internet social media and e-commerce websites. As the application of learning maps becomes more common, a large number of research and discussion work have realized that the knowledge in the learning graph can be used to improve the content (characteristic) depiction of users and projects in the content-based referral system, thereby improving the referral effect. On the other hand, referral algorithms based on deep learning are increasingly superior to the traditional referral model based on collaborative filtering in the referral effect [5]. However, the study of personalized referrals that integrate the learning graph into the deep learning framework is still relatively rare. Zhang et al. made such an attempt. The author has applied three types of typical knowledge, such as Chemistry (Questionnaire Atlas), Textual Learning, and Visual Chemistry (Image)[6]. The authors obtained the vectorization representation of structural chemistry through network embedding, and then used SDAE (Stacked Denoising Auto-Encoder) and stacked convolution-autoencoder to extract text learning features and picture learning features, respectively. Finally, the three types of features are merged into the collaborative ensemble learning framework, and the application of the three types of academic features is integrated to complete the personal recommendation. The author stopped the experiment for movie and book data sets, and proved that the proposed algorithm for blending deep learning and learning graphs has better performance.
Learning map as a constraint for deep learning
Hu et al. proposed a model that combines the logic of the first-order predicate into a deep neural network, and used the victory to deal with issues such as sentiment classification and named entity identification [7]. Logic rules are a sensitive representation of higher-order cognitive and structural chemistry, and they are also typical representations of learning. The logic rules of various types of people have been incorporated into deep neural networks. It is of great significance to apply human attempts and category knowledge to stop the guidance of neural network models. Other discussions have attempted to introduce logic rules into the probability map model. The representation of this type of work is the Markov logic network [8], but few jobs can introduce logic rules into deep neural networks.
The plan framework proposed by Hu et al. can be summarized as "teacher-student network", as shown in FIG. 2, including two local teacher network q(y|x) and student network p[theta](y|x). The teacher network serves to model the knowledge represented by the logic rules, and the student network applies the back propagation method plus the constraint of the teacher network to complete the learning of the logic rules. This framework can introduce logic rules for tasks that are mostly modeled by deep neural networks, including emotional analysis, named entity identification, and so on. After the introduction of logic rules, the effect was improved on the basis of the deep neural network model.
Picture description
Figure 2 "teacher-student network" model that introduces logic rules into deep neural networks
Its learning process mainly includes the following steps:
Apply soft logic to express the logic rule as a continuous number between [0, 1].
Based on the posterior regularization approach, apply the logic rules to stop the teacher network and ensure that the teacher network and the student network are as close as possible. The final optimization function is:
Picture description
Among them, ξl, gl are slack variables, L is the number of rules, and Gl is the grounding number of the lth rule. The KL function (Kullback-Leibler Divergence) locally guarantees that the teacher network and student network acquisition models are as divergent as possible. The latter regular expression expresses the constraints from the logic rules.
Stop exercise on student network, ensure that the teacher network's prediction results and student network's prediction results are as good as possible, and the optimization function is as follows:
Picture description
Among them, t is the exercise round, l is the loss function in different tasks (such as in the classification problem, l is the interspersed entropy), σθ is the prediction function, and sn(t) is the prediction result of the teacher network.
Repeat 1 to 3 until convergence.
Completion
With the deepening of deep learning seminars, how to effectively apply a large number of prior knowledge, and then reduce the model's dependence on a wide range of labeling samples, gradually becoming one of the mainstream research direction. The representation of learning maps has laid the necessary foundation for inquiry in this direction. The recent founding work of blending learning into a deep neural network model is also instructive. However, in general, the current deep learning model still has very limited ability to use transcendental learning. The academic community is still facing great challenges in this direction. These challenges are mainly manifested in two aspects:
The historical background of the integration of learning map and deep learning
Big Data brings an unprecedented data dividend to machine learning, especially deep learning. Thanks to a wide range of labeling data, deep neural networks can acquire effective hierarchical feature representations and thus achieve excellent results in areas such as image recognition. However, with the disappearance of data dividends, deep learning has increasingly shown its limitations, especially in relying on a large range of data and difficult to effectively apply prior knowledge. These limitations hinder the further development of deep learning. On the other hand, in a large number of theories of deep learning, people have increasingly found that the results of deep learning models often contradict human prior knowledge or expert knowledge. How to get deep learning to get rid of dependence on a wide range of samples? How to make the deep learning model effectively use a large number of existing a priori knowledge? How to make the difference between the results of the deep learning model and the prior knowledge has become an important issue in the current deep learning category.
At present, the human society has accumulated a great deal of knowledge. In particular, in recent years, with the advancement of the science mapping technology, there have been a large number of on-line learning graphs that are machine friendly. The learning map is essentially a semantic network and expresses various types of entities, concepts and their semantic relationships. Related to traditional learning representations (such as ontologies, traditional semantic networks), learning graphs have the advantages of high entity/concept coverage, semantic relationships, constructiveness (usually expressed as RDF format), and high quality, making learning patterns increasingly becoming The most important form of scholarship in the era of big data and artificial intelligence. Whether or not the application of the deep neural network model included in the learning graph can be used to enhance the performance of the model has become one of the most important issues in the deep learning model.
At this stage, the method of applying deep learning technology to the learning map is relatively straightforward. A large number of deep learning models can effectively accomplish end-to-end tasks such as entity identification, relation extraction, and relational completion, and can be used to construct or enrich the learning graph. This article mainly discusses the application of the learning map in the deep learning model. From the current literature, there are two main ways. First, the semantic information in the learning map is input into the deep learning model; the discrete chemical question map is expressed as a continuous vector, so that the prior knowledge of the learning map can be an input for deep learning. The second is the application of learning as a constraint for the purpose of optimization to guide the learning of the deep learning model; it is usually expressed as an a posteriori regular term for optimization purposes. The former's research work has been a lot of literature, and has become the current research hotspot. The knowledge map vector representation is used as an important feature in practical tasks such as question and answer and referral. The latter discussion has only just begun. This article will focus on the introduction of a deep learning model with first-order predicate logic as a constraint.
Learn graphs as input for deep learning
The graph of learning is a typical representative of the recent pause in artificial intelligence symbolism. The entities, concepts, and relationships in the learning graph are all discrete, explicit symbolic representations. These discrete symbolic representations are difficult to apply directly to neural networks based on continuous numerical representations. In order to make the neural network effectively use the symbolic chemistry in the knowledge map, the researchers proposed a large number of learning methods for learning graphs. The representation of learning graphs aims at learning the real-valued vectorized representation of the constituent elements (nodes and edges) of the learning graph. These continuous vectorized representations can be used as input to the neural network, so that the neural network model can be used to apply a great deal of prior knowledge in the learning graph. This trend has given birth to a great deal of research on the representation of learning patterns. This chapter first briefly recalls the representational learning of the learning graph, and further introduces how these vector representations are applied to various practical tasks based on the deep learning model, especially practical applications such as question and answer and referral.
1. Learning graph representation learning
The representation of the learning graph represents a vectorized representation of learning entities and relationships. The key is to rationally define the loss function ƒr(h,t) of the fact map (triad <h,r,t >) in the learning graph. Is a vectorized representation of the two entities h and t of the triple. Normally, when the fact <h,r,t > is established, it is desirable to minimize ƒr(h,t). Thinking about the facts of the entire learning graph, you can learn the entity and the vectorized representation of the relationship by minimizing the picture representation, where O represents the convergence of all facts in the learning graph. Different representational learning can use different criteria and methods to define the corresponding loss function. Here we introduce the basic thoughts of learning graphs based on interval and translation models [1].
Interval-based model. Its representative work is the SE model [2]. The basic idea is that when two entities belong to the same triple <h,r,t >, their vector representations should also be close to each other in the projected space. Thus, the loss function is defined as a vector projected interval image depicting the projection operations of the matrix Wr,1 and Wr,2 for the head entity h and the tail entity t in the triplet. However, since SE introduces two separate projection matrices, it is difficult to capture the semantic correlation between entities and relationships. Socher et al. used this third-order tensor to replace the linear transformation layer in the traditional neural network to describe the scoring function. Bordes et al. proposed the energy matching model and introduced the Hadamard product of multiple matrices to capture the interaction between the entity vector and the relation vector.
Translation-based presentation learning. Its representative work TransE model describes the correlation between entities and relationships through vector space vector translation [3]. The model assumes that if <h,r,t > holds, the embedded representation of the tail entity t should be close to the head entity h plus the embedded representation of the relation vector r, ie h+r≈t. Thus, TransE uses picture rendering as a scoring function. When the triplet is established, the score is lower, whereas the score is higher. TransE is very effective when dealing with simple 1-1 relationships (that is, the ratio of the number of entities connected at both ends of the relationship is 1:1), but the performance is significantly reduced when dealing with complex relationships of N-1, 1-N, and NN. . For these complex relationships, Wang proposed that the TransH model projects entities onto the hyperplane where the relationships are located, thus learning the different representations of the entities under different relationships. Lin proposed that TransR models project entities into relational subspaces through projection matrices, thereby acquiring different entity representations under different relationships.
In addition to the above two types of typical learning graph representation learning models, there are a large number of other representation learning models. For example, Sutskever et al. use tensor factor synthesis and Bayesian clustering to learn the relationship structure. Ranzato et al. introduced a three-way restricted Boltzmann machine to learn the vectorization representation of the learning graph and parameterized it via a tensor.
The current mainstream knowledge map shows that there are still a variety of problems in the learning approach. For example, the semantic correlation between entities and relationships cannot be better described, the learning of expressions that cannot deal with complex relationships can be better, and the model is too complex due to the introduction of a large number of parameters. , and the low computational efficiency is difficult to extend to large-scale knowledge maps and so on. In order to better provide prior knowledge for machine learning or deep learning, the representation learning of learning graphs is still a long-term research topic.
Application of Vectorization of Learning Graphs
Application 1 Question Answering System. Natural speech question and answer is an important way for human-computer interaction. Deep learning makes it possible to generate questions and answers based on question and answer corpus. However, most of today's deep question and answer models still find it difficult to apply a large amount of knowledge to complete accurate responses. Yin et al. proposed a deep-learning question-and-answer model [4] based on the encoder-decoder framework that can be used to apply the learning graphs to middle schools. In deep neural networks, the semantics of a problem is often expressed as a vector. Problems with similar vectors are thought to have similar semantics. This is a typical way of unity. On the other hand, the academic representation of the learning graph is discrete, ie there is not a sudden relationship between learning and learning. This is a typical way of symbolism. After vectorizing the learning graph, the problem can be matched with the triplet (that is, its vector similarity is calculated) to find the best triples match from the knowledge base for a particular problem. The matching process is shown in Figure 1. Regarding the question Q: "How tallis Yao Ming?", the words in the question are first expressed as a vector array HQ. Further look for candidate triples in the learning map that can be matched. Finally, for these candidate triples, the semantic similarities between the problem and different attributes are calculated. It is determined by the following similarity formula: Picture Description Here, S(Q, τ) represents the similarity of the problem Q with the candidate triple τ; xQ represents the vector of the problem (calculated from HQ), and uτ represents the third of the learning graph. The tuple vector, M is the parameter to be learned.
Picture description
Figure 1 Neurology model based on the learning graph
Application 2 referral system. Personalized referral system is one of the important intelligent services of Internet social media and e-commerce websites. As the application of learning maps becomes more common, a large number of research and discussion work have realized that the knowledge in the learning graph can be used to improve the content (characteristic) depiction of users and projects in the content-based referral system, thereby improving the referral effect. On the other hand, referral algorithms based on deep learning are increasingly superior to the traditional referral model based on collaborative filtering in the referral effect [5]. However, the study of personalized referrals that integrate the learning graph into the deep learning framework is still relatively rare. Zhang et al. made such an attempt. The author has applied three types of typical knowledge, such as Chemistry (Questionnaire Atlas), Textual Learning, and Visual Chemistry (Image)[6]. The authors obtained the vectorization representation of structural chemistry through network embedding, and then used SDAE (Stacked Denoising Auto-Encoder) and stacked convolution-autoencoder to extract text learning features and picture learning features, respectively. Finally, the three types of features are merged into the collaborative ensemble learning framework, and the application of the three types of academic features is integrated to complete the personal recommendation. The author stopped the experiment for movie and book data sets, and proved that the proposed algorithm for blending deep learning and learning graphs has better performance.
Learning map as a constraint for deep learning
Hu et al. proposed a model that combines the logic of the first-order predicate into a deep neural network, and used the victory to deal with issues such as sentiment classification and named entity identification [7]. Logic rules are a sensitive representation of higher-order cognitive and structural chemistry, and they are also typical representations of learning. The logic rules of various types of people have been incorporated into deep neural networks. It is of great significance to apply human attempts and category knowledge to stop the guidance of neural network models. Other discussions have attempted to introduce logic rules into the probability map model. The representation of this type of work is the Markov logic network [8], but few jobs can introduce logic rules into deep neural networks.
The plan framework proposed by Hu et al. can be summarized as "teacher-student network", as shown in FIG. 2, including two local teacher network q(y|x) and student network p[theta](y|x). The teacher network serves to model the knowledge represented by the logic rules, and the student network applies the back propagation method plus the constraint of the teacher network to complete the learning of the logic rules. This framework can introduce logic rules for tasks that are mostly modeled by deep neural networks, including emotional analysis, named entity identification, and so on. After the introduction of logic rules, the effect was improved on the basis of the deep neural network model.
Picture description
Figure 2 "teacher-student network" model that introduces logic rules into deep neural networks
Its learning process mainly includes the following steps:
Apply soft logic to express the logic rule as a continuous number between [0, 1].
Based on the posterior regularization approach, apply the logic rules to stop the teacher network and ensure that the teacher network and the student network are as close as possible. The final optimization function is:
Picture description
Among them, ξl, gl are slack variables, L is the number of rules, and Gl is the grounding number of the lth rule. The KL function (Kullback-Leibler Divergence) locally guarantees that the teacher network and student network acquisition models are as divergent as possible. The latter regular expression expresses the constraints from the logic rules.
Stop exercise on student network, ensure that the teacher network's prediction results and student network's prediction results are as good as possible, and the optimization function is as follows:
Picture description
Among them, t is the exercise round, l is the loss function in different tasks (such as in the classification problem, l is the interspersed entropy), σθ is the prediction function, and sn(t) is the prediction result of the teacher network.
Repeat 1 to 3 until convergence.
Completion
With the deepening of deep learning seminars, how to effectively apply a large number of prior knowledge, and then reduce the model's dependence on a wide range of labeling samples, gradually becoming one of the mainstream research direction. The representation of learning maps has laid the necessary foundation for inquiry in this direction. The recent founding work of blending learning into a deep neural network model is also instructive. However, in general, the current deep learning model still has very limited ability to use transcendental learning. The academic community is still facing great challenges in this direction. These challenges are mainly manifested in two aspects: