Domain Concepts Relationships Discovery

A method for automatization of the process of authoring adaptive course metadata.

Technologies used: Java, XML
Inputs: E-course learning objects
Outputs: E-course domain model

Addressed Problems

The quality of educational systems strongly depends on the ability to deliver personalized content to the student. The more effective the education should be, the more sophisticated mechanisms of adaptation are necessary. Adaptation complexity is reflected into e-course authoring. This is a major pitfall of successful wide-spreading and utilization of adaptive educational systems.

In order to support teachers in adaptive content authoring, it would be useful to create a part of the domain model (semi)automatically. Although the problem of inteligent course content creation was identified, there exist no known approaches to sufficiently address the mentioned problem [1].


The method of automated course metadata generation is based on knowledge discovery techniques. First, input learning objects are preprocessed and their vector representation is created. The terms with the highest relevation are extracted, which are considered to be pseudoconcepts (suitable to become concepts). In the next step, relationships between concepts are generated. The composed domain model is checked and finalized by a teacher.

method overview

Crucial part of the method is inter-pseudoconcept relatedness computation. For this purpose, several variants were proposed:

In every variant, we are trying to look at the actual domain model state from different point of view. The vector approach injects learning objects' vectors into pseudoconcept representation and relatedness is computed using well-known cosine similarity metrics. The principle of the spreading activation variant is to think of the domain model as of a contextual network. After the spreading activation method is applied the mutual relatedness of pseudoconcepts is computed based on the activation energy values associated to pseudoconcepts. The third variant is similar to querying the web space - the result for each query (e.i. pseudoconcept) is the list of the most relevant (related) pseudoconcepts. The idea of combining the variants is derived from the assumption that PageRank based analysis could by applied to the outputs of the first two variants.


  1. Šimko, M., Bieliková, M. Automated Educational Course Metadata Generation Based on Semantics Discovery. In LNCS 5794, U. Cress, V. Dimitrova, and M. Specht (Eds.), Proc. of European Conf. on Technology Enhanced Learning, EC TEL 2009, Springer, pp.99–105.
  2. Šimko, M. Bieliková, M. Automatic Concept Relationships Discovery for an Adaptive E-course. In Proc. of Educational Data Mining 2009: 2nd International Conference on Educational Data Mining. Barnes, T., Desmarais, M., Romero, C., Ventura, S. (Eds.), Cordoba, Spain, pp. 171-179.