Semantic & Syntactic Interoperability for Learning Object Metadata

This paper will be appearing in: Metadata in Practice (Hillman, D. ed.) Chicago: ALA Editions.

Overview

Introduction

With its recent approval as a standard (IEEE 2002) and its subsequent submission to ISO for further development (ISO 2003), the Learning Object Metadata (LOM) data model has achieved a level of stability and international recognition requisite to its implementation in large-scale e-learning infrastructures.  As is the case with the Dublin Core Metadata Element Set (DCMES), the consensus represented and codified in the LOM data model provides implementers and developers with a common foundation for achieving interoperability.  As a part of this development and implementation process, the LOM is being refined and adapted by a wide variety of consortia and projects to meet the requirements of specific communities and domains.  The CanCore Learning Object Metadata Application Profile represents one such effort at adaptation that has been undertaken in Canada --but whose relevance extends beyond Canada's borders.  This chapter provides an overview of the LOM data model, and it undertakes a comparison of CanCore to other application profiling efforts.  It discusses CanCore's contribution in terms of what has been called "semantic interoperability," and describes challenges presented by what is known as "syntactic" or "technical interoperability."

Characteristics of the LOM Data Model

Unlike the DCMES, the LOM focuses on the description of modular, reusable and specifically educational resources (or "learning objects") to facilitate their use by educators, authors, learners, and managers (IEEE, 2002).  In further contradistinction to the DCMES, the LOM undertakes this task through what has been called a "structuralist" rather than a "minimalist" approach to metadata (see Weibel, Ianella & Cathro 1997).  Instead of presenting a relatively simple data model that defines a minimal number of elements, the LOM identifies 76 data elements, covering a wide variety of characteristics attributable to learning objects, and places these elements in interrelationships that are both hierarchical and iterative. 

At the top of the hierarchy of LOM elements are nine broad "category" elements, General, Lifecycle, Meta-metadata, Technical, Educational, Rights, Relation, Annotation and Classification.  These category elements --and the elements subsumed beneath them-- can be described briefly as follows: The General category is said to apply to the "learning object as a whole", and defines many data elements that have equivalents in the DCMES (such as title, description and coverage).  The second category, Lifecycle, uses a hierarchical "contribute" element construction along with the "Electronic Business Card" (vCard) data model and encoding format to record the roles and identities of various contributors.  In the Meta-metadata category, this "Contribute" element construction recurs in slightly modified form, for the attribution of the creation and validation of the metadata record itself.  The "Technical" category indicates the format, size and other so-called "objective" (Wiley, Recker & Gibbons, 2000) characteristics of the learning object.  This element category also provides a "requirements" element construction that allows for the formulation of machine-readable statements about specific technical supports needed for the use of the object.  The Educational category focuses on more "subjective" characteristics of the object (Wiley, Recker & Gibbons, 2000), indicating audience attributes such as age, institutional context and role (among other things).  This category also provides elements that can be understood as falling into complex orthogonal interrelationship, describing type and level of interactivity provided by the object, as well as the degree of concision of its contents.  Rights and Relations categories are simple by comparison, using a half-dozen or fewer elements each to indicate legal terms and conditions for the use of the learning object, as well as its possible relation to other resources.  Annotation similarly employs only four elements to "enable educators to share their assessments[,] suggestions" and other "comments on the educational use of [the] learning object."  Classification, the last category, provides nine intricately structured elements --taxon source, taxon path, taxon identifier, taxon entry and others-- that can be adapted to the use of almost any classification or taxonomic purpose.  Among the specific purposes recommended for this element group (as suggested by the recommended "vocabulary" values) are "ideas," "prerequisites," "educational objectives," "educational levels" and "competencies."

The LOM data elements are repeatable in different combinations, and on different levels within their hierarchical constructions.  For example, on the lowest level in the hierarchy of elements in the classification category, all elements should be repeatable as a group or category at least 40 times.  For each repetition on this level, at least 15 different taxon paths can be specified; and for each of these repetitions, it should be possible to accommodate up to 15 particular taxon identifiers and entries.  This means that as a minimum, systems storing and processing LOM-compliant records should be able to accommodate at least 9000 taxon identifier-value pairs. 

Given the complex and demanding character of the LOM, the task of adapting it to meet the specific and concrete needs of implementers and users requires interpretation, elaboration, extension, and perhaps especially, the simplification of both the technical demands and the myriad interpretive possibilities it presents. 

LOM Application Profiles

This task of interpretation and simplification has generally been understood as being the responsibility of application profiling activities.  In a document written jointly by representatives of the Learning Object Metadata and Dublin Core communities, an application profile is defined as "an assemblage of metadata elements selected from one or more metadata schemas and combined in a compound schema" (Duval, et. al. 2002).  The numerous application profiles that interpret and adapt the LOM may be separated into four general (and not mutually exclusive) groups: 

  1. Those that combine elements from the LOM with elements from other metadata specifications and standards;
  2. Those that focus on the definition of element extensions and other customizations specifically for the LOM;
  3. Those that emphasize the reduction of the number of LOM elements and the choices they present;
  4. Those that both simplify and undertake customized extensions of the LOM

An example of the first, combinatory approach is provided by the Australian Le@rning Federation's Metadata Application Profile, which combines LOM elements with those from the DCMES, the Open Digital Rights Language, and from other sources (The Le@rning Federation, 2002).  The second approach is prominently illustrated by the "CLEO Extensions" to both LOM data elements and controlled vocabularies developed jointly by Microsoft, IBM Cisco and Thompson NETg through the Customized Learning Experience Online Lab (CLEO, 2003).  The third approach --focusing exclusively on the simplification of the LOM-- has been adopted by CanCore, and also characterizes the metadata profiling work undertaken by SingCore (ECC, 2003) and in the ADL SCORM reference framework (ADL, 2003).  The fourth approach --the simultaneous extension and reduction of the number of LOM elements-- has been adopted by the US Health Education Assets Laboratory (HEAL, 2003) and the UK's $500 million Curriculum Online project (Curriculum Online, 2003).

CanCore and Semantic Interoperability

The approach taken by CanCore involves much more than the simple reduction of  the number of elements recommended for use in LOM implementation.  CanCore also places a great deal emphasis on the refinement and precise definition of element and vocabulary semantics, applying wherever possible established and best practices from the larger metadata and cataloguing communities. 

CanCore has taken this approach with the intention of maximizing the potential for "semantic interoperability."  Interoperability generally refers to "the ability of two or more systems or components to exchange information and to use the information that has been exchanged" (IEEE, 1990).  Semantic interoperability refers specifically to the meanings that are embedded in this exchanged information, and to the effective and consistent interpretation of these meanings.  The systems or system components that carry out this interpretation, it should be emphasized, are generally human users rather than processing or transmission devices.  (For more about semantic interoperability as interpretive practice, see Friesen, 2002.)

CanCore's primary contribution to semantic interoperability takes the form of the "CanCore Learning Object Metadata Guidelines," a document in excess of 100 pages available at no charge from the CanCore Website (www.cancore.org).  This document provides for each element and element group in the LOM a great deal of fine-grained information and guidance, including:

The notes, recommendations, examples and interpretations that make up the CanCore guidelines contribute to semantic interoperability by reflecting consensus on common and best practices.  In so doing, they also help to form a basis for further consensus on these practices.  In this context, common practice refers to techniques and conventions --sometimes as simple as putting first personal name last, last name first-- that are practical, widely understood, and can be consistently applied.  Best practices --such as the use of LOM vocabulary values in addition to a locally developed value sets-- are typically demonstrably superior to other methods in optimizing interoperability.

As an example, CanCore references a number of the recommendations, definitions and practices used in Dublin Core and its communities of practice.  In one specific instance, CanCore provides and recommends Dublin Core definitions for DC.Source and qualified DC.Relation where an approximate or direct equivalence is suggested in the LOM.  In this way, Cancore leverages semantic consensus already developed in the Dublin Core community to promote semantic interoperability among projects referencing the LOM, and also to work toward cross-domain interoperability through mutual reference to the DC data model.

CanCore also attempts to both reflect and shape current practice by providing overviews of the interpretations of other application profiles in its guidelines documentation.  Noting the convergence or divergence that some of this existing practice indicates, the CanCore guidelines often recommend interpretations and understandings that either mediate between this divergence or that strengthen what appears to be emerging areas of consensus.

In both surveying and helping to form this emerging consensus, CanCore has received valuable input and assistance from a wide variety of projects and organizations.  These have included, in various stages:

Guidelines for Practice

The importance of "best practice" guidelines such as those developed by CanCore is generally recognized in the larger metadata community.  Such guidelines have been developed for a variety of metadata specifications and implementations.  Examples include the broadly-based "Using Dublin Core" guidelines (Hillman, 2001), the "CIMI Guide to best practice: Dublin Core" developed for the museums community (CIMI, 2000), and the "Online Archive of California Best Practice Guidelines" developed to support the Encoded Archival Description specification.  In the first of these documents, the purpose of the Dublin Core guidelines is described as follows:

[an] important goal of this document is to promote "best practices" for describing resources using the Dublin Core element set. The Dublin Core community recognizes that consistency in creating metadata is an important key to achieving complete retrieval and intelligible display across disparate sources of descriptive records. Inconsistent metadata effectively hides desired records, resulting in uneven, unpredictable or incomplete search results (Hillman, 2001).

A similar argument can be said to motivate each of the guidelines documents mentioned above. 

In each case --as in the CanCore guidelines-- brief definitions of the elements provided in the data model itself are augmented and refined.  These guidelines also provide examples of how the elements would be used, and often highlight (and also attempt to resolve) ambiguities that elements can present to implementers and record creators.  Especially in the case of the CIMI documentation, significant reference is also made to best practices as they have emerged in the field of cataloguing and indexing, and as they are encoded in cataloguing rules, such as the Anglo-American Cataloguing Rules (AACR2). 

The complexity of Learning Object Metadata, as well as its widespread adoption would seem to underscore the need for similar guidelines in the e-learning community.  The apparent lack of publicly available, normative interpretation and explication of LOM elements represents a conspicuous gap existing across implementations and communities that the CanCore hopes to address.

Approaches to Syntactic Interoperability

A second form of interoperability that is frequently emphasized in the literature --and that has played an important role in CanCore's development-- is known as "technical" or "syntactic" interoperability (see, for example, Miller, 2000; Hewlett-Packard, 2003).  This form of interoperability is concerned with the "technical issues" and "standards" involved in the effective "communication, transport, storage and representation" of metadata and other types of information (Miller, 2000).   And its significance for CanCore can be said to lie less in any opportunities for simplification than in the importance of maintaining the full complexity of the LOM data model.

The LOM, like the DCMES, is dependent on data bindings for the interoperable representation, communication and transport of metadata records.  "Binding" refers to the expression of the LOM data model --or others-- via a formal language or syntax for the purposes of effective data exchange and processing.  In the case of both the LOM and Dublin Core, the general standards used for creating these bindings include RDFS (Resource Description Framework Schema) and XMLS (eXtensible Markup Schema Language).  In the case of the LOM, the specific way that the XML Schema language is used to format or "bind" LOM data is itself the subject of standardization.  As I write this chapter, the XML binding for the LOM is being standardized by the IEEE to become part two of what will then be a "multipart" LOM standard.

Despite these and other efforts in support of technical interoperability for the LOM (e.g., see Richardson and Powell, 2003; IMS, 2003), significant challenges and misunderstandings seem to persist.  Some of these problems arise from the intricacy and iteration that is a part of the hierarchical structures of the LOM data model.  The complexity of these structures --while well-suited to encoding and representation in XML-- can be difficult to accommodate using common database techniques (e.g, see Shanmugasundaram et al. 1999).  Specifically, these structures, their number and iteration can be challenging to faithfully represent using the tabular and relational structures that are at the core of common database technologies.  As a simple illustration, converting sample LOM records into common database file formats using automated routines will produce relational structures with 40 or more tables.  (This type of test can be undertaken using example records from the IMS Website [IMS 2001], the conversion capabilities of XML Spy and a database such as MySQL or MSAccess.)  Such a means of storing and accessing data can be, as Shanmugasundaram et al. put it, "unwieldy" to say the least.  Compounding this problem is the fact that attractive alternatives to relational database technologies --for example, native XML databases-- tend to be very costly, and are currently not available as mature open source products.

This problem has placed considerable pressure on implementers to simplify the LOM --to reduce the number of elements, element iterations and other complexities in the LOM data model.  In doing so, implementers would be able to greatly reduce the number of tables that are required for a relational database to reliably store LOM data.  They might also be able to reduce the challenges of processing multiple levels of interrelated hierarchical elements and both XML and vCard encodings.  In some cases, this has led implementers to develop simplified versions of the LOM XML binding, which specify limitations on element numbers and iterations.  In other instances, it has presumably led to the development of application profiles that limit the use and iteration of LOM data elements.

However, attempts to simplify the LOM data model as it is implemented in databases and other infrastructure elements create other difficulties in the area of technical or syntactic interoperability.  These difficulties arise from the fact that systems based on "simplified" LOM data models will not be able to reliably exchange and store metadata records that utilize particular parts --or even the whole-- of the LOM data model.  If systems are incapable of processing and storing anything less than the full LOM element set (with at least the number of elements and iterations specified by the LOM), there is a danger that these systems will truncate records from other systems they might receive, store, then retransmit.  With the prominence of metadata record sharing strategies exemplified by the Open Archives Initiative's metadata harvesting protocol (e.g. IMS 2003), the danger of such truncation is hardly hypothetical. 

As a result, in identifying a subset of recommend elements, the CanCore guidelines underscore the fact that this subset does not represent an acceptable minimum for transmission and storage infrastructures.  To further support this point, this document provides recommendations and support for all of the elements in the LOM --not just those in its element subset.  The guidelines further emphasize that the CanCore subset --like any other simplification of the LOM data model-- only applies to metadata record creation and display.  And in these particular contexts it is often desirable to introduce even further simplifications and constraints than those explicitly recommended by CanCore (see Friesen, et al. 2003).
 
Conclusion

Metadata profiling efforts can provide significant guidance and support for the difficult task of implementing and concretizing complex and abstract data models.  In its "Guidelines for Learning Object Metadata," CanCore goes further than other application profiles in interpreting and explicating element and vocabulary semantics, and in both reflecting and attempting to reinforce best and common practices.  However, CanCore --like any other LOM application profile-- is incapable of shielding technical implementers from the characteristics of the syntactic implications of the LOM data model.  While LOM application profiles can simplify, augment and interpret to enhance semantic interoperation, providing similar support for syntactic interoperability is a different matter.  The full set of elements and hierarchical interrelationships as outlined in the LOM provide, by definition, the simplest common set of conditions for achieving technical interoperability.   Speaking very broadly, both the LOM and the DCMES --despite their very different approaches-- can be said to present their respective communities not only with a common solution, but also with a common set of problems or challenges.  At the same time --whether in the areas of semantic, syntactic or other forms of interoperability (Miller, 2000)-- these metadata standards also present the opportunity for the collaborative development of solutions, and their sharing and reuse across implementations.

References

ADL (2003) SCORM Version 1.3 Application Profile Working Draft Version 1.0 http://www.adlnet.org/index.cfm?fuseaction=DownFile&libid=496&cfid=134102&cftoken=37120141&bc=false

CIMI. (2000). Guide to Best Practice: Dublin Core. http://www.cimi.org/public_docs/meta_bestprac_v1_1_210400.pdf

CLEO (2003). CLEO Extensions to the IEEE Learning Object Metadata. http://www.cleolab.org/CLEO_LOM_Ext_v1d1a.pdf

Curriculum Online 2003. http://www.curriculumonline.gov.uk/Curriculum+OnLine/SupplierInfo/metadatadocs.htm?Nav=SupplierInfo

Curriculum Online. (2003). http://www.dfes.gov.uk/curriculumonline

Duval, E., Hodgins, W., Sutton, S., Weibel, S. L., 2002. Metadata Principles and Practicalities. D-Lib Magazine, 8 (4).  http://www.dlib.org/dlib/april02/weibel/04weibel.html

ECC. (2003). SingCORE: Singaporeís Meta-data Schema for Labeling Digital Learning Resources. http://www.ecc.org.sg/cocoon/ecc/website/singcore-17-jan-03.pdf

Friesen, N. (2002).  Semantic Interoperability and Communities of Practice.  In Mason, Jon. (Ed.) Global Summit of Online Learning Networks: Papers.  Adelaide: Educationau, 2002  http://www.educationau.edu.au/globalsummit/papers/nfriesen.htm

Friesen, N., Hesemeier, S., Fisher, S., Roberts, A., Habkirk, S. (2003) CanCore: Principles and Positions on Learning Object Metadata.  Contribution of the Canadian National Body to ISO/IEC JTC1 SC36 WG4 "Learning Resource Metadata." http://jtc1sc36.org/doc/36N0430.pdf

HEAL (2003). HEAL Standards and Documentation http://www.healcentral.org/documents.htm  

Hewlett-Packard, No date. Introduction to Semantic Web Technologies. http://www.hpl.hp.com/semweb/sw-technology.htm

IEEE 2002. IEEE P1484.12.2/D1, 2002-09-13 Draft Standard for Learning Technology - Learning Object Metadata http://ltsc.ieee.org/doc/wg12/LOM_1484_12_1_v1_Final_Draft.pdf 

IEEE 90:  Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries.  New York, NY: 1990

IMS. (2001). IMS Learning Resource Meta-data Specification. http://www.imsglobal.org/metadata/

IMS. (2003). IMS Digital Repositories Specification. http://www.imsglobal.org/digitalrepositories

ISO/IEC JTC1 SC36 WG4. (2003). Resolutions of 2003-03 SC36/WG4 Meeting http://mdlet.jtc1sc36.org/doc/SC36_WG4_N0020.pdf

J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton. Relational databases for querying xml documents: Limitations and opportunities. In Proceedings of the 25th VLDB Conference, September 1999. http://www.cs.cornell.edu/people/jai/papers/RdbmsForXML.pdf

Miller, Paul (2000). UK Interoperability Focus. http://www.ukoln.ac.uk/interop-focus/about/

Richardson, S. Powell, A. (2003).  Exposing information resources for e-learning - Harvesting and searching IMS metadata using the OAI Protocol for Metadata Harvesting and Z39.50. Ariadne 34.
http://www.ariadne.ac.uk/issue34/powell/

The Le@rning Federation schools online curriculum content initiative.  Metadata Application Profile. 
http://www.thelearningfederation.edu.au/repo/cms2/tlf/published/3859/docs/Metadata_Application_Profile_1_2.pdf

Weibel, S., Ianella, R. & Cathro, W. (1997). The 4th Dublin Core Metadata Workshop Report.  D-Lib Magazine. 
http://www.dlib.org/dlib/june97/metadata/06weibel.html

Wiley, D. A., Recker, M. M. & Gibbons, A. (2000). In defense of the by-hand assembly of learning objects. http://wiley.ed.usu.edu/docs/axiomatic.pdf