Rule-enforcing editing Environment for a Frame-based
Controlled Medical Dictionary
Gai Elhanan, M.D.
Introduction
In today’s heterogeneous Clinical Information System (CIS) environment Institutional Data Dictionaries (IDD) prove to be an important component of the required infrastructure. IDDs can be used by interface and upload engines to translate and bridge between vocabularies of the ancillary systems and to transform multiple vocabularies into a single coding system for storage and retrieval of data from centralized clinical repositories. For optimal IDDs performance they must be kept up to date with all the vocabularies involved. While IDDs are not expected to be as large as the sum of all vocabularies used in any specific CIS, they will become large enough to pose significant problems for daily management. Therefore, an efficient and easy to use editing environment is a requirement for the long-term success of any such IDD. This paper describes the operational rules for a concept-based semantic editor and the principles of its multi-user editing environment.
Background
The concept-based semantic editor (ConSeNt) is a frame-based controlled vocabulary. ConSeNT is a directed acyclic graph (DAG) that supports multiple classification hierarchies based on the IS-A semantic type and follows Cimino’s controlled medical dictionary desiderata1. Each concept is represented by a frame consisting of slots that may be of two types; literal slots contain descriptive information about a concept while semantic slots describe the explicit relationship of a concept within the hierarchy based on various semantic types.
A similar environment exists is the Medical Entities Dictionary (MED) at
NYP,
The
operational rules of ConSeNt
Any controlled vocabulary that is structured as a semantic network is ultimately a formal knowledge representation language that serves a particular domain. As with any formalized language, a set of self-consistent operational rules must be enforced on the data structure that represents the knowledge in the domain. These rules will be described in this section.
1.
Acyclicity: ConSeNT is a DAG. An
ancestor concept may not be a descendant as well and each concept must have at
least one parent (except the top concept); a concept must not be left hanging
in mid-air.
2.
Each concept must have a unique
normalized name.
3.
Slots may contain more than one
value but may be limited, by definition, to single values.
4.
Literal slot values do not get
inherited.
5.
Literal slots may contain any type
of information.
6.
Semantic slot values are inherited
by default (Figure 1).
7.
Semantic slots contain only the
unique ID (uID) values of other concepts.
8.
Semantic slots consist of pairs
that point to one another (Figure 1). Thus, if concept {A} has semantic slot
[x] containing the uID of concept {B} then concept {B} will have a slot [y]
pointing to concept {A}; [x][y] are a semantic pair. Reciprocal pointing is
explicit.
9.
All slots, literal or semantic,
have a specified domain; i.e. a specific
sub-hierarchy within the MED where the slot is defined.
10.
Semantic slots must have a defined
range; i.e. a sub-hierarchy within the MED from which concepts may be applied
to a specific slot. This range is simultaneously the domain of the reciprocal
semantic slot.
11.
Any sub-hierarchy in ConSeNT, and,
in particular, the domain and range of a semantic slot, are identified by the
root concept of the sub-hierarchy. Thus, every slot becomes defined at a unique
concept in ConSeNT called its insertion point. The “defined” status
of a specific slot is automatically activated for all descendants of the
insertion points.
12.
Refinement inheritance of semantic
slots: semantic slot values may be refined (Figure 1); i.e. a more specific
value (a descendant) may replace a less specific value (its ancestor) but both
cannot co-exist in the same slot. A semantic slot will only
hold the most specific value if several related (from the same hierarchy)
values get inherited from different parents; i.e. multiple values in a semantic
slot must each be from unrelated hierarchies.

Representational
Models for Concepts
There are two contrasting types of models of the data structure used to represent concepts within the terminology. Once a modeling type is chosen one is committed to maintaining the data structure accordingly since dependant applications require consistency. Perhaps the most intuitive is the static model (Figure 2) where each concept holds all of its slot attributes; semantic and non-semantic. This model is most suitable for production systems where response time is important and the application uses the vocabulary but does not change it. The static model can further be enhanced by pre-calculating and storing within the concept its ancestry and descendants besides parents and children. While this model offers speed it is not optimal for editorial tasks. Editorial tasks, on the other hand, change the vocabulary. Changes in parent-child relationship or semantic values as well as changes to slot definitions will require extensive calculations down the descendents tree in order to be propagated. The number of concepts that may be affected by the semantic propagation process may reach up to n-1 concepts in multiple slots.

The alternative is a dynamic model (Figure 2) that utilizes the operational
rules of the terminology. In the dynamic model slot values may either be
explicit; i.e. declared specifically at that concept’s slot or implicit;
i.e. values inherited from higher concepts. Concepts viewed using the dynamic
model are partially virtual since all of the non-explicit semantic values are
calculated on-the-fly based on the operational rules stated above and require
traversing up the ancestry tree and enhanced calculations in order to display
the actual semantic content of a concept. The atomic editing actions must allow
the human editor to view, at the least, three generations of concepts: the
focal concept being edited, one or more of its parents and zero or more of its
children. Therefore, the dynamic model need not store any ancestry or
descendent information other than parents and children. All editorial changes
are applied only to explicit values of the affected concept. The most
time-inefficient operation, semantic propagation, need only be limited to
enforcing the operational rules of acyclicity and refinement inheritance and
can be performed in the background. On undo situations the dynamic model is
advantageous as well since the only affected concepts are the main concept
being edited and its explicit reciprocals and therefore there is no need to
undo changes for each and every descendant of the main concept and those of the
reciprocals.
Editing
Environment
Paradoxically, the complex semantic network supported within the IDD is the tool to be used to allow for semi-automatic classification of new concepts given that the relevant semantic slots are instantiated: the more complex the better. Since the semantic network describes the explicit relationships of concepts and since that information uses uIDs of other concepts, pre-defined new concepts can be pushed down the hierarchy to match siblings with similar relationships or refine parents in a process described by Cimino et al2. For online creation of new concepts by non-expert users, the interaction between the IS-A hierarchic location of the new concept and the semantic values of parents and siblings will delimit the sub-domain within the range sub-hierarchy and will simplify the selection process for a user. All changes must then be tested against the operational rules to verify the semantic integrity of the IDD with immediate feedback to the user.

ConSeNT web-based editor is based on it’s web-based browser3
with added functionality (Figure 3). All actions being taken before committing
the changes are virtual but are being checked against the operational rules
with immediate feedback screens for the user. Feedback screens are more for
verification than for error notification since potential errors are being
prevented because of automatic forced inheritance and by directing the user to
select only the appropriate semantic values from the relevant sub-hierarchies
within specific ranges. Errors are more prone to occur when batch processing
takes place; the editor has the ability to verify each change and skip
problematic ones including concepts that are based on previously erroneous
ones.
DISCUSSION
An IDD is not equivalent to a clinical terminology server, although it can serve as such in limited domains. Nevertheless an editing environment for an IDD should comply with as many of the desiderata enumerated by Chute et al4 for clinical terminology servers. Aside from these our experience shows that an IDD editor should support the following requirements:
- Adhere to the operational rules stated above
- Editing capabilities for both individual concepts and slots; i.e. additional functions to create new slots and modify existing ones according to the operational rules and apply the consequences.
- Multi-user functionality
- Batch process capabilities
- Validation capabilities for the complete hierarchy, sub-hierarchies and individual concepts
- Syntactic error detection for batch processing
- Logging and audit trails
- Versioning system
- Roll-back capabilities including plucking; i.e. the ability to identify if changes to a concept can be undone without losing later modifications.
- Variable permission levels for editing of different parts of the hierarchy
Following the above stated functionality a structured, yet flexible, environment was created that allows several modalities for editing a large scale IDD. This environment minimizes the potential for errors and directs unlearned users towards the “minimally-best” solution. By doing so the spectrum of would-be editors was vastly expanded and, as a consequence, the usability of the IDD was increased.
REFERENCES
1.
Cimino JJ. Desiderata for
controlled medical vocabularies in the twenty-first century.
Methods Inf Med. 1998 Nov;37(4-5):394-403
2. Cimino JJ, Johnson SB, Hripcsak G, Hill CL, Clayton PD. Managing vocabulary for a centralized clinical system. Medinfo. 1995;8 Pt 1:117-20.
3. http://informatics.cpmc.columbia.edu/homepages/wajngur. The “PreEdit MEDviewer” button.
4.
Chute CG,
Proc AMIA Symp. 1999;:42-6.