Visual Specification and Automatic Transformation of Web Interchanging Documents

Award number: NSF ITR 0218738
PI: Kang Zhang

Project Summary

This NSF IIS supported project investigates a visual approach to the specification and transformation of XML-based Web documents. Based on a graph grammar and graph transformation formalism, the visual approach will allow the structural requirements of any XML dialect to be specified graphically and automatically generate a graphical toolkit for the construction of Web documents of represented in the XML dialect. It will also support automatic translation of Web documents from one XML dialect to another. With the proposed spatial extension, the grammar formalism will be able to specify adaptive layout for any Web presentation.

Background

Transformations of XML documents XSL must be created manually on a case-by-case basis. Furthermore, writing an effective XSL code requires some degree of programming skills and good understanding of XML’s working principle. The scientific communities have realized the importance of the XML technology and the need for automatic transformations [1]. Of the general textual approaches to the automatic translation of XML documents, Xtra [5] aims at automatic transformations between XML documents by discovering a sequence of transformation operations from the source and target DTD trees. The operations are used to generate an XSL script, which can then be applied to source XML documents to transform them to XML documents conforming to the target DTD. Approaches based on database schemas [3, 4] use rules to match similar document components for most common cases, and allow the user to customize the rules for more complex cases. These approaches offer neither automatic structure validation, nor means for visual representation and specification.

Visual approaches such as Xing [2] have been proposed. Xing achieves XML transformation and restructuring using some rules that combine the patterns of queries and results returned by queries. It uses nested boxes to represent XML data such what each element tag is written on top of a box while all the attributes of the element are enclosed in the box. Such representation is essentially textual, supplemented by hierarchical boxes surrounding the text. For a large document, the levels of nesting could be overwhelming so that it is hard for the user to understand the overall structure.

Goals, Objectives and Targeted Activities

This project has the following objectives:

To identify and find practical solutions the fundamental issues in visual transformation and validation of Web documents.
To develop a visual XML document design, validation and translation framework to support the automatic generation of Web document construction and transformation tools. The generation framework will be equipped with graphical tools for specifying the document syntax and transformation rules.
To extend the existing graph transformation formalism with spatial specification capability so that not only the logical structure but also the layout of XML documents could be specified graphically.
To evaluate the usability of the visual features of the developed toolset and enhance the user-friendliness and user productivity is using the prototypical toolset.

Project Impact

The proposed project will enhance the accessibility of IT in general and Web documents in particular to the general public by developing the user-friendly visual tools for Web document design and transformation. Web designers will find it easy to define Web document structures through graph grammar specifications and use the toolset to automatically generate graphical languages, each within a graph editor for a domain-specific XML, and if desired, for its translation into the XML of another domain. Application users in the domain without any knowledge of XML could then use the graph editor to construct domain-specific XML documents in an intuitive manner.

Teaching and training of Web engineering and programming language generation will benefit from our prototyping toolset. Using the toolset, we could step-by-step and intuitively illustrate how a visual XML language is generated in the same fashion as for conventional language generation, according to a given set of specifications. Then how an XML document can be rapidly constructed, parsed, validated, and transformed into another document style using the generated visual XML language can also be vividly demonstrated. The toolset and the concepts associated with it will make excellent teaching and self-training materials for the courses on programming languages, compilers, and visual interfaces.

Applications of the Prototype

a) XML-Based Biological Information Translation

A major challenge in bioinformatics is the interoperation of heterogeneous biological information from multiple sources of sequence data from genomic and sequence databases. This section goes through an example to demonstrate Trans VME for graphical transformation of bioinformatics XML documents through the translation from a NCBI (National Centre for Biotechnology Information) XML document into a BSML (Bioinformatic Sequence Markup Language) document.

NCBI was established in 1988 as a national resource for molecular biology information. NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information. Internally NCBI stores data in a variety of ways most appropriate to the flow of the data and its semantics. These may include normalized relational databases (eg. for ESTs), ASN.1 (eg. for other types of sequences), or XML (eg. for journal articles). BSML is an open XML data standard created for more efficient communication within the life sciences community. BSML was one of the first eXtensible Markup Languages (XML) to be developed after the World Wide Web Consortium (W3C) proposed XML.

Generated from a set of transformation rules, the Trans VME named Biotrans, facilitates the interchange of data from NCBI XML format to BSML format, i.e. an implementation of Trans operator on data instances. As shown in Figure 1.a, the transformation rules for the VME Biotrans consist of 7 rules, of which the rule 1 is shown in the figure. Figure 1.b shows the generated VME, Biotrans, with a portion of input NCBI XML document displayed as a tree. Biotrans VME then translates the input document into BSML file shown in Figure 1.c.

Facilitated by the framework, constructing a visual transformation environment is only to create a set of rules. The effort for low-level programming is therefore minimized, and the framework boosts the interoperation among heterogeneous bioinformatics data instances.

b) Web Documents Adaptation

Various viewing conditions exist while users surf the Internet, such as varying screen sizes, style preferences, and different device capabilities. Majority Web documents are for desktop displaying, such as XML and XHTML, while WML (Wireless Markup Language) documents are for displaying on mobile devices. As shown in Figure 1, a Trans operator VME, named XMLtoWML, was developed based on the prototype system.

Figure 1.a defines 7 transformation rules in the rule generator, which in turn generates a Trans VME for the translation from input XML documents to WML documents. Figure 1.b shows a source XML document, which has a page with two sections. In the generated VME, XMLtoWML, the input XML document is translated into a WML document as shown in Figure 1.c.

With the growing demanding for mobile Internet surfing, adaptation of Web documents for small screen display is becoming critical. With the help of the framework, programming via transformation rules to reuse existing XML or XHTML documents saves a lot of effort to create new mobile pages or programming adaptation tools from scratch.

c) Ontology Mapping and Integration

We provide a warehouse design for heterogeneous ontologies, such as OBOs and other proprietary ontologies. These proprietary ontologies can be imported into the warehouse through proprietary designated interface programs, while tools are available for importing OBOs into the warehouse. To interoperate these ontologies, mapping tables were designed to store the correspondences between terms of these ontologies.

The IOMG is implemented to generate mappings among the ontologies retrieved from the warehouse and detects possible false mappings for domain experts to confirm. Figure 2 shows the IOMG mapping generation interface.

Figure 2 Oasis Ontology Mapping

As highlighted, Parkinson disease is successfully mapped to Paralysis agitans, which is synonym to Parkinson disease. The automatic generation may produce some false mappings too, or miss some true mappings. Result mappings should be verified by biomedical experts before being saved into the ontology warehouse. Through a friendly user interface, domain experts can correct mappings by editing the mappings directly in the graph.

c) Schema Translation

Similar to the Trans operator on data instances, a Trans operator VME on data models is also developed by the graph transformation rules.

Early version of XML, version 1.0, contains a built-in schema language, i.e. DTD, which is going to be replaced by XML schema because of some limitations of the DTD. Since its first introduction many DTD documents are available and have to migrate to the XML schema documents. Based on the framework, we designed a Trans VME to help the migration from DTD to XML Schema.

As shown in Figure 3.a, the graph transformation rules specify a Trans operator to translate a DTD document to XML Schema document. For the input DTD document, five rules are defined and the first one shown in the popup window of Figure 3.a. Since the schema and data instance specified in a uniform representation, the transformation of the schema is specified in the similar way.

Figure 3.b shows the generated DTD to XML Schema transformation environment, called DTD2XS VME, which displays the input DTD document as a tree. The VME translates the input DTD into an XML Schema document displayed as a tree in Figure 3.c.

PhD Students Supported

Guanglei Song (currently Senior Software Developer, eBay Inc., CA)

Jun Kong (currently Assistant Professor, North Dakota State University, ND)

Publications Related to This Project

JOURNAL PAPERS:

J. Kong, K. Zhang, and X. Zeng, Spatial Graph Grammars for Graphical User Interfaces, ACM Transactions on Computer-Human Interaction, Vol.13, No.2, June 2006, 268-307.

G.L. Song, J. Kong, and K. Zhang, AutoGen: Easing Model Management through Two Levels of Abstraction, Journal of Visual Languages and Computing, Vol.17, No.6, 2006, Elsevier Science Inc., New York, 508-527.

K. Zhang, J. Kong, M.K. Qiu, and G.L. Song, Multimedia Layout Adaptation Through Grammatical Specifications, ACM/Springer Multimedia Systems, Vol.10, No.3, 2005, 245-260.

REFEREED CONFERENCE PAPERS:

G-L. Song, Y. Qian, Y. Liu, and K. Zhang, Oasis: a Mapping and Integration Framework for Biomedical Ontologies, Proc. 19th IEEE International Symposium on Computer-Based Medical Systems, Salt Lake City, USA, 22-23 June 2006, 611-616.

K.L. Ates, K. Zhang, and B. Prabhakaran, Visual Querying on Human Motion for the Disabled, Proc. 2006 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’06), Brighton, UK, 4-8 September 2006, IEEE CS Press, 222-223.

K. Ates, J. Kukluk, L. Holder, D. Cook, and K. Zhang, Graph Grammar Induction on Structural Data for Visual Programming, Proc. 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06), Washington D.C., USA, 13-15 November 2006, 232-239.

K. Zhang, G.L. Song, and J. Kong, Interoperating XML-Style of Digital Artifacts for Information Reuse, Proc. 2005 IEEE International Conference on Information Reuse and Integration (IRI’05), Las Vegas, USA, 15-17 August 2005, IEEE Press, 126-131.

G.L. Song, K. Zhang, B. Thuraisingham, and J. Cao, Towards Access Control for Visual Web Model Management, Proc. 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05), Hong Kong, China, 29 March - 1 April 2005, IEEE CS Press, 722-727.

G.L. Song, K. Zhang, B. Thuraisingham, and J. Kong, Secure Model Management Operations for the Web, In: S. Jajodia and D. Wijesekera (Eds.) Data and Application Security XIX - Proc. 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Storrs, USA, 7-10 August 2005, LNCS 3654, Springer-Verlag, 237-251.

G.L. Song, K. Zhang, R.K. Wong, and J. Kong, Management of Web Data Models Based on Graph Transformation, Proc. 2004 IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, 20-24 September 2004, IEEE CS Press, 398-404.

G.L. Song, K. Zhang, and J. Kong, Model Management Through Graph Transformations, Proc. 2004 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'04), Rome, Italy, 26-29 September 2004, IEEE CS Press, 75-82.

J. Kong and K. Zhang, Parsing Spatial Graph Grammars, Proc. 2004 IEEE Symposium on Visual Languages and Human-Centric Computing, Rome, Italy, 26-29 September 2004, IEEE CS Press, 99-101.

J. Kong and K. Zhang, On a Spatial Graph Grammar Formalism, Proc. 2004 IEEE Symposium on Visual Languages and Human-Centric Computing, Rome, Italy, 26-29 September 2004, IEEE CS Press, 102-104.

G.L. Song, K. Zhang, R.K. Wong, and J. Kong, Management of Web Data Models Based on Graph Transformation, Proc. 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), Beijing, China, 20-24 September 2004, IEEE CS Press, 398-404.

G.L. Song and K. Zhang, Visual XML Schemas Based on Reserved Graph Grammars, Proc. International Conference on Information Technology (ITCC’04), Las Vegas, USA, 5-7 April 2004, IEEE CS Press, 687- 691.

J. Kong, M.K. Qiu, and K. Zhang, Authoring Multimedia Documents Through Grammatical Specifications, Proc. 2003 IEEE International Conference on Multimedia & Expo (ICME'2003), Baltimore, USA, 6-9 July, 2003, IEEE CS Press, 629-632.

M.K. Qiu, G.L Song, J. Kong, and K. Zhang, Spatial Graph Grammars for Web Information Transformation, Proc. 2003 IEEE Symposium on Visual/Multimedia Languages (VL'03), Auckland, New Zealand, 28-31 October 2003, IEEE CS Press, 84-91.

J. Kong, and K. Zhang, Graph-based Consistency Checking in Spatial Information Systems, Proc. 2003 IEEE Symposium on Visual Languages and Formal Methods (VLFM'03), Auckland, New Zealand, 28-31 October 2003, IEEE CS Press, 153-160.

J. Kong and K. Zhang, Toward A Graphical Approach to Multimedia Document Design, Proc. 23^rd International Conference on Distributed Computing Systems Workshops - 5th International Workshop on Multimedia Network Systems and Applications, Providence, USA, 19-22 May 2003, IEEE CS Press, 666-671.

Cited References

K-H. Cheung, Y. Liu, A. Kumar, M. Snyder, M. Gerstein, and P. Miller, An XML Application for Genomic Data Interoperation, Proc. 2rd IEEE Symp. on Bioinformatics and Bioengineering, 4-6 Nov. 2001, 97-103.
M. Erwig, A Visual Language for XML, Proc. 2000 IEEE Symp. on Visual Languages, Seattle, USA, 10-13 Sep. 2000, IEEE CS Press, 47-54.
L. Milo and S. Zohar, Using Schema Matching to Simplify Heterogeneous Data Translation, Proc. 24th Int. Conf. on Very Large Databases, New York City, USA, 24-27 August, 1998, 122-133.
W.M. Shui and R.K. Wong, Application of XML Schema and Active Rules System in Management and Integration of Heterogeneous Biological Data, Proc. 3rd IEEE Symp. on Bioinformatics and Bioengineering, 10-12 March 2003, 367-374.
H. Su, H. Kuno, and E.A. Rundensteiner, Automating the Transformation of XML Documents, Proc. 3rd Int. Workshop on Web Information and Data Management, Atlanta, USA, 9 Nov. 2001, ACM Press, 68-75.