RAINBOW homepage Project description People Applications Downloads and shows Bibliography Related projects Web made by Vojtech Svatek |
Project descriptionThe project is currently supported by the grant no.201/03/1318 of the Czech Science Foundation (CSF), "Intelligent analysis of web content and structure" (2003-2005). The goal of the project is to develop a flexible architecture for knowledge-based analysis of the WWW. The Rainbow system employs the web service and semantic web technology to analyse and present to a user or computer agent the content and structure of legacy websites. The analysis of a website is multiway (see paper Sv03d), with results being integrated. The conspicuous feature of analysis services is their systematic categorisation according to four dimensions: abstract type of task (classification, retrieval, extraction), type of ‘current’ object (e.g. document, hyperlink, image), type of analysed data (e.g. free text, HTML tags, link topology, image data), and problem domain (e.g. bicycle sales). This four-dimensional approach is captured by the so-called ‘task-object-datatype-domain’ (TODD) knowledge-level framework (see paper Sv04c) and by an associated collection of ontologies (see paper La03). In Summer 2005, the Rainbow system comprises the following analysis services:
We also plan to integrate third-party tools. In addition, the infrastructure includes:
The source data repository is provided by the full-text & native XML database tool AmphorA, developed by the Amphora Research Group, TU Ostrava (partner in the CSF project). Thanks to sophisticated XML and text indexing, it enables fast XML querying as well as text retrieval (see paper Kr05a). As result repository, we use Sesame (by Aduna/Aidministrator, NL) with the help of the expertise provided by the Knowledge Representation and Reasoning Group at the Vrije Universiteit Amsterdam. The stored RDF facts are retrieved using SeRQL as query language (see the Applications section and paper Sv04b). |