Future Internet Symposium 2009

Tutorial Date: 1 September 2009 (full day)

Web Service Crawling and Annotation


Part 1: Introduction to Web Service Crawling

Part 2: Web Service Annotations

Part 3: Annotating Web Services with GATE (including hands-on session)

Part 4: Annotating Web Services with MicroWSMO (including hands-on session)

The aim of this tutorial is to describe how semantic technologies can be employed for improving the large-scale discovery of Web Services. We concentrate on working with data dumps coming from a Web Service crawler that focuses only on service relevant parts of the Web, on building unique service objects out of the data and on semi-automatically annotating these services. Input to this tutorial comes from actual results of the FP7 European R&D projects Service-Finder and SOA4All. The tutorial will show in detail which kind of data we will have to deal with, i.e. WSDL services, Web APIs (a.k.a. RESTful services) and related Web documents. We will address the topic of document duplication and explain how to infer unique service objects from the data. In a next step we will show how to automatically extract information from Web documents (both structured and unstructured data), and use them for semantic enrichment of the services. The automatic annotations, examplified with the use of GATE, consist of both category annotations and generic service information annotations, described by the means of specific service ontologies. We will introduce these ontologies and will detail the semantic annotation via hands-on sessions. In a last step we will show how to use MicroWSMO to semantically annotate RESTful services.


Agenda


Time

Event

11:00

Introduction – Nathalie Steinmetz

11:15

Basics – Web Services and Crawling – Nathalie Steinmetz


The introduction will provide an overview of Web Services in general and will introduce the idea of crawling the Web for services. It will show how to build unique abstract service objects from the crawled Web Service data.

12:00

Service Ontologies – Nathalie Steinmetz


We will provide an overview of the different ontologies that are used within Service-Finder and SOA4All to store all service-related meta-data. This meta-data encompasses both functional and non-functional service information. We will work with these ontologies in the following hands-on sessions.

12:45

Crawl Data – Nathalie Steinmetz


The tutorial participants will be familiarized with the data that results from a Service Crawler. They will get access to real large data sets and will learn how to handle them. The data will be used in the later hands-on sessions when the participants learn how to provide semantic annotations for the services.

13:00

Lunch Break

14:00

Introduction to MicroWSMO – Maria Maleshkova


We will provide an introduction to the MicroWSMO ontology and the MicroWSMO editor and will explain how to use the ontology for annotation of RESTful services.

14:30

Introduction to GATE – Adam Funk


We will provide an introduction to the GATE framework and architecture and will explain how to use GATE to semantically annotate WSDL descriptions and related documents, an approach that is followed within the Service-Finder project. The introduction will be joined by a GATE demonstration.

15:30

Coffee break

16:00

GATE Hands-on Session – Adam Funk


The tutorial participants will first be demonstrated the usage of GATE and will then, under supervision of the tutorial presenters, apply the introduced methods and tools to provide semantic annotations of services.

16:30

MicroWSMO Editor Demo and Hands-on Session – Maria Maleshkova


The tutorial participants will first be demonstrated the usage of the MicroWSMO editor for easy annotation of RESTful services, and will then, under the supervision of the tutorial presenters, actively use the editor. They will learn how to recognize important information in RESTful service documents and how to accordingly annotate them.

17:30

Questions and Answers – Nathalie Steinmetz, Adam Funk and Maria Maleshkova


Final questions and answers session: the participants and the presenters will discuss open questions and wrap up the day.



Speakers

Nathalie Steinmetz is a student at the University of Innsbruck, where she works since 2005 with the Semantic Technology Institute (STI Innsbruck) in the area of Semantic Web and Web Services as junior researcher and project collaborator in national and international projects. Beginning of 2008 she joined seekda GmbH, a start-up from STI Innsbruck, where she works as researcher and project manager. Her main research interests are in technologies around the Semantic Web, Web Services, Service-Oriented Architectures and Search Engines, with a special focus on discovery and location of (Semantic) Web Services.

Nathalie is involved in both the Service-Finder and the SOA4All projects. She mainly works on the focused service crawling, trying to detect as many as possible services and related documents on the Web. The crawling encompasses both WSDL services and RESTful services. Out of the crawled service data unique service objects are built and some basic metadata is collected. Nathalie was as well involved in building some of the ontologies that are used within the two projects to store the service meta-data.

Adam Funk received his PhD in computational linguistics from the University of Manchester in 2005, and worked at the National Centre for Text Mining in Manchester until he joined the NLP Research Group at the University of Sheffield in January 2006. At Sheffield, he has worked on several projects (in particular, SEKT, LIRICS, MUSING and Service-Finder) in the fields of semantic web development, implementation of linguistic annotation standards, information extraction, and business intelligence.

Adam is involved in the Service-Finder project, where he works on the automatic service annotation. He takes WSDL files and related documents that result from a Service Crawler and analyses them to generate semantic descriptions of the services and their providers. He employs information extraction and machine learning techniques with the GATE library and architecture to categorize each service under concepts from the Service Category Ontology, to extract information such as contact details and pricing information, and to classify documents by genre and level of interest.

Maria Maleshkova is a PhD student focusing on supporting and enhancing the creation of semantic annotations over Web Services. In particular, her work focuses on providing tool support for the semantic annotation of RESTful services with MicroWSMO and developing approaches for the automatic service annotation recommendation.

Maria is involved in the SOA4All project, where she focuses on the development of the provisioning platform. Some of her main tasks within the scope of this project include the development of an approach for the semi-automatic extraction of semantic descriptions of services, based on service-related documents and user feedback. In particularly focusing on, developing a methodology for the recognition of service properties in text-documents by extracting terms, which are characteristic for a particular service description. In addition, she works on the development of a user editor for WSMO Lite and MicroWSMO, which will provide the user with options to edit and improve the resulting semantic service annotations, by adding supplementary information and comments.


Acknowledgements

The organizers of the Web Service Crawling and Annotation tutorial gratefully acknowledge the contributions from the following European FP7 projects: