Course syllabus

COMPSCI 752: Web Data Management

2018 Semester 1

15points 

Recommended preparation: COMPSCI 351 or equivalent

Prerequisite: Departmental approval

Course coordinator: Professor Sebastian Link

Timetable

Tue 14:00-15:00, 303-101

Wed 10:00-12:00, 803-210   

Course prescription

Internet and the Web have revolutionized access to information. Today, one finds primarily on the Web, HTML (the standard for the Web) but also documents in pdf, doc, plain text as well as images, music and videos. The public Web is composed of billions of pages on millions of servers. It is a fantastic means of sharing information. It is very simple to use for humans. On the negative side, it is very inappropriate for access by software applications. This motivated the introduction of new data models, namely XML and RDF that are well suited both for humans and machines. The course aims to describe the structure of information found on the Web, and to explain how this information can be efficiently represented, described and accessed. Primary topics of the course include Web data modelling and large-scale data management in distributed and heterogeneous environments.

Course outcomes

  • Apply state-of-the-art representation formalisms for Web data, including the eXtensible Markup Language (XML) and the Resource Description Framework (RDF)
  • Model Web data with XML schema languages, including Document Type Definitions (DTDs), XML Schema and tree automata
  • Query Web data with XML and RDF query languages, including XPath, XQuery, XSLT, and SPARQL
  • Integrate Web data with ontologies, including RDF schema and the Web Ontology Language OWL
  • Understand how to manage big data on the Web, including techniques for searching, indexing and processing such as PageRank, MapReduce, Spark and Blockchain technology

Topics

Extensible Markup Language (XML), XPath, XQuery, XUpdate, XSLT, Tree automata, RDF, RDFS, Ontologies, OWL, SPARQL, Data Integration, Web Search, Data distribution, Distributed computing, Hadoop, MapReduce, Pig, Spark, Blockchain

Textbook

Web Data Management. Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart; Cambridge University Press, 2011. http://webdam.inria.fr/Jorge/

Related Reading

  • Big data analytics with Spark. Mohammed Guller; Apress, 2015.
  • Blockchain basics. Daniel Drescher; Apress, 2017.

Course summary:

Date Details Due