Course syllabus
COMPSCI 752: Web Data Management
2018 Semester 1
15points
Recommended preparation: COMPSCI 351 or equivalent
Prerequisite: Departmental approval
Course coordinator: Professor Sebastian Link
Timetable
Tue 14:00-15:00, 303-101
Wed 10:00-12:00, 803-210
Course prescription
Internet and the Web have revolutionized access to information. Today, one finds primarily on the Web, HTML (the standard for the Web) but also documents in pdf, doc, plain text as well as images, music and videos. The public Web is composed of billions of pages on millions of servers. It is a fantastic means of sharing information. It is very simple to use for humans. On the negative side, it is very inappropriate for access by software applications. This motivated the introduction of new data models, namely XML and RDF that are well suited both for humans and machines. The course aims to describe the structure of information found on the Web, and to explain how this information can be efficiently represented, described and accessed. Primary topics of the course include Web data modelling and large-scale data management in distributed and heterogeneous environments.
Course outcomes
- Apply state-of-the-art representation formalisms for Web data, including the eXtensible Markup Language (XML) and the Resource Description Framework (RDF)
- Model Web data with XML schema languages, including Document Type Definitions (DTDs), XML Schema and tree automata
- Query Web data with XML and RDF query languages, including XPath, XQuery, XSLT, and SPARQL
- Integrate Web data with ontologies, including RDF schema and the Web Ontology Language OWL
- Understand how to manage big data on the Web, including techniques for searching, indexing and processing such as PageRank, MapReduce, Spark and Blockchain technology
Topics
Extensible Markup Language (XML), XPath, XQuery, XUpdate, XSLT, Tree automata, RDF, RDFS, Ontologies, OWL, SPARQL, Data Integration, Web Search, Data distribution, Distributed computing, Hadoop, MapReduce, Pig, Spark, Blockchain
Textbook
Web Data Management. Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart; Cambridge University Press, 2011. http://webdam.inria.fr/Jorge/
Related Reading
- Big data analytics with Spark. Mohammed Guller; Apress, 2015.
- Blockchain basics. Daniel Drescher; Apress, 2017.
Course summary:
Date | Details | Due |
---|---|---|