Course syllabus

COMPSCI 752: BIG Data Management

2019 Semester 1

15points 

Recommended preparation: 

  • COMPSCI 351 or equivalent

Prerequisite: 

  • Departmental approval

Course coordinator:

  • Professor Sebastian Link

Course lecturers

  • Dr Ninh Pham
  • Professor Sebastian Link

Timetable:

  • Tue 14:00-16:00, ClockT032/105-032
  • Fri 11:00-13:00, ClockT032/105-032    

Tutorial (only in selected weeks):

  • Mon 10:00-11:00, 119-G30
  • Mon 13:00-14:00, 119-G30

Course prescription:

Many companies must manage large volumes of diverse data in order to stay competitive. The deep diversity of modern day data requires data scientists to master many technologies that rely on new principles to represent, describe, and access data. The course will provide insight into the rich landscape of big data. The main aim of the course is to prepare students for big data modelling and large-scale data management in distributed and heterogeneous environments. On the one hand, learning the principles of big data management will prepare students for a career as data scientists, independently of continuous technology changes. In particular, learning how to model, query, and integrate big data are necessary skills to get data ready for analytical purposes. For example, MapReduce algorithms such as PageRank illustrate how to efficiently rank billions of Web pages. On the other hand, investigating current big data technologies will demonstrate what is and what is not possible today, but also highlight opportunities for future work. For instance, Spark offers an integrated technology framework for preparing and analysing big data, while the disruptive Blockchain technology exemplifies a distributed computing system with high fault tolerance with application potential that we are only beginning to understand.

Recommended preparation:

  • COMPSCI 351 or equivalent

Course outcomes:

  • (LO1) Apply state-of-the-art in representation formalisms for big data, including the eXtensible Markup Language (XML), the Resource Description Framework (RDF), the JavaScript Object Notation (JSON), and NoSQL (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking)
  • (LO2) Model big data with schema languages, including Document Type Definitions (DTDs), XML Schema, JSON schema, and RDF schema (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking)
  • (LO3) Access big data with query languages, including XPath, XQuery, XSLT, SPARQL, Hive, Spark SQL (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking)
  • (LO4) Integrate big data with ontologies, including the Web Ontology Language OWL (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking)
  • (LO5) Understand how to manage and analyse big data, including techniques for searching, indexing and processing such as PageRank, MapReduce, Spark and Blockchain technology (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking)
  • (LO6) Present as a group to fellow students and teachers slides on your joint understanding about the state-of-the-art knowledge on a topic about big data (Disciplinary Knowledge and Practice, Critical Thinking, Communication and Engagement, Independence and Integrity)
  • (LO7) Communicate their individual understanding of the state-of-the-art knowledge on a research topic about big data in the form of a written report, including the fair use of this knowledge in business and society (Disciplinary Knowledge and Practice, Critical Thinking, Solution Seeking, Communication and Engagement, Independence and Integrity, Social and Environmental Responsibilities)

Topics:

  • Extensible Markup Language (XML), XPath, XQuery, XUpdate, XSLT, Tree automata
  • RDF, RDFS, Ontologies, OWL, SPARQL
  • Data integration
  • Web search
  • Data distribution and distributed computing
  • Hadoop, MapReduce, Pig
  • Spark
  • Blockchain

Textbook:

  • Web Data Management. Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart; Cambridge University Press, 2011. http://webdam.inria.fr/Jorge/

Related Reading:

  • Big data analytics with Spark. Mohammed Guller; Apress, 2015.
  • Blockchain basics. Daniel Drescher; Apress, 2017.

Course summary:

Date Details Due