The following list of topics contains references to papers. One seminar topic can be comprised of more than one paper. (This is the case if the papers are very short or if one of the papers merely helps to give a better overview.)

If a topic has material for more than one participant, the parts are separated by a mark --.


What is Big Data?

Literature search.
Starting point:
Undefined by data: a survey of Big Data definitions
Jonathan Stuart Ward, Adam Barker


Principal Component Analysis

A tutorial on principal component analysis
Jon Shelens

Principal component analysis
Jake Lever, Martin Krzywinski, Naomi Altman


Hash Functions

Hashing techniques: a survey and taxonomy
Lianhua Chi, Xingquan Zhu


Cuckoo filter: practically better than Bloom
Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher


Computational models: Streaming, Sketching, MapReduce

Sketching and streaming algorithms for processing massive data
Jelani Nelson


Google's MapReduce programming model - revisited
Ralf Lämmel


Evaluating MapReduce for multi-core and multiprocessor systems
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis


Support Vector Machines

Support vector machine - a survey
Ashis Pradhan

Applications of support victor machines for pattern recognition: a survey
Hyeran Byun, Seong-Whan Lee


Algorithms for Big Data


Dimensionality Reduction

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction
Kasper Green Larsen, Jelani Nelson


Bloom Filters

Network applications of Bloom filters: a survey
Andrei Broder, Michael Mitzenmacher


Overview paper (possibly needs some literature search).

An optimal Bloom filter replacement
Anna Pagh, Rasmus Pagh, S. Srinivasa Rao


Exact pattern matching with feed-forward Bloom filters
Iulian Moraru, David G. Andersen


Probabilistic Counting

Counting large numbers of events in small registers
Robert Morris

Approximate counting: a detailed analysis
Philippe Flajolet


Frequency Moments

Optimal approximations of the frequency moments of data streams
Piotr Indyk, David Woodruff


Graph Streaming

Graph Stream Algorithms: A Survey
Andrew McGregor


Matrix Sketching

Simple and deterministic matrix sketching
Edo Liberty



Sublinar-time algorithms

Sublinear-time algorithms
Artur Czumaj, Christian Sohler




Theoretical analysis of the k-means algorithm - a survey
Johannes Blömer, Chrisiane Lammersen, Melanie Schmidt, Christian Sohler


Local search yields a PTAS for k-means in doubling metrics
Zachary Friggstad, Mohsen Rezapour, Mohammad R. Salavatipour


Heavy hitters via cluster-preserving clustering
Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, Mikkel Thorup


On Lloyd's algorithm: new theoretical insights for clustering in practice
Cheng Tang, Claire Monteleoni


The global k-means clustering algorithm
Aristidis Likas, Nikos Vlassis, Jakob J. Verbeek


Applications of Algorithms for Big Data


The application of principal component analysis to drug discovery and biomedical data
Alessandro Giuliani


Further Fields: Physics, Astronomy, Economy etc.

It is possible to choose a literature search on the use of algorithms for Big Data in other contexts.

Working on such a topic requires self-sufficient judgement of the quality of the material.

If several participants want to do such a topic, each has to choose a different scientific domain.


Frameworks and Tools

MapReduce, Hadoop

Parallel data processing with MapReduce: a survey
Kyong-Ha Lee, Hyunsik Choi, Bongki Moon


Overview of Tools Used in Practice

Literature search on tools like
HA proxy, Elasticseach, Logstash, Prometheus, Grafana



Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators