Topics

The following list of topics contains references to papers. One seminar topic can be comprised of more than one paper. (This is the case if the papers are very short or if one of the papers merely helps to give a better overview.)

If a topic has material for more than one participant, the parts are separated by a mark --.

Background

What is Big Data?

Literature search.
Starting point:
Undefined by data: a survey of Big Data definitions
Jonathan Stuart Ward, Adam Barker

 

Principal Component Analysis

A tutorial on principal component analysis
Jon Shelens

Principal component analysis
Jake Lever, Martin Krzywinski, Naomi Altman

 

Hash Functions

Hashing techniques: a survey and taxonomy
Lianhua Chi, Xingquan Zhu

--

Cuckoo filter: practically better than Bloom
Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher

 

Computational models: Streaming, Sketching, MapReduce

Sketching and streaming algorithms for processing massive data
Jelani Nelson

--

Google's MapReduce programming model - revisited
Ralf Lämmel

--

Evaluating MapReduce for multi-core and multiprocessor systems
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis

 

Support Vector Machines

Support vector machine - a survey
Ashis Pradhan

Applications of support victor machines for pattern recognition: a survey
Hyeran Byun, Seong-Whan Lee

 

Algorithms for Big Data

Volume

Dimensionality Reduction

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction
Kasper Green Larsen, Jelani Nelson

 

Bloom Filters

Network applications of Bloom filters: a survey
Andrei Broder, Michael Mitzenmacher

--

Overview paper (possibly needs some literature search).

An optimal Bloom filter replacement
Anna Pagh, Rasmus Pagh, S. Srinivasa Rao

--

Exact pattern matching with feed-forward Bloom filters
Iulian Moraru, David G. Andersen

 

Probabilistic Counting

Counting large numbers of events in small registers
Robert Morris

Approximate counting: a detailed analysis
Philippe Flajolet

 

Frequency Moments

Optimal approximations of the frequency moments of data streams
Piotr Indyk, David Woodruff

 

Graph Streaming

Graph Stream Algorithms: A Survey
Andrew McGregor

 

Matrix Sketching

Simple and deterministic matrix sketching
Edo Liberty

 

Velocity

Sublinar-time algorithms

Sublinear-time algorithms
Artur Czumaj, Christian Sohler

 

Variety

Clustering

Theoretical analysis of the k-means algorithm - a survey
Johannes Blömer, Chrisiane Lammersen, Melanie Schmidt, Christian Sohler

--

Local search yields a PTAS for k-means in doubling metrics
Zachary Friggstad, Mohsen Rezapour, Mohammad R. Salavatipour

--

Heavy hitters via cluster-preserving clustering
Kasper Green Larsen, Jelani Nelson, Huy L. Nguyen, Mikkel Thorup

--

On Lloyd's algorithm: new theoretical insights for clustering in practice
Cheng Tang, Claire Monteleoni

--

The global k-means clustering algorithm
Aristidis Likas, Nikos Vlassis, Jakob J. Verbeek

 

Applications of Algorithms for Big Data

Biology

The application of principal component analysis to drug discovery and biomedical data
Alessandro Giuliani

 

Further Fields: Physics, Astronomy, Economy etc.

It is possible to choose a literature search on the use of algorithms for Big Data in other contexts.

Working on such a topic requires self-sufficient judgement of the quality of the material.

If several participants want to do such a topic, each has to choose a different scientific domain.

 

Frameworks and Tools

MapReduce, Hadoop

Parallel data processing with MapReduce: a survey
Kyong-Ha Lee, Hyunsik Choi, Bongki Moon

 

Overview of Tools Used in Practice

Literature search on tools like
HA proxy, Elasticseach, Logstash, Prometheus, Grafana

 

 

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators