Algorithms for BIG DATA | DFG Schwerpunktprogramm 1736

You are here: Home | Projects

Staff:

Prof. Dr. Peter Sanders, Karlsruher Institut für Technologie

Prof. Dr. Ralf Mikut, Karlsruher Institut für Technologie

This email address is being protected from spambots. You need JavaScript enabled to view it., Karlsruher Institut für Technologie

This email address is being protected from spambots. You need JavaScript enabled to view it., Karlsruher Institut für Technologie

Description:

The project will develop methods for processing time series of high resolution 3D images. The goal is to obtain very high performance both with respect to result quality and computational efficiency on modern hardware.

We will demonstrate the performance of our methods using large real world inputs of developing zebrafish embryos generated by light-sheet microscopy. These datasets reach more than 10 Terabytes per embryo and will be two orders of magnitude larger than previously reported tools can handle. Large realistic, simulated inputs including ground truth will be used to quantitatively assess result quality.

Staff:

Prof. Dr. Anand Srivastav, Christian-Albrechts-Universität zu Kiel
Prof. Dr. Thorsten Reusch, GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
Prof. Dr. Philip Caspar Rosenstiel, Universitätsklinikum Schleswig Holstein

Description:

Genome assembly is a central topic in modern life science. Increasing data is a major limitation for the applicability of presently available genome assembly software. Several approaches have been proposed to model the problem of de novo genome assembly as a combinatorial optimization problem in order to transfer the field's vast knowledge, but the literature does not reveal any major advantages of one approach over the other. Our focus will be on the development of holistic approaches, which try to encode all available information in one optimization problem, perhaps with parameters controlling certain aspects. Our investigation will be guided by the following questions: What kind of data is available or could be make available by modifications of the sequencing technique? How can we incorporate all data into an optimization problem? How can such an optimization problem be solved? How do solutions to the optimization problem look in practice (benchmarking, statistical tests)?

Staff:

Prof. Dr. Martin Skutella, Technische Universität Berlin

Description:

Within the past decade, routing problems in various transportation networks have experienced a dramatic explosion of the size of data involved. This phenomenon has various causes, one of them being the demand for models and solutions that also take the dynamic (i.e., time-dependent) development of flow in transportation networks into account. Prominent examples are route guidance systems for traffic networks, evacuation planning for major districts or entire cities, and planning problems in huge logistics networks. In the efficient algorithms and mathematical programming communities, fast methods for the solution of static (i.e., not time-dependent) routing problems have been developed over the past 50-60 years.

Staff:

Prof. Dr. Ulrich Meyer, Johann Wolfgang Goethe-Universität Frankfurt am Main

Description:

Big data is often characterized by the so called 3Vs: Volume (the amount of data), Variety (different types of data), and Velocity (the speed of input/output-data). In this proposal we are particularly interested in algorithmic challenges stemming from the combination of volume and velocity in the context of resource constraints: with a focus on huge graph problems that frequently arise in various application areas we aim to consider the possibilities and limitations of dynamically processing data after input updates, i.e., building on previous solutions instead of recomputing everything from scratch. Despite parallel computation power and efficient usage of memory hierarchies, given the sheer size of the data sets to be considered, we will often be forced to give up exact computations and rather consider effective ways to trade solution quality versus computational efforts, for example using embeddings to approximate shortest paths. Actually, not only high-frequency input data rates but also the computational platform itself may contribute to an online scenario.

Staff:

Prof. Dr. Yann Disser, Technische Universität Darmstadt
Dr. Max Klimm, Technische Universität Berlin

Webpage:

www.tu-berlin.de/?id=163770

Description:

The goal of this project is to deepen the understanding of algorithms that operate on very large networks and the dynamics that arise from the competition or cooperation between such algorithms. To achieve this goal, we want to combine models and techniques from the areas of graph exploration and algorithmic game theory. To date, the literature in these areas is mostly disjoint. By closing this gap, we hope to develop new insights into the important algorithmic and economic challenges faced in large networks, most prominently in those that are part of the Internet.

Staff:

Prof. Dr. Friedhelm Meyer auf der Heide, Universität Paderborn

Web page:

www.hni.uni-paderborn.de/alg/projekte/bigdata/

Description:

We currently observe rapidly growing interest in large systems of devices, each of which permanently observes data that has - often in real time - to be aggregated to useful information. Examples for such systems are (1) Information gathered by the smartphones of world-wide distributed user scan (and in some cases is already) used for aggregating information. Besides the monitoring done by the providers, also position based information like nearby restaurants, information about traffic ahead, nearby friends and many further kinds of information can be requested. (2) Cars generate sensor data (about, for example, their position, their speed, their environment, and other cars nearby) that is used in order to realise a self-organised management for driving in intersections or for passing on freeways. (3) Nodes of a network of an Internet Service Provider observe local usage of links, and work together in order to keep the the network in a healthy state. (4) Sensors or robots are deployed to some field, with the aim of aggregating useful information from observed data. Examples for aggregation are (weighted) average, minimum, maximum of measured temperature.

Staff:

Prof. Dr. Hannah Bast, Albert-Ludwigs-Universität Freiburg Niklas Schnelle , Albert-Ludwigs-Universität Freiburg

Niklas Schnelle

Description:

Within the DFG priority programme "Algorithm Engineering" (2007 - 2013), we have developed semantic full-text search, a deep integration of full-text and ontology search. This semantic search can cope with queries of the kind "german researchers who work on algorithms", which also work when part of the required information is contained in an ontology (e.g. profession and nationality), and part only in text documents (e.g. research interests).

Staff:

Prof. Dr. Susanne Albers, Technische Universität München

Description:

In this project we will study algorithmic techniques for energy savings in hardware environments, thereby supporting the processing of big data sets on the systems level. We will focus on the technique of dynamic speed scaling as the most promising approach currently known to conserve energy in microprocessor systems. The specific goal of this project is bring algorithmic results closer to practice.

Staff:

Prof. Dr. Peter Sanders, Karlsruher Institut für Technologie

Prof. Dr. Dorothea Wagner, Karlsruher Institut für Technologie

Dr. Christian Schulz, Uni Wien

The primary goal of the project is to tackle general graph partitioning problems on large graphs and their applications. Balanced graph partitioning for total cut minimization is only the tip of the iceberg of many related problems that have been less intensively studied but overall have at least an equally wide range of applications. While we expect many of the approaches from the basic problem to help with the more general ones, many new questions and difficulties arise.

Staff:

Prof. Dr. Henning Meyerhenke, Humboldt-Universität zu Berlin

Description:

Data-intensive computing is currently becoming a new column in science for generating new insights. This trend is due to---among others---gene sequencers, 3D microscopes, and the World Wide Web. These and other groundbreaking technologies generate massive data sets, many of which can be modeled as networks. The produced data sets contain valuable information hidden inside, waiting to be extracted and further processed with suitable analysis tools.

Staff:

Dr. Oliver Koch, Technische Universität Dortmund
Prof. Dr. Petra Mutzel, U Bonn

Description:

The aim of this project is the development of new and efficient algorithms for analysing huge molecule databases with up to one billion molecules with respect to biological activity. Thereby, we concentrate on molecular similarity search and molecular clustering, which are important tasks for substructure and virtual screening, similarity, diversity, and quantitative structure-activity relationship analysis within rational drug design.

Staff:

Dr. Matthias Mnich, TU Hamburg Harburg

Description:

The main research goal of this project is the quest for a rigorous mathematical theory of input-output efficient preprocessing. This new theory will develop the computational tools to design powerful algorithms for preprocessing very large instances of hard problems, that very efficiently compress those instances to smaller ones with guaranteed size. Our motivation is the incapability of current preprocessing routines with compression guarantee (kernelizations) to handle very large instances that do not fit into memory. The theory also seeks to rigorously explain the practical successes of preprocessing very large instances by algorithms without compression guarantee (heuristics), and will lead to a concept of computational intractability to explain the limitations of heuristics.

Staff:

Prof. Dr. Katharina A. Zweig, Technische Universität Kaiserslautern

Description:

The analysis of complex networks relies mostly on three different approaches: the identification of central nodes, the assignment of nodes to one or more groups of topologically similar other nodes, so-called clusters, and the identification of subgraphs occurring more often than expected by chance. Behind all of these approaches there are at least a dozen functions and measures to instantiate them, which all follow the same logic, which is, however, not always clearly defined. All of these approaches are today based on methods that make use of global properties of the graph like the distance between all pairs of nodes (O(nm+n^2 log n)), global ranking of similar node or edge pairs (O(n^2), or the comparison with random graph models of the same structure as the observed one (O(m log m)). Thus, none of these fundamental approaches can be directly transferred to the analysis of very large complex networks as needed today.

Staff:

Prof. Dr. Johannes Christian Fischer, Technische Universität Dortmund
Prof. Dr. Peter Sanders, Karlsruher Institut für Technologie

Description:

The aim of this project is to make general-purpos e text indices fit for data sizes in the multi-terabyte region. We want to make use of all aspects of parallel computing: local shared memory and global distributed computing. The local computations will also exploit techniques for succinct and external memory data structures. These techniques were previously only considered in isolation; this project is the first in the stringology community that integrates all of them and thereby being able to index really large texts.

Staff:

Prof. Dr. Dennis Hofheinz, ETH Zürich
Prof. Dr. Eike Kiltz, Ruhr-Universität Bochum

Description:

In our modern digital society, cryptography is vital to protect the secrecy and integrity of transmitted and stored information. Settings like digital commerce, electronic banking, or simply private email communication already rely on encryption and signature schemes. However, today's cryptographic schemes do not scale well, and thus are not suited for the increasingly large sets of data they are used on. For instance, the security guarantees currently known for RSA encryption -- the most commonly used type of encryption scheme -- degrade linearly in the number of users and cipher texts. Hence, larger settings (such as cloud computing, or simply the scenario of encrypting all existing email traffic) may enable new and more efficient attacks. To maintain a reasonable level of security in larger scenarios, RSA key lengths must be chosen significantly larger, and the scheme becomes very inefficient. Besides, a switch in RSA key lengths requires an update of the whole public key infrastructure, an impossibility in truly large scenarios. Even worse, when the scenario grows beyond an initially anticipated size, we may lose all security guarantees.

Staff:

Prof. Dr. Joachim Giesen, Friedrich-Schiller-Universität Jena

Dr. Sören Laue, Friedrich-Schiller-Universität Jena

Description:

Optimization problems are ubiquitous in science, engineering and economics. Thus, it is not surprising that optimization problems come in many different flavors. In our proposal we focus on large-scale convex optimization problems. Another important class are discrete optimization problems, like for instance computing shortest paths, minimum or maximum cuts, network flows, or vertex covers in the context of graph algorithms. Many discrete problems have relaxations as linear or semidefinite programs, or can even be cast as convex optimization problems. Relaxations are often an essential part of approximation algorithms for combinatorial optimization problems. Hence, discrete and combinatorial Big Data optimization problems can greatly benefit from generic, parallel and distributed convex optimization software.

Staff:

Prof. Dr. Marc Fischlin, Technische Universität Darmstadt
Prof. Dr. Alexander May, Ruhr-Universität Bochum

Description:

Since Edward Snowden's disclosures of massive abuse of huge amounts of data by NSA and other agencies, it is evident that every individual and every company which cares about the privacy of their data has to take measures to protect the data. This is even more true for outsourced data in cloud storage and cloud computing scenarios, and when handling big data through third parties such as via Amazon's Elastic MapReduce. Standard cryptographic means such as encryption in general do not work here because by the very nature of encryption, scrambling all reasonable information, the semantics of the data are hidden and cannot be used by third parties to perform operations; the option of decrypting the data for the operations would violate the idea of protecting the data from the service provider. To reconcile the need for security with the ability to outsource computations we thus need cryptography which is compatible with the desired operations.

Staff:

Prof. Dr. Ulrik Brandes, ETH Zürich
Prof. Dr. Dorothea Wagner, Karlsruher Institut für Technologie

Description:

We intend to devise novel methods to cluster large-scale static and dynamic online social networks. Our approach is based on skeleton structures that represent and amplify variation in local cohesion, and that are defined locally to facilitate efficient computation. In addition to simplifying the clustering problem, they shall also provide a novel understanding of community dynamics, capturing more directly the agency of social actors. Finding patterns in online social relationships and interactions is one of the most prominent applications of big data today, largely driven by the relative ease of data collection and its economic and political value.

We use cookies on our website. Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.

Ok