Growing collection of projects and amazing collaborators.


10 years later: cloud computing is closing the performance gap: Computational sciences, such as computational cosmology, provide an important pool of first-class, large-scale problems. Although scientists have historically relied on the massive computational resources of High-Performance Computing (HPC) systems to run their software, in the last decade cloud computing has gained popularity among scientists as an alternative to HPC. Despite the extensive literature comparing HPC and cloud systems and promising results, the question of whether cloud computing can match HPC performance for a wide range of scientific applications is still open, and the answer may lead to a better design of cloud systems to promote scientific computing in the cloud.

pdf | slides pdf | slides pptx | video

Parallel de novo genome assembly: One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, efficient assembly of large genomes is still an open problem. In this work, we present new parallel distributed memory algorithms for overlap detection and layout simplification steps for de novo genome assembly and implement them in diBELLA 2D. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. In addition, other theoretical contributions can be found in the work BELLA.

pdf | slides pdf | more slides pdf | slides pptx | github | video

Efficient long-read overlap detection and alignment: In this work we introduce BELLA (the shared memory version of diBELLA 2D) and demonstrate the feasibility of the k-mer-based approach through a mathematical model based on Markov chains. Our work provides a novel algorithm for pruning k-mers, which are unlikely to be useful for overlap detection and whose presence would only add unnecessary computational cost. Our reliable k-mers detection algorithm explicitly maximizes the probability of retaining k-mers that belong to unique regions of the genome. The optimal k-mer seed is selected using our binning mechanism, which uses k-mer positions within a read pair to estimate the length of overlap and “bin” k-mers to form a consensus. Finally, we develop and implement a new method to separate true alignments from false positives as a function of alignment score based on Chernoff bounds, and show that the probability of false positives decreases exponentially as the overlap length between sequences increases.

pdf | slides pdf | github | video

Distributed protein similarity search using sparse matrices: Identification of similar protein sequences is a central step in many computational biology pipelines, such as homologous protein sequence recognition, generation of similar protein graphs for downstream analyses, functional annotation, and gene location. The performance and scalability of protein similarity search has proven to be a bottleneck in many bioinformatics pipelines due to the increase of cheap and abundant sequencing data. This work introduces a new distributed memory software: PASTIS, which is based on sparse matrix calculations for efficient identification of possibly similar proteins.

pdf | github | video

BuDDI: a declarative Bloom language for CALM programming: In this paper, we show through examples that Bloom is a suitable language beyond classical distributed computing. Based on these examples, we refine the current Bloom implementation, called Bud, and introduce a new language called BuDDI: Bud to enhance Distributed Data Independence. BuDDI hides the challenges of distributed computing from the programmer without compromising performance by using logically global tables.


A high-performance GPU-based x-drop long-read alignment algorithm: Pairwise alignment is one of the most computationally intensive cores in genomic data analysis, accounting for more than 90% of the runtime for important bioinformatics applications. This method is particularly expensive for third generation sequences, as the analysis of sequences between 1Kb and 1Mb in length is very computationally intensive. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community relies primarily on approximate algorithms that search only for high-quality alignments and stop early when none is found. In this work, we present the first GPU optimization of the popular x-drop alignment algorithm that we named LOGAN.

pdf | github | video

On how to improve FPGA-based systems design productivity via SDAccel: Custom hardware accelerators are often used to improve the performance of software applications in terms of execution times and reduce energy consumption. However, implementing a hardware accelerator and integrating it into the final system is a difficult and error-prone task. Both industry and academia are continuously developing Computer Aided Design (CAD) tools to assist the designer in the development process. The latest tool released by Xilinx aims to improve the hardware design experience by using the OpenCL standard to increase overall productivity and enable code portability. This work provides an overview of the potential of SDAccel and compares the design flow with other methods using two case studies from the field of bioinformatics.


A hardware acceleration of a protein folding algorithm: Protein folding is the physical process by which a sequence of amino acids in a protein folds into its tertiary structure, which determines the functionality of the protein. There are several methods to perform this process, one of the most interesting is ab initio modeling. This method generates the 3D structure from energetic and geometric features. However, despite the potential, companies are slowed down by the high computational requirements of the algorithm. Speeding up the execution time would be crucial to increase productivity in such industries. In this work, we present an accelerated implementation of a ab initio protein folding algorithm, which is based on Monte Carlo simulation.



  • Vivek Bharadwaj (UC Berkeley)
  • Aydin Buluç (UC Berkeley, LBNL)
  • David Culler (UC Berkeley, Google)
  • Lorenzo Di Tucci (Politecnico di Milano, Huxelerate)
  • Saliya Ekanayake (Microsoft)
  • Marquita Ellis (IBM)
  • Can Firtina (ETH Zürich)
  • Rolando Garcia (UC Berkeley)
  • Elizabeth Koning (UIUC)
  • Richard Lettich (UC Berkeley)
  • Israt Nisa (AWS AI)
  • Leonid Oliker (LBNL)
  • Gabriel Raulet (LBNL)
  • Daniel Rokhsar (UC Berkeley, Joint Genome Institute)
  • Marco D. Santambrogio (Politecnico di Milano)
  • Oguz Selvitopi (LBNL)
  • Katherine Yelick (UC Berkeley, LBNL)
  • Ed Younis (LBNL)
  • Alberto Zeni (Politecnico di Milano)