News

Integrative Bioinformatics Unit

The Integrative Bioinformatics Unit (IBU) is part of the Swammerdam Institute for Life Sciences at Faculty of Science of the University of Amsterdam. In 2003, the unit emerged from the Microarray Department of the Faculty. Its main research effort is in:

  • Development of a microarray bioinformatics pipeline
  • Application of e-BioScience, such as the e-BioScience Laboratory
  • Design for genomics experimentation
  • Integration of heterogeneous genomics data

Due to the vast amount of data produced by microarray experiments, extensive bioinformatics infrastructure, methods and expertise are needed to cope with these data effectively. The bioinformatics associated with transcriptomics involves data-handling (storage and exchange), data-preprocessing (normalization and validation), and data-analysis (hierarchical clustering, biomarker selection, etc.). Furthermore, transcriptomics data has to be combined with other biological -omics data (integration). In order to address such issues, the IBU has grown into a bioinformatics research group of about 7 staff members with expertise in experimental biology, bioinformatics, informatics, physics, system design, and mathematics. We apply a multidisciplinary and collaborative approach to life sciences and bioinformatics research, based on sound expertise in transcriptomics

Development of a microarray bioinformatics pipeline

As microarray technology and analysis have come of age, it is possible to implement de-facto standards for storage and analysis of data and results. This allows us to carry out microarray analyses in truly high-throughput fashion. Our microarray bioinformatics pipeline consists of three parts:

  1. Tools for easy storage of raw data from different microarray platforms together with used laboratory protocols (to generation of MAGE-ML files on the fly) and quality metrics
  2. A pipeline for statistical validation and downstream analysis, such as re-annotation, extensive testing of contrasts, Gene Set Enrichment analysis and network analysis. This pipeline is R-centered and can be used in the e-BioScience Laboratory
  3. A compendium for analysis results in which we can integrate results from in-house, as well as from external microarray experiments

Application of e-BioScience, such as the e-BioScience Laboratory

A major obstacle in (transcript)omics research is dealing with the volume and diversity of data generated. Enhanced-science (e-science) approaches based on (remote) collaboration, reuse of data and methods, and supported by a virtual laboratory (VL) environment promise to remove this obstacle. VLs include Grid computation and data communication as well as generic and domain-specific tools and methods for information management, knowledge extraction and data analysis. Problem-solving environments (PSEs) are the domain-specific experimental environments of VLs. Our microarray bioinformatics pipeline is an example of such a PSE. Furthermore, we have built an actual laboratory, the e-BioScience Laboratory or e-BioLab in which we can use these PSEs for analysis and visualization of large biological data sets. For this the e-BioLab is equipped with advanced visualization end-points, such as a large high-resolution tiled display en electronic whiteboards. The e-BioLab is particularly suited for project groups to discuss results and address biological research questions in an interactive and multidisciplinary setting.

Design for genomics experimentation

Genomics experiments are expensive in consumables and data analysis. Therefore, most genomics experiments use experimental designs with few measuring points and limited biological replicates. At the same time, due to the expectations running high, life scientists seek to answer many "big" biological questions in every genomics experiment. This approach severely reduces the success of any genomics experiment. Especially in genomics experiments that aim to unravel cellular mechanisms, a radical adjustment of genomics experimentation seems imperative. We will study the effect on the outcome of genomics experiments, if we apply new experiment designs. Key elements are: i) increase resolution by adding more measuring points; ii) truly integrate the axes; time, space, and molecule. Although our aim is to establish proof-of-principle, we will combine this research with new high-throughput genomics experimentation developments to make this approach scalable to "real-life" applications.

Integration of heterogeneous genomics data

Biological networks are often approached on a single molecular level, e.g. protein networks. However, cells are organized as networks including many if not all molecular levels. It seems impossible to unravel any cellular network without including many molecular levels. In connection with our aim to renew the design for genomics experimentation, comes the need to interpret these multilevel results. For this, we are setting-up a key experiment, in which we will analyze as many molecular levels as we are able to, still being true to our philosophy of ample time & space measuring points. The applicable molecular levels are: DNA sequence, DNA methylation, Histon modification, mRNA presence, mRNA degradation, microRNA presence, protein presence, and protein complex formation. For a small biological event we will try to determine the involved molecular components and their interaction. As such, we will unravel a small but clear set of related cellular mechanisms.