SECTION II. HIGH-THROUGHPUT DATA ANALYSIS

Differential expression DE analysis is the process of calling gene expression that show statistically significant difference between pre-specified groups of samples. Although DE is typically not the main objective of a single-cell experiment design, as it requires pre-defined grouping information among cells of interest, it is nevertheless common in scRNA-Seq experiments.

It refers to the phenomenon that a gene is shown expressed abundantly in one cell but not detectable in another cell, as a consequence of the transcript loss in the reverse-transcription step. To account for frequent dropout events and biological variability within cell population, more sophisticated algorithms have been developed for scRNA-Seq data. The model assumes that observed expression levels in scRNA-Seq data follow a mixture of negative binomial distribution for amplified genes, as proposed before Anders and Huber, ; and a low-mean poisson distribution for dropout genes, as is observed in transcriptionally silenced genes.

In the Spotlight: Bioinformatics, Computational Biology and Systems Biology

MAST is another scRNA-Seq differential expression detection method that uses a two-part generalized linear model and adjusts for the fraction of cells that express a certain gene Finak et al. Another challenge unique to scRNA-Seq is that some genes may exhibit bimodality, meaning that the expression levels across a group of cells concentrate around two modes instead of one.

A beta-Poisson distribution was proposed in order to provide a more accurate differential expression analysis that captures bimodality Vu et al. Another tool Monocle Trapnell et al. Finally, the workflow of BASICS as described earlier, provides an criterion to detect high- or low-variable genes within the single cells dataset Vallejos et al.

Navigation menu

However, it is not clear which methods have generally superior performance. Different classical unsupervised approaches have been used to highlight single cell subgroups among a population.


  • Bioinformatics - Wikipedia.
  • Login using.
  • The Lawmans Holiday Wish (Mills & Boon Love Inspired) (Kirkwood Lake, Book 3)!
  • The Strong Road To Heaven!
  • SVI :: Bioinformatics and Cellular Genomics.
  • Bioinformatics and Cellular Genomics;
  • DIABOLIK (55): Trappola di sangue (Italian Edition);

K -means and other distance based clustering algorithms such as hierarchical clustering or WARD are also widely used Yan et al. For example, Jaitin et al. It iteratively uses PCA combined with K -means to produce the hierarchical tree of the cells. For distance metrics employed by these methods, Euclidean distance, Pearson and Spearman correlation coefficients have been popular though may not be optimal choices Pollen et al.

More sophisticated machine-learning algorithms have great potentials to overcome some issues of scRNA-Seq functional analysis. A main issue of scRNA-Seq analysis is that gene expression data cannot be expressed as a linear combination of the relationships between two cells in general Buettner and Theis, ; Bendall et al. Also classical similarities such as cosine or Euclidean distances are less meaningful as the dimensionality increases Beyer et al.

Possible irrelevant associations may arise with inappropriate metrics, while searching for the nearest neighbors on noisy data Balasubramanian and Schwartz, We describe the scRNA-Seq specific algorithms below in the order of dimension reduction, clustering, and other clustering variant methods. The datasets that were used to test these algorithms are listed in Table 2.

Description of the main datasets for subpopulation and module detection analysis. Among the dimension reduction methods, Zero-inflated factor analysis ZIFA algorithm is a new method that includes dropout events by representing the probability of gene dropout as an exponential function of its mean expression Pierson and Yau, Using a latent variable model based on factor analysis, ZIFA reduces the dimension of scRNA-Seq dataset and allows the probability of each gene expression to be zero. As mentioned earlier, scLVM is another method for identifying cell subpopulations, which features removal of confounding factor like cell-cycle effects Buettner et al.

It first computes cell-to-cell covariance using a set of marker genes related to biological hidden factors of interest such as the cell cycle. SIMLR is a new clustering method designed to learn a distance metric that best fits the structure of the data. It infers a distance function as a linear combination of several distance metrics Wang et al. It is designed to tackle the heterogeneity observed amongst single-cell datasets related to both technological difference across platforms as well as biological difference across studies.

In another single-cell clustering approach named analysis of scRNA-seq based on transcript-compatibility counts AscTC , read counts from scRNA-Seq dataset are transformed into probabilities using transcript-compatibility counts, rather than the conventional transcript abundance Ntranos et al. Individual cells are clustered using an affinity propagation algorithm, a derivative of spectral clustering. A few other hierarchical clustering approaches are worth mentioning. Geneteam is a multi-level recursive clustering method that searches for bipartitions of cells sharing exclusive expression profiles for a subset of genes Harris et al.

Similarly, Backspin is another hierarchical dividing clustering algorithm, allowing to cluster both genes and cells Zeisel et al. Traditional clustering methods lack the function of inferring the inherent lineage between cells. Common approaches for cell lineage inferences require the creation of a graph or a tree, where single cells are represented as nodes and edges between the cells indicate their similarities. The lengths of the edges are computed from a similarity matrix based on a given metric.

Before constructing the graph, a de-noising procedure is necessary. Samples from the kNNG could then be compared using the geodesic distance, defined as the shortest path between two nodes Bendall et al. Clustering analysis can then be performed on the graph using community detection algorithms Fortunato, Quasi-cliques are communities of nodes, densely but not necessarily fully connected. Different algorithms have been specifically designed for scRNA-Seq to infer a pseudo temporal ordering of single cells.

Moncole is the first scRNA-Seq bioinformatics tool to infer the temporal ordering of single cells Trapnell et al. MST connects all nodes of a graph using edges with a minimal total weighting, based on the hypothesis that the longest path through the MST corresponds to the longest series of transcriptionally similar cells. Cells are first clustered using a model-based approach before constructing an MST, allowing the reduction of the tree space complexity Ji and Ji, Embeddr is a method that uses the correlation metric between cells to construct kNNG, then projects the samples into a low-dimensional embedding using Laplacian eigen maps.

The pseudo time order is then fitted using the principal curves Campbell et al. Embeddr aims to tackle the drawbacks of Monocle, where gene expression is modeled as a linear combination and the result is highly sensitive to outliers. Since visualization is key in understanding reconstructed single-cell trajectories, better visualization algorithms are as important as methods to reconstruct the single-cell microevolution.

Another approach derived from diffusion map was developed, allowing one to visualize a clear bifurcation event among the cells which may be missed by independent component analysis ICA or t -SNE Haghverdi et al. Several approaches have been designed to exploit datasets with explicit temporal information. It assumes that the switch between cell states is a stochastic punctual process. To infer cellular hierarchy, it iteratively divides cells using k -means algorithm and uses a gap statistic to determine if a bifurcation event should occur.

This process creates a binary tree, which can then be used to model gene expression dynamics Marco et al. Free from such a requirement, Oscope is another method to infer oscillatory genes among single cells collected from a single tissue Leng et al. It hypothesizes that these cells represent distinct states according to an oscillatory process.

Oscope fits a two-dimensional sinusoidal function for each pair of genes, clusters gene pairs by frequency and reconstructs the order of the cells in a cyclic fashion. However, Oscope is unable to infer bifurcation events. Other models also consider the spatial organization of cells in a tissue. Seurat divides a cellular tissue into distinct spatial bins, linked by the expression of landmark genes per RNA in-situ hybridization.

Within each bin, it builds a mixture model using expression values among correlated genes. The posterior probability is generated for each cell and assigned to a given bin. Another approach models the tissue as a 3D map and assumes that cells spatially close share common scRNA-Seq profiles Pettit et al. This method uses a hidden markov random field to assign each bin of the map to a given cluster. Similar to Seurat, it takes the input of spatial gene expression measurement using whole mount in situ Hybridizations WiSH technology, a confocal microscopic approach that detects the presence of mRNA linked to a fluorescent probe.

Compared to bulk-cell analysis, single-cell genomics has the advantage of exploring cellular processes with a more accurate resolution, but it is more vulnerable to disturbances. Besides perfecting the experimental protocols to deal with issues such as dropouts in gene expression and biases in amplification, deriving new analytical methods to reveal the complexity in scRNA-Seq data is just as challenging.

In this review, we have listed the different bioinformatics algorithms dedicated to single-cell analysis. Although the initial few steps of workflow for scRNA-Seq analysis are similar to bulk-cell analysis data pre-processing, batch removal, alignment, quality check, and normalization , the subsequent analyses are largely unique for single cells, such as subpopulations detection, and microevolution characterization Figure 1.

With the increasing popularity of single-cell assays and ever increasing number of computational methods developed, these methods need to be more accessible to research groups without bioinformatics expertise. Moreover, datasets where cell classes have already been previously charaterized should be identified as benchmark data, in order to accurately assess the performance of new bioinformatics methods.

This will further increase the analytical challenges.

SECTION I. INTRODUCTION

Previous multi-omics bioinformatics tools applied to bulk samples could be leveraged. The use of graphs and tensor approaches that integrate heterogeneous features in bulk samples may be good starting points for multi-dimensional single cell data Li et al. Efforts should also be made toward developing computational methods to make use of spatial information possibly guided by imaging in combination of scRNA-Seq Pettit et al. Foreseeably, this poses a large spectrum of challenges from developing more efficient aligners to better data storage and data sharing solutions.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Differential expression analysis for sequence count data. HTSeq—a python framework to work with high-throughput sequencing data. The isomap algorithm and topological stability. Identifying and removing the cell-cycle effect from single-cell rna-sequencing data. Single-cell trajectory detection uncovers progression and regulatory coordination in human b cell development.

Scalable microfluidics for single cell rna printing and sequencing. Near-optimal probabilistic RNA-seq quantification. Accounting for technical noise in single-cell RNA-seq experiments. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst.

Laplacian eigenmaps and principal curves for high resolution pseudotemporal ordering of single-cell rna-seq profiles. PubMed Abstract Google Scholar. Pan-Cancer analyses reveal long intergenic non-coding rnas relevant to tumor diagnosis, subtyping and prognosis. Visualizing data using T-SNE. Integrated genome and transcriptome sequencing of the same cell. Normalization and noise reduction for single cell RNA-Seq experiments. Systematic evaluation of spliced alignment programs for RNA-seq data. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis.

RNA-Seq gene profiling-a systematic empirical comparison. Community detection in graphs. Single-Cell RNA-seq reveals activation of unique gene groups as a consequence of stem cell-parenchymal cell fusion. Integrative single-cell transcriptomics reveals molecular networks defining neuronal maturation during postnatal neurogenesis. Design and analysis of single-cell sequencing experiments. Diffusion maps for high-dimensional single-cell analysis of differentiation data.

Co-detection and sequencing of genes and transcripts from the same single cells facilitated by a microfluidics platform. Assessing similarity to primary tissue and cortical layer identity in induced pluripotent stem cell-derived cortical neurons through single-cell transcriptomics. Molecular organization of CA1 interneuron classes. A clustering algorithm based on graph connectivity.

Single-Cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Classification of low quality cells from single-cell RNA-seq data. A, and Stewart, R. Adjusting batch effects in microarray expression data using empirical bayes methods. Bayesian approach to single-cell differential expression analysis.

Single-Cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells. A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Single cell analysis of cancer cells using an improved RT-MLPA method has potential for cancer diagnosis and monitoring. Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis.

Whole exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape. Here, we address this question by focusing on the recent, post-HGP history, and the reemergence of biochemical systems biology. In , over 1, such publications appeared in PubMed.

Frontiers | Single-Cell Transcriptomics Bioinformatics and Computational Challenges | Genetics

This was a time of great excitement and profound transformation in biology brought about by the development of increasingly efficient methods for DNA sequencing [ 35 — 37 ]. At the time, the call for the sequencing of the human genome was gaining momentum [ 38 , 39 ], and in , the National Research Council of the US Academy of Sciences recommended the initiation of the Human Genome Project [ 39 ]. The HGP, completed a decade later, was an enormous success, thus validating the new discipline of genomics. It rallied the scientific community in unprecedented ways, from being a global collaboration of 20 sequencing centers from six countries to opening new horizons in large-scale biology [ 39 ].

The momentum of the HGP has spurred a plethora of genome-sequencing projects of other organisms, including plants, animals, and microorganisms. In the early phases, the sequencing projects focused mainly on mapping, sequencing, and identifying genes [ 40 ]. As the various genome-sequencing projects gathered momentum, it has become clear that collected genome sequences were only revealing more of hidden complexity, and are opening new and deeper biological questions [ 40 , 41 ].

This simplistic view rested on the deterministic concept of a gene and its role in determining biological function and organism's phenotype, the notion of which was tacitly extended to the entire genome. The degree of elusiveness of the gene concept has become fully apparent only in the last decade [ 44 — 48 ], based on the analysis of sequenced genomes, and extensive studies of the transcriptome with new techniques such is cap-analysis gene expression CAGE and tiling arrays [ 49 , 50 ]. Several facts highlight the complexity of the relationship between the organism's phenotype and its genome: As a result, in the past five years, the concept of the gene has been subject of substantial revisions [ 44 — 48 ].

Only temporarily overshadowed with the excitement about generating genome sequences, the true complexity of the relationship between an organism's genome and phenotype was recognized early. Almost simultaneously, the need for an engineering mindset in molecular biology was suggested in an influential and humorous article written by a prominent biologist [ 55 ]. Such complex systems have long been of interest in physics and mathematics, and the direct relevance of the knowledge accumulated in these disciplines to biology was realized [ 58 , 59 ].

Furthermore, systems biology practitioners can be arbitrarily divided into two not mutually exclusive camps: Biological systems consist of a large number of functionally diverse components, which interact highly selectively and often nonlinearly to produce coherent behaviors [ 2 ]. These components may be individual molecules such as in signaling or metabolic networks , assemblies of interacting complexes, sets of physical factors that guide the development of an organism genes, mRNA, associated proteins and protein complexes , cells in tissues or organs, and even entire organisms in ecological communities.

What is common to all these examples is the sheer number of components, and their selective, non-linear interactions that render the behaviors of these systems beyond the intuitive grasp. Take, for example, the cell cycle in the yeast Schizosaccharomyces pombe: The dynamic behavior of this network of interactions is possible to grasp only with the help of computer simulations and dynamical systems theory [ 70 , 71 ]. Another example is the cellular response of yeast to hyperosmotic shock: The role of mathematical models, particularly in generating experimentally testable hypotheses, has been discussed extensively [ 2 , 5 , 19 ].

Perhaps less widely appreciated is that mathematical models of biological systems are increasingly being used to represent our knowledge about these systems. For example, the i AF model of Escherichia coli 's metabolic network not only predicts experimentally observed behavior of E. Similarly, the kinetic model of glycolysis in the bloodstream form of Trypanosoma brucei [ 74 ] is the state-of-the-art representation of glycolysis in this organism. There is no alternative way of quantitative thinking about these complex systems but through models that rely on precise mathematical descriptions.

These mathematical or computational models are essentially beyond a simple intuitive grasp, and represent concise summaries of our current knowledge of respective systems. There may be significant differences in scope and scale between different models used in systems biology. Consider, for example, the model of the yeast genome-scale metabolic network [ 75 ] and the model of glycolysis in yeast [ 76 ].

It is not that one model is better than the other, rather the two models have different motivations, objectives, scales, and capabilities: This illustrates an important general principle of mathematical modeling, highly relevant to systems biology: Genome-scale metabolic models typically ignore kinetic parameters of individual reactions because such models aim to be comprehensive, and the kinetic parameters for most reactions are unknown but see recent theoretical advances [ 77 ].

In contrast, kinetic models are much more detailed but less comprehensive; however, they can provide not only the information about the steady state but also the time course given some initial conditions. Another challenge is choosing the boundaries of the model note: This usually requires exquisite familiarity with the phenomena of interest, and a considerable experience in mathematical modeling. Complex dynamical systems form structures [ 59 ], and nature often provides modular designs [ 78 ].

This modularity must be both understood and exploited correctly for optimal modeling. In genome-scale studies of microbial organisms, a convenient system boundary is the cell boundary; in most other cases, the question of the appropriate systems boundary is more opaque and must be addressed based on the prior knowledge of components and the coupling between these components.

Trivial examples of this include tissue structure in a multicellular organism or subcellular compartmentalization of metabolites. Modern systems biology is a rapidly evolving discipline. In the past, systems thinking was invoked in the context of a variety of systems and processes: Areas that have proven particularly fruitful for systems biology include studies of biochemical networks and applications to microorganisms [ 60 , 61 , 73 , 89 — 93 ].

We are only beginning to appreciate the full complexity and the multidimensional nature of biochemical networks operating in all living organisms Figure 2. Studies of metabolic networks, gene regulatory networks, and protein-protein interaction networks in microbial organisms have significantly contributed to this, and indeed to the identity of systems biology. Microorganisms are convenient models for systems studies for several reasons: In the past, microbes have been used in numerous systems studies, including that of genetic networks [ 90 , 94 ], protein-protein interactions [ 89 , 95 , 96 ], metabolic networks [ 97 — ], cell cycle regulation [ 70 , 71 ], and signal transduction networks [ ].

A conceptualization of biochemical networks showing genome, transcriptome, proteome, and metabolome-level networks, highlighting their complexity and mutual interdependence. In biological systems a large number of structurally and functionally diverse components genes, proteins, metabolites are involved in dynamic, non-linear interactions, which in turn involve a range of time scales and interaction strengths.


  1. APO 123!
  2. Mini Review ARTICLE.
  3. El cruciferario (Spanish Edition).
  4. Nous sommes des sang-mêlés : Manuel dhistoire de la civilisation française (SCIEN.HUMAINES) (French Edition).
  5. Reunion della Luce (Italian Edition)?
  6. The Beer Drinkers Guide to God: The Whole and Holy Truth About Lager, Loving, and Living.
  7. 1. Introduction.
  8. Direct conversions of species shown in solid lines, while some possible interactions not necessarily one-step are designated in dashed lines. Several types of interactions are shown: When prior knowledge of modularity allowed the assumption of decoupling, systems studies on biochemical subnetworks or cross-networks were possible.

    Examples of this include modeling of the cell cycle in yeast [ 70 , 71 ], specific metabolic pathways [ 74 , 76 , 85 ], and signal transduction pathways [ 72 , , ]. Examples of this include studies of transcriptome and proteome responses to perturbations in metabolic pathways [ 61 , ], the effects of a transcriptional regulator on central carbon metabolism in Bacillus subtilis [ ], and coordinated analysis of the minimal bacterium Mycoplasma pneumoniae , including analysis of its mRNA [ 91 ], protein complexes [ 92 ], and the metabolic network [ 93 ].

    What are these studies telling us? For example, the metabolic network in E. Surprisingly, the flux through the E. Another telling example is the smallest self-replicating organism, the bacterium M. Compared to more complex bacteria E.

    Research Overview

    In spite of its minimal genome, the proteome of M. It is unlikely that M. As a result of decades of detailed biochemical work, metabolic networks are the best understood of all biochemical networks [ , ]. We have near-complete collections of components and topologies of metabolic networks in model microorganisms such as E. For the model organism E. This, however, represents only the first step towards understanding how these components function in spatial and temporal integration, and precisely what are the controls exerted on them.

    While the topologies of metabolic networks are well understood, we are only beginning to understand interactions that control metabolism [ , ]. Metabolite equilibrium concentrations are accessible experimentally through quantitative metabolomic approaches [ 11 , 12 , ], which is directly comparable to the measurement of mRNA and protein levels in transcriptomics and proteomics, respectively.

    Systems Biology: A Short Overview

    In contrast to all other types of biochemical networks, experimental approaches for assessing in vivo reaction rates fluxes are also well developed for metabolic networks [ , — ]. This is of great importance, as metabolic fluxes are the key determinants of cellular physiology and cannot be predicted from mRNA, protein, or even metabolite levels [ ]. Thus, measurement of metabolic flux is equivalent to the measurement of information flow through a signaling pathway, or the information flow between genes residing on the same control circuit. New theoretical frameworks for more efficient extraction of information from experimental data continue to be proposed [ ], and a considerable progress has been made in the analysis of metabolic fluxes under isotopic nonstationary conditions [ , ].

    Since nonstationary flux analysis relies on shorter, transient experiments, this opens an array of new possibilities for flux analysis in higher organisms, improving the scope of systems biology studies of metabolic networks [ ]. While many systems biology approaches involve mathematical and computational modeling, the development, maintenance, and dissemination of tools for systems biology is in itself a significant challenge.

    Examples of this include development of data repositories, data standards and software tools for simulation, analysis and visualization of system components such as biochemical networks. Another example are applications of high-throughput molecular profiling technologies which often require sophisticated data processing and analysis, and typically involve elements of signal processing and statistical analysis.

    As the resulting quantitative measurements are transferred to formal mathematical models for the purpose of modeling, the endeavor becomes perhaps more systems biology and less bioinformatics. However, that is only a matter of a degree, with often no clear boundary between bioinformatics and systems biology. The need for effective exchange of formal, quantitative systems biology models has driven the development of the Systems Biology Markup Language SBML [ ].

    The SBML project aims for the development of the computer-readable format for the representation of biological processes. SBML provides a well-defined format which different software tools can use for the exchange of biological models with high fidelity. A testimony to the importance of SBML is its adoption by software tools concerned with biological modeling at the time of this writing, over software tools support SBML.

    The current SBGN specification consists of three complementary languages which aim to describe biological processes and relationships between biological entities [ ]. Since studies of biochemical networks are particularly successful aspect of systems biology, it is not surprising that a plethora of computational tools that address different needs in the analysis of biochemical networks have been reported, and in many cases, these tools are freely accessible.

    Without attempting to be comprehensive, we highlight some of the widely used research and training tools. Systems Biology Workbench SWB is a framework that allows different components for systems biology to communicate, exchange models via SBML, and reuse capabilities without understanding all the details of the each component implementation [ ].

    From the user's perspective, SWB is a collection of tools for systems biology that includes programs for building, viewing, and editing of biochemical networks, tools for simulation, and tools for import and translation of models. Another highly useful tool is CellDesigner, a Java-based program for constructing and editing of biochemical networks [ ]. In CellDesigner models can be simulated either with a built-in simulator, or alternatively CellDesigner can connect to external simulators, such as those provided by SWB [ ]. COPASI provides tools for visual analysis of simulation results, and can also perform steady-state and metabolic control analyses [ ].

    As biological research accelerates through the development of new technologies and instrumentation, biological databases have become an indispensable partner in such research. Building and maintaining of primary databases such as GenBank [ ] or Protein Data Bank [ ] have long been recognized as important bioinformatics work. Primary biological databases serve both as repositories of experimentally derived information and are the basis for the development of secondary databases that capture higher-level knowledge.

    An example of such secondary database is Pfam database of proteins families and domains [ ]. Concomitantly with the development of the biochemical systems biology, an important niche of secondary biological databases has emerged: The ecosystem of such databases and associated tools is rapidly growing and includes metabolic pathways databases organized around the BioCyc project [ ], database of human biological pathways [ ], database of interactions between small molecules and proteins [ ], and databases of protein-protein interactions [ ].

    As these databases attempt to reconstruct and organize information about interactions between cellular components, they also attempt to build higher-level knowledge and theories about the biological processes they are concerned with. Such in silico knowledge is much needed, as the integral complexity of most biological processes is beyond what is comprehensible to the human mind. In some cases, these databases allow a direct export of mathematical models. Also, the first collections of mathematical models of biological processes have been developed databases of models , concerned solely with archiving and curating the models in SBML for future reuse and refinement [ ].

    Much needed bioinformatics tools for systems biology research are the tools for visualization of network structures and network overlay of simulated and experimental data. Systems biology is rapidly gaining momentum, as evidenced by the number of publications referencing the term Figure 1.

    On the other hand, bioinformatics has originally grown from the need to provide tools and handle increasingly large amounts of biological data. As a discipline bioinformatics continues to grow in this important role, but is also increasingly merging and contributing to systems approaches to provide tools necessary for perhaps the most exciting phase in the development of biological sciences. One of the defining features of systems biology is the use of mathematical and computational models, which are essential to rigorously account for the inherent complexity of biological systems.

    This complexity arises from the diversity of components genes, proteins, and metabolites , the high selectivity of their interactions, and a non-linear nature of these interactions. These properties together render the behavior of biological systems intractable to pure intuition. The computational models used in biochemical systems biology typically require iterative building and stepwise improvements based on the comparison with experiments [ ]. Once sufficiently refined, such models have the ability to predict the behavior of the biochemical system under different perturbations, or hypothetical conditions that may be of interest but are not feasible in experimental settings e.

    However, in the new era of systems biology, mathematical models are more than just tools for integrating observations, making testable predictions, or for high throughput in silico experimentation. Highly refined mathematical models also serve as the embodiments of our current knowledge about specific biochemical systems. Mathematical and computational models that underpin biochemical studies may involve different levels of detail and scale, depending on the objectives of the study, what is known a priori , and what additional information is accessible experimentally.

    For example, protein complexes may be studied comprehensively [ 92 ], or the focus may be on a subset of proteins responsible for a specific function, such as protein import into mitochondria [ ]. Most of the so-called bottom-up approaches, which start from the descriptions of interactions, focus on a part of the biological system because we lack a comprehensive information about the system of interest [ ]. Nevertheless, bottom-up approaches provide highly useful frameworks for the integration of diverse knowledge, for example, the principles established from decades of biochemical work with the information accessible only with the latest experiments.

    In contrast, top-down approaches are largely data driven, with the caveat that their comprehensiveness is limited by the limitations in experimental approaches. For example, in one of the most comprehensive metabolomic studies to date, out of an expected primary metabolites were quantified simultaneously in cells grown in minimal medium [ ].

    Many biochemical processes can be conceptualized as complex dynamic networks on the molecular level Figure 2 , and studies of biochemical networks are assuming centre stage in systems biology [ 65 — 67 , , ]. Increasingly, we are interested in the crosstalk between the genes, transcripts, proteins, and metabolites that the gene's expression impacts upon [ , ]. Increasingly sophisticated models will be required to account for increasingly accurate and comprehensive experimental measurements. Systems approaches have already provided a deeper understanding of diverse biochemical processes, from individual metabolic pathways [ 74 , 76 ], to signaling networks [ 70 — 72 ], to genome-scale metabolic networks [ 73 , 75 ].

    Therefore, we can safely predict that systems thinking will become even more pervasive in future. The role of formal mathematical and computational models in systems approaches renders the role of bioinformatics increasingly important for systems biology research. The authors acknowledge support from Metabolomics Australia V. National Center for Biotechnology Information , U. Journal List Adv Bioinformatics v.

    Published online Feb 9. Author information Article notes Copyright and License information Disclaimer. Received Jun 4; Accepted Nov 1. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    This article has been cited by other articles in PMC. Open in a separate window. Biochemical Networks in Microorganisms We are only beginning to appreciate the full complexity and the multidimensional nature of biochemical networks operating in all living organisms Figure 2. Bioinformatic Tools for Systems Biology While many systems biology approaches involve mathematical and computational modeling, the development, maintenance, and dissemination of tools for systems biology is in itself a significant challenge.

    Future Perspectives Systems biology is rapidly gaining momentum, as evidenced by the number of publications referencing the term Figure 1. Acknowledgments The authors acknowledge support from Metabolomics Australia V. A new approach to decoding life: Annual Review of Genomics and Human Genetics. The evolution of molecular biology into systems biology. Mathematical models in microbial systems biology. Current Opinion in Microbiology. Initial sequencing and analysis of the human genome.

    The sequence of the human genome. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Yeast microarrays for genome wide parallel genetic and gene expression analysis. Systematic functional analysis of the yeast genome. Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks. Comparative and Functional Genomics. Or Control and Communication in the Animan and the Machine.

    Systems theory and biology—view of a theoretician. Systems Theory and Biology.