Skip to main content

Biological Data Integration and Model Building

  • Living reference work entry
  • First Online:
Encyclopedia of Complexity and Systems Science
  • 427 Accesses

Definition of the Subject

Data integration and model building have become essential activities in biological research as technological advancements continue to empower the measurement of biological data of increasing diversity and scale. High-throughput technologies provide a wealth of global data sets (e.g., genomics, transcriptomics, proteomics, metabolomics), and the challenge becomes how to integrate this data to maximize the amount of useful biological information that can be extracted. Integrating biological data is important and challenging because of the nature of biology. Biological systems have evolved over the course of billions of years, and in that time biological mechanisms have become very diverse, with molecular machines of intricate detail. Thus, while there are certainly great general scientific principles to be distilled – such as the foundational evolutionary theory – much of biology is found in the details of these evolved systems. This emphasis on the details of...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Abbreviations

Biochemical reaction network:

Collection of metabolic, signaling, or regulatory chemical reactions described in stoichiometric detail.

Boolean network:

A set of N discrete-valued variables, σ1,σ2,…,σ N where σ n ∈{0,1}. To each node a set of k n nodes, σ n1,σ n2,…,σ nkn is assigned, which controls the value of σ n through the equation σ n (t + 1) = f n (σ n1(t),…,σ nkn (t)). In the case of Boolean networks, the functions f n can be chosen from the ensemble of all possible Boolean functions.

Constraint-based analysis:

A modeling framework based on excluding infeasible network states via environmental, physicochemical, and regulatory constraints to improve predictions of achievable cellular states and behavior.

Data space:

Multidimensional space containing all possible states of a system; this space can be reduced using defined constraints.

Genome:

The complete DNA nucleotide sequence in all chromosomes of an organism.

Interaction network:

A graph where the nodes represent biomolecules (e.g., genes) and the edges represent defined interactions between the nodes, whether they be direct physical interactions (e.g., protein-protein binding, protein-DNA binding) or functional relationships (e.g., synthetic lethality).

Metabolome:

The complete set of small molecules which are the intermediates and products of an organism’s metabolism.

Proteome:

The complete set of expressed proteins produced by the genome.

Statistical inference network:

A network model designed from statistical inference from large-scale biological data sets to be quantitatively predictive for novel perturbations and/or environmental conditions.

Transcriptome:

The complete set of RNA transcripts produced from an organism’s genome under a particular set of conditions.

Bibliography

  • Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol 223(1):1–18

    Article  MathSciNet  Google Scholar 

  • Alm E, Arkin AP (2003) Biological networks. Curr Opin Struct Biol 13(2):193–202

    Article  Google Scholar 

  • Almaas E, Kovacs B et al (2004) Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427(6977):839–843

    Article  ADS  Google Scholar 

  • Basso K, Margolin AA et al (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37(4):382–390

    Article  Google Scholar 

  • Beard DA, Liang SD et al (2002) Energy balance for analysis of complex metabolic networks. Biophys J 83(1):79–86

    Article  Google Scholar 

  • Beard DA, Babson E et al (2004) Thermodynamic constraints for biochemical networks. J Theor Biol 228(3):327–333

    Article  MathSciNet  Google Scholar 

  • Bonneau R, Reiss DJ et al (2006) The inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol 7(5):R36

    Article  Google Scholar 

  • Burgard AP, Pharkya P et al (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84(6):647–657

    Article  Google Scholar 

  • Christopher R, Dhiman A et al (2004) Data-driven computer simulation of human cancer cell. Ann NY Acad Sci 1020:132–153

    Article  ADS  Google Scholar 

  • Cohen JE (2004) Mathematics is biology’s next microscope, only better; biology is mathematics’ next physics, only better. PLoS Biol 2(12):e439

    Article  Google Scholar 

  • Covert MW, Knight EM et al (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429(6987):92–96

    Article  ADS  Google Scholar 

  • Covert MW, Leung TH et al (2005) Achieving stability of lipopolysaccharide-induced NF-kappaB activation. Science 309(5742):1854–1857

    Article  ADS  Google Scholar 

  • Deshpande N, Addess KJ et al (2005) The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res (Database issue) 33:D233–D237

    Article  Google Scholar 

  • Duarte NC, Herrgard MJ et al (2004) Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res 14(7):1298–1309

    Article  Google Scholar 

  • Duarte NC, Becker SA et al (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci U S A 104(6):1777–1782

    Article  ADS  Google Scholar 

  • Edwards JS, Palsson BO (2000) Robustness analysis of the Escherichia coli metabolic network. Biotechnol Prog 16(6):927–939

    Article  Google Scholar 

  • Edwards JS, Ibarra RU et al (2001) In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 19(2):125–130

    Article  Google Scholar 

  • Faith JJ, Hayete B et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5(1):e8

    Article  Google Scholar 

  • Famili I, Forster J et al (2003) Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc Natl Acad Sci U S A 100(23):13134–13139

    Article  ADS  Google Scholar 

  • Faure A, Naldi A et al (2006) Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bioinformatics 22(14):e124–e131

    Article  Google Scholar 

  • Forster J, Famili I et al (2003) Large-scale evaluation of in silico gene deletions in Saccharomyces cerevisiae. OMICS 7(2):193–202

    Article  Google Scholar 

  • Francke C, Siezen RJ et al (2005) Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol 13(11):550–558

    Article  Google Scholar 

  • Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science 303(5659):799–805

    Article  ADS  Google Scholar 

  • Gianchandani EP, Papin JA et al (2006) Matrix formalism to describe functional states of transcriptional regulatory systems. PLoS Comput Biol 2(8):e101

    Article  ADS  Google Scholar 

  • Han JD, Bertin N et al (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430(6995):88–93

    Article  ADS  Google Scholar 

  • Hashimoto RF, Kim S et al (2004) Growing genetic regulatory networks from seed genes. Bioinformatics 20(8):1241–1247

    Article  Google Scholar 

  • Heinemann M, Kummel A et al (2005) In silico genome-scale reconstruction and validation of the Staphylococcus aureus metabolic network. Biotechnol Bioeng 92(7):850–864

    Article  Google Scholar 

  • Hendriks BS, Wiley HS et al (2003) HER2-mediated effects on EGFR endosomal sorting: analysis of biophysical mechanisms. Biophys J 85(4):2732–2745

    Article  Google Scholar 

  • Herrgard MJ, Palsson BO (2005) Untangling the web of functional and physical interactions in yeast. J Biol 4(2):5

    Article  Google Scholar 

  • Hoffmann A, Levchenko A et al (2002) The IkappaB-NF-kappaB signaling module: temporal control and selective gene activation. Science 298(5596):1241–1245

    Article  ADS  Google Scholar 

  • Hood L, Heath JR et al (2004) Systems biology and new technologies enable predictive and preventative medicine. Science 306(5696):640–643

    Article  ADS  Google Scholar 

  • Hua Q, Joyce AR et al (2006) Metabolic analysis of adaptive evolution for in silico-designed lactate-producing strains. Biotechnol Bioeng 95(5):992–1002

    Article  Google Scholar 

  • Hwang D, Rust AG et al (2005a) A data integration methodology for systems biology. Proc Natl Acad Sci U S A 102(48):17296–17301

    Article  ADS  Google Scholar 

  • Hwang D, Smith JJ et al (2005b) A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 102(48):17302–17307

    Article  ADS  Google Scholar 

  • Ibarra RU, Edwards JS et al (2002) Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420(6912):186–189

    Article  ADS  Google Scholar 

  • Ideker T (2004) A systems approach to discovering signaling and regulatory pathways-or, how to digest large interaction networks into relevant pieces. Adv Exp Med Biol 547:21–30

    Article  Google Scholar 

  • Ideker T, Galitski T et al (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372

    Article  Google Scholar 

  • Ideker T, Ozier O et al (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(Suppl 1):S233–S2340

    Article  Google Scholar 

  • Jamshidi N, Edwards JS et al (2001) Dynamic simulation of the human red blood cell metabolic network. Bioinformatics 17(3):286–287

    Article  Google Scholar 

  • Kauffman SA (1993) The origins of order: self organization and selection in evolution. Oxford University Press, New York

    Google Scholar 

  • Kelley BP, Yuan B et al (2004) PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res (Web Server issue) 32:W83–W88

    Article  Google Scholar 

  • Kim SY, Imoto S et al (2003) Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform 4(3):228–235

    Article  Google Scholar 

  • Kirschner MW (2005) The meaning of systems biology. Cell 121(4):503–504

    Article  Google Scholar 

  • Kitano H (2002) Computational systems biology. Nature 420(6912):206–210

    Article  ADS  Google Scholar 

  • Kurzweil R (2005) The singularity is near: when humans transcend biology. Penguin, London

    Google Scholar 

  • Lahdesmaki H, Shmulevich I et al (2003) On learning gene regulatory networks under the Boolean network model. Mach Learn 52(1–2):147–167

    Article  Google Scholar 

  • Lahdesmaki H, Hautaniemi S et al (2006) Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Process 86(4):814–834

    Article  Google Scholar 

  • Levy S, Sutton G et al (2007) The diploid genome sequence of an individual human. PLoS Biol 5(10):e254

    Article  Google Scholar 

  • Li H, Zhan M (2006) Systematic intervention of transcription for identifying network response to disease and cellular phenotypes. Bioinformatics 22(1):96–102

    Article  Google Scholar 

  • Li F, Long T et al (2004) The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci U S A 101(14):4781–4786

    Article  ADS  Google Scholar 

  • Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng 5(4):264–276

    Article  Google Scholar 

  • Margolin AA, Wang K et al (2006) Reverse engineering cellular networks. Nat Protoc 1(2):662–671

    Article  Google Scholar 

  • Mulquiney PJ, Kuchel PW (2003) Modelling metabolism with Mathematica, detailed examples including erythrocyte metabolism. CRC Press, Boca Raton

    Book  Google Scholar 

  • Pal R, Datta A et al (2005) Intervention in context-sensitive probabilistic Boolean networks. Bioinformatics 21(7):1211–1218

    Article  Google Scholar 

  • Palsson B (2004) Two-dimensional annotation of genomes. Nat Biotechnol 22(10):1218–1219

    Article  Google Scholar 

  • Papin JA, Palsson BO (2004a) The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys J 87(1):37–46

    Article  Google Scholar 

  • Papin JA, Palsson BO (2004b) Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk. J Theor Biol 227(2):283–297

    Article  Google Scholar 

  • Papin JA, Price ND et al (2002) The genome-scale metabolic extreme pathway structure in Haemophilus influenzae shows significant network redundancy. J Theor Biol 215(1):67–82

    Article  Google Scholar 

  • Papin JA, Hunter T et al (2005) Reconstruction of cellular signalling networks and analysis of their properties. Nat Rev Mol Cell Biol 6(2):99–111

    Article  Google Scholar 

  • Pharkya P, Burgard AP et al (2003) Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock. Biotechnol Bioeng 84(7):887–899

    Article  Google Scholar 

  • Pharkya P, Burgard AP et al (2004) OptStrain: a computational framework for redesign of microbial production systems. Genome Res 14(11):2367–2376

    Article  Google Scholar 

  • Pournara I, Wernisch L (2004) Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 20(17):2934–2942

    Article  Google Scholar 

  • Price ND, Papin JA et al (2002) Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. Genome Res 12(5):760–769

    Article  Google Scholar 

  • Price ND, Schellenberger J et al (2004a) Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. Biophys J 87(4):2172–2186

    Article  Google Scholar 

  • Price ND, Reed JL et al (2004b) Genome-scale models of microbial cells: evaluating the consequences of constraints. Nat Rev Microbiol 2(11):886–897

    Article  Google Scholar 

  • Reed JL, Palsson BO (2003) Thirteen years of building constraint-based in silico models of Escherichia coli. J Bacteriol 185(9):2692–2699

    Article  Google Scholar 

  • Reed JL, Palsson BO (2004) Genome-scale in silico models of E. coli have multiple equivalent phenotypic states: assessment of correlated reaction subsets that comprise network states. Genome Res 14(9):1797–1805

    Article  Google Scholar 

  • Reed JL, Vo TD et al (2003) An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol 4(9):R54

    Article  Google Scholar 

  • Reiss DJ, Baliga NS et al (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinforma 7:280

    Article  Google Scholar 

  • Rual JF, Venkatesan K et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437(7062):1173–1178

    Article  ADS  Google Scholar 

  • Sachs K, Perez O et al (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529

    Article  ADS  Google Scholar 

  • Sauer U (2004) High-throughput phenomics: experimental methods for mapping fluxomes. Curr Opin Biotechnol 15(1):58–63

    Article  MathSciNet  Google Scholar 

  • Shannon P, Markiel A et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    Article  Google Scholar 

  • Shmulevich I, Dougherty ER et al (2002a) Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18(2):261–274

    Article  Google Scholar 

  • Shmulevich I, Dougherty ER et al (2002b) From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proc IEEE 90(11):1778–1792

    Article  Google Scholar 

  • Shmulevich I, Dougherty ER et al (2002c) Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 18(10):1319–1331

    Article  Google Scholar 

  • Smith HO, Tomb JF et al (1995) Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science 269(5223):538–540

    Article  ADS  Google Scholar 

  • Stelzl U, Worm U et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968

    Article  Google Scholar 

  • Thakar J, Pillione M et al (2007) Modelling systems-level regulation of host immune responses. PLoS Comput Biol 3(6):e109

    Article  ADS  Google Scholar 

  • Thiele I, Vo TD et al (2005a) Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-scale characterization of single- and double-deletion mutants. J Bacteriol 187(16):5818–5830

    Article  Google Scholar 

  • Thiele I, Price ND et al (2005b) Candidate metabolic network states in human mitochondria. Impact of diabetes, ischemia, and diet. J Biol Chem 280(12):11683–11695

    Article  Google Scholar 

  • Tong AH, Lesage G et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813

    Article  ADS  Google Scholar 

  • von Dassow G, Meir E et al (2000) The segment polarity network is a robust developmental module. Nature 406(6792):188–192

    Article  ADS  Google Scholar 

  • Werner SL, Barken D et al (2005) Stimulus specificity of gene expression programs determined by temporal control of IKK activity. Science 309(5742):1857–1861

    Article  ADS  Google Scholar 

  • Westbrook J, Feng Z et al (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res 30(1):245–248

    Article  Google Scholar 

  • Westerhoff HV, Palsson BO (2004) The evolution of molecular biology into systems biology. Nat Biotechnol 22(10):1249–1252

    Article  Google Scholar 

  • Zou M, Conzen SD (2005) A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 21(1):71–79

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James A. Eddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this entry

Cite this entry

Eddy, J.A., Price, N.D. (2013). Biological Data Integration and Model Building. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-3-642-27737-5_34-3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27737-5_34-3

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-3-642-27737-5

  • eBook Packages: Springer Reference Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics

Publish with us

Policies and ethics