Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Definition of the Subject

Biological research over the past century or so has been dominated by reductionism – identifying and characterizing individual biomolecules – and has enjoyed enormous success. Throughout this history, however, it has become increasingly clear that an individual biomolecule can rarely account for a discrete biological function on its own. A biological process is almost always the result of a complex interplay of relationships among biomolecules (Alon 2003; Bray 2003; Hartwell et al. 1999; Hasty et al. 2002; Kitano 2002; Koonin et al. 2002; Oltvai and Barabasi 2002; Wall et al. 2004), and the treatment of these relationships as a graph is a natural and useful abstraction.

Broadly speaking, a biomolecular network is a graph representation of relationships (of which there are many types) among a group of biomolecules. Vertices or nodes represent biomolecules, including macromolecules such as genes, proteins, and RNAs, or small biomolecules like amino acids, sugars, and nucleic acids. In the next few sections, we focus mostly on the macromolecules. An edge or link between two vertices indicates a relationship between the corresponding biomolecules, which could include physical interaction, genetic interaction, or a regulatory relationship (e.g., the protein product of gene A regulates the expression of gene B). This abstraction, although simplifying, converts a complex web of biological relationships into a mathematical graph, from which we can study its structural features as well as their implications on biological functions.

Aside from biomolecular networks, other biological networks also bear significance in systems biology.

Examples include networks of protein domains and neuronal networks (Fanning and Anderson 1996; Nadvornik and Drozen 1964). Although many of the features described in this entry also apply, these networks are largely out of the scope of this entry.

Introduction

The study of small-scale biomolecular networks has a long history, stemming from “chemical graphs” introduced in the 1700s (Rouvray 1990), with knowledge about networks of metabolic reactions reaching a large scale by the 1970s.

A prominent early example may be found in the context of Monod and Jacob’s Nobel-winning research in the 1950s, which first illuminated a transcriptional regulation system, namely, control of gene expression in the lac operon (Jacob and Monod 1961; Monod et al. 1951; Monod and Jacob 1961). In their landmark work, Monod and Jacob described the operon model, in which the expression of structural genes is suppressed by the binding of the repressor to the operator gene, which in turn can be regulated by certain metabolites (Jacob and Monod 1961). Regulatory relationships among the repressor, the metabolite, the operator gene, and the structural genes in the operon model present one of the first examples of biomolecular networks (Fig. 1). In addition, Monod and Jacob also proposed several theoretical models, in the form of biochemical or enzymatic networks, as potential mechanisms of biochemical differentiation of cells with identical genomes (Monod and Jacob 1961).

Fig. 1
figure 1

Lac operon model

Pioneering work on the lac operon also includes studying functional implications of bionetwork structure. Building upon Jacob and Monod’s work, the elegant experiments of Novick and Weiner (1957) showed interesting features of beta-galactosidase induction – in a single cell, the production of lactose-degrading enzymes is either switched on or off, and the cell’s on or off state is inherited by subsequent generations. A mixture of the two types of cells has the intermediate overall level of enzyme production observed in a normal cell culture. This all-or-none phenomenon can be explained by positive feedback in the regulatory network of beta-galactosidase synthesis, where the intracellular concentration of inducer is boosted by an intermediate mechanism, which is also induced by the inducer (Fig. 1). As we can see from Fig. 1, the very first studies on gene regulation also show how the structure of biomolecular networks, even the small-scale pathways known at that time, can illuminate regulation of protein production.

Molecular biology has entered an era in which the nature, function, and relationships of macromolecules can be studied at a much larger scale. High-throughput technology platforms such as genome sequencing, microarrays, and yeast two-hybrid screening have enabled biologists to examine how nearly all genes or proteins in a genome or proteome interact or relate to one another (Gietz et al. 1997; Lockhart et al. 1996; Schena et al. 1995; Smith et al. 1985, 1986). Collections of these genomic or proteomic data have led to networks of various relationship types – including protein-protein interaction, transcriptional regulation, and sequence homology – involving thousands of proteins or genes (Cho et al. 1998; Gavin et al. 2002; Ho et al. 2002; Ito et al. 2000; Krogan et al. 2006; Lee et al. 2002; Rual et al. 2005; Uetz et al. 2000). This has offered an unprecedented “bird’s-eye view” of how parts of the network, each studied intensively on a local scale, connect to one another. This perspective may be useful in uncovering unknown pieces of the puzzle that together form the intricate machinery of life responsible for the behavior of the cell or the organism. The extremely large scale of these networks also presents a tremendous challenge as it is practically impossible to study each vertex or each interaction individually. A simplified abstract representation of these networks and mathematical methods permits automated study of the structural features, to identify local and global trends that relate biological macromolecules to their functional implications. Early work on large-scale bionetworks focused on single-color networks – networks describing a single type of relationship – while multicolor networks, networks combining multiple relationship types, have gained increasing attention in recent years.

At an abstract level, a network of protein-protein interactions and a network of computers connected through the Internet have much in common. Both have simplified representations as mathematical graphs: vertices represent proteins in the former case and computers in the latter case, and edges between vertices denote physical interactions between proteins or communication wires between computers. Concurrent with the increasing availability of large-scale biological networks, theoretical research in complex networks is advancing rapidly, with significant progress in uncovering the structural features and organizing principles of complex technological and social networks (Albert and Barabasi 2002; Bornholdt and Schuster 2003; Dorogovtsev and Mendes 2003; Newman et al. 2001). In a truly interdisciplinary effort, these principles were soon applied to biological networks, leading to the observation that biological networks share many global architectural features with networks from other complex systems (Fell and Wagner 2000; Goldberg and Roth 2003; Sole et al. 2002; Wagner 2001; Watts and Strogatz 1998), including power-law connectivity, small-world topology, and hierarchy, each discussed further below. Such resemblance suggested that many biological networks might be governed by similar “design” principles, allowing methods proven in other large, non-biological complex systems to be used to characterize the intricate relationships between biomolecules.

While some reported global features of biomolecular networks, others sought local topological patterns recurrent in the network (Alon 2003; Lee et al. 2002; Milo et al. 2002; Shen-Orr et al. 2002). Network motifs are simple patterns of interconnection in networks that occur more frequently than expected in randomized networks (Milo et al. 2002; Shen-Orr et al. 2002) and have been proposed to represent basic building blocks of complex networks (Lee et al. 2002; Milo et al. 2002; Shen-Orr et al. 2002). Different types of networks exhibit different motif profiles, providing a means for network classification (Milo et al. 2004). Dobrin et al. later showed for the E. coli transcriptional network that subgraphs matching two types of transcriptional regulatory circuit motif – feed-forward and bi-fan motifs – overlap with one another and form large clusters which further coalesce into “superclusters” (Dobrin et al. 2004).

Although some circuits corresponding to motifs may be capable of functioning independently (Mangan et al. 2006), observation of Debrin et al., among many others, leads to the idea that instances of a network motif are often interdependent on one another as opposed to being building blocks with separable functions (Zhang et al. 2005).

Cellular function is carried out through multiple types of interaction, resulting in a complex interplay of biological relationships among the myriad of cellular components. To have a truly global picture of the cell as a complex system, one needs to relate multiple types of interactions to one another. Multicolor networks are integrated networks of many relationship types, with frequent overlap between relationships of different types. For example, physically interacting proteins have expression patterns that are slightly more similar than would be expected by chance (Ge et al. 2001; Jansen et al. 2002a); genes with correlated expression are more likely to be controlled by a common transcription factor (Yu et al. 2003); and some genetic interaction types have a weak tendency to occur between homologous genes (Tong et al. 2004) (see section “Single-Color Networks” for definitions of the various interaction types).

Topological study of single-color networks is paralleled in multicolor networks. Multicolor network motifs characterize relationships between different biological interaction types within local network neighborhoods (Taylor et al. 2007; Yeger-Lotem et al. 2004; Zhang et al. 2005). Furthermore, multicolor networks also exhibit network themes – recurring higher-order interconnection patterns that encompass multiple occurrences of network motifs and reflect a common organizational principle (Zhang et al. 2005). Network themes can be tied to specific biological phenomena that represent more fundamental “design principles,” for example, the tendency for proteins within the same physical complex to be regulated by a common transcription factor (Simonis et al. 2006; Zhang et al. 2005). This suggests that instead of representing network “building blocks,” network motifs should in some cases be viewed as signatures of more fundamental higher-order structures (Zhang et al. 2005). Network themes also provide a more natural simplification of the otherwise complex set of relationships in an integrated network (Zhang et al. 2005).

Studying the architectural features of biological networks conveys useful biological intuition and also allows us to make biological predictions. Even simple methods exploiting the predictive power of local topology have proven useful in predicting physical and genetic interactions (Bader et al. 2004; Giot 2003; Goldberg and Roth 2003; Wong et al. 2004). Richer network models are able to incorporate more information and hence promise better predictions.

A multicolor network can be viewed as a graph in which each edge carries labels denoting the relationships between the corresponding pair of vertices. Similarly, each vertex may carry multiple labels (e.g., denoting that a particular protein is expressed in the pancreas or that a gene is conserved in chimpanzee). A rich network is a network in which a vector describing potentially many vertex properties is associated with each vertex, and a vector describing potentially many edge properties is associated with each pair of vertices. By incorporating a more comprehensive set of information, these rich network models provide additional power in predicting features of genes or proteins, as well as the relationships between them. In the following sections, we will survey methodology, findings, and applications for biological network models of increasing complexity, starting with single-color biomolecular networks.

As described previously, study of large-scale biomolecular networks has been enabled by the development of a variety of genomic and proteomic technologies, yielding a variety of relationship types. Saccharomyces cerevisiae, the budding yeast, a relatively simple single-celled eukaryote, has been an important model system, and many network studies originate from this yeast. Many core functions in yeast are conserved in humans, and many network trends observed in yeast have proven true for other organisms (Rual et al. 2005; Stelzl et al. 2005), leading us to believe that relationships in different organisms share similar organizational principles.

Before delving into the details of biomolecular network modeling, we add a few notes of caution. First, even the most complex network models remain a vast oversimplification of the much richer and more complex reality of cellular networks. A single “vertex” can actually correspond to hundreds or even millions of physical protein molecules carrying a variety of posttranslation modifications, each diffusing independently throughout the cell. As important nuances relating to spatial and temporal dynamics are often lost, the network abstraction is only relevant to the extent that it proves useful. As the famous statistician George Box put it, “all models are wrong, some are useful” (Launer and Wilkinson 1979). Second, the method of network abstraction and the data source/experimental may introduce systematic errors. For example, the so-called “matrix” method of inferring protein interactions (which assumes that proteins interact if each binds directly or indirectly to a common “bait” protein) will show “overclustering” by definition. Alternatively, the “spoke” method, in which interactions are only inferred between a bait protein and each prey protein that it “pulls down,” will result in “underclustering” if only a small subset of proteins are used as baits (Bader and Hogue 2002; Gavin et al. 2002; Ho et al. 2002). Therefore, one should take extra caution in interpreting observed network features to separate artifacts from real biological phenomena by using appropriate controls. Last, in searching network structure for trends, one often needs to compare the actual biomolecular network with appropriate random networks to ensure that the features identified are not a trivial consequence of a simpler phenomenon. For example, in searching for network motifs, randomized networks with the same degree distribution (see section “Single-Color Networks”) can be used as a control to avoid discovering motifs that are a trivial consequence of the underlying degree distribution (King 2004; Milo et al. 2002; Zhang et al. 2005).

Single-Color Networks

Definition and Categorization

The most basic biomolecular network is a single-color network, which contains only one type of biological interaction or relationship. The nature of the biological interaction or relationship defines the type of network, as it also specifies the nature of biomolecules involved. For interactions with clearly defined directionality, the corresponding network can be represented mathematically as a directed graph, with vertices denoting biomolecules, and edges connecting pairs of vertices whose corresponding biomolecules have directional interactions. Edge direction can represent material flow, exemplified by a metabolite network in which vertices are metabolites, and edges are enzyme-catalyzed reactions. As many metabolic reactions are practically irreversible under physiological conditions, we can use directed edges to represent the direction of a reaction, pointing from substrate to product. Alternatively, one can represent biochemical reactions using a directed edge to connect two enzymes such that the first enzyme’s metabolic product is the second one’s substrate. We refer to networks of the latter type as metabolic enzyme networks, to distinguish them from metabolite networks. Direction of interaction can also be derived from the inferred direction of information flow. For example, in a signaling pathway, a directed edge points from one protein to another immediately downstream in signal transduction. Transcriptional regulation is another type of directed biomolecular interaction, and we can construct a network of transcriptional regulation from directed edges connecting transcription factors to the genes they regulate. In addition to transcriptional regulation, gene regulation in the cell also happens posttranscriptionally, for example, through microRNAs (or small RNAs) which base pair with mRNAs and influence their stability or translation. Researchers have constructed microRNA networks by connecting microRNAs with their target mRNAs based on experimental identification at a limited scale (Farh et al. 2005; Sood et al. 2006; Stark et al. 2005) or based on computational predictions (Rajewsky 2006; Tsang et al. 2007). Posttranslational regulation is another important part of the cellular regulatory process. One example of a posttranslational regulatory network is a protein modification network such as a protein phosphorylation network, in which each directed edge connects a kinase with its protein substrate or phosphorylation target (Ma’ayan et al. 2005; Ptacek et al. 2005).

Single-color networks that capture symmetric relationships can be represented by undirected graphs, with each vertex denoting a biomolecule, and an edge connecting vertices whose corresponding biomolecules have a symmetric relationship. An important example is a protein network of physical interactions, in which edges connect proteins that physically associate with each other with sufficiently small dissociation constants. Such association may be via direct physical contact or indirect via one or more intermediates.

Genetic interaction is another important type of biomolecular relationship in studying cellular functions. It describes pairs of genes for which the phenotype of the double mutant departs from expected under some null model of independence, e.g., gene pairs for which the fitness of the double mutant deviates from the product of the fitness values for each of the two single mutants. Synthetic sick genetic interactions (synthetic lethality in the extreme case) correspond to gene pairs for which double mutant fitness is lower than expected. These synthetic sick or lethal (SSL) interactions can be represented as an undirected network. SSL interactions of ten correspond to genes which have redundant or compensatory function. They are also slightly more likely to encode proteins that interact physically (Tong et al. 2004). If the double mutant phenotype for a pair of genes is less extreme than expected, the corresponding gene pair is said to have an antagonistic (or alleviating) genetic interaction. Although we can use undirected edges for alleviating interactions, in some cases additional information can be conveyed using directed edges. For example, if phenotypes arising from mutations in genes A and B differ and the double mutant phenotype most closely approximates gene A, we may draw a directed edge from gene A to gene B. With the aid of other prior biological knowledge, such directed edges can be useful in assigning order of action in biological pathways (Avery and Wasserman 1992; Juvan et al. 2005). There exist many other types of genetic interaction (Drees et al. 2005; St Onge et al. 2007).

Another relationship type is sequence homology (describing two genes with similar sequences that each descended from a common ancestor). Other biomolecular relationships may be derived from shared biomolecular properties, for example, the relationship of similar mRNA expression profile (Wen et al. 1998), similar phenotype (Giaever 2002; Gunsalus et al. 2005), or similar gene function (Lee et al. 2004). Because there are many different properties of biomolecules, many “similar-property” networks are possible. Above, we sampled types of biomolecular interactions or relationships used in network studies to date. However, this is not meant to be an exhaustive list, as there are many more relationship types, some unknown to us and some for which large-scale data is not yet available. As more large-scale experimental methods are developed and applied to biological research, more biomolecular relationships can be represented as networks.

A few special types of graphs have particular relevance to the study of large-scale biomolecular networks. These include bipartite and quasi-bipartite graphs. A bipartite graph is a graph with two disjoint vertex sets, such that no edges connect pairs of vertices within the same set. For example, a bipartite graph might consist of a set of gene vertices and a set of protein vertices, together with edges that each corresponds to a relationship described by “gene X encodes protein Y.” If one of the two vertex sets contains within-set edges while the other does not, the graph is called quasi-bipartite. Certain biomolecular networks are intrinsically quasi-bipartite. For example, transcriptional regulatory networks contain edges pointing from genes encoding transcription factors to the regulated genes. Thus, vertices may be divided into transcription factors and non-transcription factors. While the latter vertex set, by definition, does not contain any within-set links, a transcription factor may regulate the expression of another transcription factor, so that the network is quasi-bipartite. Quasi-bipartite graphs can also arise as a simple consequence of experimental design. Protein interaction networks derived from yeast two-hybrid and genetic interactions derived from systematic genetic array (SGA) analysis are two examples, where only a subset of genes have been used as “baits” or “queries.” In network analyses, care must be taken to avoid discoveries that are trivial consequences of bipartite or quasi-bipartite properties arising from the experimental design.

Tools and Methodology

Studying global structural features of a network involves many different measures of graph properties (Diestel 2005a). Here we survey a few frequently used measures below.

Degree (or connectivity) is the most basic characteristic of a vertex. The degree of vertex V, k(V), is defined as the number of edges incident on vertex V and tells us how connected V is with other vertices in the network. In the case of directed networks, there is an outgoing degree (or out-degree) k out (V), denoting the number of edges that start from V, and an incoming degree (or in-degree) k in (V), denoting the number of edges that end at V.

The clustering coefficient, C(V), measures the connectivity among the neighbors of a vertex V and can be defined as the number of edges among V’s neighbors, divided by the total possible number of edges if all the neighbors were connected to one another. The average clustering coefficient of a network characterizes the general tendency of vertices to form clusters (Watts and Strogatz 1998).

In a network, distance between two vertices U and V, d(U, V), is measured as the number of edges in the shortest path from U to V. In an undirected graph, d(U, V) satisfies non-negativity, symmetry, and the triangle inequality of a distance metric. In the case of directed graph, however, all edges in a path must be in the same direction. While non-negativity still holds, the distance d(U, V) from U to V could be different from the distance d(V, U) from V to U so that symmetry is not necessarily satisfied nor is the triangle inequality. Hence, strictly speaking, d(U, V) is not a distance metric in directed graphs. The overall connectedness of a network can be quantified by its characteristic path length, defined as the average distance between pairs of vertices in the network.

A subgraph of a graph G is a graph whose vertex and edge sets are subsets of those of G. A component is a subgraph such that all vertices are connected to one another by some path, and there are no connections to vertices outside of that subgraph. If a component contains a majority of a graph’s vertices, it is referred to as the giant component. Other subgraphs of interest are cliques, a set of completely connected vertices. Such dense subgraphs bear particular relevance to protein complexes, which often present themselves as cliques, or dense but incompletely connected subgraphs, in a protein network of physical interaction. One can “seed” the search for such dense subgraphs in the network (as a proxy search for protein complexes) using k-cores – subgraphs such that each vertex has at least k neighbors within that subgraph (Bader and Hogue 2002).

Global Structural Features

Topological features of biomolecular networks have been subjected to extensive study (reviewed in Barabasi and Oltvai 2004a). Below we describe some features of particular relevance to network structure and function.

Power-Law Networks

In a biomolecular network, some vertices are better connected than others. To characterize how connectivity varies from vertex to vertex, one can calculate the degree distribution – i.e., the distribution of P(k), the fraction of vertices with degree k. The Erdös-Rényi (ER) model describes random networks in which each vertex pair is assigned an edge randomly with a probability that is fixed for all vertex pairs. In other words, each edge is assigned or not assigned according to the outcome of a “weighted coin flip” (also known as a Bernoulli trial). For such ER networks, P(k) follows a binomial distribution but can often be approximated well and conveniently by a Poisson distribution. The degree distributions of biomolecular networks (and many other complex networks such as food webs and social networks) tend to fall off at high degree much more slowly than a Poisson and often follow a power-law distribution \( P(k)\sim k{}^{-\lambda } \) (Fig. 2) (Barabasi and Albert 1999). The degree exponent λ in most biomolecular networks seems to have a value between 2 and 3 (Barabasi and Albert 1999). The most intuitive (and perhaps most useful) feature of this power-law distribution is that most vertices in the network have only a few edges, while a few vertices have many edges. These highly connected vertices are called hubs, which are sometimes described to “hold the network together” since removal of a hub has a particularly high tendency to break the network up into separated components. Networks with a power-law degree distribution (or sometimes networks that only follow a power law in the high-degree tail of the distribution) are called scale-free networks. The term scale-free is meant to indicate that there is not a typical vertex in the network that can characterize the rest of the vertices. For some random networks, many vertices have degrees equal or close to the average degree; however, it should be noted that ER networks with a low edge probability can be easily confused with power-law networks (Han et al. 2005). Arguments have also been made that scale-rich is a more appropriate description of graphs with a power-law degree distribution (Tanaka 2005). For clarity, we prefer the term power-law network.

Fig. 2
figure 2

Degree distribution of transcriptional regulatory networks. (a/b) Incoming and outgoing degree distributions of a manually curated network; (c/d) incoming degree distribution of two large-scale transcriptional regulatory maps, fitted with power law (c) and exponential (d)

Networks of metabolic reactions were the first biomolecular networks to be characterized as power law (Jeong et al. 2000; Wagner and Fell 2001). Jeong et al. analyzed the metabolite networks of 43 organisms across all three domains of life – bacteria, archaea, and eukaryotes – and discovered that despite significant variation in the components and pathways, all metabolite networks have similar topological scaling properties (Jeong et al. 2000). As many biochemical reactions are irreversible under physiological conditions, metabolite networks should be represented as directed graphs. The distribution of both in-degrees and out-degrees follows power-law distributions, i.e., \( P\left({k}_{\mathrm{in}}\right)\sim {k}_{\mathrm{in}}^{-{\lambda}_{\mathrm{in}}} \), \( P\left({k}_{\mathrm{out}}\right)\sim {k}_{\mathrm{out}}^{-{\lambda}_{\mathrm{out}}} \), and the values of both λin and λout lie within a narrow band of 2.0–2.4 (Jeong et al. 2000). This demonstrated that a power-law degree distribution is a common feature of cellular metabolism, with most substrates involved in one or two reactions, while a few molecules, such as coenzyme A, function as metabolic hubs by participating in several dozen reactions.

Other biomolecular networks have also been described as power law, including networks of physical interactions between proteins, transcriptional regulatory networks, and networks of SSL genetic interactions. The protein interaction map of the yeast Saccharomyces cerevisiae as measured by systematic two-hybrid screens (Ito 2001; Uetz et al. 2000) exhibits a power-law degree distribution, as do physical interaction maps of multicellular organisms – the worm C. elegans, the fly D. melanogaster, and the human (Giot 2003; Jeong et al. 2001; Li 2004; Wagner 2001; Yook et al. 2004). The degree distribution of correlated expression networks based on microarray data has been shown to follow power law (Agrawal 2002; Featherstone and Broadie 2002). The outgoing connections from transcription factors to target genes also exhibit power-law behavior (Fig. 2b; Guelzim et al. 2002; Lee et al. 2002; Thieffry et al. 1998). The incoming degree distributions in these networks have been found instead to follow an exponential decay according to manually curated data (Fig. 2a; Guelzim et al. 2002; Thieffry et al. 1998). However, analysis of more recent large-scale data (Harbison et al. 2004; Lee et al. 2002; MacIsaac et al. 2006) suggests that power-law distribution may provide a slightly better fit (Fig. 2c, d). The resolution of this awaits further refined data and analysis, but it is clear that the distribution of in-degrees and out-degrees in a directed graph may behave differently. Similar to directed networks, the degree distribution of quasi-bipartite networks should also be examined separately for the different vertex sets.

Closer examination of several biomolecular networks showed that their degree distributions exhibit better fit to a truncated power law than a power-law model (Amaral et al. 2000; Rual et al. 2005). A truncated power-law distribution follows a power-law distribution with an “exponential envelope,” such that power-law behavior dominates the distribution at low degrees, while the exponential function dominates the distribution at the high-degree tail of the distribution. Truncated power-law behavior may arise from constraints on the number of interaction partners for high-degree vertices (Amaral et al. 2000). However, biological implications of the differences between these degree distributions remain to be demonstrated in most cases. It is nevertheless clear that biomolecular networks share the common feature that a few vertices have many edges, while most vertices have only a few edges.

The ubiquity of power law and presence of hubs in biomolecular networks prompt us to ask what advantage such network architecture might offer. Some argue that the answer lies in robustness. Intuitively, random removal of a considerable number of vertices could disintegrate a network, breaking it down into small, disconnected groups of vertices. This is true for Erdös-Rényi random networks (Barabasi and Oltvai 2004a; Strogatz 2001). Power-law networks, such as the electrical power grid, are extremely resilient to accidental vertex failures. Even if 80 % of the vertices are randomly removed, the rest often still form a connected component. This is confirmed by both numerical and theoretical analyses examining how removing vertices of varying degrees impacts the average path length and size of the giant component (Albert et al. 2000; Broder 2000; Callaway et al. 2000; Cohen et al. 2000). As most vertices in power-law networks have small degrees, any vertex that fails is most likely one of them, and hence the damage will be negligible. However, such robustness is accompanied by the weakness that deliberate attacks on hubs are much more likely to create disconnected components and hence particularly devastating. Some believe this notion may apply beyond power grids to biological networks, and we discuss this possibility further in section “Applications.”

A related question is the evolutionary origin of power-law networks. Building upon Simon’s early work (Bornholdt and Ebel 2001; Simon 1955), Barabasi and Jeong proposed a growth and preferential attachment model (Barabasi and Albert 1999). Under this model, the network must emerge from gradual addition of vertices and edges. Preferential attachment specifies that the newly added vertices connect to the rest of the network by preferentially attaching themselves to highly connected vertices (e.g., with a probability proportional to the degree of the target vertex). This rich-get-richer mechanism offers an explanation for hubs. In protein networks, growth and preferential attachment may result from gene duplication, which produces initially identical proteins with the same interaction partners. The more interactions possessed by a protein, the more likely it is to acquire a new interaction through duplication of an interaction partner, resulting in the rich-get-richer phenomenon associated with preferential attachment.

Neighborhood Clustering

Neighborhood clustering describes a network characteristic that resembles the phenomenon of neighbors of my neighbors tend to be neighbors of one another. Above, we defined a clustering coefficient to quantify this local property of a graph. The average clustering coefficient of all vertices in the graph characterizes the global tendency toward clustering in the network (Watts and Strogatz 1998). Several biomolecular networks, for example, the metabolic network (Ravasz et al. 2002) and physical interaction networks, have a significantly higher average clustering coefficient than random networks of equivalent size and degree distribution, suggesting modular organization in addition to power-law degree distribution. A partially determined human interactome network, however, was found to be less clustered than random networks with the same degree distribution (Rual et al. 2005). This could be because only a subset of edges in the complete network was sampled, or it might suggest “clustering” of protein-protein interaction networks in other organisms has been artificially boosted by previous experimental methods (Rual et al. 2005). It is also possible that the human interactome network indeed is less clustered than other organisms. Further experimentation is required to resolve this apparent discrepancy.

Small World

Another important feature of biomolecular networks is that they are frequently small-world. A small-world network is defined by two properties: (1) short characteristic path length, meaning that the length of the shortest path between any pair of vertices tends to be very small (this is sometimes confusingly called “the small-world property,” despite there being two properties of a small-world network), and (2) high clustering coefficient, i.e., densely connected local neighborhoods as described above (Watts and Strogatz 1998). Random networks following the ER model satisfy the first requirement but not the second, whereas some regular lattices possess the second property but not the first. Both metabolite networks and physical interaction networks have been shown to be small-world (Fell and Wagner 2000; Sole et al. 2002; Wagner and Fell 2001), Short characteristic path length in small-world communication networks allows efficient signaling between vertices, in that local perturbations can propagate rapidly to the entire network. Small-worldness seems to be evolutionarily conserved, as the metabolite network of a parasitic bacterium exhibits roughly the same average path length as observed for a large multicellular organism (Jeong et al. 2000; Wagner and Fell 2001).

Disassortativity

In some networks, highly connected vertices tend to link to vertices with few edges, avoiding connections to other hubs. This is the disassortative property. Disassortativity is shared by both metabolic networks and protein-protein interaction networks, as well as technological networks such as the World Wide Web, but not in social networks which are assortative (Maslov and Sneppen 2002; Newman 2002; Pastor-Satorras et al. 2001). The avoidance of hub-hub connections may come from the compartmentalization of many cellular processes (Hartwell et al. 1999), although other explanations have been proposed. For example, disassortativity may provide some protection against deliberate attacks to the hub vertices. As the robustness of power-law networks comes with an increased vulnerability to deliberate attacks against hubs, suppression of hub-hub connection has been suggested to alleviate or slow down propagation of such deleterious perturbations (Maslov and Sneppen 2002). While it is difficult to know how applicable the communication network analogy is to biological networks, these ideas are certainly thought provoking.

Network Motifs

The average clustering coefficient discussed above can be expressed in terms of the number of triangles in a network relative to three times the number of paths of length two (Newman et al. 2001). It is then interesting to ask whether there are other local patterns of connectivity that are overrepresented within biological networks. Such patterns are called network motifs. Demonstrating enrichment of a given connectivity pattern requires counting the number of subgraphs in the network matching the pattern and comparing the count to that observed in many randomized versions of the same network (Milo et al. 2002).

Milo et al. initially proposed that network motifs represent elementary building blocks of cellular networks. The transcription regulatory networks in E. coli and S. cerevisiae both exhibit enrichment for the feed-forward loop and the bi-fan loop (Fig. 3), which can be attributed to specific tasks in information processing (Milo et al. 2002; Shen-Orr et al. 2002). In particular, the feed-forward motif (Fig. 3) involves two transcription factors, one regulating the other and both controlling the expression of a third target gene. The feed-forward motif can serve as an AND gate that corresponds to a low-pass filter which suppresses out transient signals while preserving persistent signals, as is the case in the araBAD operon in the arabinose feed-forward loop (Schleif 2000).

Fig. 3
figure 3

Single-color network motifs

Different types of networks exhibit distinct sets of motifs. For example, the food web exhibits a “three-chain” motif and a bi-parallel motif, in contrast to the feed-forward motif and bi-fan motif in transcriptional regulatory networks (Fig. 3; Milo et al. 2002). Thus, network motifs provide insight into common mechanisms of interaction within a network. The abundance profile of network motifs for a given network serves as a fingerprint for classifying networks into different superfamilies (Milo et al. 2004).

A given network motif may have many instances within a network. The relationship of network motif instances to one another poses an interesting question. Observations made in the E. coli transcriptional regulatory network show that instances of both the feed-forward motif and the bi-fan motif do not exist in isolation. Instead, they tend to overlap with one another and form large clusters, which then coalesce into superclusters (Dobrin et al. 2004). This suggests hierarchical organization of biomolecular networks and that motifs should not necessarily be viewed as elementary building blocks of complex networks but might instead be thought of as signatures of higher-order structures corresponding to distinct cellular processes. The fact that many motif clusters overlap with known biological functions (Dobrin et al. 2004) interferes between motif instances. We further discuss this notion below.

Modularity and Hierarchy

Clustering of network motifs suggested presence of higher-order structures in biomolecular networks (Dobrin et al. 2004). The search for enrichment of larger patterns (involving more than four vertices) in large networks is computationally challenging due to the exponential computational complexity associated with enumerating increasingly large subgraphs in biomolecular networks. An alternative approach to brute-force computation or sampling methods is to identify densely connected clusters or network modules. Various clustering methods have been used to characterize modules using network topology (Girvan and Newman 2002; Holme et al. 2003; Ravasz et al. 2002; Rives and Galitski 2003; Schuster et al. 2002; Snel et al. 2002; Spirin and Mirny 2003). Note that the ideal module is a collection of gene products that work together as a unit, which may plug in to perform its molecular function in different contexts. Densely connected clusters may be useful in identifying modules, but we should separate the ideal concept from its surrogate measure (Hartwell et al. 1999).

The concept of relatively isolated modules may seem contradictory to the power-law characteristic of biomolecular networks. Both naturally exist (Ravasz and Barabasi 2003; Ravasz et al. 2002), however indicating network modules do not exist in isolation but combine at various levels to form a hierarchical network. The hierarchical structure of biomolecular networks poses a challenge in identifying modules, as there are no clear boundaries between them. However, understanding networks’ hierarchical organization can also aid the module identification, at least in the transcriptional regulation network. For example, the transcriptional regulatory networks of E. coli and S. cerevisiae both exhibit a multilayer hierarchical structure (Ma et al. 2004; Yu and Gerstein 2006), and removing global regulators in the top layers of the hierarchy decomposes the network into functionally relevant modules (Ma et al. 2004). Recent work also identified “origons” or “organizer” structures within the transcriptional regulatory network that originate at a distinct class of sensor transcription factors and serve as functional modules for various environmental perturbations (Balazsi et al. 2005; Farkas et al. 2006). Combining topological information with other genomic or proteomic data can lead to more biologically relevant modules (Bader and Hogue 2002; Bar-Joseph 2003; Ihmels 2002; Jansen et al. 2003; Juvan et al. 2005; Stuart et al. 2003; Tornow and Mewes 2003). We describe this further in the next section.

Applications

Structural features of biomolecular networks can be used to infer organizing principles of biomolecular interactions and relationships. The power-law nature suggests robustness of biomolecular networks against random attacks, and models of network designation evolution – e.g., growth and preferential attachment – serve as additional support to the gene duplication model of protein network growth. The small-world designation suggests the potential for rapid communication within biomolecular networks, with local perturbations reaching the entire network via a small number of links while also reflecting clustering due to modularity, e.g., as a result of protein complexes. Network motifs are local interconnection patterns that are often footprints of higher-order structure (i.e., functional units composed of more vertices than are considered in network motifs). Different types of biomolecular network have distinctive profiles of the relative abundance of network motifs, so that network motifs can also be used to characterize and compare complex networks.

More practically, structural characteristics of biomolecular networks can be used to make biological inferences. Experimentally derived networks are inevitably susceptible to random as well as systematic errors, and characteristics of network structure can be used to assess confidence of edges in biomolecular networks and also make novel predictions. Goldberg et al. examined a list of S. cerevisiae protein-protein interactions and were able to stratify this list into many subsets of varying accuracy by exploiting the neighborhood clustering feature of small-world networks (Goldberg and Roth 2003). Other methods using network topology also proved useful in predicting protein interaction networks of other organisms (Giot 2003), for example, by exploiting conservation of network motifs (Albert and Albert 2004).

High clustering in biomolecular networks often represents physical organization of biomolecules, particularly in the case of physical protein interaction maps. Therefore, identifying densely connected subgraphs (clusters) can help detect protein complexes (Bader and Hogue 2003; King 2004; Rives and Galitski 2003; Spirin and Mirny 2003), In addition, topological modules are believed to represent specific biological processes or functions (Bader and Hogue 2002; Bar-Joseph 2003; Ihmels 2002; Jansen et al. 2003; Stuart et al. 2003; Tornow and Mewes 2003). Hence, identifying modules can also help assign biological processes or functions to genes in a network.

Features of vertices can also be inferred from network topology. As discussed above, power-law networks are robust with regard to random vertex failures, but highly sensitive to targeted attacks against hubs. Whether this notion of robustness applies beyond electrical power grids to biological networks is subject to debate. In protein networks that include co-complexed relationships, for example, many proteins are hubs by virtue of their membership in protein complexes (appearing as cliques). Removal of such hubs is not highly disruptive in terms of generating disconnected components, and yet there are many essential complexes for which the removal of a single protein member disrupts the physical assembly of the whole complex and results in death for the organism.

Nevertheless, genome-wide deletion analysis in S. cerevisiae demonstrated a correlation between degree and essentiality: only ~10 % of the genes encoding proteins with degrees less than 5 are essential, as compared to ~60 % for genes encoding proteins with 15 or more interactions (Giaever 2002; Jeong et al. 2001; Winzeler 1999). Thus, protein connectivity can to some extent predict the phenotype of its corresponding gene. Recent work on metabolic networks also demonstrated that certain topology-based vertex characteristics can accurately predict knockout phenotypes of the corresponding enzyme (Wunderlich and Mirny 2006).

In the case of SSL genetic interaction networks, highly connected genes are often more critical to fitness than genes of less connectivity as they “buffer” a large number of genes to maintain the cell’s survival (Tong et al. 2004). Upon disruption of a hub gene in the SSL network, mutations in an increased number of genes will result in fitness defect. It has been hypothesized that hub genes could be useful targets for anticancer drugs to specifically kill cancer cells as they often carry many additional mutations, while normal cells can be protected due to genetic buffering by the hub gene (Tong et al. 2004).

Examples of structural features and applications discussed thus far have focused on networks composed of a single type of relationship. However, as there are many biological interaction or relationship types, it is worthwhile to combine information from various biomolecular networks. In the next section, we discuss multicolor networks that represent more than one biological interaction/relationship simultaneously.

Multicolor Networks

Definition and Categorization

Previously, we described various types of single-color networks, each containing edges that correspond to a single type of relationship between biomolecules. Different types of interactions, however, do not exist independently. Instead, they are interrelated with one another and offer different perspectives on the underlying biological mechanisms. It is therefore useful to consider multiple biomolecular relationships simultaneously by integrating them into one multicolor network (Fig. 4). Each edge type can be represented by a color that indicates the type of biological interaction or relationship. Here we focus on multicolor networks where each vertex corresponds to a gene and its corresponding protein(s). As a multicolor network can be viewed as an overlay of multiple single-color networks, the structural features of single-color networks discussed previously still hold true for the individual component single-color networks, but additional properties of multicolor networks emerge.

Fig. 4
figure 4

A multicolor network

Features and Implications

Correlation of Edge “Colors”

The first step in studying multicolor networks is to examine how various edge types correlate to one another. Various statistical methods can be used to test significance of correlation. Without considering network topology, one can treat pairs of vertices in the network as independent data points and assess significance of overlap between two edge types analytically under the null hypothesis of the hypergeometric distribution. Network topology, however, can have significant impact on edge correlation. For example, as most single-color biomolecular networks contain highly connected hubs, if many of the hubs in one network are hubs in another network, the overlay of the two single-color networks will, as a result, have considerable overlap of the two interaction type. To test edge correlation more rigorously in the context of network topology, we can generate randomized versions of the multicolor network by rewiring edges in each component single-color network independently while preserving certain topological features, for example, degree distribution (Yeger-Lotem et al. 2004; Zhang et al. 2005). We can then determine the significance of correlation by calculating the fraction of random networks that contain more two-color edges than the real network. This approach can also be generalized to assess significance of overlap of more than two edge colors.

Several studies have noted the correlation between different types of biological interactions. For example, interacting proteins are more likely to have similar expression pattern (Ge et al. 2001; Jansen et al. 2002a); genes with correlated expression are more likely to be controlled by a common transcription factor (Yu et al. 2003); and SSL genetic interactions are more likely to occur between homologous genes (Tong et al. 2004). Although each of these correlations are statistically significant, no single pair of interaction types shows perfect overlap (and indeed the level of correlation is low in each example). Thus, there is dependence among many of the observed types of biomolecular interactions, and yet each type offers a distinct view of the same complex biological system.

“Multicolor” Network Motifs and Themes

The network motif concept, described previously for single-color networks, is extensible to a multicolor network. Multicolor network motifs characterize relationships between different biological interaction types within local network neighborhoods. Yeger-Lotem et al. examined network motifs in combined cellular networks of two interaction types – transcriptional regulation and protein interaction (Yeger-Lotem et al. 2004). They identified three multicolor motifs of three genes – two interacting transcription factors that co-regulate a third gene, two interacting proteins co-regulated by the same transcription factor, and a “mixed feedback loop,” where a given transcription factor and the product of its target gene form a protein interaction with each other. More multicolor motifs were characterized in an integrated network of five interaction types – transcription regulation, physical interaction, SSL genetic interaction, sequence homology, and correlated mRNA expression (Fig. 5; Zhang et al. 2005). These motifs can be explained in terms of high-order network structures or network themes (Fig. 5), which are representative of the underlying biological phenomena. One interesting example is motifs containing two vertices linked by protein interaction or correlated expression, such that both vertices connect to a third vertex either by SSL interaction or by sequence homology. This suggests the theme that a protein and a distinct protein complex/biological process have compensatory functions, resulting in SSL relationships between the protein and any component essential to the compensatory complex/process. Many four-vertex motifs that correspond to the compensatory complex theme (Fig. 5) further support this notion. These four-vertex motifs also suggested that some instances of the “compensatory protein complex/process” theme could simply be signatures of compensatory pairs of complexes or processes, rather than a single protein acting alone to compensate for loss of another process. An excellent example of the compensatory complex theme is the pairing of endoplasmic reticulum (ER) protein-translocation subcomplex (Hanein et al. 1996) and the Gim complex (Geissler et al. 1998), with each complex connected to the other by many SSL interactions (Tong et al. 2004; Zhang et al. 2005).

Fig. 5
figure 5

Multicolor network motifs and themes

Thematic Maps

In the previous section, we described the idea that network motifs may in general be footprints of larger network structures. Organization of some multicolor network motifs into larger structures of network themes further supported this notion and also confirmed that the concept of modularity discovered in single-color networks is extensible to networks of multiple interaction types. As such modules in a multicolor network can exist in various sizes and exhibit a combinatorial number of different connection patterns, systematically identifying them by prevalence of network motifs of a fixed size can be impracticable. Network theme representations, in which a simplified network is drawn in terms of specific network themes (Zhang et al. 2005), is a convenient way of examining the system. In the example of the compensatory complex theme, we can collapse members of the same complex into one single vertex and draw links between complexes with significantly enriched inter-complex SSL interactions (Zhang et al. 2005). The resulting thematic map of compensatory complexes (Fig. 6) serves as a guide to the “redundant systems” within the integrated S. cerevisiae network (Zhang et al. 2005). Similarly, one can create maps of “regulonic complexes” – complexes for which the corresponding genes are co-regulated – by linking each transcription factor to protein complexes that are significantly enriched for genes targeted by that transcription factor, providing a schematic view of the transcriptional circuitry in the cell (Simonis et al. 2004, 2006; Zhang et al. 2005).

Fig. 6
figure 6

Network themes

Applications

Multicolor networks integrate several biomolecular interaction types and contain information that is richer than the component single-color networks taken individually. Hence, they can be used to make more accurate biological inference. Correlation between biological interaction types alone can be useful, especially when a large number of interaction types are combined together. A Bayesian approach achieved considerable success in predicting physical interactions between proteins (Jansen et al. 2003). To capture the additional information from interdependency of interaction types, one can examine the conditional probability of a physical interaction given all possible “multicolor relationships” (each corresponding to a particular configuration of single-color relationships) (Jansen et al. 2003; Milo et al. 2002). However, such an approach scales poorly as the number of interaction types grows (the number of multicolor relationships grows as 2N, where N is the number of interaction types, so that there will be increasingly many multicolor relationships with no training examples). More sophisticated methods, such as logistic regression (Bader et al. 2004; Giot 2003), decision trees, or random forests, can model the interdependency between interaction types in a more scalable manner, allowing the combination of a large number of biomolecular relationships.

Studies using such methods can significantly improve the quality of predictions (Jansen et al. 2002b, 2003; Qi et al. 2005; Wong et al. 2004; Zhang et al. 2004).

Multicolor network motifs put correlations of interaction types in the context of local topology and provide potential to further improve biological inference. Specifically, one can predict a certain type of link between a given pair of vertices if its addition completes a structure matching an enriched network motif. For example, two genes with a common SSL interaction partner may have an increased probability of protein interaction, as addition of a physical interaction edge results in a match of a motif depicted in Fig. 5 (G1). Similarly, a SSL edge between two genes can complete a match to the same motif, if the two genes are indirectly connected by physical interaction followed by SSL. This “two-hop” relationship of physical interaction and SSL proves to be a strong predictor of SSL interactions (Wong et al. 2004).

More broadly, we can also predict an interaction if its addition fits into a recurring network theme. For instance, there are many more SSL interactions between the ER-translocation complex and the Gim complex than would be expected by chance given the out-degrees of both complexes in the SSL network. However, no SSL interaction has been observed between Sec62 or Sec63, two members of the ER protein-translocation subcomplex, and any protein in the Gim complex because Sec62 and Sec63 were not used as queries in the SGA analysis (Tong et al. 2004). We can therefore hypothesize that Sec62 or Sec63 is SSL with many members of the Gim complex (Zhang et al. 2005).

Network themes represent modular organization of the network at the functional level and can be used to predict the function or biological process in which a gene is involved. The rationale is that if a gene is part of a module where a particular function is enriched, then there is increased probability that this gene also holds the same function. For example, in the feed-forward theme depicted in Fig. 5, most of the genes regulated by both Mcm1 and Swi4 are involved in the control or execution of the cell cycle. Therefore, we can hypothesize that Yor315w, a protein about which little is known, is also involved in cell cycle regulation. Indeed, this protein is a target of cyclin-dependent kinase Cdk1, a primary regulator of the cell cycle. Predictions based on network themes may be robust to errors in the input data, since they depend on connectivity patterns in extended network neighborhoods instead of one or very few links.

It is worth emphasizing that predictions based on network motifs and themes are more suggestive than definitive. Above, we focused on organizational principles that can be utilized for biological inference. Such inferences should be combined with prior information from as many sources as possible and should be tested experimentally. In the next sections, we will touch upon rich network models that can be used for graphical data integration. We also refer readers to entry “Biological Data Integration and Model Building” in this encyclopedia for more general approaches to data integration.

Rich Network Models

Networks of Labeled Vertices and Edges

Just as the relationships between biomolecules come in many colors, the biomolecules themselves (the vertices) also exhibit a range of properties. These include intracellular localization, mRNA expression level, cellular function, and protein complex membership, to name just a few. A network labeled vertices and edges captures multiple properties of biomolecules as well as relationships among them. Extending the concept of multicolor networks, one can build a rich network model using a network with vector-labeled vertices and edges. Each vertex or edge can be labeled with a vector, such that each variable in the vector represents some vertex property or edge type. The rich information captured by such a model can be used to infer types of interactions or characteristics of vertices (Getoor et al. 2004).

Hypergraphs

Previously, we mentioned that sharing of the same or similar vertex property can be viewed as a type of biomolecular relationship and hence represented using an edge. A hypergraph can be used to more explicitly represent a shared property of multiple vertices. A hypergraph is a generalization of a graph where edges can connect any number of vertices, instead of only two as in conventional graphs, and a hypergraph edge (or “hyperedge”) can be used to connect all vertices sharing the same property. When further generalized to contain multiple types of edges, hypergraphs provide an alternative to networks of labeled vertices and edges described earlier. Hypergraphs have been used, for example, to model single-color networks of protein domains (Freudenberg et al. 2002). However, it remains to be seen whether this provides a more useful abstraction than a rich network representation in which vertices and edges are each labeled with vectors.

Weighted Graphs and Probabilistic Networks

Network models discussed thus far have mostly been discrete. For example, an edge in each single-color network either exists or does not exist. In the rich network models we discussed, variables used to describe a vertex or edge characteristic can be binary or categorical variables. However, some properties of a vertex may have continuous values, for example, a gene product may be described by its corresponding mRNA expression level. An edge property may also be continuous, e.g., affinity of a physical interaction or coefficient of correlation between mRNA abundance profiles. Such quantitative information can also be incorporated in a rich network model, simply by including continuously valued variables within the vertex or edge vectors.

One special case of a weighted graph is a probabilistic network, in which the weight on each edge represents a posterior probability that the edge is real, given the evidence supporting that edge. Probabilistic networks are particularly useful when knowledge of relationships is uncertain. This is often the case for biological data, sometimes due to inaccuracy of measurement or varying levels of confidence in a computationally inferred relationship (Asthana et al. 2004; Zhang et al. 2004). Probabilistic networks have been used to predict novel members of partially known protein complexes, for example, by sampling from networks of physical interaction from multiple data sources based on their corresponding confidence levels (Asthana et al. 2004; Bader 2003). Algorithms have also been developed to identify protein complexes by finding dense subgraphs within a weighted graph (Li et al. 2007).

Future Directions

Research on the structure and function of biomolecular networks has covered a lot of ground in the past decade. So far, we have studied global topological features such as power-law degree distributions, small-worldness, disassortativity, and associated biological implications such as robustness and dense neighborhood clustering. We have also discussed the hierarchical organization of networks, with network “structures” ranging from network motifs to dense clusters and to pairs of clusters, densely connected within each cluster by one “color” of interaction and connected to one another by interactions of a different color. We have also explored the roles of these structures in cellular functions, for example, the involvement of certain network motifs in information processing and the correspondence of dense clusters to protein complexes and biological processes. We also recognized the interplay of various interaction types, for example, the correspondence of pairs of clusters connected by SSL genetic interaction to protein complexes with mutually compensatory functions. Deeper understandings in the above areas certainly warrant future study. However, many currently under-explored territories remain.

The cell is a constantly changing environment, so are the molecules and processes within that environment. Network studies have been mostly limited to the union of interactions in all locations, times, and environments. Cellular state, such as position in the cell cycle, is a key determinant of concentrations and status of biomolecules and hence the interactions between them. Studies have begun to reveal the dynamic nature of biomolecular networks. For example, by overlaying mRNA expression patterns with the physical interaction network, Han et al. described two types of highly connected hubs – “party” hubs, which interact with most partners simultaneously, and “date” hubs, which connect to different partners at different times (Han et al. 2004). Expression data have also been combined with protein interaction data to reveal cell cycle-dependent protein complexes or arrangements of protein complexes (de Lichtenberg et al. 2005). Genome-scale molecular data across a wide variety of cellular states are so far only available for mRNA expression levels (Cho et al. 1998; Spellman et al. 1998). Availability of more biomolecular interaction data under various cellular conditions will allow us to study the additional complexity of cellular processes. Such research will require adding another dimension when modeling biomolecular networks and will likely also require new mathematical tools.

In addition to overall cellular state, intracellular environment also bears significance on cellular interactions. For example, proteins may be restricted to particular compartments, either defined by a membrane as are the nuclei or mitochondria, or in less-clearly delimited regions such as nucleoli. Such features can be viewed as vertex properties in the network and can therefore be modeled either using a hypergraph or by labeling vertices. Cellular localization data are already available at a genomic scale (Huh et al. 2003; Kumar et al. 2002) and should yield additional insights.

One major limitation of network analysis is data collection. Even with the rapid development of high-throughput genomic and proteomic experimentation, we are still far from being able to systematically characterize all biomolecular features and interactions. Quantifying them at high resolution in both space (across subcellular locations and across cell and tissue types) and time (e.g., through the course of organismal development) appears a more distant goal. More comprehensive network data will undoubtedly advance the study of the structure and function of biomolecular networks, and the knowledge can in turn aid experimental discovery. One intuitive example is to use inferences made from network study to apply limited resources to experiments that are more likely to yield discoveries. We are optimistic that experimental discovery and network analysis can form a positive feedback loop, with each reinforcing the other and together advancing our understanding of biomolecular networks.