Keywords
- Degree Distribution
- Preferential Attachment
- Network Motif
- Transcriptional Regulatory Network
- Biomolecular Interaction
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Definition of the Subject
Biological research over the past century or so has been dominated by reductionism – identifying and characterizing individual biomolecules – and has enjoyed enormous success. Throughout this history, however, it has become increasingly clear that an individual biomolecule can rarely account for a discrete biological function on its own. A biological process is almost always the result of a complex interplay of relationships among biomolecules (Alon 2003; Bray 2003; Hartwell et al. 1999; Hasty et al. 2002; Kitano 2002; Koonin et al. 2002; Oltvai and Barabasi 2002; Wall et al. 2004), and the treatment of these relationships as a graph is a natural and useful abstraction.
Broadly speaking, a biomolecular network is a graph representation of relationships (of which there are many types) among a group of biomolecules. Vertices or nodes represent biomolecules, including macromolecules such as genes, proteins, and RNAs, or small biomolecules like amino acids, sugars, and nucleic acids. In the next few sections, we focus mostly on the macromolecules. An edge or link between two vertices indicates a relationship between the corresponding biomolecules, which could include physical interaction, genetic interaction, or a regulatory relationship (e.g., the protein product of gene A regulates the expression of gene B). This abstraction, although simplifying, converts a complex web of biological relationships into a mathematical graph, from which we can study its structural features as well as their implications on biological functions.
Aside from biomolecular networks, other biological networks also bear significance in systems biology.
Examples include networks of protein domains and neuronal networks (Fanning and Anderson 1996; Nadvornik and Drozen 1964). Although many of the features described in this entry also apply, these networks are largely out of the scope of this entry.
Introduction
The study of small-scale biomolecular networks has a long history, stemming from “chemical graphs” introduced in the 1700s (Rouvray 1990), with knowledge about networks of metabolic reactions reaching a large scale by the 1970s.
A prominent early example may be found in the context of Monod and Jacob’s Nobel-winning research in the 1950s, which first illuminated a transcriptional regulation system, namely, control of gene expression in the lac operon (Jacob and Monod 1961; Monod et al. 1951; Monod and Jacob 1961). In their landmark work, Monod and Jacob described the operon model, in which the expression of structural genes is suppressed by the binding of the repressor to the operator gene, which in turn can be regulated by certain metabolites (Jacob and Monod 1961). Regulatory relationships among the repressor, the metabolite, the operator gene, and the structural genes in the operon model present one of the first examples of biomolecular networks (Fig. 1). In addition, Monod and Jacob also proposed several theoretical models, in the form of biochemical or enzymatic networks, as potential mechanisms of biochemical differentiation of cells with identical genomes (Monod and Jacob 1961).
Pioneering work on the lac operon also includes studying functional implications of bionetwork structure. Building upon Jacob and Monod’s work, the elegant experiments of Novick and Weiner (1957) showed interesting features of beta-galactosidase induction – in a single cell, the production of lactose-degrading enzymes is either switched on or off, and the cell’s on or off state is inherited by subsequent generations. A mixture of the two types of cells has the intermediate overall level of enzyme production observed in a normal cell culture. This all-or-none phenomenon can be explained by positive feedback in the regulatory network of beta-galactosidase synthesis, where the intracellular concentration of inducer is boosted by an intermediate mechanism, which is also induced by the inducer (Fig. 1). As we can see from Fig. 1, the very first studies on gene regulation also show how the structure of biomolecular networks, even the small-scale pathways known at that time, can illuminate regulation of protein production.
Molecular biology has entered an era in which the nature, function, and relationships of macromolecules can be studied at a much larger scale. High-throughput technology platforms such as genome sequencing, microarrays, and yeast two-hybrid screening have enabled biologists to examine how nearly all genes or proteins in a genome or proteome interact or relate to one another (Gietz et al. 1997; Lockhart et al. 1996; Schena et al. 1995; Smith et al. 1985, 1986). Collections of these genomic or proteomic data have led to networks of various relationship types – including protein-protein interaction, transcriptional regulation, and sequence homology – involving thousands of proteins or genes (Cho et al. 1998; Gavin et al. 2002; Ho et al. 2002; Ito et al. 2000; Krogan et al. 2006; Lee et al. 2002; Rual et al. 2005; Uetz et al. 2000). This has offered an unprecedented “bird’s-eye view” of how parts of the network, each studied intensively on a local scale, connect to one another. This perspective may be useful in uncovering unknown pieces of the puzzle that together form the intricate machinery of life responsible for the behavior of the cell or the organism. The extremely large scale of these networks also presents a tremendous challenge as it is practically impossible to study each vertex or each interaction individually. A simplified abstract representation of these networks and mathematical methods permits automated study of the structural features, to identify local and global trends that relate biological macromolecules to their functional implications. Early work on large-scale bionetworks focused on single-color networks – networks describing a single type of relationship – while multicolor networks, networks combining multiple relationship types, have gained increasing attention in recent years.
At an abstract level, a network of protein-protein interactions and a network of computers connected through the Internet have much in common. Both have simplified representations as mathematical graphs: vertices represent proteins in the former case and computers in the latter case, and edges between vertices denote physical interactions between proteins or communication wires between computers. Concurrent with the increasing availability of large-scale biological networks, theoretical research in complex networks is advancing rapidly, with significant progress in uncovering the structural features and organizing principles of complex technological and social networks (Albert and Barabasi 2002; Bornholdt and Schuster 2003; Dorogovtsev and Mendes 2003; Newman et al. 2001). In a truly interdisciplinary effort, these principles were soon applied to biological networks, leading to the observation that biological networks share many global architectural features with networks from other complex systems (Fell and Wagner 2000; Goldberg and Roth 2003; Sole et al. 2002; Wagner 2001; Watts and Strogatz 1998), including power-law connectivity, small-world topology, and hierarchy, each discussed further below. Such resemblance suggested that many biological networks might be governed by similar “design” principles, allowing methods proven in other large, non-biological complex systems to be used to characterize the intricate relationships between biomolecules.
While some reported global features of biomolecular networks, others sought local topological patterns recurrent in the network (Alon 2003; Lee et al. 2002; Milo et al. 2002; Shen-Orr et al. 2002). Network motifs are simple patterns of interconnection in networks that occur more frequently than expected in randomized networks (Milo et al. 2002; Shen-Orr et al. 2002) and have been proposed to represent basic building blocks of complex networks (Lee et al. 2002; Milo et al. 2002; Shen-Orr et al. 2002). Different types of networks exhibit different motif profiles, providing a means for network classification (Milo et al. 2004). Dobrin et al. later showed for the E. coli transcriptional network that subgraphs matching two types of transcriptional regulatory circuit motif – feed-forward and bi-fan motifs – overlap with one another and form large clusters which further coalesce into “superclusters” (Dobrin et al. 2004).
Although some circuits corresponding to motifs may be capable of functioning independently (Mangan et al. 2006), observation of Debrin et al., among many others, leads to the idea that instances of a network motif are often interdependent on one another as opposed to being building blocks with separable functions (Zhang et al. 2005).
Cellular function is carried out through multiple types of interaction, resulting in a complex interplay of biological relationships among the myriad of cellular components. To have a truly global picture of the cell as a complex system, one needs to relate multiple types of interactions to one another. Multicolor networks are integrated networks of many relationship types, with frequent overlap between relationships of different types. For example, physically interacting proteins have expression patterns that are slightly more similar than would be expected by chance (Ge et al. 2001; Jansen et al. 2002a); genes with correlated expression are more likely to be controlled by a common transcription factor (Yu et al. 2003); and some genetic interaction types have a weak tendency to occur between homologous genes (Tong et al. 2004) (see section “Single-Color Networks” for definitions of the various interaction types).
Topological study of single-color networks is paralleled in multicolor networks. Multicolor network motifs characterize relationships between different biological interaction types within local network neighborhoods (Taylor et al. 2007; Yeger-Lotem et al. 2004; Zhang et al. 2005). Furthermore, multicolor networks also exhibit network themes – recurring higher-order interconnection patterns that encompass multiple occurrences of network motifs and reflect a common organizational principle (Zhang et al. 2005). Network themes can be tied to specific biological phenomena that represent more fundamental “design principles,” for example, the tendency for proteins within the same physical complex to be regulated by a common transcription factor (Simonis et al. 2006; Zhang et al. 2005). This suggests that instead of representing network “building blocks,” network motifs should in some cases be viewed as signatures of more fundamental higher-order structures (Zhang et al. 2005). Network themes also provide a more natural simplification of the otherwise complex set of relationships in an integrated network (Zhang et al. 2005).
Studying the architectural features of biological networks conveys useful biological intuition and also allows us to make biological predictions. Even simple methods exploiting the predictive power of local topology have proven useful in predicting physical and genetic interactions (Bader et al. 2004; Giot 2003; Goldberg and Roth 2003; Wong et al. 2004). Richer network models are able to incorporate more information and hence promise better predictions.
A multicolor network can be viewed as a graph in which each edge carries labels denoting the relationships between the corresponding pair of vertices. Similarly, each vertex may carry multiple labels (e.g., denoting that a particular protein is expressed in the pancreas or that a gene is conserved in chimpanzee). A rich network is a network in which a vector describing potentially many vertex properties is associated with each vertex, and a vector describing potentially many edge properties is associated with each pair of vertices. By incorporating a more comprehensive set of information, these rich network models provide additional power in predicting features of genes or proteins, as well as the relationships between them. In the following sections, we will survey methodology, findings, and applications for biological network models of increasing complexity, starting with single-color biomolecular networks.
As described previously, study of large-scale biomolecular networks has been enabled by the development of a variety of genomic and proteomic technologies, yielding a variety of relationship types. Saccharomyces cerevisiae, the budding yeast, a relatively simple single-celled eukaryote, has been an important model system, and many network studies originate from this yeast. Many core functions in yeast are conserved in humans, and many network trends observed in yeast have proven true for other organisms (Rual et al. 2005; Stelzl et al. 2005), leading us to believe that relationships in different organisms share similar organizational principles.
Before delving into the details of biomolecular network modeling, we add a few notes of caution. First, even the most complex network models remain a vast oversimplification of the much richer and more complex reality of cellular networks. A single “vertex” can actually correspond to hundreds or even millions of physical protein molecules carrying a variety of posttranslation modifications, each diffusing independently throughout the cell. As important nuances relating to spatial and temporal dynamics are often lost, the network abstraction is only relevant to the extent that it proves useful. As the famous statistician George Box put it, “all models are wrong, some are useful” (Launer and Wilkinson 1979). Second, the method of network abstraction and the data source/experimental may introduce systematic errors. For example, the so-called “matrix” method of inferring protein interactions (which assumes that proteins interact if each binds directly or indirectly to a common “bait” protein) will show “overclustering” by definition. Alternatively, the “spoke” method, in which interactions are only inferred between a bait protein and each prey protein that it “pulls down,” will result in “underclustering” if only a small subset of proteins are used as baits (Bader and Hogue 2002; Gavin et al. 2002; Ho et al. 2002). Therefore, one should take extra caution in interpreting observed network features to separate artifacts from real biological phenomena by using appropriate controls. Last, in searching network structure for trends, one often needs to compare the actual biomolecular network with appropriate random networks to ensure that the features identified are not a trivial consequence of a simpler phenomenon. For example, in searching for network motifs, randomized networks with the same degree distribution (see section “Single-Color Networks”) can be used as a control to avoid discovering motifs that are a trivial consequence of the underlying degree distribution (King 2004; Milo et al. 2002; Zhang et al. 2005).
Single-Color Networks
Definition and Categorization
The most basic biomolecular network is a single-color network, which contains only one type of biological interaction or relationship. The nature of the biological interaction or relationship defines the type of network, as it also specifies the nature of biomolecules involved. For interactions with clearly defined directionality, the corresponding network can be represented mathematically as a directed graph, with vertices denoting biomolecules, and edges connecting pairs of vertices whose corresponding biomolecules have directional interactions. Edge direction can represent material flow, exemplified by a metabolite network in which vertices are metabolites, and edges are enzyme-catalyzed reactions. As many metabolic reactions are practically irreversible under physiological conditions, we can use directed edges to represent the direction of a reaction, pointing from substrate to product. Alternatively, one can represent biochemical reactions using a directed edge to connect two enzymes such that the first enzyme’s metabolic product is the second one’s substrate. We refer to networks of the latter type as metabolic enzyme networks, to distinguish them from metabolite networks. Direction of interaction can also be derived from the inferred direction of information flow. For example, in a signaling pathway, a directed edge points from one protein to another immediately downstream in signal transduction. Transcriptional regulation is another type of directed biomolecular interaction, and we can construct a network of transcriptional regulation from directed edges connecting transcription factors to the genes they regulate. In addition to transcriptional regulation, gene regulation in the cell also happens posttranscriptionally, for example, through microRNAs (or small RNAs) which base pair with mRNAs and influence their stability or translation. Researchers have constructed microRNA networks by connecting microRNAs with their target mRNAs based on experimental identification at a limited scale (Farh et al. 2005; Sood et al. 2006; Stark et al. 2005) or based on computational predictions (Rajewsky 2006; Tsang et al. 2007). Posttranslational regulation is another important part of the cellular regulatory process. One example of a posttranslational regulatory network is a protein modification network such as a protein phosphorylation network, in which each directed edge connects a kinase with its protein substrate or phosphorylation target (Ma’ayan et al. 2005; Ptacek et al. 2005).
Single-color networks that capture symmetric relationships can be represented by undirected graphs, with each vertex denoting a biomolecule, and an edge connecting vertices whose corresponding biomolecules have a symmetric relationship. An important example is a protein network of physical interactions, in which edges connect proteins that physically associate with each other with sufficiently small dissociation constants. Such association may be via direct physical contact or indirect via one or more intermediates.
Genetic interaction is another important type of biomolecular relationship in studying cellular functions. It describes pairs of genes for which the phenotype of the double mutant departs from expected under some null model of independence, e.g., gene pairs for which the fitness of the double mutant deviates from the product of the fitness values for each of the two single mutants. Synthetic sick genetic interactions (synthetic lethality in the extreme case) correspond to gene pairs for which double mutant fitness is lower than expected. These synthetic sick or lethal (SSL) interactions can be represented as an undirected network. SSL interactions of ten correspond to genes which have redundant or compensatory function. They are also slightly more likely to encode proteins that interact physically (Tong et al. 2004). If the double mutant phenotype for a pair of genes is less extreme than expected, the corresponding gene pair is said to have an antagonistic (or alleviating) genetic interaction. Although we can use undirected edges for alleviating interactions, in some cases additional information can be conveyed using directed edges. For example, if phenotypes arising from mutations in genes A and B differ and the double mutant phenotype most closely approximates gene A, we may draw a directed edge from gene A to gene B. With the aid of other prior biological knowledge, such directed edges can be useful in assigning order of action in biological pathways (Avery and Wasserman 1992; Juvan et al. 2005). There exist many other types of genetic interaction (Drees et al. 2005; St Onge et al. 2007).
Another relationship type is sequence homology (describing two genes with similar sequences that each descended from a common ancestor). Other biomolecular relationships may be derived from shared biomolecular properties, for example, the relationship of similar mRNA expression profile (Wen et al. 1998), similar phenotype (Giaever 2002; Gunsalus et al. 2005), or similar gene function (Lee et al. 2004). Because there are many different properties of biomolecules, many “similar-property” networks are possible. Above, we sampled types of biomolecular interactions or relationships used in network studies to date. However, this is not meant to be an exhaustive list, as there are many more relationship types, some unknown to us and some for which large-scale data is not yet available. As more large-scale experimental methods are developed and applied to biological research, more biomolecular relationships can be represented as networks.
A few special types of graphs have particular relevance to the study of large-scale biomolecular networks. These include bipartite and quasi-bipartite graphs. A bipartite graph is a graph with two disjoint vertex sets, such that no edges connect pairs of vertices within the same set. For example, a bipartite graph might consist of a set of gene vertices and a set of protein vertices, together with edges that each corresponds to a relationship described by “gene X encodes protein Y.” If one of the two vertex sets contains within-set edges while the other does not, the graph is called quasi-bipartite. Certain biomolecular networks are intrinsically quasi-bipartite. For example, transcriptional regulatory networks contain edges pointing from genes encoding transcription factors to the regulated genes. Thus, vertices may be divided into transcription factors and non-transcription factors. While the latter vertex set, by definition, does not contain any within-set links, a transcription factor may regulate the expression of another transcription factor, so that the network is quasi-bipartite. Quasi-bipartite graphs can also arise as a simple consequence of experimental design. Protein interaction networks derived from yeast two-hybrid and genetic interactions derived from systematic genetic array (SGA) analysis are two examples, where only a subset of genes have been used as “baits” or “queries.” In network analyses, care must be taken to avoid discoveries that are trivial consequences of bipartite or quasi-bipartite properties arising from the experimental design.
Tools and Methodology
Studying global structural features of a network involves many different measures of graph properties (Diestel 2005a). Here we survey a few frequently used measures below.
Degree (or connectivity) is the most basic characteristic of a vertex. The degree of vertex V, k(V), is defined as the number of edges incident on vertex V and tells us how connected V is with other vertices in the network. In the case of directed networks, there is an outgoing degree (or out-degree) k out (V), denoting the number of edges that start from V, and an incoming degree (or in-degree) k in (V), denoting the number of edges that end at V.
The clustering coefficient, C(V), measures the connectivity among the neighbors of a vertex V and can be defined as the number of edges among V’s neighbors, divided by the total possible number of edges if all the neighbors were connected to one another. The average clustering coefficient of a network characterizes the general tendency of vertices to form clusters (Watts and Strogatz 1998).
In a network, distance between two vertices U and V, d(U, V), is measured as the number of edges in the shortest path from U to V. In an undirected graph, d(U, V) satisfies non-negativity, symmetry, and the triangle inequality of a distance metric. In the case of directed graph, however, all edges in a path must be in the same direction. While non-negativity still holds, the distance d(U, V) from U to V could be different from the distance d(V, U) from V to U so that symmetry is not necessarily satisfied nor is the triangle inequality. Hence, strictly speaking, d(U, V) is not a distance metric in directed graphs. The overall connectedness of a network can be quantified by its characteristic path length, defined as the average distance between pairs of vertices in the network.
A subgraph of a graph G is a graph whose vertex and edge sets are subsets of those of G. A component is a subgraph such that all vertices are connected to one another by some path, and there are no connections to vertices outside of that subgraph. If a component contains a majority of a graph’s vertices, it is referred to as the giant component. Other subgraphs of interest are cliques, a set of completely connected vertices. Such dense subgraphs bear particular relevance to protein complexes, which often present themselves as cliques, or dense but incompletely connected subgraphs, in a protein network of physical interaction. One can “seed” the search for such dense subgraphs in the network (as a proxy search for protein complexes) using k-cores – subgraphs such that each vertex has at least k neighbors within that subgraph (Bader and Hogue 2002).
Global Structural Features
Topological features of biomolecular networks have been subjected to extensive study (reviewed in Barabasi and Oltvai 2004a). Below we describe some features of particular relevance to network structure and function.
Power-Law Networks
In a biomolecular network, some vertices are better connected than others. To characterize how connectivity varies from vertex to vertex, one can calculate the degree distribution – i.e., the distribution of P(k), the fraction of vertices with degree k. The Erdös-Rényi (ER) model describes random networks in which each vertex pair is assigned an edge randomly with a probability that is fixed for all vertex pairs. In other words, each edge is assigned or not assigned according to the outcome of a “weighted coin flip” (also known as a Bernoulli trial). For such ER networks, P(k) follows a binomial distribution but can often be approximated well and conveniently by a Poisson distribution. The degree distributions of biomolecular networks (and many other complex networks such as food webs and social networks) tend to fall off at high degree much more slowly than a Poisson and often follow a power-law distribution \( P(k)\sim k{}^{-\lambda } \) (Fig. 2) (Barabasi and Albert 1999). The degree exponent λ in most biomolecular networks seems to have a value between 2 and 3 (Barabasi and Albert 1999). The most intuitive (and perhaps most useful) feature of this power-law distribution is that most vertices in the network have only a few edges, while a few vertices have many edges. These highly connected vertices are called hubs, which are sometimes described to “hold the network together” since removal of a hub has a particularly high tendency to break the network up into separated components. Networks with a power-law degree distribution (or sometimes networks that only follow a power law in the high-degree tail of the distribution) are called scale-free networks. The term scale-free is meant to indicate that there is not a typical vertex in the network that can characterize the rest of the vertices. For some random networks, many vertices have degrees equal or close to the average degree; however, it should be noted that ER networks with a low edge probability can be easily confused with power-law networks (Han et al. 2005). Arguments have also been made that scale-rich is a more appropriate description of graphs with a power-law degree distribution (Tanaka 2005). For clarity, we prefer the term power-law network.
Networks of metabolic reactions were the first biomolecular networks to be characterized as power law (Jeong et al. 2000; Wagner and Fell 2001). Jeong et al. analyzed the metabolite networks of 43 organisms across all three domains of life – bacteria, archaea, and eukaryotes – and discovered that despite significant variation in the components and pathways, all metabolite networks have similar topological scaling properties (Jeong et al. 2000). As many biochemical reactions are irreversible under physiological conditions, metabolite networks should be represented as directed graphs. The distribution of both in-degrees and out-degrees follows power-law distributions, i.e., \( P\left({k}_{\mathrm{in}}\right)\sim {k}_{\mathrm{in}}^{-{\lambda}_{\mathrm{in}}} \), \( P\left({k}_{\mathrm{out}}\right)\sim {k}_{\mathrm{out}}^{-{\lambda}_{\mathrm{out}}} \), and the values of both λin and λout lie within a narrow band of 2.0–2.4 (Jeong et al. 2000). This demonstrated that a power-law degree distribution is a common feature of cellular metabolism, with most substrates involved in one or two reactions, while a few molecules, such as coenzyme A, function as metabolic hubs by participating in several dozen reactions.
Other biomolecular networks have also been described as power law, including networks of physical interactions between proteins, transcriptional regulatory networks, and networks of SSL genetic interactions. The protein interaction map of the yeast Saccharomyces cerevisiae as measured by systematic two-hybrid screens (Ito 2001; Uetz et al. 2000) exhibits a power-law degree distribution, as do physical interaction maps of multicellular organisms – the worm C. elegans, the fly D. melanogaster, and the human (Giot 2003; Jeong et al. 2001; Li 2004; Wagner 2001; Yook et al. 2004). The degree distribution of correlated expression networks based on microarray data has been shown to follow power law (Agrawal 2002; Featherstone and Broadie 2002). The outgoing connections from transcription factors to target genes also exhibit power-law behavior (Fig. 2b; Guelzim et al. 2002; Lee et al. 2002; Thieffry et al. 1998). The incoming degree distributions in these networks have been found instead to follow an exponential decay according to manually curated data (Fig. 2a; Guelzim et al. 2002; Thieffry et al. 1998). However, analysis of more recent large-scale data (Harbison et al. 2004; Lee et al. 2002; MacIsaac et al. 2006) suggests that power-law distribution may provide a slightly better fit (Fig. 2c, d). The resolution of this awaits further refined data and analysis, but it is clear that the distribution of in-degrees and out-degrees in a directed graph may behave differently. Similar to directed networks, the degree distribution of quasi-bipartite networks should also be examined separately for the different vertex sets.
Closer examination of several biomolecular networks showed that their degree distributions exhibit better fit to a truncated power law than a power-law model (Amaral et al. 2000; Rual et al. 2005). A truncated power-law distribution follows a power-law distribution with an “exponential envelope,” such that power-law behavior dominates the distribution at low degrees, while the exponential function dominates the distribution at the high-degree tail of the distribution. Truncated power-law behavior may arise from constraints on the number of interaction partners for high-degree vertices (Amaral et al. 2000). However, biological implications of the differences between these degree distributions remain to be demonstrated in most cases. It is nevertheless clear that biomolecular networks share the common feature that a few vertices have many edges, while most vertices have only a few edges.
The ubiquity of power law and presence of hubs in biomolecular networks prompt us to ask what advantage such network architecture might offer. Some argue that the answer lies in robustness. Intuitively, random removal of a considerable number of vertices could disintegrate a network, breaking it down into small, disconnected groups of vertices. This is true for Erdös-Rényi random networks (Barabasi and Oltvai 2004a; Strogatz 2001). Power-law networks, such as the electrical power grid, are extremely resilient to accidental vertex failures. Even if 80 % of the vertices are randomly removed, the rest often still form a connected component. This is confirmed by both numerical and theoretical analyses examining how removing vertices of varying degrees impacts the average path length and size of the giant component (Albert et al. 2000; Broder 2000; Callaway et al. 2000; Cohen et al. 2000). As most vertices in power-law networks have small degrees, any vertex that fails is most likely one of them, and hence the damage will be negligible. However, such robustness is accompanied by the weakness that deliberate attacks on hubs are much more likely to create disconnected components and hence particularly devastating. Some believe this notion may apply beyond power grids to biological networks, and we discuss this possibility further in section “Applications.”
A related question is the evolutionary origin of power-law networks. Building upon Simon’s early work (Bornholdt and Ebel 2001; Simon 1955), Barabasi and Jeong proposed a growth and preferential attachment model (Barabasi and Albert 1999). Under this model, the network must emerge from gradual addition of vertices and edges. Preferential attachment specifies that the newly added vertices connect to the rest of the network by preferentially attaching themselves to highly connected vertices (e.g., with a probability proportional to the degree of the target vertex). This rich-get-richer mechanism offers an explanation for hubs. In protein networks, growth and preferential attachment may result from gene duplication, which produces initially identical proteins with the same interaction partners. The more interactions possessed by a protein, the more likely it is to acquire a new interaction through duplication of an interaction partner, resulting in the rich-get-richer phenomenon associated with preferential attachment.
Neighborhood Clustering
Neighborhood clustering describes a network characteristic that resembles the phenomenon of neighbors of my neighbors tend to be neighbors of one another. Above, we defined a clustering coefficient to quantify this local property of a graph. The average clustering coefficient of all vertices in the graph characterizes the global tendency toward clustering in the network (Watts and Strogatz 1998). Several biomolecular networks, for example, the metabolic network (Ravasz et al. 2002) and physical interaction networks, have a significantly higher average clustering coefficient than random networks of equivalent size and degree distribution, suggesting modular organization in addition to power-law degree distribution. A partially determined human interactome network, however, was found to be less clustered than random networks with the same degree distribution (Rual et al. 2005). This could be because only a subset of edges in the complete network was sampled, or it might suggest “clustering” of protein-protein interaction networks in other organisms has been artificially boosted by previous experimental methods (Rual et al. 2005). It is also possible that the human interactome network indeed is less clustered than other organisms. Further experimentation is required to resolve this apparent discrepancy.
Small World
Another important feature of biomolecular networks is that they are frequently small-world. A small-world network is defined by two properties: (1) short characteristic path length, meaning that the length of the shortest path between any pair of vertices tends to be very small (this is sometimes confusingly called “the small-world property,” despite there being two properties of a small-world network), and (2) high clustering coefficient, i.e., densely connected local neighborhoods as described above (Watts and Strogatz 1998). Random networks following the ER model satisfy the first requirement but not the second, whereas some regular lattices possess the second property but not the first. Both metabolite networks and physical interaction networks have been shown to be small-world (Fell and Wagner 2000; Sole et al. 2002; Wagner and Fell 2001), Short characteristic path length in small-world communication networks allows efficient signaling between vertices, in that local perturbations can propagate rapidly to the entire network. Small-worldness seems to be evolutionarily conserved, as the metabolite network of a parasitic bacterium exhibits roughly the same average path length as observed for a large multicellular organism (Jeong et al. 2000; Wagner and Fell 2001).
Disassortativity
In some networks, highly connected vertices tend to link to vertices with few edges, avoiding connections to other hubs. This is the disassortative property. Disassortativity is shared by both metabolic networks and protein-protein interaction networks, as well as technological networks such as the World Wide Web, but not in social networks which are assortative (Maslov and Sneppen 2002; Newman 2002; Pastor-Satorras et al. 2001). The avoidance of hub-hub connections may come from the compartmentalization of many cellular processes (Hartwell et al. 1999), although other explanations have been proposed. For example, disassortativity may provide some protection against deliberate attacks to the hub vertices. As the robustness of power-law networks comes with an increased vulnerability to deliberate attacks against hubs, suppression of hub-hub connection has been suggested to alleviate or slow down propagation of such deleterious perturbations (Maslov and Sneppen 2002). While it is difficult to know how applicable the communication network analogy is to biological networks, these ideas are certainly thought provoking.
Network Motifs
The average clustering coefficient discussed above can be expressed in terms of the number of triangles in a network relative to three times the number of paths of length two (Newman et al. 2001). It is then interesting to ask whether there are other local patterns of connectivity that are overrepresented within biological networks. Such patterns are called network motifs. Demonstrating enrichment of a given connectivity pattern requires counting the number of subgraphs in the network matching the pattern and comparing the count to that observed in many randomized versions of the same network (Milo et al. 2002).
Milo et al. initially proposed that network motifs represent elementary building blocks of cellular networks. The transcription regulatory networks in E. coli and S. cerevisiae both exhibit enrichment for the feed-forward loop and the bi-fan loop (Fig. 3), which can be attributed to specific tasks in information processing (Milo et al. 2002; Shen-Orr et al. 2002). In particular, the feed-forward motif (Fig. 3) involves two transcription factors, one regulating the other and both controlling the expression of a third target gene. The feed-forward motif can serve as an AND gate that corresponds to a low-pass filter which suppresses out transient signals while preserving persistent signals, as is the case in the araBAD operon in the arabinose feed-forward loop (Schleif 2000).
Different types of networks exhibit distinct sets of motifs. For example, the food web exhibits a “three-chain” motif and a bi-parallel motif, in contrast to the feed-forward motif and bi-fan motif in transcriptional regulatory networks (Fig. 3; Milo et al. 2002). Thus, network motifs provide insight into common mechanisms of interaction within a network. The abundance profile of network motifs for a given network serves as a fingerprint for classifying networks into different superfamilies (Milo et al. 2004).
A given network motif may have many instances within a network. The relationship of network motif instances to one another poses an interesting question. Observations made in the E. coli transcriptional regulatory network show that instances of both the feed-forward motif and the bi-fan motif do not exist in isolation. Instead, they tend to overlap with one another and form large clusters, which then coalesce into superclusters (Dobrin et al. 2004). This suggests hierarchical organization of biomolecular networks and that motifs should not necessarily be viewed as elementary building blocks of complex networks but might instead be thought of as signatures of higher-order structures corresponding to distinct cellular processes. The fact that many motif clusters overlap with known biological functions (Dobrin et al. 2004) interferes between motif instances. We further discuss this notion below.
Modularity and Hierarchy
Clustering of network motifs suggested presence of higher-order structures in biomolecular networks (Dobrin et al. 2004). The search for enrichment of larger patterns (involving more than four vertices) in large networks is computationally challenging due to the exponential computational complexity associated with enumerating increasingly large subgraphs in biomolecular networks. An alternative approach to brute-force computation or sampling methods is to identify densely connected clusters or network modules. Various clustering methods have been used to characterize modules using network topology (Girvan and Newman 2002; Holme et al. 2003; Ravasz et al. 2002; Rives and Galitski 2003; Schuster et al. 2002; Snel et al. 2002; Spirin and Mirny 2003). Note that the ideal module is a collection of gene products that work together as a unit, which may plug in to perform its molecular function in different contexts. Densely connected clusters may be useful in identifying modules, but we should separate the ideal concept from its surrogate measure (Hartwell et al. 1999).
The concept of relatively isolated modules may seem contradictory to the power-law characteristic of biomolecular networks. Both naturally exist (Ravasz and Barabasi 2003; Ravasz et al. 2002), however indicating network modules do not exist in isolation but combine at various levels to form a hierarchical network. The hierarchical structure of biomolecular networks poses a challenge in identifying modules, as there are no clear boundaries between them. However, understanding networks’ hierarchical organization can also aid the module identification, at least in the transcriptional regulation network. For example, the transcriptional regulatory networks of E. coli and S. cerevisiae both exhibit a multilayer hierarchical structure (Ma et al. 2004; Yu and Gerstein 2006), and removing global regulators in the top layers of the hierarchy decomposes the network into functionally relevant modules (Ma et al. 2004). Recent work also identified “origons” or “organizer” structures within the transcriptional regulatory network that originate at a distinct class of sensor transcription factors and serve as functional modules for various environmental perturbations (Balazsi et al. 2005; Farkas et al. 2006). Combining topological information with other genomic or proteomic data can lead to more biologically relevant modules (Bader and Hogue 2002; Bar-Joseph 2003; Ihmels 2002; Jansen et al. 2003; Juvan et al. 2005; Stuart et al. 2003; Tornow and Mewes 2003). We describe this further in the next section.
Applications
Structural features of biomolecular networks can be used to infer organizing principles of biomolecular interactions and relationships. The power-law nature suggests robustness of biomolecular networks against random attacks, and models of network designation evolution – e.g., growth and preferential attachment – serve as additional support to the gene duplication model of protein network growth. The small-world designation suggests the potential for rapid communication within biomolecular networks, with local perturbations reaching the entire network via a small number of links while also reflecting clustering due to modularity, e.g., as a result of protein complexes. Network motifs are local interconnection patterns that are often footprints of higher-order structure (i.e., functional units composed of more vertices than are considered in network motifs). Different types of biomolecular network have distinctive profiles of the relative abundance of network motifs, so that network motifs can also be used to characterize and compare complex networks.
More practically, structural characteristics of biomolecular networks can be used to make biological inferences. Experimentally derived networks are inevitably susceptible to random as well as systematic errors, and characteristics of network structure can be used to assess confidence of edges in biomolecular networks and also make novel predictions. Goldberg et al. examined a list of S. cerevisiae protein-protein interactions and were able to stratify this list into many subsets of varying accuracy by exploiting the neighborhood clustering feature of small-world networks (Goldberg and Roth 2003). Other methods using network topology also proved useful in predicting protein interaction networks of other organisms (Giot 2003), for example, by exploiting conservation of network motifs (Albert and Albert 2004).
High clustering in biomolecular networks often represents physical organization of biomolecules, particularly in the case of physical protein interaction maps. Therefore, identifying densely connected subgraphs (clusters) can help detect protein complexes (Bader and Hogue 2003; King 2004; Rives and Galitski 2003; Spirin and Mirny 2003), In addition, topological modules are believed to represent specific biological processes or functions (Bader and Hogue 2002; Bar-Joseph 2003; Ihmels 2002; Jansen et al. 2003; Stuart et al. 2003; Tornow and Mewes 2003). Hence, identifying modules can also help assign biological processes or functions to genes in a network.
Features of vertices can also be inferred from network topology. As discussed above, power-law networks are robust with regard to random vertex failures, but highly sensitive to targeted attacks against hubs. Whether this notion of robustness applies beyond electrical power grids to biological networks is subject to debate. In protein networks that include co-complexed relationships, for example, many proteins are hubs by virtue of their membership in protein complexes (appearing as cliques). Removal of such hubs is not highly disruptive in terms of generating disconnected components, and yet there are many essential complexes for which the removal of a single protein member disrupts the physical assembly of the whole complex and results in death for the organism.
Nevertheless, genome-wide deletion analysis in S. cerevisiae demonstrated a correlation between degree and essentiality: only ~10 % of the genes encoding proteins with degrees less than 5 are essential, as compared to ~60 % for genes encoding proteins with 15 or more interactions (Giaever 2002; Jeong et al. 2001; Winzeler 1999). Thus, protein connectivity can to some extent predict the phenotype of its corresponding gene. Recent work on metabolic networks also demonstrated that certain topology-based vertex characteristics can accurately predict knockout phenotypes of the corresponding enzyme (Wunderlich and Mirny 2006).
In the case of SSL genetic interaction networks, highly connected genes are often more critical to fitness than genes of less connectivity as they “buffer” a large number of genes to maintain the cell’s survival (Tong et al. 2004). Upon disruption of a hub gene in the SSL network, mutations in an increased number of genes will result in fitness defect. It has been hypothesized that hub genes could be useful targets for anticancer drugs to specifically kill cancer cells as they often carry many additional mutations, while normal cells can be protected due to genetic buffering by the hub gene (Tong et al. 2004).
Examples of structural features and applications discussed thus far have focused on networks composed of a single type of relationship. However, as there are many biological interaction or relationship types, it is worthwhile to combine information from various biomolecular networks. In the next section, we discuss multicolor networks that represent more than one biological interaction/relationship simultaneously.
Multicolor Networks
Definition and Categorization
Previously, we described various types of single-color networks, each containing edges that correspond to a single type of relationship between biomolecules. Different types of interactions, however, do not exist independently. Instead, they are interrelated with one another and offer different perspectives on the underlying biological mechanisms. It is therefore useful to consider multiple biomolecular relationships simultaneously by integrating them into one multicolor network (Fig. 4). Each edge type can be represented by a color that indicates the type of biological interaction or relationship. Here we focus on multicolor networks where each vertex corresponds to a gene and its corresponding protein(s). As a multicolor network can be viewed as an overlay of multiple single-color networks, the structural features of single-color networks discussed previously still hold true for the individual component single-color networks, but additional properties of multicolor networks emerge.
Features and Implications
Correlation of Edge “Colors”
The first step in studying multicolor networks is to examine how various edge types correlate to one another. Various statistical methods can be used to test significance of correlation. Without considering network topology, one can treat pairs of vertices in the network as independent data points and assess significance of overlap between two edge types analytically under the null hypothesis of the hypergeometric distribution. Network topology, however, can have significant impact on edge correlation. For example, as most single-color biomolecular networks contain highly connected hubs, if many of the hubs in one network are hubs in another network, the overlay of the two single-color networks will, as a result, have considerable overlap of the two interaction type. To test edge correlation more rigorously in the context of network topology, we can generate randomized versions of the multicolor network by rewiring edges in each component single-color network independently while preserving certain topological features, for example, degree distribution (Yeger-Lotem et al. 2004; Zhang et al. 2005). We can then determine the significance of correlation by calculating the fraction of random networks that contain more two-color edges than the real network. This approach can also be generalized to assess significance of overlap of more than two edge colors.
Several studies have noted the correlation between different types of biological interactions. For example, interacting proteins are more likely to have similar expression pattern (Ge et al. 2001; Jansen et al. 2002a); genes with correlated expression are more likely to be controlled by a common transcription factor (Yu et al. 2003); and SSL genetic interactions are more likely to occur between homologous genes (Tong et al. 2004). Although each of these correlations are statistically significant, no single pair of interaction types shows perfect overlap (and indeed the level of correlation is low in each example). Thus, there is dependence among many of the observed types of biomolecular interactions, and yet each type offers a distinct view of the same complex biological system.
“Multicolor” Network Motifs and Themes
The network motif concept, described previously for single-color networks, is extensible to a multicolor network. Multicolor network motifs characterize relationships between different biological interaction types within local network neighborhoods. Yeger-Lotem et al. examined network motifs in combined cellular networks of two interaction types – transcriptional regulation and protein interaction (Yeger-Lotem et al. 2004). They identified three multicolor motifs of three genes – two interacting transcription factors that co-regulate a third gene, two interacting proteins co-regulated by the same transcription factor, and a “mixed feedback loop,” where a given transcription factor and the product of its target gene form a protein interaction with each other. More multicolor motifs were characterized in an integrated network of five interaction types – transcription regulation, physical interaction, SSL genetic interaction, sequence homology, and correlated mRNA expression (Fig. 5; Zhang et al. 2005). These motifs can be explained in terms of high-order network structures or network themes (Fig. 5), which are representative of the underlying biological phenomena. One interesting example is motifs containing two vertices linked by protein interaction or correlated expression, such that both vertices connect to a third vertex either by SSL interaction or by sequence homology. This suggests the theme that a protein and a distinct protein complex/biological process have compensatory functions, resulting in SSL relationships between the protein and any component essential to the compensatory complex/process. Many four-vertex motifs that correspond to the compensatory complex theme (Fig. 5) further support this notion. These four-vertex motifs also suggested that some instances of the “compensatory protein complex/process” theme could simply be signatures of compensatory pairs of complexes or processes, rather than a single protein acting alone to compensate for loss of another process. An excellent example of the compensatory complex theme is the pairing of endoplasmic reticulum (ER) protein-translocation subcomplex (Hanein et al. 1996) and the Gim complex (Geissler et al. 1998), with each complex connected to the other by many SSL interactions (Tong et al. 2004; Zhang et al. 2005).
Thematic Maps
In the previous section, we described the idea that network motifs may in general be footprints of larger network structures. Organization of some multicolor network motifs into larger structures of network themes further supported this notion and also confirmed that the concept of modularity discovered in single-color networks is extensible to networks of multiple interaction types. As such modules in a multicolor network can exist in various sizes and exhibit a combinatorial number of different connection patterns, systematically identifying them by prevalence of network motifs of a fixed size can be impracticable. Network theme representations, in which a simplified network is drawn in terms of specific network themes (Zhang et al. 2005), is a convenient way of examining the system. In the example of the compensatory complex theme, we can collapse members of the same complex into one single vertex and draw links between complexes with significantly enriched inter-complex SSL interactions (Zhang et al. 2005). The resulting thematic map of compensatory complexes (Fig. 6) serves as a guide to the “redundant systems” within the integrated S. cerevisiae network (Zhang et al. 2005). Similarly, one can create maps of “regulonic complexes” – complexes for which the corresponding genes are co-regulated – by linking each transcription factor to protein complexes that are significantly enriched for genes targeted by that transcription factor, providing a schematic view of the transcriptional circuitry in the cell (Simonis et al. 2004, 2006; Zhang et al. 2005).
Applications
Multicolor networks integrate several biomolecular interaction types and contain information that is richer than the component single-color networks taken individually. Hence, they can be used to make more accurate biological inference. Correlation between biological interaction types alone can be useful, especially when a large number of interaction types are combined together. A Bayesian approach achieved considerable success in predicting physical interactions between proteins (Jansen et al. 2003). To capture the additional information from interdependency of interaction types, one can examine the conditional probability of a physical interaction given all possible “multicolor relationships” (each corresponding to a particular configuration of single-color relationships) (Jansen et al. 2003; Milo et al. 2002). However, such an approach scales poorly as the number of interaction types grows (the number of multicolor relationships grows as 2N, where N is the number of interaction types, so that there will be increasingly many multicolor relationships with no training examples). More sophisticated methods, such as logistic regression (Bader et al. 2004; Giot 2003), decision trees, or random forests, can model the interdependency between interaction types in a more scalable manner, allowing the combination of a large number of biomolecular relationships.
Studies using such methods can significantly improve the quality of predictions (Jansen et al. 2002b, 2003; Qi et al. 2005; Wong et al. 2004; Zhang et al. 2004).
Multicolor network motifs put correlations of interaction types in the context of local topology and provide potential to further improve biological inference. Specifically, one can predict a certain type of link between a given pair of vertices if its addition completes a structure matching an enriched network motif. For example, two genes with a common SSL interaction partner may have an increased probability of protein interaction, as addition of a physical interaction edge results in a match of a motif depicted in Fig. 5 (G1). Similarly, a SSL edge between two genes can complete a match to the same motif, if the two genes are indirectly connected by physical interaction followed by SSL. This “two-hop” relationship of physical interaction and SSL proves to be a strong predictor of SSL interactions (Wong et al. 2004).
More broadly, we can also predict an interaction if its addition fits into a recurring network theme. For instance, there are many more SSL interactions between the ER-translocation complex and the Gim complex than would be expected by chance given the out-degrees of both complexes in the SSL network. However, no SSL interaction has been observed between Sec62 or Sec63, two members of the ER protein-translocation subcomplex, and any protein in the Gim complex because Sec62 and Sec63 were not used as queries in the SGA analysis (Tong et al. 2004). We can therefore hypothesize that Sec62 or Sec63 is SSL with many members of the Gim complex (Zhang et al. 2005).
Network themes represent modular organization of the network at the functional level and can be used to predict the function or biological process in which a gene is involved. The rationale is that if a gene is part of a module where a particular function is enriched, then there is increased probability that this gene also holds the same function. For example, in the feed-forward theme depicted in Fig. 5, most of the genes regulated by both Mcm1 and Swi4 are involved in the control or execution of the cell cycle. Therefore, we can hypothesize that Yor315w, a protein about which little is known, is also involved in cell cycle regulation. Indeed, this protein is a target of cyclin-dependent kinase Cdk1, a primary regulator of the cell cycle. Predictions based on network themes may be robust to errors in the input data, since they depend on connectivity patterns in extended network neighborhoods instead of one or very few links.
It is worth emphasizing that predictions based on network motifs and themes are more suggestive than definitive. Above, we focused on organizational principles that can be utilized for biological inference. Such inferences should be combined with prior information from as many sources as possible and should be tested experimentally. In the next sections, we will touch upon rich network models that can be used for graphical data integration. We also refer readers to entry “Biological Data Integration and Model Building” in this encyclopedia for more general approaches to data integration.
Rich Network Models
Networks of Labeled Vertices and Edges
Just as the relationships between biomolecules come in many colors, the biomolecules themselves (the vertices) also exhibit a range of properties. These include intracellular localization, mRNA expression level, cellular function, and protein complex membership, to name just a few. A network labeled vertices and edges captures multiple properties of biomolecules as well as relationships among them. Extending the concept of multicolor networks, one can build a rich network model using a network with vector-labeled vertices and edges. Each vertex or edge can be labeled with a vector, such that each variable in the vector represents some vertex property or edge type. The rich information captured by such a model can be used to infer types of interactions or characteristics of vertices (Getoor et al. 2004).
Hypergraphs
Previously, we mentioned that sharing of the same or similar vertex property can be viewed as a type of biomolecular relationship and hence represented using an edge. A hypergraph can be used to more explicitly represent a shared property of multiple vertices. A hypergraph is a generalization of a graph where edges can connect any number of vertices, instead of only two as in conventional graphs, and a hypergraph edge (or “hyperedge”) can be used to connect all vertices sharing the same property. When further generalized to contain multiple types of edges, hypergraphs provide an alternative to networks of labeled vertices and edges described earlier. Hypergraphs have been used, for example, to model single-color networks of protein domains (Freudenberg et al. 2002). However, it remains to be seen whether this provides a more useful abstraction than a rich network representation in which vertices and edges are each labeled with vectors.
Weighted Graphs and Probabilistic Networks
Network models discussed thus far have mostly been discrete. For example, an edge in each single-color network either exists or does not exist. In the rich network models we discussed, variables used to describe a vertex or edge characteristic can be binary or categorical variables. However, some properties of a vertex may have continuous values, for example, a gene product may be described by its corresponding mRNA expression level. An edge property may also be continuous, e.g., affinity of a physical interaction or coefficient of correlation between mRNA abundance profiles. Such quantitative information can also be incorporated in a rich network model, simply by including continuously valued variables within the vertex or edge vectors.
One special case of a weighted graph is a probabilistic network, in which the weight on each edge represents a posterior probability that the edge is real, given the evidence supporting that edge. Probabilistic networks are particularly useful when knowledge of relationships is uncertain. This is often the case for biological data, sometimes due to inaccuracy of measurement or varying levels of confidence in a computationally inferred relationship (Asthana et al. 2004; Zhang et al. 2004). Probabilistic networks have been used to predict novel members of partially known protein complexes, for example, by sampling from networks of physical interaction from multiple data sources based on their corresponding confidence levels (Asthana et al. 2004; Bader 2003). Algorithms have also been developed to identify protein complexes by finding dense subgraphs within a weighted graph (Li et al. 2007).
Future Directions
Research on the structure and function of biomolecular networks has covered a lot of ground in the past decade. So far, we have studied global topological features such as power-law degree distributions, small-worldness, disassortativity, and associated biological implications such as robustness and dense neighborhood clustering. We have also discussed the hierarchical organization of networks, with network “structures” ranging from network motifs to dense clusters and to pairs of clusters, densely connected within each cluster by one “color” of interaction and connected to one another by interactions of a different color. We have also explored the roles of these structures in cellular functions, for example, the involvement of certain network motifs in information processing and the correspondence of dense clusters to protein complexes and biological processes. We also recognized the interplay of various interaction types, for example, the correspondence of pairs of clusters connected by SSL genetic interaction to protein complexes with mutually compensatory functions. Deeper understandings in the above areas certainly warrant future study. However, many currently under-explored territories remain.
The cell is a constantly changing environment, so are the molecules and processes within that environment. Network studies have been mostly limited to the union of interactions in all locations, times, and environments. Cellular state, such as position in the cell cycle, is a key determinant of concentrations and status of biomolecules and hence the interactions between them. Studies have begun to reveal the dynamic nature of biomolecular networks. For example, by overlaying mRNA expression patterns with the physical interaction network, Han et al. described two types of highly connected hubs – “party” hubs, which interact with most partners simultaneously, and “date” hubs, which connect to different partners at different times (Han et al. 2004). Expression data have also been combined with protein interaction data to reveal cell cycle-dependent protein complexes or arrangements of protein complexes (de Lichtenberg et al. 2005). Genome-scale molecular data across a wide variety of cellular states are so far only available for mRNA expression levels (Cho et al. 1998; Spellman et al. 1998). Availability of more biomolecular interaction data under various cellular conditions will allow us to study the additional complexity of cellular processes. Such research will require adding another dimension when modeling biomolecular networks and will likely also require new mathematical tools.
In addition to overall cellular state, intracellular environment also bears significance on cellular interactions. For example, proteins may be restricted to particular compartments, either defined by a membrane as are the nuclei or mitochondria, or in less-clearly delimited regions such as nucleoli. Such features can be viewed as vertex properties in the network and can therefore be modeled either using a hypergraph or by labeling vertices. Cellular localization data are already available at a genomic scale (Huh et al. 2003; Kumar et al. 2002) and should yield additional insights.
One major limitation of network analysis is data collection. Even with the rapid development of high-throughput genomic and proteomic experimentation, we are still far from being able to systematically characterize all biomolecular features and interactions. Quantifying them at high resolution in both space (across subcellular locations and across cell and tissue types) and time (e.g., through the course of organismal development) appears a more distant goal. More comprehensive network data will undoubtedly advance the study of the structure and function of biomolecular networks, and the knowledge can in turn aid experimental discovery. One intuitive example is to use inferences made from network study to apply limited resources to experiments that are more likely to yield discoveries. We are optimistic that experimental discovery and network analysis can form a positive feedback loop, with each reinforcing the other and together advancing our understanding of biomolecular networks.
Abbreviations
- Biomolecular network :
-
A graph representation of relationships among a group of biomolecules. Nodes or vertices represent biomolecules. An edge or link between two vertices indicates a relationship between the corresponding biomolecules, for example, physical interaction, genetic interaction, or regulatory relationship.
- Biomolecule :
-
Any organic molecule that is produced by or essential to a living organism, sometimes specifically referring to macromolecules such as a protein or nucleic acid.
- Genetic interaction (epistasis) :
-
Functional interaction between genes, in which the action of one gene is modified by the other gene, sometimes called the modifier gene. The gene whose phenotype is expressed is said to be epistatic, while the one whose phenotype is altered or suppressed is said to be hypostatic. Epistasis can either refer to this phenomenon or more broadly to any case in which two mutations together cause a phenotype that is surprising given the phenotypes of each single mutation alone.
- “Multicolor” network :
-
A network with edges defined by more than one type of interaction or relationship, with each type corresponding to a different “color.”
- Network motif :
-
A specific pattern of connected vertices and edges that occurs frequently within a given network.
- Power-law network :
-
A network defined by a degree distribution which follows \( P(k)\sim {k}^{-\gamma } \), where the probability P(k) that a vertex in the network connects with k other vertices is roughly proportional to k −γ. Sometimes networks that exhibit this behavior only at high degree are also called power law. The coefficient γ seems to vary approximately between 2 and 3 for most real networks. In a power-law network, majority of the vertices have low degree (connectivity), while a small fraction of the vertices have very high degree. Highly connected vertices are referred to as hubs.
- Protein-protein interaction :
-
The physical association of two protein molecules with each other. A pair of proteins can interact directly with physical contact or indirectly through other biomolecules, often other proteins.
- Scale-free network :
-
See power-law network.
- “Single-color” network :
-
A network with edges defined by only one type of interaction or relationship.
- Small-world network :
-
A special type of network with (1) short characteristic path length, such that most vertex pairs are connected to one another via only a small number of edges, and (2) high clustering coefficient, such that neighbors of a given vertex tend to be connected to one another.
- Yeast two-hybrid :
-
An experimental method to examine protein-protein interaction, in which one protein is fused to a transcriptional activation domain (the GAL4 activation domain) and the other to a DNA-binding domain (the GAL4 DNA-binding domain), and both fusion proteins are introduced into yeast. Expression of a GAL4-regulated reporter gene with the appropriate DNA-binding sites upstream of its promoter indicates that the two proteins physically interact.
Bibliography
Primary Literature
Agrawal H (2002) Extreme self-organization in networks constructed from gene expression data. Phys Rev Lett 89:268702
Albert I, Albert R (2004) Conserved network motifs allow protein-protein interaction prediction. Bioinformatics 20(18):3346–3352
Albert R, Barabasi AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47
Albert R, Jeong H et al (2000) Error and attack tolerance of complex networks. Nature 406:378–382
Alon U (2003) Biological networks: the tinkerer as an engineer. Science 301:1866–1867
Amaral LA, Scala A et al (2000) Classes of small-world networks. Proc Natl Acad Sci U S A 97(21):11149–11152
Asthana S, King OD et al (2004) Predicting protein complex membership using probabilistic network reliability. Genome Res 14(6):1170–1175
Avery L, Wasserman S (1992) Ordering gene function: the interpretation of epistasis in regulatory hierarchies. Trends Genet 8(9):312–316
Bader JS (2003) Greedily building protein networks with confidence. Bioinformatics 19(15):1869–1874
Bader GD, Hogue CW (2002) Analyzing yeast protein-protein interaction data obtained from different sources. Nat Biotechnol 20(10):991–997
Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4(1):2
Bader JS, Chaudhuri A et al (2004) Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 22(1):78–85
Balazsi G, Barabasi AL et al (2005) Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli. Proc Natl Acad Sci U S A 102(22):7841–7846
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Bar-Joseph Z (2003) Computational discovery of gene modules and regulatory networks. Nat Biotechnol 21:1337–1342
Bornholdt S, Ebel H (2001) World Wide Web scaling exponent from Simon’s 1955 model. Phys Rev E 64(3):35104
Bornholdt S, Schuster HG (2003) Handbook of graphs and networks: from the genome to the internet
Bray D (2003) Molecular networks: the top-down view. Science 301:1864–1865
Broder A (2000) Graph structure in the web. Comput Netw 33:309–320
Callaway DS, Newman MEJ et al (2000) Network robustness and fragility: percolation on random graphs. Phys Rev Lett 85:5468–5471
Cho RJ, Campbell MJ et al (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73
Cohen R, Erez K et al (2000) Resilience of the Internet to random breakdowns. Phys Rev Lett 85:4626–4628
de Lichtenberg U, Jensen LJ et al (2005) Dynamic complex formation during the yeast cell cycle. Science 307(5710):724–727
Dobrin R, Beg QK et al (2004) Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network. BMC Bioinformatics 5(1):10
Dorogovtsev SN, Mendes JF (2003) Evolution of networks: from biological nets to the internet and WWW. Oxford University Press, Oxford
Drees BL, Thorsson V et al (2005) Derivation of genetic interaction networks from quantitative phenotype data. Genome Biol 6(4):R38
Fanning AS, Anderson JM (1996) Protein-protein interactions: PDZ domain networks. Curr Biol 6(11):1385–1388
Farh KK, Grimson A et al (2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310(5755):1817–1821
Farkas IJ, Wu C et al (2006) Topological basis of signal integration in the transcriptional-regulatory network of the yeast, Saccharomyces cerevisiae. BMC Bioinformatics 7:478
Featherstone DE, Broadie K (2002) Wrestling with pleiotropy: genomic and topological analysis of the yeast gene expression network. Bioessays 24:267–274
Fell DA, Wagner A (2000) The small world of metabolism. Nat Biotechnol 18(11):1121–1122
Freudenberg J, Zimmer R et al (2002) A hypergraph-based method for unification of existing protein structure- and sequence-families. In Silico Biol 2(3):339–349
Gavin AC, Bosche M et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868):141–147
Ge H, Liu Z et al (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 29(4):482–486
Geissler S, Siegers K et al (1998) A novel protein complex promoting formation of functional alpha-and gamma-tubulin. EMBO J 17(4):952–966
Getoor L, Rhee JT et al (2004) Understanding tuberculosis epidemiology using structured statistical models. Artif Intell Med 30(3):233–256
Giaever G (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387–391
Gietz RD, Triggs-Raine B et al (1997) Identification of proteins that interact with a protein of interest: applications of the yeast two-hybrid system. Mol Cell Biochem 172(1–2):67–79
Giot L (2003) A protein interaction map of Drosophila melanogaster. Science 302:1727–1736
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci U S A 99:7821–7826
Goldberg DS, Roth FP (2003) Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci U S A 3:3
Guelzim N, Bottani S et al (2002) Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 31(1):60–63
Gunsalus KC, Ge H et al (2005) Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 436(7052):861–865
Han JD, Bertin N et al (2004) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430(6995):88–93
Han JD, Dupuy D et al (2005) Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 23(7):839–844
Hanein D, Matlack KE et al (1996) Oligomeric rings of the Sec61p complex induced by ligands required for protein translocation. Cell 87(4):721–732
Harbison CT, Gordon DB et al (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004):99–104
Hartwell LH, Hopfield JJ et al (1999) From molecular to modular cell biology. Nature 402:C47–C52
Hasty J, McMillen D et al (2002) Engineered gene circuits. Nature 420:224–230
Ho Y, Gruhler A et al (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868):180–183
Holme P, Huss M et al (2003) Subnetwork hierarchies of biochemical pathways. Bioinformatics 19:532–538
Huh WK, Falvo JV et al (2003) Global analysis of protein localization in budding yeast. Nature 425(6959):686–691
Ihmels J (2002) Revealing modular organization in the yeast transcriptional network. Nat Genet 31:370–377
Ito T (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98:4569–4574
Ito T, Tashiro K et al (2000) Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A 97(3):1143–1147
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356
Jansen R, Greenbaum D et al (2002a) Relating whole-genome expression data with protein-protein interactions. Genome Res 12(1):37–46
Jansen R, Lan N et al (2002b) Integration of genomic datasets to predict protein complexes in yeast. J Struct Funct Genomics 2:71–81
Jansen R et al (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644):449–453
Jeong H, Tombor B et al (2000) The large-scale organization of metabolic networks. Nature 407:651–654
Jeong H, Mason SP et al (2001) Lethality and centrality in protein networks. Nature 411(6833):41–42
Juvan P, Demsar J et al (2005) GenePath: from mutations to genetic networks and back. Nucleic Acids Res 33(Web Server issue):W749–W752
King OD (2004) Comment on subgraphs in random networks. Phys Rev E Stat Nonlin Soft Matter Phys 70(5 Pt 2):058101, author reply 058102
Kitano H (2002) Computational systems biology. Nature 420:206–210
Koonin EV, Wolf YI et al (2002) The structure of the protein universe and genome evolution. Nature 420:218–223
Krogan NJ, Cagney G et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643
Kumar A, Agarwal S et al (2002) Subcellular localization of the yeast proteome. Genes Dev 16(6):707–719
Launer RL, Wilkinson GN (1979) Robustness in statistics. Academic, New York
Lee TI, Rinaldi NJ et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298(5594):799–804
Lee I, Date SV et al (2004) A probabilistic functional network of yeast genes. Science 306(5701):1555–1558
Li S (2004) A map of the interactome network of the metazoan, C. elegans. Science 303:590–593
Li W, Liu Y et al (2007) Dynamical systems for discovering protein complexes and functional modules from biological networks. IEEE/ACM Trans Comput Biol Bioinform 4(2):233–250
Lockhart DJ, Dong H et al (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14(13):1675–1680
Ma HW, Buer J et al (2004) Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach. BMC Bioinformatics 5:199
Ma’ayan A, Jenkins SL et al (2005) Formation of regulatory patterns during signal propagation in a mammalian cellular network. Science 309(5737):1078–1083
MacIsaac KD, Wang T et al (2006) An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7:113
Mangan S, Itzkovitz S et al (2006) The incoherent feed-forward loop accelerates the response-time of the gal system of Escherichia coli. J Mol Biol 356(5):1073–1081
Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296:910–913
Milo R, Shen-Orr S et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Milo R, Itzkovitz S et al (2004) Superfamilies of evolved and designed networks. Science 303(5663):1538–1542
Monod J, Jacob F (1961) Teleonomic mechanisms in cellular metabolism, growth, and differentiation. Cold Spring Harb Symp Quant Biol 26:389–401
Monod J, Cohen-Bazire G et al (1951) The biosynthesis of beta-galactosidase (lactase) in Escherichia coli; the specificity of induction. Biochim Biophys Acta 7(4):585–599
Nadvornik P, Drozen V (1964) Models of neurons and neuron networks. Act Nerv Super (Praha) 6:293–302
Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701
Newman ME, Strogatz SH et al (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2):026118
Novick A, Weiner M (1957) Enzyme induction as an all-or-none phenomenon. Proc Natl Acad Sci U S A 43(7):553–566
Oltvai ZN, Barabasi AL (2002) Life’s complexity pyramid. Science 298:763–764
Pastor-Satorras R, Vazquez A et al (2001) Dynamical and correlation properties of the internet. Phys Rev Lett 87:258701
Ptacek J, Devgan G et al (2005) Global analysis of protein phosphorylation in yeast. Nature 438(7068):679–684
Qi Y, Klein-Seetharaman J et al (2005) Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput 531–542
Rajewsky N (2006) microRNA target predictions in animals. Nat Genet 38(Suppl):S8–S13
Ravasz E, Barabasi AL (2003) Hierarchical organization in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys 67:026112
Ravasz E, Somera AL et al (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555
Rives AW, Galitski T (2003) Modular organization of cellular networks. Proc Natl Acad Sci U S A 100(3):1128–1133
Rouvray H (1990) The origins of chemical graph theory. In: Bonchev D, Rouvray DH (eds) Chemical graph theory: introduction and fundamentals, vol 41. Gordon and Breach Science, New York
Rual JF, Venkatesan K et al (2005) Towards a proteome-scale map of the human protein-protein interaction network. Nature 437(7062):1173–1178
Schena M, Shalon D et al (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
Schleif R (2000) Regulation of the L-arabinose operon of Escherichia coli. Trends Genet 16(12):559–565
Schuster S, Pfeiffer T et al (2002) Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae. Bioinformatics 18:351–361
Shen-Orr SS, Milo R et al (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68
Simon HA (1955) On a class of skew distribution functions. Biometrika 42:425–440
Simonis N, van Helden J et al (2004) Transcriptional regulation of protein complexes in yeast. Genome Biol 5(5):R33
Simonis N, Gonze D et al (2006) Modularity of the transcriptional response of protein complexes in yeast. J Mol Biol 363(2):589–610
Smith LM, Fung S et al (1985) The synthesis of oligonucleotides containing an aliphatic amino group at the 5′ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. Nucleic Acids Res 13(7):2399–2412
Smith LM, Sanders JZ et al (1986) Fluorescence detection in automated DNA sequence analysis. Nature 321(6071):674–679
Snel B, Bork P et al (2002) The identification of functional modules from the genomic association of genes. Proc Natl Acad Sci U S A 99:5890–5895
Sole RV, Pastor-Satorras R et al (2002) A model of large-scale proteome evolution. Adv Complex Syst 5:43–54
Sood P, Krek A et al (2006) Cell-type-specific signatures of microRNAs on target mRNA expression. Proc Natl Acad Sci U S A 103(8):2746–2751
Spellman PT, Sherlock G et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Cell Biol 9(12):3273–3297
Spirin V, Mirny LA (2003) Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci U S A 100(21):12123–12128
St Onge RP, Mani R et al (2007) Systematic pathway analysis using high-resolution fitness profiling of combinatorial gene deletions. Nat Genet 39(2):199–206
Stark A, Brennecke J et al (2005) Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3′UTR evolution. Cell 123(6):1133–1146
Stelzl U, Worm U et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968
Strogatz SH (2001) Exploring complex networks. Nature 410(6825):268–276
Stuart JM, Segal E et al (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science 302:249–255
Tanaka R (2005) Scale-rich metabolic networks. Phys Rev Lett 94(16):168101
Taylor RJ, Siegel AF et al (2007) Network motif analysis of a multi-mode genetic-interaction network. Genome Biol 8(8):R160
Thieffry D, Huerta AM et al (1998) From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. Bioessays 20(5):433–440
Tong AH, Lesage G et al (2004) Global mapping of the yeast genetic interaction network. Science 303(5659):808–813
Tornow S, Mewes HW (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res 31:6283–6289
Tsang J, Zhu J et al (2007) MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol Cell 26(5):753–767
Uetz P, Giot L et al (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770):623–627
Wagner A (2001) The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol Biol Evol 18(7):1283–1292
Wagner A, Fell DA (2001) The small world inside large metabolic networks. Proc Biol Sci 268(1478):1803–1810
Wall ME, Hlavacek WS et al (2004) Design of gene circuits: lessons from bacteria. Nat Rev Genet 5:34–42
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442
Wen X, Fuhrman S et al (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci U S A 95(1):334–339
Winzeler EA (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901–906
Wong SL, Zhang LV et al (2004) Combining biological networks to predict genetic interactions. Proc Natl Acad Sci U S A 101(44):15682–15687
Wunderlich Z, Mirny LA (2006) Using the topology of metabolic networks to predict viability of mutant strains. Biophys J 91(6):2304–2311
Yeger-Lotem E, Sattath S et al (2004) Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc Natl Acad Sci U S A 101(16):5934–5939
Yook SH, Oltvai ZN et al (2004) Functional and topological characterization of protein interaction networks. Proteomics 4(4):928–942
Yu H, Gerstein M (2006) Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci U S A 103(40):14724–14731
Yu H, Luscombe NM et al (2003) Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet 19(8):422–427
Zhang LV, Wong SL et al (2004) Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5(1):38
Zhang L, King O et al (2005) Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol 4(2):6
Books and Reviews
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5(2):101–113
Diestel R (2005) Graph theory, 3rd edn. Springer, Heidelberg
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this entry
Cite this entry
Zhang, L.V., Roth, F.P. (2015). Biomolecular Network Structure and Function. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27737-5_38-3
Download citation
DOI: https://doi.org/10.1007/978-3-642-27737-5_38-3
Received:
Accepted:
Published:
Publisher Name: Springer, Berlin, Heidelberg
Online ISBN: 978-3-642-27737-5
eBook Packages: Springer Reference Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics