Supplemental Materials

Below are instructions for performing some common graph analysis functions in NAViGaTOR as well as detailed steps for visualizing the results from these analyses. NAViGaTOR v2.2.1 was used for the following steps.

Contents

Sample Files

Tips

1. How to add ancillary data to the existing network

2. How to find articulation points and highlight them in the network

3. How to calculate and visualize Betweenness Centrality measure for a network

4. How to find the Cluster Coefficient and visualize the results

5. Edge Density and Diameter

6. Single source shortest path

7. Find shortest paths between groups of nodes.

8. All Pairs Shortest Path (APSP) traversal counter

9. References

Sample Files

Files used (note that the following files are in NAViGaTOR 2.2 XML format). You can download them all together from here or click on the file names below to view and save them individually.

  1. SampleNetwork00.xml - a network generated by fetching immediate neighbours of 11 genes of interest (CCNA2, PTEN, ATAD2, DSCC1, SMAD2, POLQ, RAD51, H3F3A, TOP2A, MYC, BIRC5) from I2D
  2. uniprot2entrezgene_mapping.txt - a mapping of all nodes’ in SampleNetwork00.xml node IDs (which are in Uniprot format) to EntrezGene ID format using the http://www.uniprot.org/ ID Mapping service on 21 March 2011.
  3. lesmisArticulationPoints_unorganized.xml - a network of character co-appearances in Victor Hugo’s novel “Les Miserables”.
  4. SampleNetwork01.xml - same network as SampleNetwork00.xml, but with node annotations added for gene names, gene ontology information, etc.
  1. hsa03013_20110316.xml - a network generated using the PathMod plugin in NAViGaTOR. The plugin can be accessed from the Plugins ➨ Pathways ➨ Load Pathway menu item. Service KEGG was selected, with ‘Browse by organism’ option selected. Organism was chosen as ‘Homo sapiens (human)’. The pathway selected has ID hsa03013. It describes RNA transport in humans. Network was fetched from KEGG on 16 March 2011.
  2. SampleNetwork02.xml - a network generated by fetching immediate neighbours of 8 proteins of interest (O15105, Q00610, O75528, P12956, P12757, Q8N2W9, P04637, P42858) from a snapshot of I2D obtained on 20 October 2010.
  3. macaque_noFilters.xml: Network of functional connectivity between anatomical areas of macaque visual cortex.

Tips

  1. Once a network of interest is open, you may want to ‘Pause’ the network layout algorithm from the Network panel in side-bar after you open these network to make it easier to work with the graph objects.

  1. It is useful to have the ‘Use Blending’ checked in the Default Properties sidebar to enable transparency.

1. How to add ancillary data to the existing network

  1. Open the network SampleNetwork00.xml using the NAViGaTOR 2.2 XML file format.
  2. The simplest way to load arbitrary data into the network is to have it in a tab-delimited file. The first column in the file must match the node IDs in the network. The second column can be any feature that you want to add to the network. There can be as many additional columns as you like. In our example, we want to annotate the nodes in the network with an EntrezGene ID. We will use the file uniprot2entrezgene_mapping.txt for this purpose. From the menus select File ➨ Import ➨ Node Features and open this file.

Note: you can also import edge features this way (File ➨ Import ➨ Edge Features). For edge import, the file must have 2 columns of node IDs, where each pair of IDs represents an edge in the network. A third column will contain the feature you want to annotate the edge with. You can have as many extra columns as you like to import multiple features in one go.

  1. Make sure ‘Use Default ID’ is selected and click Next.

Note: if the IDs used in your mapping file are related to some other feature existing in the network, then you can select that from the ‘Alternate ID’ drop-down box by unselecting ‘Use Default ID’.

  1. Set the header size to 1 because we want to ignore the first line in the file. Click Next.

  1. Using the drop-down box in the first column, select ‘Node ID’. Click Next.

  1. Use the first drop-down box in the second column and set it to ‘Node’. From the second drop-down select ‘Text’. Finally click ‘Copy Title Line’. Now click Done.

  1. The feature is now imported into the network and attached the corresponding nodes. Select a single node and the imported value for that node will be visible in the side-bar

  1. Plugins are able to import node/edge features too. For instance, you can use the I2D plugin on this network to fetch biological annotations for the genes in this network. From the menus select Plugins  ➨  I2D  ➨  ‘Get Node Features’

  1. Make sure all the check boxes are marked. In our network the ID type is ‘SwissProt ID’ and the organism is ‘HUMAN’. Click Search and wait for the results to come back.

Note: this will take several minutes. To just try it out, you can make a small selection (about 10 nodes) and select ‘Search Selection Only’ to get results back faster.

  1. Once done, the nodes will have the ancillary data attached to them. To view this data, select Analysis ➨ View Node Statistics.

2. How to find articulation points and highlight them in the network

  1. Open the network lesmisArticulationPoints_unorganized.xml (first making sure to select NAViGaTOR 2.2 XML format). This network corresponds to character co-appearances in Victor Hugo’s novel “Les Miserables” (Knuth, 1993; les Miserables).
  2. Click on Analysis ➨ Select Articulation Points. All the articulation points in the network will get added to the selection.

  1. Right click on the network in  the network panel and select Node ➨ Appearance (Selection)

  1. We want to make the nodes that are articulation points appear different then the other nodes in the network. We will make them diamond shaped and give them a darker color as well as show their labels.
  1. Click ‘Color’ and choose black
  2. Change the Shape to ‘Diamond’
  3. Change the Height and Width to 16
  4. Make sure ‘Show Label’ selected and choose ‘Label’ in the dropdown underneath it.

  1. Click Done when done
  1. Use the Concentric Circles button to lay these nodes and their immediate neighbours on concentric circle. Alternatively you can go to the Layout menu and click ‘Layout neighbourhood of projected node(s) on concentric circles’ ➨ ’Depth 1’.
  2. At this point you may want to use the Zoom In/Out buttons to get a better view of the network.
  3. You will note that there are some nodes that are not fixed in place. These are the ones that were more than one connection away from the articulation points. We want to select these points and fix them in position. But first we need to save the current selection so we can come back to it later. In the ‘Subsets’ panel in the side-bar, click ‘New’

and name the subset ‘Articulation Points’.

  1. Now go to the Selection menu and selection ‘Grow Selection’ ➨ ’Depth 1’.
  2. Go again to the Selection menu and select ‘Invert Selection’. This will select all the nodes that are not the immediate neighbours of the articulation points.
  3. Right-click in the network panel and select Node ➨ Appearance (Selection) again. Make sure ‘Fixed’ is checked and then click Done.
  4. Now click on the Horizontal Line button (), and then use the Move Tool () to move the selection anywhere in the network. You can also use the Scale Tool () to further space out the nodes. Additionally you can use the Bezier Tool () to curve the edges between the nodes.

  1. Let’s go back to our articulation points. In the Subsets panel in the side-bar, select ‘Articulation Points’ and click on the ‘’ button.

  1. Selection ➨ Select all Edges between selected nodes
  2. Right-click in the network panel and select Edge ➨ Appearance (Selection)

  1. Set the colour to black and width to 4. Click Done.
  2. Right-click again in the network panel and select Edge ➨ Appearance (Inverse) this time.
  3. Drag the Blend slider to the 25 position and click Done.
  4. Finally, you can use the Lasso Select Tool () to select the large circle and all the nodes inside it and then use the Rotate Tool () to align the circles as you like.
  5. Selection  ➨  Deselect all nodes and edges

Nodes represent characters and are linked together as they appear in the same chapter. Interestingly, some of the 8 characters identified as articulation points (diamond shape) such as Valjean and Gavroche correspond to major characters in the plot who die by the end.

3. How to calculate and visualize Betweenness Centrality measure for a network

  1. Open the network SampleNetwork01.xml (first making sure to select NAViGaTOR 2.2 XML format). Select all nodes in the network (Selection  ➨  Select all nodes)
  2. Then go to the Analysis menu and select ‘View all shortest paths (Nodes)’

  1. Wait for a while for the operation to complete. When it is complete, a spreadsheet with shortest path information will pop-up

The shortest path between protein A1L0T0 (ILVBL) and O00567 (NOP56) is length 2, the only intermediary protein is P01106 (MYC). Myc oncoprotein is a regulator of gene transcription that drives the neoplastic process in approximately 50% of all human cancers (Dang, Resar et al. 1999; Prochownik 2004; Basso, Margolin et al. 2005; Dang, O'Donnell et al. 2006).

  1. From the spreadsheet’s Analysis menu, select ‘Path Traversal - Betweeness’. Wait for the operation to complete.

  1. Once completed, go back to Analysis menu in the main NAViGaTOR window and select ‘View node statistics’. All the nodes will have a betweenness centrality measure attached to them under the ‘curBC’ heading. You can click on the heading to sort the rows according to that measure. You can also select rows in the spreadsheet and choose Selection ➨ ’Replace Network selection with Spreadsheet selection’ to have the corresponding nodes selected in the network. You can then change the visual attributes of these nodes by right-clicking on them and selecting Nodes ➨ Appearance (Selection).

  1. Alternatively, instead of individually setting visual attributes, it is more useful to visualize all the values of the betweenness centrality measure automatically. To do so go back to the main NAViGaTOR window. Select all nodes (Selection ➨ Select all nodes). In the side-bar choose the ‘Network’ tab and click on the ‘Add new node filter’ button under ‘Appearance Filters’.

  1. In the ‘Select a feature to visualize’ drop-down, select ‘curBC’ and make sure that ‘Colour’, ‘Transparency’ and ‘Size’ are selected. This will map the values of the betweenness centrality to the colour, transparency and size of nodes in the network. Each visualization attribute’s mapping can be tweaked by adjusting its settings in the panel displayed on the right.

The default settings are fine for this example. You can always modify them later by selecting the filter in the ‘Appearance Filters’ panel and clicking the ‘Edit filter settings’ button.

  1. When satisfied with the settings, click ‘Current Selection’.
  2. You will immediately see the results of the visual mapping in the network panel

  1. Additionally, in the ‘Appearance Filters’ panel, you can click the ‘Show legend’ button to see the range of values being mapped

The node sizes and color indicate the nodes that are playing important roles in keeping the network connected. Since the nodes in this graph are proteins, one might infer that these are more likely to be essential to cellular function.

4. How to find the Cluster Coefficient and visualize the results

  1. Open the network SampleNetwork01.xml (first making sure to select NAViGaTOR 2.2 XML format).
  2. Then go to the Analysis menu and select ‘View node statistics’.
  3. In the Node Spreadsheet’s Analysis menu, select ‘Add cluster coefficient’. Wait for the operation to finish.
  4. When finished, you will see a new column added to the spreadsheet titled ‘Cluster Coef.’ You can click on the heading to sort the rows according to that measure. You can also select rows in the spreadsheet and choose Selection ➨ ’Replace Network selection with Spreadsheet selection’ to have the corresponding nodes selected in the network.

You can then manually set visual attributes for these nodes by right-clicking on them and selecting Node ➨ Appearance (Selection).

  1. Alternatively, you can have NAViGaTOR automatically visualize the cluster coefficient values in the network. Create a new node filter from the ‘Appearance Filters’ panel in the side-bar. Select ‘Cluster Coef.’ as the feature to visualize and choose the visualize attributes to map to. In this example, let’s choose the ‘Red to Blue’ color gradient:

and map the size from 8 to 40:

  1. When done, click ‘All’ to visual the cluster coefficient values across the entire network. You will be immediately able to see the results of the visualization.

In this protein-protein interaction network, the clustering coefficient of a protein reflects the extent to which the its interacting proteins are interconnected with each other. For example, node XRCC5, in blue, has clustering coefficient of 1 which means that all of its neighbors are interconnected. XRCC5 is important in non-homologous end-joining DNA repair pathway. Node HDAC2, in purple, has clustering coefficient of 0.66 which means that two thirds of its neighbors are connected to each other. The majority of the nodes in red are at the periphery of the network with no connections between their neighbors.

5. Edge Density and Diameter

  1. Open the network hsa03013_20110316.xml
  2. From the Analysis menu, select ‘View Network Component Statistics’

  1. A spreadsheet will pop-up showing the density for each connected component in the network. You can click on the Density column heading to sort according to that value.

Values closer to 1 indicate how close the component is to being a clique. In PPI networks, cliques can be used to predict new protein complexes.

  1. To calculate the diameter for the components, select Analysis ➨ ‘Compute diameter for all components’ and wait for the operation to finish.
  2. Once completed, a new column will be added to the spreadsheet showing the diameter for each connected component.

If the diameter is small, proteins can influence each other more readily.

6. Single source shortest path

  1. Open the network SampleNetwork01.xml
  2. Select any two nodes for which you want to find a shortest path between. For example, pick ‘Q06609’ corresponding to RAD51 and ‘P01106’ corresponding to MYC.
  3. From the Analysis menu, select select ‘Select Shortest Path(s)’. Since we are not going to use weights in this example, you can skip the ‘Choose Weights’ pop-up dialog box that appears by clicking ‘OK’. This will pick one shortest path between the 2 nodes. If there are multiple shortest paths, it picks one arbitrarily. The shortest path picked will get selected in the network.

  1. Alternatively, to see all the shortest paths between 2 or more nodes, select the nodes and click ‘View all shortest paths (nodes)’ in the Analysis menu. This will show a spreadsheet of all the shortest paths between each pair of nodes in the current selection.

When you move your mouse over the rows in the ‘Path’ column, the corresponding path gets highlighted in the network. (Make sure that View ➨ ’Enable Preview Highlighting’ is checked in the spreadsheet menus).

  1. To make all of these edges stand out in the network, select all the rows in the spreadsheet and click on Selection ➨ ’Replace Network Selection with Spreadsheet selection’. This will select all the shortest paths in the network.

  1. With the paths selected in the network, right-click in the Network Panel and select Edge ➨ ’Appearance (Selection)’.

  1. Change the Width parameter to 5, and click ‘Done’.

  1. To even further make the edges stand out, right-click in the Network Panel again and this time select Edge ➨ ’Appearance (Inverse)’ and move the ‘Blend’ slider to any value under 25.

Both RAD51 and MYC are genes involved in cancer. Identifying the shortest paths between them can help understand how they affect each other through other genes they interact with.

7. Find shortest paths between groups of nodes.

  1. Load file SampleNetwork02.xml into NAViGaTOR.

  1. Using the Lasso Tool () select the common neighbours of nodes ‘P12757’ corresponding to SKIL and ‘O15105’ corresponding to SMAD7.
  2. Within the Subset sidebar, click on New to create a new subset.

  1. Name the subset ‘Subset 1’ and leave the rest of the settings the same in the dialog box that pops up. Click Submit.

  1. Using the Lasso Tool () select the common neighbours of nodes ‘Q00610’ corresponding to CLTC and ‘P12956’ corresponding to XRCC6. Create another subset for them titled ‘Subset 2’.
  2. We are interested in finding the shortest paths between these 2 sets of proteins. Select Analysis  ➨  View all shortest paths (Groups) item from the menu.

  1. You will be prompted to select which groups you would like to find paths between. Once you have selected the two subsets you created earlier, click OK.

  1. When the analysis is complete, you will be presented with a spreadsheet that displays the shortest paths available between the two groups. Moving the mouse over the ‘Path’ column will highlight the corresponding path in the network. Make sure that View ➨ Enable Preview Highlighting is enabled for this to work.

  1. To highlight all the paths, select all of the rows in the spreadsheet (Ctrl+A in Windows, Cmd+A on Macs) and select ‘Selection  ➨  Replace Spreadsheet selection with Network selection’. You can then use the right click menu’s Edge/Node  ➨  ’Appearance (Selection/Inverse)’ to apply various visual styles to the selection.
  2. Additionally, from the spreadsheet view you can choose Analysis ➨ ’Path Traversal - Counter’ to get a measure of how frequently each node was hit during the shortest path traversal. These values can be visualized using a Node Appearance Filter as describes in previous sections in this document.

From these results we can infer that from the set of 4 starting proteins whose common neighbours we looked at, only 3 of them seem to interact more directly with the 2 protein sets.

8. All Pairs Shortest Path (APSP) traversal counter

  1. Load macaque_noFilters.xml network into NAViGaTOR
  2. Select Analysis ➨  APSP Traversal Counter from the application menu.

  1. The APSP window will open. By default, the NO WEIGHTS value is selected in the ‘Source of weight values’ combo box. If your graph has numeric features associated with edges, you can customize the value of each edge by selecting your feature in this box and then selecting from the additional features below it. For our example, we will continue with the defaults by pressing OK.

  1. After the algorithm finishes processing you should see the following message:

  1. Click OK. To view the resulting APSP ‘hits’, go to Analysis ➨  ‘View Node Statistics’ and also Analysis ➨ ‘View Interaction Statistics’. The values are visible under the column titled ‘APSP Hits’.

  1. Now you can visualize the APSP data by applying Filters to it. To get a better idea of the importance of different nodes and edges in the graph, we can apply filters. You can create new filters using the New Node Filter and New Edge Filter buttons in the Appearance Filters sidebar.

  1. Node Filters: From the Node Appearance Filters dialog, you can select any node feature for mapping to a visual attribute. In our case, we select the APSP Hits from the ‘Select a Feature to visualize’ Combo Box. We then select Color and Size as our visual attributes. There will be many options to customize your mapping. In our example we will choose ‘Full Spectrum’ for colour and map the size from 8 to 40. The remaining settings we will use the default values.

  1. Edge Filters: From the Edge Appearance Filter dialog, you can select any edge feature for mapping to a visual attribute. In our case, we select the APSP Hits from the ‘Select a Feature to visualize’ Combo Box. We then select Transparency and Width as our visual attributes There will be many options to customize your mapping, but we will use the default settings for both.

The resultant graph should now illustrate the values for both edge and node APSP hits.

9. References

  1. D. E. Knuth. The Stanford GraphBase: A Platform for Combinatorial Computing. Addison-Wesley, Reading, MA, 1993.
  2. Les Miserables. Network of coappearances of characters in Victor Hugo’s novel “Les Miserables”. http://www-personal.umich.edu/~mejn/netdata/. [Online; accessed 22-December-2010]
  3. Dang, C. V., L. M. Resar, et al. (1999). "Function of the c-Myc oncogenic transcription factor." Exp Cell Res 253(1): 63-77.
  4. Prochownik, E. V. (2004). "c-Myc as a therapeutic target in cancer." Expert Rev Anticancer Ther 4(2): 289-302.
  5. Basso, K., A. A. Margolin, et al. (2005). "Reverse engineering of regulatory networks in human B cells." Nat Genet 37(4): 382-90.
  6. Dang, C. V., K. A. O'Donnell, et al. (2006). "The c-Myc target gene network." Semin Cancer Biol 16(4): 253-64.