Using the Homology Models database: A Tutorial
The idea of a pipeline for large-scale homology modeling, which starts with the sequence for an experimentally solved protein structure has proved more effective in constructing 3D predictive structural models of sequence homologues, along a spectrum of sharing high sequence identity with the template sequence to those sharing relatively high e-value with the template sequence obtained by optimizing PSI-BLAST parameters (1).
The models are evaluated by ProsaII residue-by-residue profiles (Sippl, 1993) and a quantity expressed as an overall pG value obtained from Z score obtained from the length-normalized ProsaII profile. The smoothed energy profiles output by ProsaII provides an estimate of potentially misfolded regions in a model, which, thus, may be used with manual modeling improvements for sequences of especial interest.
We have established the corresponding databases: with all NESG structures to date (~670 structures) and a subset of the PDB with 60 % redundancy cutoff. Users can search their sequences for homology models (and structures if they exist) using a wide range of different criteria of importance, such as model quality, coverage, sequence/template e-value or sequence identity, genome, structural template, etc. The database will provide multiple models if they fit the search criteria so that the use may choose the model most appropriate for their purposes. The user will also have access to coordinate files for the homology models as well as computational function annotations (from the program (MARK-US) for the PDB structure templates (Figure 1). If the user is interested in obtaining similar function annotation results for a given model, a link is provide to MARKUS. If a user wishes to obtain a higher quality model, the sequence can be sent to the program PUDGE for refining and/or analyzing the homology model.
(1) Mirkovic N., Li Z., Parnassa A., Murray D. (2007) Strategies for High-Throughput Comparative Modeling: Applications to Leverage Analysis in Structural Genomics and Protein Family Organization. Proteins: Structure, Function, and Bioinformatics 66:766-777.
Figure 1. Overview of homology model databases. Users can search for homology models either by copying and pasting user's sequence or by selecting different options such as model quality, template PDB, species, and sequence coverage.
Searching the database using different options
User can search homology models using a combination of different options such as template PDB structure, species, model quality assessed by Prosa and sequence identity. In addition to the main blast search the additional search let users search for more specific items contained in the database. Figure 2 shows the additional database searches main screen. These search terms may be used independently or together to narrow your results set down.
Template Structure: This search allows users to see the target results for the Skyline run of the NESG structure
Species: This search allows users to select all of the targets that belong to a certain species.
Model Quality: This search allows you to see all the models produced by SkyLine by the quality of the model produced. 1.00 is the highest model quality.
Target-Template Sequence Identity: This allows you to search all targets produced by Skyline by the percentage it matched it's template.
Figure 2 . Users can search for homology models by selecting different options such as model quality, template PDB, species, and sequence coverage.
Searching the database using amino acid sequence in Raw format
This search allows users to input an amino acid sequence into the text box at the top of the search page, and this will be used as a BLAST input to search for similar sequences with in the models database. This BLAST search will return the top 10 results that have at least a 75% sequence coverage of hit sequences (if the query sequence is shorter than hit sequence, sequence coverage is calculated for the query sequence) and over 30% sequence identity of the two sequences. If no results are produced then the database will return a message presenting several possible reasons the search results may be empty. For our example, we'll do a search using
the sequence for steroidogenic acute regulatory protein isoform 1 [Homo sapiens]. We input the raw formatted sequence into the
text box as demonstrated by figure 3.
Figure 3. Example of searching homology models from user's sequence.
Once the submit button is clicked the sequence is
submitted for the BLAST run, then the results are returned in table
form as shown in Figure 4.
Figure 4. Blast results for user's sequence.
Figure 5. Homology models and template structures for user's input sequence.
Users can submit more than one homology models from different templates from Figure 5. In this example, AAB35726.1 sequence has five different homology models from 5 different START domain structures. Each structure and protein sequence are linked to corresponding databases, Protein Data Bank (www.pdb.org) and NCBI protein database (www.ncbi.nlm.nih.gov), respectively.
The results page allows users to see the information pertaining to the target such as: Model Quality, Target/Template Sequence Analysis, E-Value, Target Coverage, Target Start, Target End, Target Length, 3D Coverage, 3D Start, 3D End, Species, and Target Description. It also displays the aligned template and target in a color coded table (Figure 6). The template always appears in pale yellow and the target in light gray. In addition to the sequence alignment the model 3D coordinate file is available for download via the model 3D-coordinate link located on the bottom of the results entry. The coordinate file is also displayed as a 3-D cartoon rendering using the Jmol applet. For more information on Jmol, please see the project's web page at http://jmol.sourceforge.net. You can also view the additional help topic on Jmol.
Figure 6. The table of hit result includes the sequence alignment and annotations from NR database (a). Sequence alignment and Homology model can be downloaded. The Homology model is viewed by Jmol program. Each homology model is ready to be submitted to the model refinement and functional analysis database/webserver (b). Verify3D results for the homology model and corresponding template structure are displayed so that user can decide the quality of the homology models (c).
Each homology model also provides users with additional tools for further exploration. "Model Refinement" optimizes the sidechain and adds hydrogens using SCAP program through PUDGE webserver (http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PUDGE). The optimized model is assessed by Verify3D and Prosa program (Figure 7). User can compare the quality of model before after optimization by checking Verify3D and Prosa difference plots.
Figure 7. The PUDGE returns prosa difference plot (a) as well as verify3D graph of optimized model.
Advanced Refinement option allows users to choose multiple programs for model refinement (MR_w_xplor_d10 option is not ready yet), model analysis, and model evaluation. In this example of Figure 8, MR_w_scap_basic option is selected as model refinement method which will add hydrogens and optimize the sidechains and Ra_residue_potential, Ra_w_prosa, and Ra_v3d are used to analyze the optimized model. The PUDGE will return the result page as Figure 8(b) where users can download the optimized model and check the model evaluation after refinement.
Figure 8. Advanced Refinement option gives users multiple options for model refinement, analysis, and evaluation through PUDGE webserver. The residue potentials of optimized model is shown in (c) and Prosa results with different window size is displayed in (d)
Functional analysis option submitts the homology model to Mark-US database/webserver for in-depth functional analysis (Figure 9). Mark-US will calculate the elecstrostatic map, cavities, residue conservation, multiple alignment, and etc.
Figure 9. Clicking "Functional Analysis" will submit the job to Mark-Us, function annotation server (a). The Mark-Us server provides electrostatic potential map, cavities, residue conservation, structure neighborhood, etc.
All NESG Structures run to date with leverage information
In addition to the main blast search users can view information about each template structure that has been used for homology modeling process in Skyline. The table presents the template structure, a link to the MarkUs functional annotation web-server/database, and the total number of models produced from the each template structure. An abbreviated sample of the table is demonstrated in figure 1. You can reference any structure on the Honig Lab's Mark-Us, functional annotation server, by clicking on the structure identifier in the middle column. For more information on MarkUs, please see the project's submission page
Figure 10. The summary table of database shows the links to the results of homology models for each template structure ,the links to the Mark-Us, funtional analysis web-server/database and the number of reliable models where pG score is higher than 0.7.
By clicking the template structure identifier in the left most column you will see the results that show all of the models produced from that structure's SkyLine run. In this example we will look at the template structure 12asa, as demonstrated in Figure 11. The results set in figure 11 is ranked by Model Quality, this is to provide the homology models with the highest quality and perhaps of a more interesting nature to the user.
Figure 11. Results for all models for 12as ranked by Model Quality.
We see in our results set that two tables are returned. The first contains information about the NESG structure and options that it was run in Skyline with. You will also notice that a link to MarkUs is also included in this table to allow users to again reference this structure with the functional annotation server.