Alignment Programs (dotplot, global & local alignment)
Table of Contents
- iNquiry alignment background
- Dotplot programs
- Global alignment programs
- Local alignment programs
- Research using alignment programs
- Sources
Figure one: Sample dotplot using Dotmatcher
Table one: Other dotplot programs
Figure two: Sample global alignment
Table two: Other global alignment programs
Figure three: Sample output from a local alignment
Table three: Other local alignment programs
iNquiry Alignment
Inquiry tools for alignment find commonalties between nucleic acid or protein sequences. Some of these tools, such as EMMA and Needle , simply create alignments. Other programs, such as Prettyplot and Cons , go one step further by displaying the alignment graphically or making consensus sequences from alignments, respectively. Thus, there are a variety of alignment tools able to align and analyze sequences through a variety of methods.
Sequence alignment begins with the selection of an algorithm that generates a score for each alignment. This algorithm must adjust for gaps hypothetically resulting from insertions and deletions in the sequence. The score constructed by the algorithm represents the degree of similarity between sequences. Global alignment programs create alignments from an entire genome, while local alignment programs use only certain regions of the genome to find enough similarity to maximize the alignment score. Other programs, such as dotplot programs, graphically display the alignments.
Dot plots graphically display regions of similarity between two sequences. One sequence is placed on the y-axis of a rectangular graph, and the other sequence is placed on the x-axis. A dot is plotted at every coordinate where there is similarity between the sequences. Where two sequences are very similar many dots align to form diagonal lines. These diagonals make it possible to visualize local regions of similarity. Diagonals also make it easy to see other features such as repeats (which form parallel diagonal lines), and insertions or deletions (which form breaks or discontinuities in the diagonal lines)(2). Dot plot programs include Dotmatcher, Dotpath, Dottup, and Polydot . Each program is uniquely designed to compare sequences using a dot plot.
Input for two nucleic acid or protein sequences must be entered as sequence a and b. Because dotplot programs look for matches in each threshold window, both a window size and threshold must be chosen. The window size is the area to which the threshold for a consensus will be applied. The smaller the window size, the more likely it is for a consensus to be made. The threshold is a value that will determine how strict an identity must be to be considered for the dotplot. The user must also identify how the output is to be displayed (i.e.- text, windows graph, etc.).
Dotmatcher uses a threshold, calculated from a substitution matrix such as Blosum, to define whether a match is plotted. A window of specified length moves up all possible diagonals and a score is calculated within each window for each position. The score is the sum of the comparison of the two sequences using the given similarity matrix along the window. If the score is above the threshold, then a line is plotted on the image over the position of the window (2).

Dotpath |
|
Dottup |
|
Polydot |
|
Global Alignment Programs
Global alignment programs align entire sequences. Each program compares a whole sequence to either another complete sequence or a sequence fragmented by cloning or changed by natural divergence. As with all the alignments, in a global pairwise alignment it is assumed that the two sequences have diverged from a common ancestor. Each program uses a different means to compensate for area uncommon to both sequences. The best alignment over the whole length of the two sequences then best illustrates their similarities. Global alignment programs include Est2genome, Stretcher , and Needle .
How to use Global alignment programs
Two sequences must be entered. Then the user must specify a gap penalty (default is 10), or score taken away for every gap created. This penalty ensures that the best alignment will contain the fewest possible gaps. A gap extension penalty (default is 0.5) must also be specified. This penalty is added for every base included in each gap. This number should be lower than the gap penalty because longer gaps are more common than multiple short gaps. An output file must also be designated. Typically, the cost of extending a gap is set to be 5-10 times lower than the cost for opening a gap. (2)
Needle finds the best alignment for two whole sequences using the Needleman-Wunsch global alignment algorithm. This algorithm chooses the opotimal global alignment by exploring all possible alignments and choosing the one with the best score. It does this by reading from a scoring matrix that contains values for every possible residue or nucleotide match. Needle finds an alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix (2). Thus, all possible alignments are considered to ensure that only the best is used. Needle scores alignments in a scoring matrix that contains values for every possible residue or nucleotide match. An alignment with the maximum possible score is found after the score of an alignment equal to the sum of the matches is taken from the scoring matrix.
########################################
# Program: needle
# Rundate: Tue Apr 06 14:30:14 2004
# Align_format: srspair
# Report_file: hba_human.needle
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: HBA_HUMAN
# 2: HBB_HUMAN
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 148
# Identity: 63/148 (42.6%)
# Similarity: 88/148 (59.5%)
# Gaps: 9/148 ( 6.1%)
# Score: 290.5
#
#
#=======================================
HBA_HUMAN 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DL 48
.|:|.:|:.|.|.|||| :..|.|.|||.|:.:.:|.|:.:|..| ||
HBB_HUMAN 1 VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDL 48
HBA_HUMAN 49 S-----HGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRV 93
| .|:.:||.|||||..|.::.:||:|::....:.||:||..||.|
HBB_HUMAN 49 STPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHV 98
HBA_HUMAN 94 DPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 141
||.||:||.:.|:..||.|...||||.|.|:..|.:|.|:..|..||.
HBB_HUMAN 99 DPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH 146
Stretcher |
|
Est2genome |
|
Local Alignment Programs
Local alignment programs find local, small parts of two sequences where there is some similarity and makes no assumption about the whole length of the sequence needing to be similar. Local alignment programs include Matcher, Water, Wordmatch, Seqmatchall , and Supermatcher.
How to use local alignment programs
Two sequences must be entered by the user, as well as a gap penalty, and gap extension penalty. With some local alignment programs, such as Matcher , the user can request that more than just the best alignment be shown. Other programs require that the user choose the scoring matrix. Most of the programs, however, use EDNAMAT and eBLOSUM62 as default scoring matrices for nucleic acid and protein sequences, respectively.
Water uses the Smith-Waterman local alignment to compare sequences. Regions of similarity between two sequences are found, and local regions from one sequence can even be used to scan a sequence database. This program is best used to look for similarities between small parts of two sequences. In addition to the acutal alignment, Water will show an alignment score, in addition to percent identity and similary, as seen in figure three.
######################################## # Program: water # Rundate: Tue Apr 06 14:34:38 2004 # Align_format: srspair # Report_file: hba_human.water ######################################## #======================================= # # Aligned_sequences: 2 # 1: HBA_HUMAN # 2: HBB_HUMAN # Matrix: EBLOSUM62 # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 145 # Identity: 63/145 (43.4%) # Similarity: 88/145 (60.7%) # Gaps: 8/145 ( 5.5%) # Score: 293.5 #
Wordmatch |
|
Supermatcher |
|
Matcher |
|
Seqmatchall |
|
Research using alignment programs
- Coenye, T. Vandamme, P. Extracting phylogenetic information from whole genome sequencing projects: the lactic acid bacteria as a test case . Microbiol. Dec 2003; 149:3507-3517.
- Huang, Y. Zhang, L. Rapid and sensitive dot matrix methods for genome analysis . Bioinformatics. Mar 2004; 20(4): 460-6.
- Jancovich,J. Mao, J. Chinchar, V. Wyatt, C. et. al. Genomic sequence of a ranavirus associated with salamander mortalities in North America . Virology. Nov. 2003; 316:90-103.
- Warren, M. Smith, A. Partridge, N. Masabanda, J. Structural analysis of the chicken BRCA2 gene facilitates identification of functional domains and disease causing mutations . Human Molecular Genetics. Apr. 2002; 11:841-851
- Wicker, T. Guyot, R. Yahiaoui, N. Keller, B. CACTA transposons in Triticeae. A diverse family of high-copy repetitive elements . Plant Physiol. May 2003; 132(1):52-63.
Use of Global Alignment Programs
- Cheung, A. Bayer, A. Zhang, G. Gresham, H. Xiong, Y. Regulation of virulence determinants in vitro and in vivo in Staphylococcus aureus . FEMS Immunol Med Micorbiol. Jan 2004; 40(1):1-9.
Use of local alignment programs
- Goddard, J. Sumner, J. Nicholson, W. et. al. Survey of ticks collected in Mississippi for Rickettsia,Ehrlichia , and Borrelia species . J Vector Ecol. Dec 2003; 28:184-9
- Yamada, Y. Makimura, K. et. al. Phylogenetic relationships among medically important yeasts based on sequences of mitochondrial large subunit ribosomal RNA gene . Mycoses. Feb. 2004; 47:24-8.
- Zhang, W. Jayarao, B. Knabel, S. Multi-virulence locus sequence typing of Listeria monocytogenes. Appl Environ Microbiol. Feb. 2004;70:913-20.
Sources
- Thomas, M. "Sequence alignments" Power point presentation 2/10/2004
- EMBOSS instructional package. http://egg.isu.edu/inquiry
- www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps
- www.sanger.ac.uk/Software/EMBOSS
- http://egg.isu.edu/emboss/dotmatcher.html