Introduction to Sequence Similarity and Identity
- Sequence similarity refers to the degree of resemblance or resemblance between two or more biological sequences.
- Sequence identity measures the exact match or similarity between sequences at the same positions.
Sequence Alignment
- Sequence alignment is a common approach to measure similarity and identity between sequences.
- Alignment algorithms compare sequences and identify matching or similar regions.
Pairwise Sequence Alignment
from Bio import pairwise2
sequence1 = "ATCGTACG"
sequence2 = "ATAGCACG"
alignments = pairwise2.align.globalxx(sequence1, sequence2)
for alignment in alignments:
print("Alignment Score:", alignment.score)
print("Alignment Sequence 1:", alignment.seqA)
print("Alignment Sequence 2:", alignment.seqB)
print()
- Perform pairwise sequence alignment using the
pairwise2.align.globalxx()
function.
- Pass the sequences to align as arguments.
- Iterate over the alignments and print the alignment score, sequence 1, and sequence 2.
Calculating Sequence Similarity and Identity
- Sequence similarity can be measured by various metrics, such as percent identity, percent similarity, or alignment score.
- Percent identity is the ratio of identical positions to the total aligned positions.
- Percent similarity considers both identical and similar positions.
Sequence Similarity Calculation
from Bio import pairwise2
sequence1 = "ATCGTACG"
sequence2 = "ATAGCACG"
alignments = pairwise2.align.globalxx(sequence1, sequence2)
alignment = alignments[0]
aligned_length = len(alignment.seqA)
identical_positions = sum(a == b for a, b in zip(alignment.seqA, alignment.seqB))
similarity = (identical_positions / aligned_length) * 100
print("Percent Similarity:", similarity)
- Perform pairwise sequence alignment using the
pairwise2.align.globalxx()
function.
- Consider the first alignment from the list of alignments.
- Calculate the aligned length and the number of identical positions.
- Calculate the percent similarity by dividing the number of identical positions by the aligned length and multiplying by 100.
Summary
- Sequence similarity and identity are important measures in bioinformatics.
- Sequence alignment is a common approach to calculate similarity and identity.
- Biopython provides functions and modules, such as
pairwise2
, for performing sequence alignment and calculating similarity.