Course Content
Biopython Fundamentals
About Lesson

Introduction to Sequence Similarity and Identity

  • Sequence similarity refers to the degree of resemblance or resemblance between two or more biological sequences.
  • Sequence identity measures the exact match or similarity between sequences at the same positions.

Sequence Alignment

  • Sequence alignment is a common approach to measure similarity and identity between sequences.
  • Alignment algorithms compare sequences and identify matching or similar regions.

Pairwise Sequence Alignment

from Bio import pairwise2

sequence1 = "ATCGTACG"
sequence2 = "ATAGCACG"

alignments = pairwise2.align.globalxx(sequence1, sequence2)

for alignment in alignments:
    print("Alignment Score:", alignment.score)
    print("Alignment Sequence 1:", alignment.seqA)
    print("Alignment Sequence 2:", alignment.seqB)
    print()
  • Perform pairwise sequence alignment using the pairwise2.align.globalxx() function.
  • Pass the sequences to align as arguments.
  • Iterate over the alignments and print the alignment score, sequence 1, and sequence 2.

Calculating Sequence Similarity and Identity

  • Sequence similarity can be measured by various metrics, such as percent identity, percent similarity, or alignment score.
  • Percent identity is the ratio of identical positions to the total aligned positions.
  • Percent similarity considers both identical and similar positions.

Sequence Similarity Calculation

from Bio import pairwise2

sequence1 = "ATCGTACG"
sequence2 = "ATAGCACG"

alignments = pairwise2.align.globalxx(sequence1, sequence2)

alignment = alignments[0]  # Consider the first alignment

aligned_length = len(alignment.seqA)  # Length of the aligned region
identical_positions = sum(a == b for a, b in zip(alignment.seqA, alignment.seqB))
similarity = (identical_positions / aligned_length) * 100

print("Percent Similarity:", similarity)
  • Perform pairwise sequence alignment using the pairwise2.align.globalxx() function.
  • Consider the first alignment from the list of alignments.
  • Calculate the aligned length and the number of identical positions.
  • Calculate the percent similarity by dividing the number of identical positions by the aligned length and multiplying by 100.

Summary

  • Sequence similarity and identity are important measures in bioinformatics.
  • Sequence alignment is a common approach to calculate similarity and identity.
  • Biopython provides functions and modules, such as pairwise2, for performing sequence alignment and calculating similarity.