Extracting and modifying sequence features

Sequence Features Sequence features represent specific regions or elements within a biological...

Sequence Features

  • Sequence features represent specific regions or elements within a biological sequence.
  • Features can include coding regions, promoters, binding sites, and more.

Accessing Sequence Features

  • Biopython provides methods to access and manipulate sequence features.
  • Features are typically represented as objects with properties such as location, type, qualifiers, and more.

Feature Properties

from Bio import SeqIO

genbank_file = "sequence.gb"

for record in SeqIO.parse(genbank_file, "genbank"):
    for feature in record.features:
        print("Feature Type:", feature.type)
        print("Feature Location:", feature.location)
        print("Feature Qualifiers:", feature.qualifiers)
        print("-------------")
  • Read a GenBank file using the SeqIO.parse() function.
  • Iterate over each record in the file.
  • Iterate over each feature in the record.
  • Access and print the type, location, qualifiers, and other properties of each feature.

Feature Manipulation

  • Biopython allows manipulation of sequence features such as addition, deletion, and modification.
  • Features can be added, removed, or modified based on specific requirements.

Adding a Feature

from Bio import SeqFeature, SeqIO

genbank_file = "sequence.gb"

for record in SeqIO.parse(genbank_file, "genbank"):
    # Create a new feature
    new_feature = SeqFeature.SeqFeature(
        location=SeqFeature.FeatureLocation(10, 50),
        type="misc_feature",
        qualifiers={"note": "New Feature"}
    )
    
    # Add the new feature to the record
    record.features.append(new_feature)
    
    # Write the modified record to a new GenBank file
    SeqIO.write(record, "modified_sequence.gb", "genbank")
  • Read a GenBank file using the SeqIO.parse() function.
  • Create a new feature using SeqFeature.SeqFeature.
  • Set the location, type, and qualifiers of the new feature.
  • Add the new feature to the record’s features list.
  • Write the modified record to a new GenBank file using SeqIO.write().

Modifying a Feature

from Bio import SeqIO

genbank_file = "sequence.gb"

for record in SeqIO.parse(genbank_file, "genbank"):
    for feature in record.features:
        if feature.type == "CDS":
            # Modify the qualifiers of a CDS feature
            feature.qualifiers["gene"] = ["ABC"]
            feature.qualifiers["product"] = ["ABC Protein"]
    
    # Write the modified record to a new GenBank file
    SeqIO.write(record, "modified_sequence.gb", "genbank")
  • Read a GenBank file using the SeqIO.parse() function.
  • Iterate over each feature in the record.
  • Check if the feature type is “CDS”.
  • Modify the qualifiers of the CDS feature by updating the values of specific qualifiers.
  • Write the modified record to a new GenBank file using SeqIO.write().

Removing a Feature

from Bio import SeqIO

genbank_file = "sequence.gb"

for record in SeqIO.parse(genbank_file, "genbank"):
    for feature in record.features:
        if feature.type == "CDS" and "gene" in feature.qualifiers and feature.qualifiers["gene"] == ["ABC"]:
            # Remove the feature from the record
            record.features.remove(feature)
    
    # Write the modified record to a new GenBank file
    SeqIO.write(record, "modified_sequence.gb", "genbank")
  • Read a GenBank file using the SeqIO.parse() function.
  • Iterate over each feature in the record.
  • Check if the feature type is “CDS” and if the “gene” qualifier has a value of “ABC”.
  • Remove the feature from the record’s features list using the remove() method.
  • Write the modified record to a new GenBank file using SeqIO.write().

Summary

  • Sequence features represent specific regions or elements within a biological sequence.
  • Biopython provides methods to access, manipulate, add, modify, and remove sequence features.
  • Features have properties such as type, location, qualifiers, and more.

Please note that the code snippets provided assume that you have a valid GenBank file (“sequence.gb”) available for testing. You can replace the file name with your own sequence file to run the examples.

Join the conversation