Reading and writing Sequences

Introduction to Biopython’s SeqIO Module Biopython’s SeqIO module provides a powerful and...

Introduction to Biopython’s SeqIO Module

  • Biopython’s SeqIO module provides a powerful and flexible interface for reading and writing biological sequences.
  • It supports various file formats commonly used in bioinformatics, including FASTA, GenBank, and more.
  • SeqIO simplifies the handling of sequences, allowing easy access to sequence data and associated metadata.

Reading Sequences with SeqIO

  • SeqIO provides methods to read sequences from files in different formats.
  • SeqIO.read() reads a single sequence record from a file.
  • SeqIO.parse() reads multiple sequence records from a file.
from Bio import SeqIO
file_path = "sequence.fasta"
record = SeqIO.read(file_path, "fasta")
print("Header:", record.id)
print("Sequence:", record.seq)
  • SeqIO.read() reads a single sequence record from a file.
  • The file_path specifies the path to the sequence file, and “fasta” indicates the file format.
  • The returned record object contains the sequence and associated metadata

Reading Multiple Sequences

from Bio import SeqIO
file_path = "sequences.fasta"
records = SeqIO.parse(file_path, "fasta")
for record in records:
    print("Header:", record.id)
    print("Sequence:", record.seq)
    print()
  • SeqIO.parse() reads multiple sequence records from a file.
  • The file_path specifies the path to the sequence file, and “fasta” indicates the file format.
  • The returned records object is an iterator that can be looped over to access each sequence record.

Writing Sequences with SeqIO

  • SeqIO.write() is used to write sequences to a file in a specified format.
  • The method requires a sequence record and the output file handle.
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
sequence = "ATCGATCGATCG"
record = SeqRecord(Seq(sequence), id="Seq1", description="Sample sequence")
output_file = "output.fasta"
SeqIO.write(record, output_file, "fasta")
  1. Import the necessary modules from Biopython: SeqIO, Seq, and SeqRecord.
  2. Define the DNA sequence as a string: sequence = "ATCGATCGATCG".
  3. Create a SeqRecord object using the sequence string, and provide an ID and description for the sequence.
  4. Specify the output file name and format in which you want to save the sequence: output_file = "output.fasta".
  5. Use the SeqIO.write() function to write the sequence record to the output file in FASTA format, using the “fasta” format specifier.

Make sure to have the output.fasta file will be created with the specified sequence in FASTA format.

Join the conversation