Biological data formats

Biopython Fundamentals

Introduction to common biological data formats supported by Biopython, including FASTA, GenBank, FASTQ, and PDB.
Structure and features of each data format.
Reading and writing sequences and other biological data using Biopython’s SeqIO module.

Introduction to Biological Data Formats

Biological data formats are used to represent and store biological information.
Various file formats are used in bioinformatics and computational biology.
Biopython provides support for handling multiple biological data formats.

Common Biological Data Formats:

FASTA Format:
- Simple text-based format for representing nucleotide or protein sequences.
- Consists of a header line starting with ‘>’ and the sequence data.
GenBank Format:
- Standard format for representing DNA or RNA sequences along with annotations.
- Contains sequence data, features, and metadata in a structured manner.
FASTQ Format:
- Used to store high-throughput sequencing data, including DNA reads and their quality scores.
- Contains sequence reads, base qualities, and additional information.
PDB Format:
- Protein Data Bank format for representing protein structures.
- Contains atomic coordinates, atom types, and other structural information.

Reading and Writing Biological Data with `SeqIO`

Biopython’s SeqIO module provides a convenient way to read and write biological data in various formats.
SeqIO.read() reads a single record from a file.
SeqIO.parse() reads multiple records from a file.
SeqIO.write() writes sequences to a file in a specified format.

Reading Sequences from a FASTA File

from Bio import SeqIO
fasta_file = "sequences.fasta"
for record in SeqIO.parse(fasta_file, "fasta"):
    print(f"Header: {record.id}")
    print(f"Sequence: {record.seq}")
    print()

The SeqIO.parse() function reads multiple sequences from a FASTA file.
Each record object represents a single sequence with attributes like id (header) and seq (sequence data).

Introduction to Biological Data Formats

Common Biological Data Formats:

Reading and Writing Biological Data with SeqIO

Reading Sequences from a FASTA File

Reading and Writing Biological Data with `SeqIO`