Objectives
- Understand the importance of NCBI (National Center for Biotechnology Information) databases in bioinformatics.
- Learn about the available NCBI databases and their applications.
- Explore Biopython’s capabilities for accessing and retrieving data from NCBI databases.
Introduction to NCBI Databases
- The NCBI provides a collection of databases containing a wealth of biological data, including sequences, genes, proteins, and literature references.
- These databases are widely used by researchers for data retrieval and analysis.
Commonly Used NCBI Databases
- GenBank: A comprehensive database of DNA sequences.
- PubMed: A collection of biomedical literature references and abstracts.
- RefSeq: A curated database of reference sequences for genes, transcripts, and proteins.
- UniGene: A database of transcript clusters, providing a unified view of gene expression.
- Protein: A database of protein sequences and related information.
Accessing NCBI Databases with Biopython
- Biopython provides modules and functions to access and retrieve data from NCBI databases.
- The main module for NCBI database access in Biopython is
Bio.Entrez
.
Retrieving Data from NCBI Databases
from Bio import Entrez # Provide your email address for Entrez Entrez.email = "your_email@example.com" # Search and retrieve data from PubMed handle = Entrez.esearch(db="pubmed", term="biopython", retmax=10) record = Entrez.read(handle) # Print the retrieved PubMed IDs pubmed_ids = record["IdList"] print("PubMed IDs:") print(pubmed_ids)
- Set your email address for Entrez using
Entrez.email
. - Use
Entrez.esearch()
to search and retrieve data from a specific NCBI database (e.g., PubMed). - Specify the database (“pubmed”), search terms (e.g., “biopython”), and the maximum number of records to retrieve (retmax).
- Read and process the search results using
Entrez.read()
.
Retrieving Full Records from NCBI Databases
from Bio import Entrez # Provide your email address for Entrez Entrez.email = "your_email@example.com" # Retrieve full records from GenBank handle = Entrez.efetch(db="nucleotide", id="NC_000913", rettype="gb", retmode="text") record = SeqIO.read(handle, "gb") # Print the retrieved GenBank record print("GenBank Record:") print(record)
- Set your email address for Entrez using
Entrez.email
. - Use
Entrez.efetch()
to retrieve full records from a specific NCBI database (e.g., GenBank). - Specify the database (“nucleotide”), unique identifiers (e.g., “NC_000913”), and the desired output format (rettype and retmode).
- Read and process the retrieved record using
SeqIO.read()
.
Summary
- NCBI databases are valuable resources for biological data retrieval and analysis.
- Biopython’s
Bio.Entrez
module provides functionalities for accessing and retrieving data from NCBI databases. - Utilize the power of Biopython to search, retrieve, and analyze data from various NCBI databases.
Join the conversation