How To Create GenBank Files with Biopython

Much like how you should use a csv library to generate csv files, you should use a library to generate GenBank files.

In Python, you can use Biopython!

Here’s a small recipe you get to started. It creates a GenBank from a sequence, and even includes an annotation.

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.SeqFeature import SeqFeature, FeatureLocation

# Create a sequence
sequence_string = "ggggaaaattttaaaaccccaaaa"
sequence_object = Seq(sequence_string, IUPAC.unambiguous_dna)

# Create a record
record = SeqRecord(sequence_object,
                   id='123456789', # random accession number
                   name='Example',
                   description='An example GenBank file generated by BioPython')

# Add annotation
feature = SeqFeature(FeatureLocation(start=3, end=12), type='misc_feature')
record.features.append(feature)

# Save as GenBank file
output_file = open('example.gb', 'w')
SeqIO.write(record, output_file, 'genbank')

The output:

LOCUS       Example                   24 bp    DNA              UNK 01-JAN-1980
DEFINITION  An example GenBank file generated by BioPython
ACCESSION   123456789
VERSION     123456789
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    4..12
ORIGIN
        1 ggggaaaatt ttaaaacccc aaaa
//

Check out the really good Biopython documentation for more details!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: