Iron Chef SynBio

Much like how you should use a csv library to generate csv files, you should use a library to generate GenBank files.

In Python, you can use Biopython!

Here’s a small recipe you get to started. It creates a GenBank from a sequence, and even includes an annotation.

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.SeqFeature import SeqFeature, FeatureLocation

# Create a sequence
sequence_string = "ggggaaaattttaaaaccccaaaa"
sequence_object = Seq(sequence_string, IUPAC.unambiguous_dna)

# Create a record
record = SeqRecord(sequence_object,
                   id='123456789', # random accession number
                   description='An example GenBank file generated by BioPython')

# Add annotation
feature = SeqFeature(FeatureLocation(start=3, end=12), type='misc_feature')

# Save as GenBank file
output_file = open('', 'w')
SeqIO.write(record, output_file, 'genbank')

The output:

LOCUS       Example                   24 bp    DNA              UNK 01-JAN-1980
DEFINITION  An example GenBank file generated by BioPython
ACCESSION   123456789
VERSION     123456789
SOURCE      .
FEATURES             Location/Qualifiers
     misc_feature    4..12
        1 ggggaaaatt ttaaaacccc aaaa

Check out the really good Biopython documentation for more details!

I registered last minute for the Mammalian Synthetic Biology Workshop and it was awesome. It was really neat to see all the research everyone’s doing.

I got to chat with Ron and Mariola, from iGEM 2011. Definitely a blast from the past.

Ron and Mariola

Ron said he still shows my animation occasionally. Haha, sweet.

Irving Weissman gave a really cool stem cell keynote.


Alex said Irv is like the godfather of stem cell research. Very impressive!

I unexpectedly ran into Pete! I haven’t seen him in like maybe 6 years!


It was great catching up!

Alex also gave a talk! I really enjoyed the talk, not only because I’d never heard Alex give a talk before, but also he had a clear and fun delivery style.


He thanked his wife, Amy, in the credits slide which I thought was super cute.

I think it’s super cool when couples are involved in each others’ fields. Amy probably provides some extraordinary research feedback just like how Harry gives me a lot of feedback for my conference talks.

Anyhoo, mSBW3 was great event. I saw some old friends, learned some neat stuffs, and in general had a good time.

One wish: I wish the the slides from talks are posted online somewhere. A great feature software conferences is that a lot of talks are recorded and the talks’ slides are posted online. This way, the conference can benefit a greater community.

You’re probably familiar with the DNA bases A, C, G, T and how A is the complement of T and C is the complement of G, and vice versa.

A neat mental exercise is to find the complement of the degenerate bases like R which represents both A and G.

Suppose the complement of R is a base that represents the complement of A and the complement of G. The complement of A is T and the complement of G is C, so the complement of R is Y (a degenerate base that represents T and G).

Here’s a whole list of them. Try it out.

It’s fun to think of complement as an function.

You see neat patterns like, every degenerate has exactly one complement, so complement is a bijective function. The fixed points of complement are at S (represents C and G), W (represents A and T), and N (represents all the bases A, C, G, and T). In fact, there will always fixed points at degenerate bases that represent a even number of bases. This is true even if we expand out the original A, C, G, T to fictional bases like Q and Z (I made Q and Z up).



I wrote a similar post for downloading sequences of yeast genes.

Basically I needed to do some analysis on the upstream promoter region of about 1200 human genes.

I wasn’t gonna download them one-by-one, and I didn’t want to get a database dump of the whole genome.

Luckily, there’s UCSC’s hgTable! (I love you guys)

Screen Shot 2016-03-31 at 11.39.02 AM

So it hit up.

Screen Shot 2016-03-31 at 11.39.31 AM

Paste in your list of genes.

Note: No commas at the end of lines

Screen Shot 2016-03-31 at 11.45.14 AM

Submit to get your sequences!

You can even toggle between many options, like genomic DNA or protein sequence, or a certain number of bases upstream and downstream of the gene.

Screen Shot 2016-03-31 at 11.46.18 AM

The output will be in one giant FASTA file so if you’re getting the sequences of a lot of genes, you better download the gzip, ’cause it’ll take forever to load it in your browser (it’s on the order of 100MB).

Another note: you might get more than one sequence per gene, depending on what tables, groups, and tracks you select.

Have fun!

The Saccharomyces Genome Database is a really nice tool to get yeast gene info. I used it for the first time this past week.

I had to retrieve the sequence of 50+ yeast genes and individually downloading 50+ fasta files seemed really tedious.

Eventually I figured out you could retrieve multiple genomic sequences using YeastMine.

The YeastMine template query feature accepts comma separated genes.

Screen Shot 2016-03-17 at 9.53.27 PM

The resulting genomic DNA sequences can then be exported to CSVs.

Screen Shot 2016-03-17 at 9.54.59 PM

Seasoned SGD users probably already knew about this feature, but it took me a while to figure out (I did not have any luck searching for this topic).

Hopefully this post will save someone some time!

I’m super excited for the Wyss Retreat tomorrow!

It’ll be awesome to check out all the non-synbio projects happenin’ at the Wyss in a giant show-and-tell.

The event is invitation-only, and I believe the invitation said we can’t talk about what we see at the event.

But the organizers have provided a Twitter hashtag to encourage discussion.

wyss retreat