Archive

Coding Bits

I was doing some next generation sequencing (NGS) analysis over the weekend, for the the first time. As such, I had to get some of the common software tools like PEAR and bowtie. Their official sites were hosted by SourceForge, but I didn’t want to download the binaries from SourceForge ’cause I’m paranoid about malware. So, I compiled them myself.

The process turned out to be super easy!

They all have git repos:

https://github.com/xflouris/PEAR

https://github.com/BenLangmead/bowtie

https://github.com/BenLangmead/bowtie2

For example, for bowtie, you can do:

git clone https://github.com/BenLangmead/bowtie.git
cd bowtie
make

For bowtie, you need libtbb, and for bowtie2, you need to compile with NO_TBB=1.

I’m pleasantly surprised because I remember the struggle of building open source projects when I was a young’un.

Just wanted to share!

A while ago, I learned about this neat trick from Harry.

You can write your resume in Markdown and convert them to PDF using Pandoc!

Since Markdown is plain text, you can version control it with git and create branches for whatever company you’re applying to. You can even share the repository with someone you trust, and they can do review the commits on GitHub.

Pandoc uses Latex to render the PDF, so they look very “textbook formal”.

All you have to do is:

pandoc resume.md -s -o resume.pdf

pandoc

Try it out!

You can do this for all sorts of documents, not just resumes.

Much like how you should use a csv library to generate csv files, you should use a library to generate GenBank files.

In Python, you can use Biopython!

Here’s a small recipe you get to started. It creates a GenBank from a sequence, and even includes an annotation.

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.SeqFeature import SeqFeature, FeatureLocation

# Create a sequence
sequence_string = "ggggaaaattttaaaaccccaaaa"
sequence_object = Seq(sequence_string, IUPAC.unambiguous_dna)

# Create a record
record = SeqRecord(sequence_object,
                   id='123456789', # random accession number
                   name='Example',
                   description='An example GenBank file generated by BioPython')

# Add annotation
feature = SeqFeature(FeatureLocation(start=3, end=12), type='misc_feature')
record.features.append(feature)

# Save as GenBank file
output_file = open('example.gb', 'w')
SeqIO.write(record, output_file, 'genbank')

The output:

LOCUS       Example                   24 bp    DNA              UNK 01-JAN-1980
DEFINITION  An example GenBank file generated by BioPython
ACCESSION   123456789
VERSION     123456789
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    4..12
ORIGIN
        1 ggggaaaatt ttaaaacccc aaaa
//

Check out the really good Biopython documentation for more details!