Archive

Iron Chef SynBio

Bert Hubert recently published Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine and I (along with the rest of the internet) loved it! It provided a clear review of Pfizer’s mRNA vaccine written for a software engineering audience. Given my previous synthetic biology experience, I was excited to read about how their vaccine actually worked.

Bert linked to a document by the WHO containing the genetic sequence of the mRNA. In that sequence, the symbol “Ψ” represents 1-methyl-3′-pseudouridylyl. Bert explained that this was a clever substitution to help the mRNA escape our immune system’s detection. He also linked to a tweet about Karikó and Weissman which implied that they had discovered 1-methyl-3′-pseudouridylyl.

I was fascinated! I’d never heard of Ψ before. Digging in a little deeper I realized that the symbolic representation had confused me, and maybe other folks were confused too.

Karikó and Weissman published a lot of papers about pseudouridine, which they denoted with “Ψ.” Replacing uracil for pseudouridine in mRNA does indeed help the mRNA escape our immune system’s detection [1]. However, pseudouridine is not pseudouridylyl.

Looking at the WHO document that Bert linked, the two molecules are structurally similar but definitely not the same. I’m not any good at organic chemistry, but my guess is the extra phosphorus and oxygens are from the sugar phosphate backbone. But what’s with the extra methyl group (carbons and hydrogens) on the nitrogen? My intuition says that’s important.

pseudouridine 1-methyl-3′-pseudouridylyl
Wikipedia WHO 11889

So what’s pseudouridylyl?

Why did Pfizer choose to use pseudouridylyl instead of pseudouridine? I had guessed that it must offer either a biological benefit (increasing effectiveness) or a manufacturing benefit (reducing cost). “Pseudouridylyl” (or even “uridylyl”) is not a common word either on the internet or in the literature. I found lots of references to similar chemicals, but couldn’t find literature linking any of them to immune system responses or mRNA synthesis.

And then came the a-ha moment!

In the WHO document, the author represented 1-methyl-3′-pseudouridylyl as “m1Ψ” (with “m1” indicating a modification). However, in the actual genetic sequence, the author chose to abbreviate “m1Ψ” to simply “Ψ.”

Now that I knew what I was looking for, I found an abundance of papers referencing m1Ψ. The canonical name seems to be “N(1)-methylpseudouridine.”

N(1)-methylpseudouridine looks to be an even better substitution than pseudouridine [2]. It also shares the convenient property of escaping immune system detection [3]. The N(1)-methylpseudouridine papers definitely build upon the work of Karikó and Weissman.

The diagrams show that they’re pretty close! See the methyl group?

N1-Methylpseudouridine-UTP 1-methylpseudouridine 1-methyl-3′-pseudouridylyl
Jena Bioscience Modomics WHO 11889

I’m now fairly confident that Ψ in the Pfizer sequence is m1Ψ. They’re calling it “1-methyl-3′-pseudouridylyl,” but it’s also known as “N(1)-methylpseudouridine” or “1-methylpseudouridine.”

References

[1] Karikó K, Muramatsu H, Welsh FA, Ludwig J, Kato H, Akira S, Weissman D. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther. 2008 Nov;16(11):1833-40. doi: 10.1038/mt.2008.200. Epub 2008 Sep 16. PMID: 18797453; PMCID: PMC2775451.

[2] Callum J C Parr, Shunsuke Wada, Kenjiro Kotake, Shigetoshi Kameda, Satoshi Matsuura, Souhei Sakashita, Soyoung Park, Hiroshi Sugiyama, Yi Kuang, Hirohide Saito, N1-Methylpseudouridine substitution enhances the performance of synthetic mRNA switches in cells, Nucleic Acids Research, Volume 48, Issue 6, 06 April 2020, Page e35, https://doi.org/10.1093/nar/gkaa070

[3] Andries O, Mc Cafferty S, De Smedt SC, Weiss R, Sanders NN, Kitada T. N(1)-methylpseudouridine-incorporated mRNA outperforms pseudouridine-incorporated mRNA by providing enhanced protein expression and reduced immunogenicity in mammalian cell lines and mice. J Control Release. 2015 Nov 10;217:337-44. doi: 10.1016/j.jconrel.2015.08.051. Epub 2015 Sep 3. PMID: 26342664.

I was doing some next generation sequencing (NGS) analysis over the weekend, for the the first time. As such, I had to get some of the common software tools like PEAR and bowtie. Their official sites were hosted by SourceForge, but I didn’t want to download the binaries from SourceForge ’cause I’m paranoid about malware. So, I compiled them myself.

The process turned out to be super easy!

They all have git repos:

https://github.com/xflouris/PEAR

https://github.com/BenLangmead/bowtie

https://github.com/BenLangmead/bowtie2

For example, for bowtie, you can do:

git clone https://github.com/BenLangmead/bowtie.git
cd bowtie
make

For bowtie, you need libtbb, and for bowtie2, you need to compile with NO_TBB=1.

I’m pleasantly surprised because I remember the struggle of building open source projects when I was a young’un.

Just wanted to share!

Much like how you should use a csv library to generate csv files, you should use a library to generate GenBank files.

In Python, you can use Biopython!

Here’s a small recipe you get to started. It creates a GenBank from a sequence, and even includes an annotation.

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.SeqFeature import SeqFeature, FeatureLocation

# Create a sequence
sequence_string = "ggggaaaattttaaaaccccaaaa"
sequence_object = Seq(sequence_string, IUPAC.unambiguous_dna)

# Create a record
record = SeqRecord(sequence_object,
                   id='123456789', # random accession number
                   name='Example',
                   description='An example GenBank file generated by BioPython')

# Add annotation
feature = SeqFeature(FeatureLocation(start=3, end=12), type='misc_feature')
record.features.append(feature)

# Save as GenBank file
output_file = open('example.gb', 'w')
SeqIO.write(record, output_file, 'genbank')

The output:

LOCUS       Example                   24 bp    DNA              UNK 01-JAN-1980
DEFINITION  An example GenBank file generated by BioPython
ACCESSION   123456789
VERSION     123456789
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    4..12
ORIGIN
        1 ggggaaaatt ttaaaacccc aaaa
//

Check out the really good Biopython documentation for more details!