Archive

Iron Chef SynBio

After reading Bert Hubert’s Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine and figuring out what 1-methyl-3′-pseudouridylyl was, I ended up reading a lot of papers.

Since I was a very junior biologist and am not familiar with this area of biology, I didn’t know really know where to start. I mostly followed headlines and citations. Bert provided a great jumping off point by referencing Karikó and Weissman. Their story is fascinating.

In 1998, Karikó wrote a paper on a method to more efficiently deliver DNA or mRNA to cells (Phosphate-enhanced transfection of cationic lipid-complexed mRNA and plasmid DNA). This method doesn’t involve any uracil substitutions and instead modifies the chemical mixture containing the DNA and mRNA before the delivery process.This doesn’t do much in itself, but they used these protocols to speed up their later research.

In 2005, Karikó and Weissman published a paper on substituting uracil for pseudouridine in mRNA. The modified mRNA doesn’t trigger Toll-like receptors (Suppression of RNA Recognition by Toll-like Receptors: The Impact of Nucleoside Modification and the Evolutionary Origin of RNA). Toll-like receptors are proteins primarily expressed by cells that are a part of the immune system, so the immune system doesn’t detect the modified mRNA and therefore doesn’t destroy it.

To put this in software engineering terms, imagine a firewall (the immune system) with a rule (the Toll-like receptor) examining incoming requests (the mRNA) and blocking any with a certain parameter (mRNA containing uracil). Modifying the mRNA by replacing the uracil with pseudouridine is the equivalent of dropping that parameter so that the request passes that rule. The firewall keeps working, and other rules may still block the request, but at least it’s passed that one.

In 2008, Karikó and Weissman discovered that by substituting pseudouridine in mRNA they could reduce the immune system response and simultaneously make the mRNA more effective at expressing the protein it encodes (Incorporation of Pseudouridine Into mRNA Yields Superior Nonimmunogenic Vector With Increased Translational Capacity and Biological Stability).

In our software metaphor, that’s analogous to passing all of the firewall rules!*

This is a really exciting discovery for mRNA vaccines!

I was surprised to learn that pseudouridine sometimes replaces uracil in “natural” mRNA. I’d thought of RNA as being composed of adenine, cytosine, guanine, and uracil, but researchers have known since 1951 that pseudouridine is also a nucleobase (the actual paper mentioning this is Nucleoside-5′-Phosphates from Ribonucleic Acid but I couldn’t find a copy so I’m relying on a citation from Pseudouridine Formation, the Most Common Transglycosylation in RNA).

The early discovery of pseudouridine and later discovery of its usefulness in mRNA vaccines reminds me a lot of the CRISPR discoveries. CRISPR was discovered as a bacterial immune system feature in 1987, but Doudna and Charpentier leveraged it as a powerful DNA editing tool in 2011 and won the Nobel for it this year. Researchers stand on the shoulders of giants!

If you’re interested in a more technical history of pseudouridine, Pseudouridine in RNA: What, Where, How, and Why is a good summary from 2008.

* In biology, very few things happen 100% of the time. Since biological systems are often stochastic, reactions are measured by percent effectiveness compared to something else. I’m mentioning this here so you don’t worry that we have no immune defenses against mRNA bioweapons and the end is nigh.

Bert Hubert recently published Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine and I (along with the rest of the internet) loved it! It provided a clear review of Pfizer’s mRNA vaccine written for a software engineering audience. Given my previous synthetic biology experience, I was excited to read about how their vaccine actually worked.

Bert linked to a document by the WHO containing the genetic sequence of the mRNA. In that sequence, the symbol “Ψ” represents 1-methyl-3′-pseudouridylyl. Bert explained that this was a clever substitution to help the mRNA escape our immune system’s detection. He also linked to a tweet about Karikó and Weissman which implied that they had discovered 1-methyl-3′-pseudouridylyl.

I was fascinated! I’d never heard of Ψ before. Digging in a little deeper I realized that the symbolic representation had confused me, and maybe other folks were confused too.

Karikó and Weissman published a lot of papers about pseudouridine, which they denoted with “Ψ.” Replacing uracil for pseudouridine in mRNA does indeed help the mRNA escape our immune system’s detection [1]. However, pseudouridine is not pseudouridylyl.

Looking at the WHO document that Bert linked, the two molecules are structurally similar but definitely not the same. I’m not any good at organic chemistry, but my guess is the extra phosphorus and oxygens are from the sugar phosphate backbone. But what’s with the extra methyl group (carbons and hydrogens) on the nitrogen? My intuition says that’s important.

pseudouridine 1-methyl-3′-pseudouridylyl
Wikipedia WHO 11889

So what’s pseudouridylyl?

Why did Pfizer choose to use pseudouridylyl instead of pseudouridine? I had guessed that it must offer either a biological benefit (increasing effectiveness) or a manufacturing benefit (reducing cost). “Pseudouridylyl” (or even “uridylyl”) is not a common word either on the internet or in the literature. I found lots of references to similar chemicals, but couldn’t find literature linking any of them to immune system responses or mRNA synthesis.

And then came the a-ha moment!

In the WHO document, the author represented 1-methyl-3′-pseudouridylyl as “m1Ψ” (with “m1” indicating a modification). However, in the actual genetic sequence, the author chose to abbreviate “m1Ψ” to simply “Ψ.”

Now that I knew what I was looking for, I found an abundance of papers referencing m1Ψ. The canonical name seems to be “N(1)-methylpseudouridine.”

N(1)-methylpseudouridine looks to be an even better substitution than pseudouridine [2]. It also shares the convenient property of escaping immune system detection [3]. The N(1)-methylpseudouridine papers definitely build upon the work of Karikó and Weissman.

The diagrams show that they’re pretty close! See the methyl group?

N1-Methylpseudouridine-UTP 1-methylpseudouridine 1-methyl-3′-pseudouridylyl
Jena Bioscience Modomics WHO 11889

I’m now fairly confident that Ψ in the Pfizer sequence is m1Ψ. They’re calling it “1-methyl-3′-pseudouridylyl,” but it’s also known as “N(1)-methylpseudouridine” or “1-methylpseudouridine.”

References

[1] Karikó K, Muramatsu H, Welsh FA, Ludwig J, Kato H, Akira S, Weissman D. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther. 2008 Nov;16(11):1833-40. doi: 10.1038/mt.2008.200. Epub 2008 Sep 16. PMID: 18797453; PMCID: PMC2775451.

[2] Callum J C Parr, Shunsuke Wada, Kenjiro Kotake, Shigetoshi Kameda, Satoshi Matsuura, Souhei Sakashita, Soyoung Park, Hiroshi Sugiyama, Yi Kuang, Hirohide Saito, N1-Methylpseudouridine substitution enhances the performance of synthetic mRNA switches in cells, Nucleic Acids Research, Volume 48, Issue 6, 06 April 2020, Page e35, https://doi.org/10.1093/nar/gkaa070

[3] Andries O, Mc Cafferty S, De Smedt SC, Weiss R, Sanders NN, Kitada T. N(1)-methylpseudouridine-incorporated mRNA outperforms pseudouridine-incorporated mRNA by providing enhanced protein expression and reduced immunogenicity in mammalian cell lines and mice. J Control Release. 2015 Nov 10;217:337-44. doi: 10.1016/j.jconrel.2015.08.051. Epub 2015 Sep 3. PMID: 26342664.

I was doing some next generation sequencing (NGS) analysis over the weekend, for the the first time. As such, I had to get some of the common software tools like PEAR and bowtie. Their official sites were hosted by SourceForge, but I didn’t want to download the binaries from SourceForge ’cause I’m paranoid about malware. So, I compiled them myself.

The process turned out to be super easy!

They all have git repos:

https://github.com/xflouris/PEAR

https://github.com/BenLangmead/bowtie

https://github.com/BenLangmead/bowtie2

For example, for bowtie, you can do:

git clone https://github.com/BenLangmead/bowtie.git
cd bowtie
make

For bowtie, you need libtbb, and for bowtie2, you need to compile with NO_TBB=1.

I’m pleasantly surprised because I remember the struggle of building open source projects when I was a young’un.

Just wanted to share!

Much like how you should use a csv library to generate csv files, you should use a library to generate GenBank files.

In Python, you can use Biopython!

Here’s a small recipe you get to started. It creates a GenBank from a sequence, and even includes an annotation.

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.SeqFeature import SeqFeature, FeatureLocation

# Create a sequence
sequence_string = "ggggaaaattttaaaaccccaaaa"
sequence_object = Seq(sequence_string, IUPAC.unambiguous_dna)

# Create a record
record = SeqRecord(sequence_object,
                   id='123456789', # random accession number
                   name='Example',
                   description='An example GenBank file generated by BioPython')

# Add annotation
feature = SeqFeature(FeatureLocation(start=3, end=12), type='misc_feature')
record.features.append(feature)

# Save as GenBank file
output_file = open('example.gb', 'w')
SeqIO.write(record, output_file, 'genbank')

The output:

LOCUS       Example                   24 bp    DNA              UNK 01-JAN-1980
DEFINITION  An example GenBank file generated by BioPython
ACCESSION   123456789
VERSION     123456789
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     misc_feature    4..12
ORIGIN
        1 ggggaaaatt ttaaaacccc aaaa
//

Check out the really good Biopython documentation for more details!