Revision as of 23:51, 26 September 2012

The GATE Assembly Kit

TALEs make sequence-specific genome modification much easier that before and therefore attracts great interest in the synbio research community and beyond. Interestingly, many of the researchers who hold the patents on TALEs also released open source toolkits for TALE assembly for academic research. However, most strategies of TALE gene assembly published thus far rely on a hierarchical procedure, that is very time consuming, laborious and not automatable.

Therefore we herein describe the Golden Gate cloning-based TAL Effector (GATE) Assembly platform, which enables literally everyone to produce low-cost, tailored TALEs within a few minutes of labwork and basic lab equipment. Moreover, we have automated this strategy and produced different TAL Effector Transcription Factors with 96 % success rate faster than any other method published before.

Review of existing TALE construction methods

Although TALE assembly is considerably easier than e.g. screening for novel zinc fingers, the highly repetitive structure of the TALE gene implies some challenges, because conventional PCR or homologous recombination-based gene assembly strategies cannot be applied. To our knowledge, the numerous approaches TAL-Effector gene assembly, published so far, fall under the following Three categories:

1. Few groups have applied methods called unit assembly¹ or Restriction Enzyme And Ligation (REAL)². In the first step both strategies perform conventional restriction enzyme digestion in order to assemble gene fragments of single repeats. The pairs of repeat gene fragments are subsequently assembled to form tetramers, and this highly hierarchical assembly strategy is continued until the desired number of repeats is assembled. These platforms obviously involve multiple laborious and time consuming rounds of digestion, ligation and isolation of the right ligation products. The recently published fast ligation-based automatable solid-phase high-throughput (FLASH) system circumvents major challenges of REAL by attaching the first repeat to streptavidin-coated magnetic beads and, successively, adding further repeats or oligorepeats from a 376-plasmid library. Although Reyon et al. claim that FLASH can also be performed manually, this probably does not represent the most convenient and low-cost protocol for iGEM students.

2. We call the second category of TALE production methods the synthesis optimization approach. The major challenge of TAL synthesis is the highly repetitive amino acid sequence of the DNA binding part. Since synthetic genes are typically produced from overlapping synthesized oligos, overlaps of different pairs of overlapping oligos need to be distinct. The synthesis optimization approach employs a sophisticated computer program that optimizes codon usage in order to reduce repetitiveness of the TAL gene and calculates optimal oligos for synthesis^3,4. Although this approach might be the method of the future, it is currently too expensive for iGEM teams.

3. The third category of TALE assembly protocols applies Golden Gate Cloning (GGC)^5,6,7,8,9 (for details on GGC, see the Golden Gate standard page). In all GGC-based TALE repeat assembly strategies, level 1 modules (i.e. single repeat gene fragments) are flanked by type IIs restriction sites adjacent to their first or last 4 nucleotides, respectively, that produce sticky ends after digestion with the type IIs restriction enzyme. Since each level 1 module codes for the same amino acid sequence (despite of the RVDs), the codon usage must be changed at these 4 external nucleotides for producing unique sticky ends that assemble in the predefined order after digestion. Consequently, the 4 bp overlaps of a level 1 module specify its future position within the TALE gene. So, in order to be able to target any sequence of DNA, a method that is using GGC requires N x M modules. N signifies the number of level 1 module positions (i.e. number of modules that the TALE should contain after GGC) and M signifies the number of different repeats that the user should be able to put into each of the N positions (in most kits M equals 4, one repeat for each DNA base). Unfortunately, using GGC, only up to 10 modules ⁵ can be assembled with high accuracy. So in the GGC-based protocols, level 1 modules get assembled to form level 2 modules (oligorepeats). These level 2 modules need to be amplified and isolated before a second GGC reaction assembles them to form the complete repeat array. The bottleneck of the GGC-based methods is the need for amplification and isolation of level 2 modules, which costs a lot of time, requires some extra knowledge, additional enzymes and lab equipment (we actually tried one of the GGC-based open source kits, but, even after 2.5 weeks, were not able to assemble the whole TALE).

GATE Assembly

Right from the beginning, we were very much intrigued by the efficiency of Golden Gate Cloning and hypothesized, that instant TAL assembly would be possible if we overcame the need for a second (or even third) round of GGC. Since we were sure we were not able to improve GGC reaction conditions so much that we could actually assemble all repeats at once, we came up with another solution: Why not use direpeats instead of single repeats as level 1 modules? This would cut the number of level 1 modules half and allow us to perform TAL assembly in one single reaction. Unfortunately, our idea would not only cut half N but would also quadruple M, and thus would double the toolkit size. So we needed to further reduce N down to 6 to obtain a reasonable toolkit size of 96 level 1 modules. We actually liked the idea that our kit would perfectly fit on a 96 well plate.

Next, we looked into the literature to check, if TALEs that recognize 14 bp (instead of around 18 bp) are actually functional. We were very fortunate to see that efficiency of TAL transcription factors (TAL-TFs) ¹⁰ and TAL effector nucleases (TALENs)11 remains constant between for target sequences between 13 and 20 bp. Moreover, Zhang et al. published splendid results with 14 bp-binding TAL-TFs in a human cell line7. Since we wanted our TALEs to function in both bacteria and eukaryotic systems, while published TAL repeats were always designed for one particular organism, we decided to design the direpeat nucleotide sequences from scratch: We used the amino acid sequence of the hex3 gene of Xanthomonas oryzae to find out the amino acid sequences for the 16 direpeats. Next, we reverse-tanslated the sequences into DNA, codon optimized them for E.coli and human cells and reduced homologies between and within gene fragments (only the extention PCR binding sites were the same for every direpeat gene. After receiving the sequences that were synthesized as G-blocks by IDT, we performed 6 extention PCRs on every sequence to add 4 bp overlaps, BsmBI restriction sites and iGEM prefix and suffix to the parts. The 4 bp overlaps would later determine the position of the respective direpeat in the repeat array of the TALE. One of the advantages of GGC is that one can insert the whole plasmids containing the parts one wants to assemble. So we decided to clone all 96 parts into the standard iGEM vector pSB1C3. We hypothesized that the BsmBI restriction site in the chloramphenicol gene would decrease GGC efficiency, so we performed a mutagenesis PCR to introduce the silent mutation (G434C) prior to cloning the 96 PCR products into it. Next

We took the sequences of the four already known TAL repeats A,C,G,T and combined them into 16 new, so called direpeat sequences. These 16 direpeats were ordered as gene synthesis products.

Now the real work began. To start building TAL Proteins we needed to expand our 16 direpeats a second time. Each of the 16 direpeats needed to be integrated into six versions marked with different terminal sequences, one version for every place of our final six direpeat TAL protein. Because we didn't want to buy six times 16 different synthesised direpeats we came up with a plan to produce them by ourselves. We created six primer pairs, each primer with a common part matching all of the direpeats and an unique overhang contacting the direpeat it binds to.

Because we did not feel comfortable requiring six different restriction enzymes in the final PCR to produce a twelve direpeat TAL, we used a technic called 'Golden-Gate Cloning'. The technic uses the type two restriction enzyme BsmB1 and its ability to cut DNA slightly downstream of the recognition site. This way it was possible for us to create different sticky ends with just one enzyme. Therefore we are able to ligate six different parts in the right order in one PCR step.

After finshing these 96 different extension PCR's we had to ligate the products into the orignial iGEM BioBrick vector and finally got our full library of 96 unique direpeats.

Back to top

@@ Line 29: / Line 29: @@
 == GATE Assembly ==
 <br>
-<div align="justify">In this first part we  explain the theoretical background of our project and the path we took creating our '''GATE Assembly kit'''. You will understand why our toolkit looks the way it does and what the theory behind its mechanisms is. If you look for a tutorial on how to use the toolkit we refer you to the [[Team:Freiburg/Project/Tal|'Using the Toolkit']] section in the project part.<br>
+<div align="justify">Right from the beginning, we were very much intrigued by the efficiency of Golden Gate Cloning and hypothesized, that instant TAL assembly would be possible if we overcame the need for a second (or even third) round of GGC. Since we were sure we were not able to improve GGC reaction conditions so much that we could actually assemble all repeats at once, we came up with another solution: Why not use direpeats instead of single repeats as level 1 modules? This would cut the number of level 1 modules half and allow us to perform TAL assembly in one single reaction.  Unfortunately, our idea would not only cut half N but would also quadruple M, and thus would double the toolkit size. So we needed to further reduce N down to 6 to obtain a reasonable toolkit size of 96 level 1 modules. We actually liked the idea that our kit would perfectly fit on a 96 well plate.
+Next, we looked into the literature to check, if TALEs that recognize 14 bp (instead of around 18 bp) are actually functional. We were very fortunate to see that efficiency of TAL transcription factors (TAL-TFs) <sup>10 </sup> and TAL effector nucleases (TALENs)11 remains constant between for target sequences between 13 and 20 bp. Moreover, Zhang et al. published splendid results with 14 bp-binding TAL-TFs in a human cell line7.
+Since we wanted our TALEs to function in both bacteria and eukaryotic systems, while published TAL repeats were always designed for one particular organism, we decided to design the direpeat nucleotide sequences from scratch: We used the amino acid sequence of the hex3 gene of Xanthomonas oryzae to find out the amino acid sequences for the 16 direpeats. Next, we reverse-tanslated the sequences into DNA, codon optimized them for E.coli and human cells and reduced homologies between and within gene fragments (only the extention PCR binding sites were the same for every direpeat gene.
+After receiving the sequences that were synthesized as G-blocks by IDT, we performed 6 extention PCRs on every sequence to add 4 bp overlaps, BsmBI restriction sites and iGEM prefix and suffix to the parts. The 4 bp overlaps would later determine the position of the respective direpeat in the repeat array of the TALE.
+One of the advantages of GGC is that one can insert the whole plasmids containing the parts one wants to assemble. So we decided to clone all 96 parts into the standard iGEM vector pSB1C3.  We hypothesized that the BsmBI restriction site in the chloramphenicol gene would decrease GGC efficiency, so we performed a mutagenesis PCR to introduce the silent mutation (G434C) prior to cloning the 96 PCR products into it.
+Next
+<br>

Team:Freiburg/Project/Overview

From 2012.igem.org

Revision as of 23:51, 26 September 2012

The GATE Assembly Kit

Review of existing TALE construction methods

GATE Assembly