Team:Edinburgh/modelling/software

From 2014.igem.org

It was during the planning stages of designing our population regulation system, that a minor problem occurred to us. We were looking at genes we could express heterologously which would slow growth when turned on, and while we did find a few, we also kept coming across genes which would have this effect when turned off in the genome. We could have knocked them out, but that didn't seem very… iGEM. Were these genes to be out of reach for us?

No! Why not express an antisense gene against these genes? We came across an interesting method for creating antisense RNAs in prokaryotes involving paired termini. Essentially, in this method, your antisense gene consists of a segment of the target gene which contains the start codon and ribosome-binding site, in antisense orientation, with flanking inverted repeats to give the resultant RNA extra stability (as it is no longer a linear molecule).

We were looking at a number of potential genes to target, and we soon found that manually creating the sequence of the hypothetical antisense RNA was becoming tiresome. It also seemed like a perfect task for a computer – simple but tedious.

And so, version one of ‘AntisenseBot 5000’ was born. It took the DNA sequence of the target gene, worked out the correct frame by checking against the protein sequence, and returned the RNA sequence of the antisense that would knock it out (as well as the DNA gene that would produce it). This quickly speeded up the process of producing hypothetical antisense genes.

With our faithful script at our side, we could now start checking the viability of multiple antisense genes. To do this we used an online tool called ‘RNAserver’ which modelled the secondary structure of a given RNA sequence. We quickly saw that the antisense RNAs we were generating varied widely in how ‘free’ the critical start codon and RBS region was – some of them were quite badly tangled according to these models.

Interestingly, we also spotted that altering the RNA very slightly could sometimes have quite radical changes on the predicted secondary structure. Changing the length, or the specific base at which the target starts, produced very different structures.

So we gave AntisenseBot 5000 a considerable makeover:

  1. Small modifications to the Python script meant that now instead of merely producing one antisense gene, it now produced 2,400 different RNAs, each different in length or start-site by one base.
  2. We coupled this to a modified version of the RNAserver program. We downloaded the offline version of the tool, and modified it so that it could now take multiple RNAs, and rank them based in order of decreasing ‘openness’ (in practice, free energy) at the site which we told it was the RBS and start codon. By taking the output of the new Python script, we now had a program which could calculate the optimum antisense gene for any given native gene.

We were quite pleased with AntisenseBot 5001 (as we christened it), and got as far as designing sequences for five antisense genes, against acpP, fabA, ftsZ, rpoD and rpsL.

Sadly, the unfortunate realities of trying to do multiple ambitious things in a single Summer caught up with us. We planned to do the antisense experiments once the initial non-antisense genes had been PCR’d, assembled and transformed, but this took so long that in the end we never got to test our antisense system. We resigned ourself to the fact that our program, while cool, would have to remain an untested curiosity that we could hopefully mention in our final project writeups.

UCL to the rescue

In mid-September, as we were frantically trying to close down our project (and uncomfortably aware that we hadn't secured our coveted ‘collaboration’ required for the gold medal), we got an interesting Tweet from the UCL team.

They'd seen our tweet back in July about the antisense program, and as luck would have it were also working on antisense molecules! We could all see, that here was a golden opportunity. We could put our hitherto unused program to work, help UCL with their project, and secure collaboration points for both our teams.

We skyped one cloudy Thursday afternoon, and worked out the plan of action. UCL were trying to knock down the gene ispB and had designed an antisense gene for the job. Was it going to work? And was there a better antisense gene available?

So, we got to work. We ran their antisense gene through our program and tried to predict how favourable their RNA was based on its secondary structure. We also worked out some suggested antisense genes they might make if it didn't (which required some modification to the program to get rid of the paired termini and, a feature that hadn't even occurred to us, to omit genes with illegal restriction sites). We would help them by proving optimal gene sequences for antisense, and they would help us by providing real world wet data to test our predictions.

Our predictions turned out to be somewhat grim – the antisense molecule UCL had selected looked somewhat tangled, and didn't cover the critical region. In a turn of events that was unfortunate for UCL, but somewhat reassuring from a software point if view, the antisense gene did in fact fail to knock down the target gene. This might not necessarily be because the antisense RNA was as folded as predicted, but nonetheless we took it as promising signs of the usefulness of our tool. Clearly many, many more tests would need to be done to confirm this as one data point is all but useless but still – data is data.


Predicted structure of UCL's antisense RNA

At time of writing UCL are still working on synthesising our suggested antisense genes, and if they get time, we will be very interested to learn how well they work.