Enter the maze

Hacking DNA

by Jo Brodie and Paul Curzon, Queen Mary University of London

DNA Jigsaw. From PIXABAY.com

DNA is the molecule of life. Our DNA stores the information of how to create us. Now it can be hacked.

DNA consists of two strands coiling round each other in a double helix. It's made of four building blocks, or 'nucleotides', labelled A, C, G, T. Different orders of letters gives the information of how to build each unique creature, you and me included. Sequences of DNA are analysed in labs by a machine called a gene sequencer. It works out the order of the letters and so tells us what's in the DNA. When biologists talk of sequencing the human (or another animal or plant's) genome they mean using a gene sequencer to work out the specific sequences in the DNA for that species. They are also used by forensic scientists to work out who might have been at the scene of a crime, and to predict whether a person has genetic disorders that might lead to disease.

DNA can be used to store information other than that of life: any information in fact. This may be the future of data storage. Computers use a code made of 0s and 1s. There is no reason why you can't encode all the same information using A, C, G, T instead. For example, a string of 1s and 0s might be encoded by having each pair of bits represented by one of the four nucleotides: 00 = A, 01 = C, 10 = G and 11 = T. The idea has been demonstrated by Harvard scientists who stored a video clip in DNA.

It also leads to whole new cyber-security threats. A program is just data too, so can be stored in DNA sequences, for example. Researchers from the University of Washington have managed to hide a malicious program inside DNA that can attack the gene sequencer itself!

The gene sequencer not only works out the sequence of DNA symbols. As it is a computer, it converts it into a binary form that can then be processed as normal. As DNA sequences are long, the sequencer compresses them. The attack made use of a common bug found in programs that malware often uses: 'buffer overflow' errors. These arise when the person writing a program includes instructions to set aside a fixed amount of space to store data, but then doesn't include code to make sure only that amount of data is stored. If more data is stored then it overflows into the memory area beyond that allocated to it. If executable code is stored there, then the effect can be to overwrite the program with new malicious instructions.

When the gene sequencer reaches that malware DNA, the converted program emerges and is converted back into 1s and 0s. If those bits are treated as instructions and executed, it launches its attack and takes control of the computer that runs the sequencer. In principle, an attack like this could be used to fake results for subsequent DNA tests, subverting court cases, disrupt hospital testing, steal sensitive genetic data, or corrupt DNA-based memory.

Fortunately, the risks of exactly this attack causing any problems in the real world are very low but the team wanted to highlight the potential for DNA based attacks, generally. They pointed out how lax the development processes and controls were for much of the software used in these labs. The bigger risk right now is probably from scientists falling for spear phishing scams or forgetting to change the default password on the sequencer.