From the what-could-possibly-go-wrong department: Scientists have now managed to write executable code into DNA that is theoretically capable of infecting the computer that reads it. It was only a matter of time. This is bound to result in trolling law enforcement, à la Rick Sanchez trolling the galactic government with his three lines of code.
It’s not quite accurate to call it a virus, even though this might be the closest to a real virus that software has ever come. It consists of replication instructions, encoded in a snippet of DNA that can deliver a payload capable of assuming control of the computer that reads the strand. It has to integrate itself into the host system to propagate itself. All it needs is a capsid, although the file metadata and header might qualify.
So how did we write executable code into a DNA strand in the first place?
First, the researchers decided on the exploit they meant to use. It wasn’t an accident that the scientists picked C for their exploit. C has a well-known set of vulnerabilities in some functions that leave systems open to a classic buffer-overflow attack.
Then, they encoded their snippet of C in a simple cipher, using nucleobases for binary pairs: A = 00, C = 01, G = 10, T = 11.
Computers run on a binary stream of electrical impulses that alternates between OFF and ON: 0 and 1. As a consequence, executable code has to go through the binary state on some level. Reading the DNA sequence got the malicious code into the computer that was doing the read, and from there it took advantage of a buffer overflow and got loose in the system to grab for privileges.
“The conversion from ASCII As, Ts, Gs, and Cs into a stream of bits is done in a fixed-size buffer that assumes a reasonable maximum read length,” explained co-author Karl Koscher in an email exchange with TechCrunch.
“The exploit was 176 bases long,” Koscher wrote. “The compression program translates each base into two bits, which are packed together, resulting in a 44 byte exploit when translated.”
“Most of these bytes are used to encode an ASCII shell command,” he continued. “Four bytes are used to make the conversion function return to the system() function in the C standard library, which executes shell commands, and four more bytes were used to tell system() where the command is in memory.”
In other words: feed this strand of DNA into a compiler and it’s Hello World in 176 nucleobases. Three lines of math, indeed.
Even though the possibilities for destructive interference with law enforcement and scientific/corporate espionage clearly abound, the fact that buffer overflows are so notorious — and so common — means that programmers have been looking out for this kind of attack for a long time. Heartbleed was a buffer overflow attack. There exist boilerplate wrappers that check code for this kind of bug, and quit if the program experiences such an error.
Furthermore, since it’s a DNA-based exploit, there are some problems in the mechanism. The strand can fragment, for one thing, and because DNA can be read in both directions, the code can be transcribed backwards. But no worries: the study authors remark that a clever future assailant could write the code as a palindrome.
But it’s still important to look for this kind of emergent threat. “We know that if an adversary has control over the data a computer is processing, it can potentially take over that computer,” said professor Tadayoshi Kohno, who led the project. Kohno’s background is in looking for attacks that come from left field — attempts to hack embedded systems like pacemakers, for example. “That means when you’re looking at the security of computational biology systems, you’re not only thinking about the network connectivity and the USB drive and the user at the keyboard but also the information stored in the DNA they’re sequencing. It’s about considering a different class of threat.”
Now read: how scientists grew a dinosaur leg on a chicken.