BY ANTONIO REGALADO
MIT TECHNOLOGY REVIEW
A private DNA ancestry database that’s been used by police to catch criminals is a security risk from which a nation-state could steal DNA data on a million Americans, according to security researchers.
Security flaws in the service, called GEDmatch, not only risk exposing people’s genetic health information but could let an adversary such as China or Russia create a powerful biometric database useful for identifying nearly any American from a DNA sample.
GEDMatch, which crowdsources DNA profiles, was created by genealogy enthusiasts to let people search for relatives and is run entirely by volunteers. It shows how a trend toward sharing DNA data online can create privacy risks affecting everyone, even people who don’t choose to share their own information.
“You can replace your credit card number, but you can’t replace your genome,” says Peter Ney, a postdoctoral researcher in computer science at the University of Washington.
Ney, along with professors and DNA security researchers Luis Ceze and Tadayoshi Kohno, described in a report posted online how they developed and tested a novel attack employing DNA data they uploaded to GEDmatch.
Using specially designed DNA profiles, they say, they were able to run searches that let them guess more than 90% of the DNA data of other users.
The founder of GEDmatch, Curtis Rogers, confirmed that the researchers alerted him to the threat during the summer.
“We certainly are concerned about privacy also, and it’s good that studies like this are done,” says Rogers. “But no matter what you do, there will always be some potential for privacy invasion when you are doing genealogy. Genealogy is a procedure in which you want to compare your information to other people’s.”
Razib Khan, a genomics researcher who is head of scientific content at Insitome, a service that interprets DNA for consumers, called the new security research a large-scale demonstration of weaknesses already known to enthusiasts.
Khan says he has been aware of efforts to “scrape” GEDmatch, or collect more data than usual, and believes a larger attack to whisk away much of the data could already have occurred. “My guess is that almost certainly it’s already been done,” he says. “Governments are collecting data on people. You never know what they can use it for.”
Asked if there was evidence the database had already faced concerted attacks, scraping, or scanning, Rogers said, “I don’t want to get into it.”
“Not that I am aware of,” he added. “I don’t know.”
Rogers declined to comment on whether he’d been approached by national security officials about the site.
Crowdsourcing DNA
Rogers started the genealogy service as a way for people to upload DNA test results from services like 23andMe and locate relatives among other users, by comparing their DNA. The crowdsourced database now holds 1.3 million profiles, he says, although some of these are duplicates.
As the site grew, it drew the attention of police investigators. In 2017, police in California announced they had used the database, without Rogers’ knowledge, to help identify a murderer known as the Golden State Killer. Police did it by uploading DNA data extracted from crime-scene evidence and comparing it with users’ data to identify some of his relatives.
Since then, dozens of murderers and rapists have been identified using GEDmatch. But a privacy debate erupted as well, partly because police had searched users’ DNA without their knowledge. In response, Rogers allowed users to opt in or out of police searches, or just delete their profiles.
But there was an even broader concern: if a DNA database is large enough, practically everyone can now be tracked though their relatives, even if they never took a DNA test.
With the million or so profiles in the database, most Americans have second or third cousins in it, says Doc Edge, a researcher at the University of California, Davis, who last week posted the first paper showing how ancestry databases could be vulnerable to a clever searcher.
Now the team at the University of Washington has demonstrated a new attack specifically on GEDmatch that is “much stronger,” according to Yaniv Erlich, chief scientist of MyHeritage, another DNA genealogy company.
The researchers exploited the way GEDmatch’s genetic comparison engine works in order to infer the DNA data of other people. “These researchers went in through the main gates—they did not break in,” says Erlich. ”Here we have a method that is not even illegal as far as we know.”
When a user searches for relatives, the program compares thousands of DNA markers (called SNPs) from the user’s genome to those of others in the database. The better the match, the more closely that person is related to you. A parent and child will share half their DNA, for example.
To test his hack, Ney uploaded specially designed “attack” DNA files, which he then compared with target profiles he also created. He found that with a dozen attack files he could infer the nearly all the actual DNA markers of the target profile, even though these are meant to be private.
National security risk
The same attack wouldn’t work on other genealogy sites, like 23andMe, because they don’t permit data uploads. Others, like MyHeritage, do allow uploads but don’t give users as much information about their matches. “The problem with GEDmatch is the browser is too good, and searches too deeply,” says Erlich. “If I were them, I would remove it, fix it, then put it back.”
According to Erlich, the vulnerability has national security implications. If a foreign counterintelligence agency grabbed a million American DNA profiles, that country could use genetic genealogy to identify the true identity of American spies or diplomats, locate their relatives, or discover genetic kompromat like unacknowledged children. Since other countries don’t have such databases for the US to steal, the risk would not be symmetric.
“You could have a capability which is better than what the FBI currently has, and you can use it in any way that you want,” says Erlich. “With the raw data you could come up with even better algorithms. You can identify spies or do genetic surveillance.”
As well, says Ney, fraudsters could create fake accounts and pretend to be someone’s long-lost relative.
Ney says he told GEDmatch of the vulnerabilities in July but is not convinced the tiny company is capable of repairing the problems. Initially, his team gave it a deadline of September to fix the problems, but Ney says he held off posting his report for more than a month when he noticed the site hadn’t been fixed.
“Then, a month ago, they did make a small change to their algorithm that prevents the most significant attack we developed,” says Ney. “Our question is whether those fixes are robust to a determined adversary. It might be a temporary patch.”
GEDMatch, run out of a house in Lake Worth, Florida, is small business whose aim is genealogy and education, not profits, says Rogers. He acknowledged that its team of five part-time volunteers would not have the resources to hire security consultants.
Rogers, who is not a computer programmer, did not offer details about what fixes GEDmatch had implemented. “I let the technical people work on it, and I believe they have,” he said in an interview. He later emailed to say the site was “actively working to add more security measures based on the reported problems.”
Ney says he does not believe the genealogy site is secure. “How much effort does it take to secure a large website with a million-plus in genetic data? I think it’s hard for anyone to do,” he says. “The question I have is whether a volunteer-run effort is capable of having the manpower to handle it.”
Ney also doesn’t believe administrators at GEDmatch have any way of knowing whether or not the trove of DNA data has already been carried off by an attacker, since an attack could look like an ordinary search for relatives.
“They are in a situation of being ignorant, which is its own problem,” says Ney. “The worst kind of attack is where you don’t even know it happened.”