Written by Dave Nyczepir

The task force developing recommendations for a national artificial intelligence research resource must balance the need to provide valuable data with the increased risk that it could be used to triangulate personally identifiable information, given the large number of parties supposed to have access to it, according to experts.

Working group members want to include startups and small companies developing privacy technologies among NAIRR users, but exactly how resources, capabilities and policies would be integrated continues to be discussed, according to co-chair Manish Parashar.

Members have previously said that researchers and students based in the United States – primarily in academia, but also with companies that have received federal grants such as small business innovation research funding or technology transfer in small businesses – are the target users of NAIRR. The privacy technologies they develop could help the resource protect personally identifiable information (PII).

“Yes, the working group is certainly discussing how privacy-enabling technologies could help improve the privacy aspects of using the NAIRR,” Parashar told FedScoop. “However, the task force also discussed how privacy requires more than just technical solutions, and we expect a full range of considerations when examining privacy, civil rights and civil liberties. .”

The data used to train machine learning (ML) algorithms can be anonymized to some degree, but the process is never absolute, meaning PII can be correlated with enough effort.

Startups such as integra.ai, which advocates privacy by design, see an opportunity for the NAIRR to not only include them, but also use their privacy-enhancing technologies: federated learning, differential privacy, homomorphic encryption, and compute. secure multiparty.

“I would like to see a privacy trail, a privacy initiative that both leverages the research value of the resource, but also supports the whole initiative to really protect the privacy of that information,” said said Karl Martin, senior vice president of technology at integrate. have.

Martin envisions a group of researchers and companies with a mandate to support NAIRR with privacy-enhancing technologies that others can or should use to access the resource’s data, in addition to advancing their own work.

Database-style access controls are the “most basic” form of organizations limiting privacy based on data type, and they would likely become “frustrating” for NAIRR users, Martin said.

On the other hand, federated learning allows building ML algorithms without directly accessing the data and can be combined with additional layers of privacy, like differential privacy, to make it difficult to reverse engineer the original data, a he added.

Whatever privacy technologies the task force ultimately recommends should be based on a philosophy of smart data, opting for those associated with data rather than systems.

“What is the value of this data? said Martine. “So what are the protection mechanisms that can surround the data?