Top Brazilian bank Banco Bradesco S.A. wants to avoid data breaches and has been been working with IBM Research to apply encryption techniques of the future.
The idea is to secure data in a totally inpenetrable way, so much so that even a quantum computer wouldn’t be able to get to it— using an type of cryptography which was first suggested by mathematicians in the 1970s called homomorphic encryption.
Using homomorphic encryption, the researchers apply machine learning on fully encrypted banking data — without having to decrypt it first.
Having encrypted real banking data, the scientists demonstrated secure machine learning-based predictions. They presented the results this week at the IACR Real World Crypto Symposium 2020 in New York City.
Cryptography “underpins the trust that we have in e-commerce, blockchains and crypto-currencies,” says Mike Osborne, a crypto researcher at IBM. Today, data at rest such as backups as well as data in transit sent between organizations or simply between one’s computer and, say, an e-commerce website, are typically encrypted.
The problem is, to be processed, the data need to be decrypted first, meaning exposing the information for a certain length of time. This window, however brief, is enough for data exfiltration and leaks. That’s why before any computations, sensitive data are usually pre-processed to obfuscate, anonymize or even remove any private and sensitive fields. But for certain applications, such as in banking, removing all private and sensitive information doesn’t make sense.
But what if the people handling the data wouldn’t have to decrypt anything to perform computations? Well, then, leak or not, the information would remain safe and sound. Provided the encryption is strong enough, that is.
IBM Research is focused on how to apply advanced cryptographic schemes such as Homomorphic Encryption (HE) to protect the privacy and confidentiality of both the data during the training of ML models as well as the models themselves, and as a consequence, the prediction task can also be protected.
That’s where homomorphic encryption comes in. If encrypted homomorphically, data stays encrypted — all the time. The information remains cryptographically jumbled to preserve privacy and confidentiality while it is being processed.
For instance, it would be possible to use machine learning to match a patient for a clinical trial based on a their genome sequence which is homomorphically encrypted in the cloud. The system would output an encrypted a match, without exposing any sensitive data. “The query can be performed ‘obliviously,’ without revealing the data or the result to the agent processing the query,” says Dave Braines, a researcher at IBM.
And, crucially, homomorphic encryption is resistant to any potential attacks from quantum computers. Once fully-functional, quantum computers should greatly surpass traditional computers for certain types of computations. And they could also easily crack current encryption algorithms. But not homomorphic ones. That’s because this type of encryption is based on the mathematics of lattices — repeating, multidimensional grid-like collection of points. Lattice-based schemes hide data inside this lattice, some distance away from a point, and determining how far away an encrypted message is from a lattice point is extremely difficult for both a quantum or a traditional computer. But if you know a secret key, then it’s easy to get the message.
“Developments in quantum computing have given much of the cryptography that we use today an end-of-shelf life,” says Osborne. “Unfortunately, it is a common misunderstanding to think that we can wait until a large quantum computer is built before doing anything. We can safely say that homomorphic encryption is secure against quantum computers.”
In the latest research, the scientists worked with Banco Bradesco real financial data. The team took transaction data and an existing machine learning-based prediction model and performed two experiments. First, they homomorphically encrypted the data and the model, and showed that it was possible to run predictions with the same accuracy as without encryption. It means that banks can safely outsource the task of running predictions to an untrusted environment, adds Bergamaschi.
They then trained the model using encrypted data, showing that it was possible to use homomorphic encryption to preserve the privacy of data. Typically, financial institutions gather information on their clients — analyzing how much a person spends on groceries, petrol and so on, and predicting whether that customer would soon need a loan.
To do this, a bank’s analysts usually manually identify the most important features about an individual’s financial history that would allow them to make that prediction, selecting a handful of features out of about a thousand. While doing so, they have access to the data, potentially compromising it.
“We’ve shown that we could do this important task homomorphically with encrypted data coming in,” says Bergamaschi. This way, the key features are chosen without exposing any information about the customer — meaning that there is “potential to reduce the damaging consequences of data leaks that we have seen in the past, and the breach of privacy of individuals.”
One issue with homomorphic encryption today is that the computational requirements are way greater with encrypted data, making the process much, much longer. But the technology has been improving over the years, and now we are finally “at that point where we can achieve adequate performance when protecting the privacy and confidentiality of the data is paramount,” says Bergamaschi.
IBM cryptographers have released the Homomorphic Encryption library — HElib v1.0.0.