Democratizing Machine Learning with Blockchain Technology

Jul 17, 2020

The goal of the collective learning system is to spread the benefits of this transformative technology to individuals, small- and medium-sized businesses. This will reverse the trend of a handful of giant technology companies swallowing most of profit from the machine learning boom. By working together across borders, these techniques will enable industries to achieve enormous improvements in efficiency without needing to rely on “Big Tech”. In this article, we describe the key ideas behind the collective learning network.

Blockchains and Collective Learning

Similarly to other Proof-of-Stake blockchains, the staking of tokens entitles users with the right to operate validator nodes who collectively operate the blockchain. Each validator node is responsible for adding blocks to the chain. Their other responsibility is to verify blocks that are added to the chain by the others. In a decentralized network, this is achieved by using digital signatures to vote on the validity of the block. Blocks that receive more than a threshold fraction of positive votes, typically one-half or two-thirds of all validators, are then accepted as valid, while blocks with fewer votes are rejected.

The key features that distinguish’s collective learning system from standard blockchains is that the validators also have access to local training data and the ability to train a machine learning model. This could involve any of the popular machine learning frameworks that implements the correct learning algorithm with a specified model topology.

The collective learning model, shared by all validators, is initiated with a genesis block that contains random weights. This model initially performs poorly on all of the validator’s (distinct) training sets. After the initiation step, the consensus protocol selects one of the validators to produce the first update to the model. This validator carries out a few steps of the learning algorithm to improve the performance of the model on their own local data set. The validator then broadcasts a block containing the new and improved model to the other validators in the network.

Upon receiving an updated model, validators evaluate its performance on their own local data set compared with its predecessor. The validator broadcasts a positive vote for models that have improved performance while rejecting updates that have degraded performance from the perspective of their training set. This process is then repeated many times with different validators training the model in each epoch until a fixed number of cycles has been completed or a particular target performance has been met. Attackers that attempt to “poison” the model or validators that have data that is incompatible with the majority will not contribute to the learning process.

Image for post collective learning system. 1. A decentralized network of validator nodes is formed by the staking of tokens. 2. The machine learning model is initialized with random weights, W0, which are entrered into a block. Validators then take turns to propose updates in the form of blocks, W1,W2,… 3. A model “poisoning” attack by “Eve” is repelled by the other validators rejecting her block submission.

Preserving Privacy and Improving Performance

This blockchain-mediated collective learning system enables multiple stakeholders to build a shared machine learning model without needing to rely on a central authority. There are, however, many potential avenues for future improvements. We’re currently working on some important questions such as; “How are participants incentivized to behave well?”, or “Who pays for the on-chain data storage?”, or “What about the validators with data that is inconsistent with the others?”. Along with these issues we’ve also been improving the stability and efficiency of collective learning that we’ll be describing in future articles and source code releases.

In the next article in this series, my colleague Emma Smith describes how the collective learning protocol can be used in the healthcare industry. She’ll also explain how privacy-preserving techniques from the Deepmind=sponsored Openmined project can be used to protect the privacy of patients whose data is used in the collective learning system.