Unleashing Data with Collective Machine Learning

7月 16, 2020

Fetch.ai’s blockchain platform enables people to work together to train machine learning models without requiring them to trust one another or a central authority. This allows the advantages of “big data” to be shared between organizations that are unable or unwilling to share data directly with each other. This situation arises frequently in many industries, such as financial services, where cybersecurity, competition or privacy laws prevent data sharing. Collective learning enables groups of individuals or businesses to work together to gain the benefits of scale and compete with the dominant players in their sector.

Control over Data and Machine Learning Models is Centralized

The field of machine learning has progressed rapidly over the past decade. Fast GPU processors, huge datasets and deep neural networks have driven unprecedented performance in domains such as speech recognition, machine vision and predictive text. At the same time, open source software frameworks such as Tensorflow and PyTorch have placed powerful learning algorithms at the fingertips of 任何人 with access to a PC and an internet connection.

Progress in AI and its wider application have the potential to deliver economic growth, a greener future and improvements in our quality of life but there are significant concerns over its current direction. These include the potential for discrimination according to race or gender, and the use of AI for harmful activities such as political manipulation, crime and warfare. In this article we describe an approach to deal with another problem; namely, that the success of any AI business is increasingly determined by the data that it controls.
The centralization of data places enormous power in the hands of intermediaries, such as Google or Facebook, who preside over the personal data of millions of consumers. It also means that there is increasingly a race to extract this “new oil” across many industries. Consumers become the resource or product with significant costs to their privacy and security, and there are increasing concerns around the potential for their behavior to be manipulated.

For business, the race for data reduces competitiveness, as incumbents with access to large amounts of data are less easily displaced by newcomers. Existing industries such as the automotive and consumer product industries also fear their profits being corralled by large tech firms while they are reduced to mere commodities. A key question then is how to provide the benefits of aggregating large data sets to train better machine learning algorithms without resorting to the services of “Big Tech”.

The Path to Decentralized Machine Learning

One part of the solution is a technique known as federated learning. Rather than centralizing the data in one place, federated learning involves bringing the model to the data or more accurately the device where it is stored. The model can then be trained locally using the processing power on the device and returned to the central server. Privacy-preserving techniques can also be applied during training to ensure that private information is not “leaked” by the model. Federated learning improves security and privacy but leaves all aspects of the training process and its financial benefit under the control of a central intermediary.

Image for post
Centralized training involves sending all data to a database. In federated training, the raw data remain on the device while the model is centralized.

Blockchains are the other half of the solution since they enable individuals to transact anonymously without needing to trust one another or a central authority. This enables incentives to be introduced that encourage good outcomes such as successful training of a shared model. Penalties can also be applied to someone consistently failing to fulfill their obligations or, worse, sending malicious updates in an attempt to “poison” the model. Blockchains also provide an incorruptible record of each participant’s actions. This feature enables reputations to be established in open networks and also provides audit trails that are essential in highly regulated industries such as healthcare or financial services.
There are many technical hurdles that need to be overcome to make this type of multi-stakeholder federated learning successful. We have shown that the first stage can be solved using our collective learning prototype, which has been in development at Fetch.ai since the beginning of 2020. In the next article, I describe the simplest version of our collective learning protocol and future directions for improving its efficiency and performance.