Collective Learning: COVID-19

Dec 9, 2020

Image for post

2020 has been unlike any year in recent times and the COVID-19 pandemic has continued to dominate the headlines around the world and our collective lives literally in every sense possible. Underlined by a dramatic loss of human life worldwide, there has been an unprecedented disruption to public health systems, economy, and society.

The COVID-19 pandemic has also re-emphasized how access to quality healthcare is the cornerstone of a successful health response. To support the ongoing campaigns against COVID-19, we at decided to support the fight against COVID-19 in a way we know best — using AI and Blockchain.

10 months into the outbreak, clinicians have identified a range of methods to identify patients with COVID-19 symptoms but perhaps the most effective one identified so far are chest X-rays. X-rays have long been used by clinicians to identify ailments and within the context of the pandemic, they are certainly proving once again to lead the way in diagnosing patients, while awaiting RT-PCR results or when RT-PCR results are negative, and the patient has COVID-19 symptoms. According to Science Daily, chest X-rays have a high positive predictive value of 83.3% for SARS-CoV-2 infection, thereby making them quite an effective tool in diagnosing COVID-19.

Millions of healthcare professionals around the world are using chest X-rays to determine the next steps of treatment for COVID-19 patients. Despite the high efficacy rate, one of the caveats associated with pursuing any effective healthcare treatment is the amount of data generated and the current outbreak is no different in this regard, with millions of gigabytes of data — the equivalent of a modest library — being generated each day in medical records to help analyse the virus.

But because the human brain can only process so much information at a time, the most effective tool available therefore is to analyse all this data using machine language. According to the National Library of Medicine, machine learning based methods have shown unprecedented success in the reliable analysis of medical images. This is also where the Collective Learning initiative launched by steps in.

In the next section, we will elaborate on Collective Learning and how we combined machine learning along with blockchain can be used to “fill in the blanks” left by science in the campaign against COVID-19.

What is Collective Learning?

The Fetch.AI collective learning module is a tool that enables distributed parties to work together to train machine learning models without sharing the underlying data or trusting any of the individual participants.

This is very useful in scenarios where personally identifiable information (PII) is involved which cannot compliantly be shared.

Collective Learning via decentralized protocols provides hospitals and doctors with a network in which they can use their own private data, in the form of chest X-rays, that have been labeled according to whether the patients with pneumonia have tested positive for COVID-19 serving as a rapid diagnostic tool. It can also be used to identify the severity of a patient’s condition including recognizing the need for intubation or the need for supplemental oxygen.

Multiple participants from anywhere in the world can securely train a shared machine learning model on their private data which the neural network will then learn from. Utilizing blockchain technology and AI learning capabilities, it supports and trains its network to learn from private data without having access to it.

How does it work?

The goal of the project was to identify COVID-19 cases using chest X-ray images and establish a clear distinction between normal and pneumonia cases. The public open dataset used for the project consists of 478 images suspected of COVID-19 with 203 normal (non-COVID or pneumonia) images. To create an unbiased and numerically even dataset, we also used this dataset (consisting of 255 normal images, and 478 pneumonia images) as a secondary source. From these we built a chest xray dataset, consisting of 1434 images, 478 images for each category. The images were cropped and resized to 512×512 pixels.

Here we can see some images from the dataset. The first shows an xray of a healthy person.

Image for post

The second image shows a COVID positive person chest X-Ray.

Image for post

The third image shows a pneumonia patient who is not affected with COVID.

Image for post

Because of the small size of the dataset, instead of building a deep convolutional neural network we decided to implement a classical machine learning framework for the analysis. Here is how we did it:

  1. First we preprocessed the dataset and extracted a variety of features based on the research presented in journals. Spatial domain (Texture, GLDM, GLCM) and frequency domain (FFT and Wavelet) features were used, and we created a 256 dimensional vector representation for each image.
  2. To reduce the dimensionality of the data, we used principal component analysis. We extracted the dimensions corresponding to the first 64 highest eigenvalues of the covariance matrix and projected the data into this subspace. In the end we ended up with 64 element vectors for each image.
  3. A fully connected neural network was trained on the dataset to recognize COVID.

As a next step, we set out to simulate the neural network in a real life scenario, with multiple participants providing data and access rights set as private to them. With those parameters, it was not possible to build a global dataset to train a highly accurate network without infringing upon privacy, so instead we split the data and created nodes within our framework to train a global model using multiple parties with private data.

To measure the impact of our framework, we compared it to offline learning, which is defined as follows: The setup is the same as for Collective Learning, but the nodes are not sharing weights with each other. Instead each hospital/care provider who signed up as a participant is training the algorithms using their private data. This simulation is very much the current setup of what most companies/research institutes have available. Compared to this, our platform provides a way for the nodes to share knowledge and learn from each other without exposing the private data they have access to.

Upon comparison of Collective Learning Framework and the Offline Learning, the accuracies associated with both of them respectively can be seen on the following plot. The Collective Learning results have a higher accuracy than the offline learning.

Image for post

To further validate the recorded results, we decided to lay out — what is also known in the field of machine learning — a confusion matrix.

A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. This shows us the relationship between the true labels shown by the rows against the predicted classes represented by the columns (see below). Elements on the diagonal represent correct predictions while off-diagonal elements represent the different types of errors. For example 18 normal X-rays were classified as COVID-19.

Out of the 96 images on COVID-19, the algorithm predicted 81 out of 96 correctly. Similarly there are 93 pneumonia images and the algorithm predicted 82 correctly.

Image for post

IMPACT of Collective Learning’s decentralized collective learning protocol could successfully distinguish COVID-19 patients from those with pneumonia from other causes with an accuracy of 97%. In the context of the pandemic, with wider implementation of our Collective Learning network, we have the ability to rapidly and efficiently combine information from hospitals across the globe to vastly improve prognostic predictions and ensure patients are given the right care.’s Collective Learning, as of November 2020, has correctly identified COVID-19 cases from a training set of chest X-ray images submitted from hospitals and private practices using a model trained across a distributed network with nodes in Los Angeles and London. As a result, it can give doctors a digital second opinion that confirms or questions their assessment of a patient’s condition.

Supporting the above inference are also some graphs that highlight the impact of Collective Learning: a fast startup for new nodes and how accuracy improves as more data enters the network. First as new nodes join the network it takes only one round for them to reach average accuracy of the network. This fast start up is valuable to nodes joining later as they get good accuracy without much effort. But, new nodes also improve the whole as there is more overall data so the average accuracy improves.

Image for post

Image for postIt is also helpful that new nodes can join later to the network, introducing new data, new knowledge to the system, which in the end will increase the overall accuracy.

What next?

In many ways, the pandemic has highlighted the inadequacies of our healthcare systems and processes. On the other hand, it has also provided an opportunity for projects like Collective Learning from to demonstrate the impact of AI and blockchain via advanced analytics to augment decision making for healthcare providers. If you’re interested in collaborating or contributing to this project, do reach out to us on [email protected]