Machine Learning of When to ‘Love your Neighbour’ in Communication Networks

May 9, 2019


Peer-to-peer networks have many advantages over routed communication networks. By gossiping messages between its neighbours, it is possible for a node to pass a message to any other node in only a handful of steps. These communications occur without needing any centralised coordination, which makes them resilient to attack since the multiple paths between the different participants allow communications to be maintained even if some nodes malfunction or are corrupted. The flexibility and robustness of peer-to-peer networks are one of the key technologies that led to the development of the internet and are also integral to blockchain protocols.

A potential weakness of decentralised peer-to-peer networks is that their flexibility can lead to fragile or inefficient network topologies, particularly if they are subjected to attack by an adversary who can manipulate connections. This is particularly relevant to blockchain protocols where blocking or delaying messages could be highly profitable for the attacker. To resolve these issues, at Fetch.AI we have developed a networking protocol that uses machine learning algorithms to optimise the connections between blockchain nodes.

Figure 1. A partition of the network can enable a single attacker (node 41) to initiate a double-spend attack by submitting a transaction with a different destination address to the two halves of the network (coloured green and blue). This can be achieved without the attacker needing to control any of the stake or hash-power in the network.

This is a difficult problem as the entire network is dynamic and delays between nodes may vary greatly. The algorithm must function in a decentralised way and respond to a changing environment quickly. It must also serve the dual purposes of optimising network traffic during normal operation and preventing attacks such as partitioning (cf. Fig. 1) or an eclipse attack.

In this post, we study ways of increasing the network resistance by equipping nodes with a module that allows them to keep track of the quality of their connections. By evaluating the performance of their neighbours and by building a Bayesian trust model of the network, each node is able to:

  • prioritise connections to nodes that are providing them with high-quality data (correct blocks, unseen transactions) and the best connection (high availability, low latency), thereby improving the overall speed of message propagation in the network.
  • rapidly drop the connections to a suspected node if malicious behaviour is detected. This leads to malicious or spamming nodes being quickly isolated, eliminating their ability to attack or influence the network.

A lesson from chess

In the simplest form, the problem can be framed as a ranking task: how to choose an optimal set of neighbours without the need to test all possible configurations. The problem can be illustrated as follows. Consider, that you know three people, Alice, Bob and Cecil. You know the outcome of past interaction between “Alice and Bob” and between “Bob and Cecil”. How, based on that history, can we predict the outcome of an interaction between Bob and Cecil?

Fig. 2. How to determine the outcome of Bob and Cecil interactions’ based on their past history of interactions with other players?

This is similar to a problem that arises frequently in competitive sports such as chess where we can observe the results of pairwise matches between players, and we wish to use these to estimate an ordering or ranking of players that reflect their chess-playing ability at the current time. To answer this question, in 1959, Arpad Elo developed a ranking system that modelled probability of a given outcome (win, lose or draw) in a given game as a function of a player’s rating r_i and performance p_i. The rating is a continuously valued variable that represents the player’s current ability, while the performance is modelled as a random number drawn from a normal distribution centred around the player’s rating value, namely

with some fixed variance. It then assumes that a player i wins against player j if his performance is higher, namely, p_i > p_j, where the probability of such an outcome is given by

where y is the outcome of the game (+1 if player i won, 0 for a draw and -1 if player j won). The parameter regulates how large is the update (or how important the recent experience is compared to the previously accumulated knowledge). Note also, that the positive and negative updates are equal, thus the sum of the players’ ranting is constant.

Fig 3. Before the match, the most likely outcome is that Bob will beat Alice. If that expectation is confirmed, the distributions describing Alice’s and Bob’s expected performances are updated. Bob’s rating is increased and Alice’s rating decreases accordingly.
Fig. 4. If, on the contrary, Alice beats Bob, the distributions are shifted in the opposite direction. If the update rate parameter is large enough, it may mean that Alice’s new rating will be larger than Bob’s.

By choosing a reference value of, for example, 1000 for a new player, this method that can be used to estimate the ability of players and the likely outcome of a match between two players — even if they have never played against each other before.

Full Bayesian update

The ELO system (and its extensions) have been widely adopted in various games. However, it has several features that make it unsuitable for the network trust problem. The ELO system converges relatively slowly, as it assumes the same variance for every player. Additionally, in order to calculate a rating for different players, a global view of the system is needed, which is information that is not accessible to participants of the peer-to-peer network. This led us to introduce the following modifications:

  • We perform a full Gaussian update, that modifies both the mean and variance of the distribution. This enables the algorithm to converges quicker. Intuitively, the algorithm works in such a way, that an unexpected outcome results in a larger update to the mean value. This is consistent with our common-sense: if a friend was nice to you, you don’t need to change your opinion of him or her, so a positive interaction would lead to a relatively small update. However, if the same person cheated you, you have a reason to reevaluate your prior opinion about him or her. In this case, the update to your believes how “nice your friend is” will be larger.
  • In order to adapt our algorithm to the decentralised setting, each participant creates a rating of their peers and compares their interactions with a set of archetypes (“a peer that always has new data”, “a peer that never has new data”, “a peer that always responds”, “a peer that never responds to a query”, etc.). These archetypes help to establish the nodes’ expectations of their peers’ behaviour. A peer that often has new, previously unseen messages becomes closer to the “good” archetype, while a peer that rarely has information that is new or interesting to share has a decreased rating.
  • Each new peer is initially assumed to have a relatively broad prior rating distribution centred close to the upper end of the scale. This can be interpreted as a node having a positive opinion of newcomers but with a high degree of uncertainty. After each interaction with a peer, nodes update their belief in the peer’s honesty (or more generally, its ‘performance’).
  • If the performance of a neighbour decrease, with a certain probability, a node can replace that connection with a new node.

Finally, since the performance of different nodes may change over time (for example a previously honest node might have been bribed and started to act maliciously), a time-dependent drift is applied to each variance. This serves two purposes; first, the uncertainty in the rating of nodes that have not recently been peers is gradually increased. Second, the drift prevents the uncertainty being reduced below a certain minimal level.


We tested our approach in a simplified environment. Figure 5 presents beliefs: the solid line denotes the mean value and the shaded region represents the three-sigma uncertainty of the performance of two nodes. During the first 100 interactions, the first node (blue) shared interesting (unseen) data, while the second node (red) returned only invalid (or to put it another way, useless) data. During the second 100 interactions, the roles of the red and the blue node were swapped. Note, that the evolution of the beliefs is not symmetric. A node that was believed to be honest (blue) loses his reputation after just a few invalid messages. While the node that was believed to be bad (red) slowly starts to improve its reputation in the second half of the simulation, but cannot reach a level from which it started (the starting point was arbitrarily chosen to be 100).

Fig. 5. During the first 100 rounds, node B shares interesting (unseen) data, while node A returned only invalid data. Then during the second 100 interactions, the roles of the nodes are swapped. The solid line indicates the mean performance and the shaded area shows a standard deviation.

Comparison with different archetypes allows us to use the same Bayesian model and obtain qualitatively different responses to different situations. While broadcasting invalid messages reduces both the rank of the node and its future ability to improve its rating, a lack-of-response should perhaps be treated more forgivingly. In Figure 6, a belief about two other nodes is presented. The first node (green) provided new (unseen) data for 50 rounds, then was offline for 50 rounds, and then returned to provide new data for the final 100 rounds. The second node (yellow) followed a similar pattern, but instead of new data, it shared something less valuable, for example, valid transactions that had already been seen.

Fig. 6. Node C shares interesting (unseen) data, while node D shares less interesting (already seen) data. Both nodes are offline in rounds 51–100. The solid line indicates the mean performance and the shaded area shows a standard deviation.

In the next test, we simulated 11 nodes that have a different probability of sharing some useful data, from 0%, 10%, 20%, … to 90% and 100% as indicated by the labels on the vertices. Each node can query only two nodes (two incoming connections) but can share data with many nodes (unlimited outcoming connections). After each round, nodes can update their belief about incoming peers. They can also switch one connection if they believe that another node can achieve better performance.

Fig. 7. Nodes switch connections from low performing nodes to higher performing ones.

The simulation shows that nodes quickly learn to avoid peers that provide low-quality data. This prevents, for example, a spamming node from impacting the network. As soon as nodes discover that the data is of low quality, the spamming nodes will be isolated and their messages will not be propagated.

A careful reader might ask if such a mechanism can lead to centralisation — where only a handful of nodes serve as hubs. This would be unlikely to occur in a real system, as is shown in the following simulation, as transactions and blocks originate from different (random) nodes within the network. These nodes will constantly change which of their peers is providing them with interesting (“unseen”) data. Additionally, the beliefs about the node’s rank are subjective, so depending on the connections, the same node could be considered to have a high level of performance by one of its peers — and low performance by another.

To illustrate this point we created a highly partitioned network, cf. Fig. 8.

Fig. 8. An example of a partitioned network.

The different partitions are denoted by contrasting colours. We run a simulation, in which a random node in the system produces a block, that is subsequently propagated through the network. Each node can maintain only two incoming connections, but an unlimited number of outgoing links. By observing which node is supplying interesting (previously unseen) blocks, nodes can build private beliefs about a peer’s performance. Periodically, nodes can choose to replace one connection if they believe it will improve their access to new data. We measured the average block propagation time and the overall cost of maintaining the internet connections (where the cost of connections between different regions was 10 times the cost of connection within a region of one colour). The results of the simulations are presented below, in Fig. 9.

Fig. 9. On the left are the connections between different nodes. The numbers denote the number of outgoing connections from a particular node. A high value indicates a popular node that can influence a large number of its peers. On the upper-right side, the average block propagation time is shown in ms and on the bottom-right side, the overall cost of maintaining the network’s connections is shown. Note that we were able to significantly reduce the propagation time, while the cost increased only by a few per cent.

This resulted in the network having a far better topology with less partitioning, and this was achieved by switching only a few connections (as you can see comparing the figures 9 and 10, the majority of links remained the same).

Fig. 10. The final topology of the network. Switching just a few connections (cf. with Fig. 9) decreased the network partition and reduced the average propagation time.


We believe that such a self-organising network will improve the block and messaging propagation time and will make the network more resistant to partitioning. We continue to research applications of our ranking model and we are excited to see it incorporated into our smart ledger design.