Fetch.ai Ledger Benchmarking II — Single Lane Performance
Mar 26, 2019
This is the second in a series of articles about benchmarking. If you have not read the system overview benchmark, please see Benchmarking I — Overview and Architecture.
In this post, we discuss single shard performance to quantify the maximum expected throughput of the system. We test the system under various conditions and make projections for the multi-shard performance. Working under the assumption that the average transaction sizes are around 2048 bytes, we can show that the testnet achieves peak rates of 30k transactions/sec or more in non-trivial topologies.
The initial set of tests that we completed were conducted using a sequence of nodes connected into a chain. The primary reason for selecting this test is to understand and interpret the results of how the gossip-based architecture of the network performs. The results were generated using a selection of cloud computing resources across the US and Europe. Due to the availability of resources the results were generated with a selection of intel based processor architectures. We chose a network topology as follows:
Figure 1: Here the orange node represents the entrypoint for data and the green node the exit point. We expect that this heterogeneous selection of processors, together with the transatlantic data transfers, more closely resemble main network conditions.
One of the key reasons to look at a chain setup is that for any graph it is to be expected that propagation will be dominated by the shortest chain (under the assumption of roughly uniform propagation times). One of the pitfalls of studying chains is that they do not truthfully reflect the need for handling echoing in the system. Performance analysis figures outlined here should be considered with this in mind.
In this post we will focus on two main processes inside the ledger, which are ultimately network-bound. These are the block propagation and the transaction synchronisation respectively.
In this section, we examine the block propagation of our system. Inside our ledger, blocks are gossiped from node to node. The block propagation is an important metric for the system since it will give a lower bound on a block time (the average time between blocks in the blockchain), depending on the consensus scheme.
In a proof-of-work scheme block latency will extend the time miners spend wastefully hashing their blocks, while in elected leader schemes it will delay mining of the next block. This is especially notable in the case where the elected leader fails to submit a block: nodes must have an upper bound for expected block time to avoid waiting forever.
For this test, nodes were connected in the chain topology as described above. The size of the chain was altered from two nodes to seven nodes. Given the setup of the chain, each hop between nodes had a latency of ~100ms (because each node was either side of the Atlantic). This was done to minimise the variance of network latency to the test.
The number of nodes was chosen to reflect the expected topology of miners — for a small world model the expected average path length tends to ln(N)/ln(K) where N is the node number and K is the mean degree. Hence, for a network of 100,000 nodes with an average of five connections, the average path length is ~7.15. In practice, however, we expect nodes to self-organise using the Fetch.AI trust framework, so as to minimise this mean path length.
In this figure below we present the results of the block propagation. The size (in transactions) of the block was varied as well as the length of the chain of nodes.
The block propagation time is linear with the number of nodes as is evidenced by the graph. This is the expected result, especially given the chain topology and the gossip protocol. However, the gradient of the line as the block size increases could be better. Further analysis points ultimately to the size of the blocks. Planned improvements to our internal serialisation will increase performance.
The other major network test that was performed was transaction synchronisation. This tests the propagation of transactions that have entered a single node on the network to all other nodes in the network.
In this test setup it is assumed all valid transactions that arrive at any node should eventually be seen by all nodes. In practice, however, depending on transient conditions, transactions with a very low fee may not be prioritised for synchronisation.
The synchronisation of transactions is important to system performance in two ways:
- Transactions cannot enter a block until they are seen by a mining node. Therefore, in reality the latency for transactions to enter the blockchain is a function of their synchronisation speed as well as their attractiveness to nodes (fee).
- Transaction propagation ideally is on the same order of magnitude or better than block propagation; once a block has been mined, it is optimal for nodes to have already received and verified the corresponding transactions before they receive the block. If they are missing transactions, they must query and receive these from the network. As the block cannot be verified until this is complete it reduces the effective useful block time of the network, reducing throughput.
Transaction synchronisation in our ledger differs in two major ways from block propagation: conceptually transactions are ‘pulled’ in blocks from one node to another and that transactions are verified before retransmission. The rationale for a pull-based synchronisation mechanism is that a push-based protocol is likely to suffer from high overheads from message spamming, and that the system is likely to react poorly to high load.
For an initial test, nodes were connected in a chain configuration (varying in length from 1 to 8) and 250k transactions were submitted to the first node in the chain. Nodes were run as separate processes locally on a single server. This initial test was to evaluate the performance of the system in this best case scenario where the network latency was effectively zero. This approach was to verify our assumption that the system performance in this setup would be CPU bound due to the process of verifying transactions.
This can be seen from the figure above — doubling the number of CPUs available on the machine resulted in a significant performance improvement. This validates our model that process will be CPU bound. Further execution level analysis also verified that the hot path for each of these processes were calls to OpenSSL cryptographic library. Specifically these were the calls required to verify the signature(s) inside the transactions.
Next, we performed a more realistic evaluation of network delay. A series of chain lengths were selected (2, 3, 4 and 5). For each chain of nodes, individual machines were deployed as a network and 1,000,000 transactions were submitted to the beginning of the chain. In each experiment the time was measured between transaction submission and transaction execution. For this test we were primarily interested in the synchronisation performance, so for this test we disabled actual transaction execution. In this way we more accurate benchmark for this process.
A summary of the observed results is shown in the figure below:
For this set of tests, the system load quickly becomes CPU bound. This is because the transaction verification tasks dominate. This is primarily due to the fact that the transaction sync stage must verify all transactions before they are transferred onto another node. We will be reviewing this design decision in the future. This explains the observed slowdown based on the number of hops that is required to achieve full network synchronisation: as this chain length parameter grows, the effective transaction throughput of the system is reduced, which is the expected behaviour.
When we consider the rates for each 1 second internal, the following distribution is observed for the 5 node chain case:
In summary, our tests have highlighted the CPU intensity of the transaction synchronisation protocol. Given this, the underlying mechanism is able to batch the transaction quite well leading to reasonable average transaction throughput rates.
These tests provide valuable information on the performance of our system. It is clear that careful tuning of the system will be required in order to ensure the network is working efficiently. As discussed, this will be a multi-variable problem depending on the topology of the network, the effective transaction input rate and the computation power available.
It should be noted, however, that we expect this system to scale linearly with the number of lanes (shards) that are present in the system. Due to current limitations of the ledger software, this scaling cannot be tested at this time beyond the modelling already performed. When this feature is implemented, we will evaluate the performance again to validate this claim.
A significant proportion of the computational effort in this subsystem is the verification of the incoming transactions. Any possible future improvements that are made to this process will have a large positive knock-on effect on performance in this subsystem.
The current performance of the ledger with today’s implementation is satisfactory: it provides enough throughput for the Fetch.AI system to function with a suitable margin remaining. We are, though, aware of many areas where further improvements, technology and optimisations can increase the performance significantly. Over the coming months, we will be implementing, benchmarking and releasing these improvements.