August 7, 2019

Testnet Table Corruption Post-mortem

The Telos Core Devs reported on July 26, 2019 about its work in performing a postmortem investigation of the corrupted ‘producers’ table on the Telos testnet that required a roll-back and restoration of the testnet. The ‘producers’ table keeps track of the block producers, their produced and missed blocks, payments, votes received, and other important information. The chain cannot function if this table is corrupted. Until the root cause of this problem was found, it would be potentially unsafe to upgrade to the next version of eosio on our mainnet for risk of creating the same problem there. The TCD has, therefore, investigated this issue with an abundance of caution. Fortunately, the root cause has now been discovered and updating the Telos mainnet can proceed.

The Cause

The corruption of the ‘producers’ table was caused by a different structure of the testnet version of this table compared to the mainnet version. In short, the testnet has been operating in its current form since long before the mainnet was launched in December 2018. In developing the feature to unreg non-functional block producers from the schedule, an additional table column was added to reflect the unreg reason. This existed in the mainnet version of the ‘producers’ table, but not in the testnet version, which had been developed previously. EOSIO tables cannot be updated in structure once created and filled with data. When the update to code for the mainnet was pushed to the testnet in order to test REX creation features, the functions written for the Telos mainnet — with one extra table column for unreg reason — conflicted with the structure of the testnet version which did not have this column. The result was that the table was corrupted by receiving data of a different type which was meant for a different column.

The Path Forward

One of the aspects that made finding this problem difficult is the fact that it does not exist on the Telos mainnet because the table structures match between the mainnet and the source code. While this proved vexing in debugging, the very good news is that it means that no action is necessary to address this issue on the Telos mainnet. This means that the process of upgrading Telos system contracts to be able to support the changes of the Telos Economic Development Plan (TEDP) including REX staking can proceed immediately and the upgrade to eosio 1.8.x can progress thereafter.

Restarting Telos Testnet

The function of a testnet is to replicate a mainnet as closely as possible so that code can be tested there and expected to perform identically on the mainnet. Due to this error in the ‘producers’ table, the current testnet does not fit this purpose and is unsuitable for testing — particularly for testing this function. Therefore, the TCD and Telos block producers are creating a new testnet to replace the current one. This will allow new functions to be tested before implementing them on our mainnet.

The current Telos testnet will be depreciated in the near future, following a period of time for developers working there to migrate their test applications to the new version. This is perfectly within the expected life-cycle of a testnet. Periodic replacement of the Telos testnet is expected and needed to prune unnecessarily large storage files that result from prolonged operation of the testnet. The Telos block producers will determine the official date for retiring the previous testnet. Until that period, there will be two testnets with the newer one being used by the TCD but the older one reflected on public block explorers.

Updating Telos Mainnet

The TCD has already produced the new source code necessary to update the Telos mainnet for TEDP and REX. Before new code is pushed to the mainnet, the TCD must merge and compile the proposed code and conduct unit tests to ensure it is performing as intended (and perform fixes where it does not). This unit testing phase is expected to conclude by Friday, August 9th. Once complete, this code will be pushed to the new testnet for operations testing. Upon successful deployment there, the updated code will be proposed as a multisig transaction for the Telos block producers. When approved by the BPs, these new functions will run on the Telos mainnet and the features of the TEDP and REX will be active.

About the author: Douglas Horn is the Telos architect and whitepaper author, and the founder of GoodBlock, a block producer and app developer for the Telos Blockchain Network.

More about GoodBlock can be found at:

Join us on Twitter @GoodBlockio

Vote for GoodBlock on the Telos Blockchain Network @goodblocktls