Ethereum can host decentralized applications, DAPPs. These applications exist through small programs that live on the Blockchain: Smart Contracts.

But when I write a Smart Contract, where is my application data Stored?

We want to understand how data storage works before working on this platform. Code execution, servers and programming language are rarely critical to the design of an application. But the Data, its structure, and its security will constrain most of our design.

Let’s imagine we are porting apps to Ethereum:

  • For a Facebook-like, where are the publications and comments data?
  • For a Dropbox-like, where are my private files?
  • Or for a Slack-like chat app, where do we store discussion channels? What about Private Messages?


The Account Machine

We’re going to forget about the Blockchain for a minute: we already know it’s a machine that generates Consensus. We can forget the Blockchain and assume that “Ethereum is a big, slow, reliable, computer.”

We’re looking at the Ethereum System from a higher level of abstraction: the Software part.

Ethereum holds a set of accounts. Every account has an owner and a balance (some Ether).

If I prove my identity, I can transfer Ether from my account to another. The money will flow from one account to the other. It’s an atomic operation called a Transaction.

The Ethereum Software is a Transaction Processing System:
1. We have a state: the set of all accounts and their balance,
2. We apply one or more transactions,
3. We get a new state: an updated set of accounts and their balances.

And that’s it!

Today we look into the ability to execute code and programs within a transaction. That’s where Smart Contracts come into play.

Robot Accounts

Every account has an owner and a balance. But some of these accounts are special; they own themselves. At creation time, we give them a piece of code and memory. That’s a Smart Contract.

A Smart Contract is a smart bank account. I find the Contract term unclear, think of them as Robot Accounts.

It’s a Robot that executes some code when it receives transactions. This transaction happens within the Blockchain. It’s public, replicated and validated by the network.
A Smart Contract won’t fail because of a power outage in a Datacenter.

A Smart Contract has a balance, some code, and some storage. This storage is persistent, and that’s where we’ll find a DAPP data.

Storage of Robot Accounts

When a Smart Contract is created or when a transaction awakens it, the Contract’s code can read and write to its storage space.

Storage Specifications:

It’s a big dictionary (key-value store) that maps keys to values.
Keys are strings of 32 bytes. We can have 232 x 8 bits = 2256 different keys. Same for values.

It’s like a Redis, RocksDB or LevelDB storage.

A DAPP and its Smart Contracts may use this storage a page of hard-drive storage in a regular program.

Here’s an example of a Smart Contract structure. It uses the Solidity Programming Language:

'' Solidity Code (solidity.readthedocs.io)
struct Voter {
 uint weight;
 bool voted;
 uint8 vote;
 address delegate;
}

2256 keys x 32 bytes (values) is around 1063 PETABYTES. You would need billions of time the age of the universe to go through this amount of data with an SSD.
We can assume there’s no storage limit for a DAPP.

But there’s a cost:

DAPPs Fuel

For every transaction, we add some Ether, the gas (fuel). The emitter of the transaction pays this tax to motivate the miners to process the transaction. Miners ensure the network is reliable and we reward them with some Ether. Gas is the fuel of the Ethereum Machine.

We send transactions & some fuel to this big machine. When the transaction targets a Smart Contract, the Ethereum machine starts the Account’s Robot. Each action of this robot will burn some more gas.

These Robot’s Actions are instructions in the Ethereum Virtual Machine (EVM). There are instructions to read in storage, instructions to write, etc. They all have a cost in gas, and that cost will constrain how much storage we may use.

Storage Cost

The cost of each instruction in a Smart Contract will limit the amount of storage it uses. Ethereum allows for a theoretically infinite storage space, BUT you have to provide gas for every read/write operation.

This cost changes all the time: it depends on the network, the market and new developments of the Ethereum specs. To get a general idea of the pricing, I simulated a few Smart Contracts:

I tried three operations:

  1. Writing a uint8 (one byte) in storage,
  2. Incrementing a uint8 in the storage (read then write),
  3. A simple voting function. It checks that the emitter of the transaction has the right to vote and update the vote result. You can vote only once; the second attempt is short-circuited.

Code and tools are in the Appendix below. Here are the numbers:

According to this table, this article would cost around 50 Euros to store with a Smart Contract. Without pictures.
Posting a tweet, a few euros and ordering on Amazon a few cents.

These are estimations, orders of magnitude. The exact cost will depend on the exact instructions you use. Also on the network load, on the gas market, etc. New algorithms in Ethereum might push the price down (Proof Of Stake).

Two implications of the Blockchain Architecture

We can ignore the Blockchain to understand how a DAPP stores data in Ethereum. But the Blockchain implies a few properties.

1. You can read from the Blockchain for free
If you install the Ethereum client and you get into the network. Even without mining, the client will synchronize all the blockchain data. All the data of every DAPP on Ethereum is then available on your machine, without any gas cost.

2. Storage Costs doesn’t depend on duration
Each member of the network may replay the entire history of transactions. By design, the data stored in Ethereum is never deleted, so there’s no benefit to removing a key-value.

Finally, where should I store my data?

Well, maybe not on the Ethereum Blockchain. The data stored there, with Smart Contracts, is safe and easy to access. But the cost and the structure of the store makes it relevant for metadata-related uses.

Taking the examples from the introduction: Users Posts, Files and Message Boxes will probably be on another platform like IPFS. In the Ethereum Blockchain, we would store critical data. Data like encryption keys, roots to storage trees & authorizations.

Appendix

Piece of code used for the table:

pragma solidity ^0.4.0;

contract Test {
    mapping(uint =\> uint) tests;

    function Test() {
    }

    function one_set() {
        tests[0] = 0;
    }

    function two_increment() {
        tests[0] = tests[0] + 1;
    }
}
/// Give a single vote to proposal $(proposal).
function vote(uint8 proposal) {
    Voter storage sender = voters[msg.sender];
    if (sender.voted || proposal >= proposals.length) return;
        sender.voted = true;
        sender.vote = proposal;
        proposals[proposal].voteCount += sender.weight;
    }
}

Tools used to run the code and evaluate the costs: