Blockchains are decentralized systems and are relatively slower than the centralized distributed systems. This is because decentralized systems have each node processing all the data so that they could not be hacked or influenced by a single point of control or failure. Processing all the data by all the nodes also makes things slow.
In order to still make use of blockchains and decentralized systems in real world use-cases, where millions of users could use them in spite of their low throughput, we have to follow certain best practices. One of these best practice is about what kind of data should be stored on and processed by the blockchains.
This definitely depends on the use-case we are using blockchains for, but it is an important design decision to make blockchains scale. For example, in use-cases like supply chain and data reconciliation among several parties, we should only put that data on blockchain which is critical for verification and consensus. All other data should be off the chain.
By off the chain data or off-chain data, I mean to say the data that could be stored or processed by conventional services and databases, outside the blockchain network. And the data that must go on the blockchain for consensus is generally referred to as on-chain data.
When a blockchain-based application or solution stores and sends less data to the blockchain, it performs more efficiently. This is directly related to the fact that if we put less data on the chain, the nodes of the blockchain network have do less work in getting consensus and the system performs faster as a whole.
In most cases, to make sure blockchains perform efficiently, they are used as verification machines and not as databases. The data is stored on conventional distributed databases, but it is verified on the blockchain.