What is database sharding?

Sharding

First, let’s explore database sharding. It is the process of splitting a single (usually large) dataset into various smaller chunks (known as shards) that are stored across multiple databases. Sharding is considered to be a horizontal scaling solution since it increases the number of database instances in a system.

If, for example, we wanted to shard a large dataset of clothing inventory for an e-commerce application, it might look like this:

Sharding explained

Note how the inventory table was broken up and spread across multiple machines hosting a database. In this case, the table was broken up by the size value, with all items of a particular size in its own database instance (aka a shard).

Now that we have a general overview of the concept of sharding, let’s explore the distinct advantages and disadvantages:

Advantages

Increase in storage capacity: By increasing the number of shards, the overall total storage capacity of a system is increased.
Increased Availability: Even if one shard goes offline, the majority of shards will still be available to retrieve and store data. This means only a portion of the overall dataset will be unavailable.

Disadvantages

Query overhead: A database that has been sharded must have an independent machine or service that can properly route database queries to the appropriate shard. This increases latency and expense on every operation because if the query requires data from multiple shards, the router must query each shard and then merge the data.
Administration complexity: A database that has been sharded requires more upkeep and maintenance since there are now multiple machines with their own databases.
Increased cost: There is an inherent increase in cost because sharding requires more machines as well as computing power.

Ref: https://www.codecademy.com/article/database-scaling-strategies

What is database sharding?

Related Posts