A critical step in starting any database project: relational vs. non-relational, CAP Theorem and more.
When you start a new enterprise database project, one of the most critical steps is choosing the right database. With the advent of big data, there are simply many more options for your data management needs. Choosing the right database will mean the following:
- First and foremost, you must understand how your database will be used under the scope of the requirements of your project.
- With one kind of database, you will only fulfill some of your database needs.
- Performance comes only after you’ve successfully matched all your database needs individually with the right kind of database.
- There’s always a tradeoff between consistency, availability and partition tolerance.
Understanding the Tradeoff
The reason that we have many database options available today is due to the CAP Theorem. CAP stands for consistency, availability and partition tolerance.
Consistency means that any read request will return the most recent write.
Availability means that the non-responding node must respond in a reasonable amount of time.
Partition Tolerance means that the system will continue to operate despite network or node failures.
"At any given time, only two of these 3 requirements can be satisfied at once".
Relational Databases traditionally feature strong consistency and high availability at the expense of partition tolerance.
Examples: SQL Server, MySQL, Oracle database, PostgresSQL, IBM DB2
Non-relational Databases have been developed to serve Availability and Partition Tolerance or (Consistency and Partition Tolerance) needs.
Examples: Memcached, Redis, Coherence, Hbase, BigTable, Accumulo, MongoDB, CouchDB
"For complex systems that are both read and write intensive, it may be important to have a combination of relational and non-relational databases to split up the tasks of reads versus writes to optimize CAP".
Important Questions to Ask
The next step to choosing a database is to have a list of questions to ask in relation to your business requirement. Here are some
How many relationships are in your data?
What is the level of complexity in your data?
How often do the data change?
How often does your application query the data?
How often does your application query the relationship underlying the data?
How often do your users update the data?
How often do your users update the logic in the data?
How critical is your Application in a disaster scenario?
Understanding the Advantages and the Disadvantages
Relational databases are optimized for writes. They are optimized for consistency and availability.
Advantages of relational databases include simplicity, ease of data retrieval, data integrity, and flexibility.
Disadvantages of relational databases include:
Costly — expensive to set up and maintain the database.
Structured Limits — relational databases have limits to field lengths. This can be cumbersome for storing a large amount of information in one field.
Isolation — multiple relational databases can easily become “islands of information”. It can be difficult to connect the databases where they can talk to each other.
Non-relational databases are optimized for reads. They serve Availability and Partition Tolerance, or Consistency and Partition Tolerance needs.
Advantages of non-relational databases include:
Flexibility — storage of large volumes of structured, semi-structured, and unstructured data.
Agile Programming — can accommodate quick iterations of sprints and code pushes.
Inexpensive Scalability — can scale out architecture efficiently without expensive overhead.
Disadvantages of non-relational databases include:
Data Consistency — non-relational databases do not perform ACID transactions. Instead, they rely on “eventual consistency”. The performance benefits of these databases mean there’s a cost of consistency.
Standardization — There isn’t a specific programming interface to the different databases. Each one varies in query language with another.
Scalability — Not all non-relational databases are good at automating the process of sharding, or spreading the database across multiple nodes. This presents limitations to be able to scale up or down for fluctuating demand.
Understanding the Different Types of Non-relational Databases
These days, there are different kinds of Non-relational databases. They fall into specific categories. Each category of non-relational database serves a given purpose.
Key-value — These databases work best with a simple database schema. It is best for many read, writes and few updates. It performs best when there are no complex queries or business logic.
Examples: Redis, Dynamo DB, and Cosmos DB.
Document — These databases work best if you need a flexible schema. The data is stored in XML or JSON format. You can live with high read performance and can balance read performance with write performance. You can use indexes to maximize your performance with these databases.
Examples: MongoDB, DynamoDB, and Couchbase.
Graph — These databases are great for when you have a complex database schema. You need to display the business logic frequently between nodes. Graph Databases will allow you to navigate between nodes.
Examples: Neo4j, Cosmos Db, and Amazon Neptune.
Now that you have a solid understanding of what’s needed to choose a database for your next enterprise database project, you can select one or a few for different Application needs that you may have. Your selection will be an informed one that takes account of all the different business needs that you may have.
“Choosing The Right Database”– Jun Wu Tweet