Choosing The Right Database

Jun Wu
Apr 08, 2020

A critical step in starting any database project: relational vs. non-relational, CAP Theorem and more.


When you start a new enterprise database project, one of the most critical steps is choosing the right database. With the advent of big data, there are simply many more options for your data management needs. Choosing the right database will mean the following:
  • First and foremost, you must understand how your database will be used under the scope of the requirements of your project.
  • With one kind of database, you will only fulfill some of your database needs.
  • Performance comes only after you’ve successfully matched all your database needs individually with the right kind of database.
  • There’s always a tradeoff between consistency, availability and partition tolerance.
Understanding the Tradeoff

The reason that we have many database options available today is due to the CAP Theorem. CAP stands for consistency, availability and partition tolerance.

means that any read request will return the most recent write.

means that the non-responding node must respond in a reasonable amount of time.

Partition Tolerance
means that the system will continue to operate despite network or node failures.

"At any given time, only two of these 3 requirements can be satisfied at once".

Relational Databases
traditionally feature strong consistency and high availability at the expense of partition tolerance.

Examples: SQL Server, MySQL, Oracle database, PostgresSQL, IBM DB2

Non-relational Databases
have been developed to serve Availability and Partition Tolerance or (Consistency and Partition Tolerance) needs.

Examples: Memcached, Redis, Coherence, Hbase, BigTable, Accumulo, MongoDB, CouchDB

"For complex systems that are both read and write intensive, it may be important to have a combination of relational and non-relational databases to split up the tasks of reads versus writes to optimize CAP".
Important Questions to Ask

The next step to choosing a database is to have a list of questions to ask in relation to your business requirement. Here are some

How many relationships are in your data? 
What is the level of complexity in your data? 
How often do the data change? 
How often does your application query the data? 
How often does your application query the relationship underlying the data?
How often do your users update the data? 
How often do your users update the logic in the data?
How critical is your Application in a disaster scenario?

Understanding the Advantages and the Disadvantages

Relational databases
are optimized for writes. They are optimized for consistency and availability.

of relational databases include simplicity, ease of data retrieval, data integrity, and flexibility.

of relational databases include:

Costly — expensive to set up and maintain the database.
Structured Limits — relational databases have limits to field lengths. This can be cumbersome for storing a large amount of information in one field. 
Isolation — multiple relational databases can easily become “islands of information”. It can be difficult to connect the databases where they can talk to each other.

Non-relational databases
are optimized for reads. They serve Availability and Partition Tolerance, or Consistency and Partition Tolerance needs.

of non-relational databases include:

Flexibility — storage of large volumes of structured, semi-structured, and unstructured data. 
Agile Programming — can accommodate quick iterations of sprints and code pushes.
Inexpensive Scalability — can scale out architecture efficiently without expensive overhead.

of non-relational databases include:

Data Consistency — non-relational databases do not perform ACID transactions. Instead, they rely on “eventual consistency”. The performance benefits of these databases mean there’s a cost of consistency.

Standardization — There isn’t a specific programming interface to the different databases. Each one varies in query language with another.

Scalability — Not all non-relational databases are good at automating the process of sharding, or spreading the database across multiple nodes. This presents limitations to be able to scale up or down for fluctuating demand.

Understanding the Different Types of Non-relational Databases

These days, there are different kinds of Non-relational databases. They fall into specific categories. Each category of non-relational database serves a given purpose.

— These databases work best with a simple database schema. It is best for many read, writes and few updates. It performs best when there are no complex queries or business logic.

Examples: Redis, Dynamo DB, and Cosmos DB.

— These databases work best if you need a flexible schema. The data is stored in XML or JSON format. You can live with high read performance and can balance read performance with write performance. You can use indexes to maximize your performance with these databases.

Examples: MongoDB, DynamoDB, and Couchbase.

— These databases are great for when you have a complex database schema. You need to display the business logic frequently between nodes. Graph Databases will allow you to navigate between nodes.

Examples: Neo4j, Cosmos Db, and Amazon Neptune.

Now that you have a solid understanding of what’s needed to choose a database for your next enterprise database project, you can select one or a few for different Application needs that you may have. Your selection will be an informed one that takes account of all the different business needs that you may have.

“Choosing The Right Database”
– Jun Wu twitter social icon Tweet

Share this article:


Post a comment
Log In to Comment

Related Stories

Jul 23, 2021

Pandas vs SQL. When Data Scientists Should Use One Over the Other

A deep dive into the benefits of each toolTable of ContentsIntroductionPandasSQLSummaryReferencesIntroductionBoth of these tools are important to n...

Matt Przybyla
By Matt Przybyla
Jul 14, 2021

How To Write The Perfect Data Science CV

These tips are also applicable to Software Engineers. Make a few changes in your CV and land that job!Writing a good CV can be one of the toughest ...

Roman Orac
By Roman Orac
Jul 09, 2021

Separating Hype From Value In Artificial Intelligence

You've probably heard a lot about data science, artificial intelligence and big data. Frankly, there has been a lot of hype around these areas. Wha...

Daniel Morales
By Daniel Morales

Win USD $2,000 in cash prizes with our data science competition!

🎉 Model submissions for the "Keyword Recency Prediction" competition will close in

arrow-up icon