Choosing The Right Database

Jun Wu
Apr 08, 2020


A critical step in starting any database project: relational vs. non-relational, CAP Theorem and more.

@fellowferdi unsplash.com


When you start a new enterprise database project, one of the most critical steps is choosing the right database. With the advent of big data, there are simply many more options for your data management needs. Choosing the right database will mean the following:
  • First and foremost, you must understand how your database will be used under the scope of the requirements of your project.
  • With one kind of database, you will only fulfill some of your database needs.
  • Performance comes only after you’ve successfully matched all your database needs individually with the right kind of database.
  • There’s always a tradeoff between consistency, availability and partition tolerance.
Understanding the Tradeoff

The reason that we have many database options available today is due to the CAP Theorem. CAP stands for consistency, availability and partition tolerance.

Consistency
means that any read request will return the most recent write.

Availability
means that the non-responding node must respond in a reasonable amount of time.

Partition Tolerance
means that the system will continue to operate despite network or node failures.

"At any given time, only two of these 3 requirements can be satisfied at once".

Relational Databases
traditionally feature strong consistency and high availability at the expense of partition tolerance.

Examples: SQL Server, MySQL, Oracle database, PostgresSQL, IBM DB2

Non-relational Databases
have been developed to serve Availability and Partition Tolerance or (Consistency and Partition Tolerance) needs.

Examples: Memcached, Redis, Coherence, Hbase, BigTable, Accumulo, MongoDB, CouchDB

"For complex systems that are both read and write intensive, it may be important to have a combination of relational and non-relational databases to split up the tasks of reads versus writes to optimize CAP".
Important Questions to Ask

The next step to choosing a database is to have a list of questions to ask in relation to your business requirement. Here are some

How many relationships are in your data? 
What is the level of complexity in your data? 
How often do the data change? 
How often does your application query the data? 
How often does your application query the relationship underlying the data?
How often do your users update the data? 
How often do your users update the logic in the data?
How critical is your Application in a disaster scenario?

Understanding the Advantages and the Disadvantages

Relational databases
are optimized for writes. They are optimized for consistency and availability.

Advantages
of relational databases include simplicity, ease of data retrieval, data integrity, and flexibility.

Disadvantages
of relational databases include:

Costly — expensive to set up and maintain the database.
Structured Limits — relational databases have limits to field lengths. This can be cumbersome for storing a large amount of information in one field. 
Isolation — multiple relational databases can easily become “islands of information”. It can be difficult to connect the databases where they can talk to each other.

Non-relational databases
are optimized for reads. They serve Availability and Partition Tolerance, or Consistency and Partition Tolerance needs.

Advantages
of non-relational databases include:

Flexibility — storage of large volumes of structured, semi-structured, and unstructured data. 
Agile Programming — can accommodate quick iterations of sprints and code pushes.
Inexpensive Scalability — can scale out architecture efficiently without expensive overhead.

Disadvantages
of non-relational databases include:

Data Consistency — non-relational databases do not perform ACID transactions. Instead, they rely on “eventual consistency”. The performance benefits of these databases mean there’s a cost of consistency.

Standardization — There isn’t a specific programming interface to the different databases. Each one varies in query language with another.

Scalability — Not all non-relational databases are good at automating the process of sharding, or spreading the database across multiple nodes. This presents limitations to be able to scale up or down for fluctuating demand.

Understanding the Different Types of Non-relational Databases

These days, there are different kinds of Non-relational databases. They fall into specific categories. Each category of non-relational database serves a given purpose.

Key-value
— These databases work best with a simple database schema. It is best for many read, writes and few updates. It performs best when there are no complex queries or business logic.

Examples: Redis, Dynamo DB, and Cosmos DB.

Document
— These databases work best if you need a flexible schema. The data is stored in XML or JSON format. You can live with high read performance and can balance read performance with write performance. You can use indexes to maximize your performance with these databases.

Examples: MongoDB, DynamoDB, and Couchbase.

Graph
— These databases are great for when you have a complex database schema. You need to display the business logic frequently between nodes. Graph Databases will allow you to navigate between nodes.

Examples: Neo4j, Cosmos Db, and Amazon Neptune.

Now that you have a solid understanding of what’s needed to choose a database for your next enterprise database project, you can select one or a few for different Application needs that you may have. Your selection will be an informed one that takes account of all the different business needs that you may have.

“Choosing The Right Database”
– Jun Wu twitter social icon Tweet


Share this article:

0 Comments

Post a comment
Log In to Comment

Related Stories

Nov 25, 2021

5 Tips To Ace Your Job Interview For A Data Scientist Opening

5 Tips To Ace Your Job Interview For A Data Scientist Opening.PNG 795.94 KBImage SourceAspiring data scientists have a bright future ahead of them....

Daniel Morales
By Daniel Morales
Nov 12, 2021

When to Avoid Deep Learning

IntroductionThis article is intended for data scientists who may consider using deep learning algorithms, and want to know more about the cons of i...

Matt Przybyla
By Matt Przybyla
Oct 16, 2021

6 Advanced Statistical Concepts in Data Science

The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.In my previous articles Be...

Nagesh Singh Chauhan
By Nagesh Singh Chauhan
Icon

Join our private community in Slack

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!

 
We'll send you an invitational link to your email immediatly.
arrow-up icon