Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The YouTube video How to Choose The Right Database has great advices on how to choose the right databases.
Types of Databases
- relational
- non-relational
- key value database
- document database
- wide column database
- graph database
- search engine database
- time series database
Advantages of Relational Databases - consitency - security - ease of backup and recovery
Advantages of Non-relational Databases - flexibility - scalability - cost of effectiveness
Storage Format
- row storage
- columnar storage
columnar storage is good for analytical operations
Comparison of Databases
Name | Language | Opensource/Free | PACELC | Advantages | Disadvantages | Comment |
---|---|---|---|---|---|---|
MySQL [1] | SQL | Opensource | PC/EC | the most popular opensource RDBMS | ||
Cassandra [1] | CQL (Cassandra Query Language) |
Opensource | PA/EL | real-time | no join | |
HBase [1] | Opensource | PC/EC | real-time | no join | ||
ClickHouse [2] | SQL | Opensource | OLAP for big data | Has very good performance | ||
TiDB [3] | SQL | Opensource | OLAP for big data | good performance, support integration with Spark | ||
Redis [4] | DSL (hashmap API-like) |
Opensource | Distributed in-memory cache for real-time applications | Queries or joins | ||
neo4j [5] | Cypher (Graph Query Language) |
Opensource | Graph applications | The most popular graph database | ||
Elasticsearch [6] | DSL, SQL | Opensource | Out-of-the-box search engine for large documents | Designed as a search engine but also popularly used as a database | ||
TDengine [7] | SQL | Opensource | IoT | IoT, good performance |
[2] ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.
yugabyte-db
scylladb
Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.
MongoDB
MongoDB is a document-oriented, disk-based database optimized for operational simplicity, schema-free design and very large data volumes.
Distributed In-memory Cache
A distributed in-memory cache is essentially a distributed key-value storage/database. You can think it as a hashmap over network.
Redis is the most popular in-memory cache which is implemented in C. memcached is another (not so popular) in-memory cache and is also implemented in C. pelikan is Twitter's unified cache backend which is implemented in C and Rust.