Ben Chuanlong Du's Blog

It is never too late to learn.

Binary Serialization Format

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Summary

  1. Protobuf is best for message serialization. Some companies (e.g., Google) also use it extensively for disk serialization.

  2. FlatBuffers has better CPU performance.

  3. Apache Parquet is the most popular binary …

Useful Java Libraries

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Purpose Java Libraries

Guava

Guava is a high-quality general purpose Java opensource library mainly developed by Google. It has good immutable collection implementations which are preferred to Java's built-in immutable …

Unified SQL Syntax

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Trino

Trino is a distributed SQL query engine for big data. It is formerly known as PrestoSQL.

ZetaSQL

ZetaSQL is a customized SQL dialect, along with parser and analyzer, that Google …

SQL Database Client-server Protocols

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Apache Arrow Flight is the future protocol for querying Databases! It use columnar data and leverages Apache Arrow to avoid unnecessary copy of data, which makes it able to query large …

Cloud Object Storage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Amazon S3

Google Cloud Storage