Ben Chuanlong Du's Blog

It is never too late to learn.

SQL Database Client-server Protocols

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Apache Arrow Flight is the future protocol for querying Databases! It use columnar data and leverages Apache Arrow to avoid unnecessary copy of data, which makes it able to query large …

Use TableSample in SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The limit clause (or the method DataFrame.limit if you are using Spark) is a better alternative if randomness is not critical.

PostgreSQL

SELECT id from table TABLESAMPLE BERNOULLI(10) WHERE …

SQL Equivalent

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

SQL translation is a great tool that transalte any SQL statement(s) to a different dialetc using the JOOQ Parser.

SQL Variant Code
List
databases [1 …

Split String into Rows in SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

split

SELECT
    A.state,
    split.A.value('.', 'VARCHAR(100)') AS String
FROM (
    SELECT 
        state,  
        CAST('<M>' + REPLACE(city, ',', '</M><M>') + '</M>' AS XML) AS string  
    FROM
        TableA
    ) AS A
CROSS APPLY String.nodes ('/M') AS split(a)

Improve the Performance of Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Plan Your Work

  1. Have a clear idea about what you want to do is very important, especially when you are working on an explorative project. It often saves you time to …