Ben Chuanlong Du's Blog

It is never too late to learn.

Yarn for Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. List all Spark applications.

    yarn application --list
    
  2. Show status of a Spark application.

    yarn application -status application_1459542433815_0002
    
  3. view logs of a Spark application.

    yarn logs -applicationId application_1459542433815_0002
    
  4. kill a Spark application …

Spark Issue Libc Not Found

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

/lib64/libc.so.6: version `GLIBC_2.18' not found (required by ...)

Cause

The required version of GLIBC by the binary executor is not found on Spark nodes.

Solution

Recompile your …

Data Quality

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  • Upper and lower bounds tests and Inter Quartile Range Checks(IQR) and standard deviations

  • Aggregate level checks (after manipulating data, there should still be the ability to explain how the data …

Tips on Delta Lake

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Delta Lake

Delta Table

convert to delta [db_name.]table_name [partitioned by ...] [vacuum [retain number hours]]

vaccum

describe history db_name.table_name

can select from historical snapshot can also rollback to a historical snapshot rollback …

Hive SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Hive is case-insensitive, both keywords and functions

  2. You can use both double and single quotes for strings

  3. use = rather than == for equality comparison but it seems that == also works

  4. use % rather …