Ben Chuanlong Du's Blog

It is never too late to learn.

String in Java

Comments

  1. String is a immutable class in Java. Extensive operations on strings (e.g., + in a big loop) is usually very slow before Java 7 (the + operator is optimized by the compiler automatically starting from Java 7). To avoid this problem (in older versions of Java), you can use the StringBuilder

Aggregate DataFrames in Spark

Aggregation Without Grouping

  1. You can aggregate all values in Columns of a DataFrame. Just use aggregation functions in select without groupBy, which is very similar to SQL syntax.

  2. The aggregation functions all and any are available since Spark 3.0. However, they can be achieved using other aggregation functions such as sum

Hands on the Python module Multiprocessing

Comments

  1. multiprocess is a fork of the Python standard libary multiprocessing . multiprocess extends multiprocessing to provide enhanced serialization, using dill. multiprocess leverages multiprocessing to support the spawning of processes using the API of the python standard library's threading module.

  2. multiprocessing.Pool.map does not work with lambda functions due to the fact that lambda functions cannot be pickled. There are multiple approaches to avoid the issue. You can define a function or use functools.partial

Gradle Kotlin DSL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

AVOID using the Kotlin DSL for Gradle! The Kotlin DSL for Gradle is not mature and lack of documentation at this time. Stick with Groovy DSL for Gradle.

shadowJar

https://github …