Distance and Similarity for Machine Learning
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Cosine Similarity
Jaccard Index (Jaccard Similarity Coefficient)
Eucleadian Distance
L1 Distance
Chebyshev Distance
References
https://en.wikipedia.org/wiki/Cosine_similarity
https://en.wikipedia.org/wiki/Jaccard_index
Use XGBoost With Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The split-by-leaf mode (grow_policy="lossguide"
) is not supported in distributed training,
which makes XGBoost4J on Spark much slower than LightGBM on Spark.
XGBoost with Spark
https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d
https://xgboost …
Convert a Tensor to a Numpy Array or List in PyTorch
Tips¶
There are multiple ways to convert a Tensor to a numpy array in PyTorch.
First,
you can call the method Tensor.numpy
.
my_tensor.numpy()
Second,
you can use the function numpy.array
.
import numpy as np
np.array(my_tensor)
It is suggested that you use the function numpy.array
to convert a Tensor to a numpy array.
The reason is that numpy.array
is more generic.
You can also use it to convert other objects (e.g., PIL.Image)
to numpy arrays
while those objects might not have a method named numpy
Tips on the Transformers Python Library for NLP
Tokenization in NLP
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Libraries
SentencePiece
SentencePiece is an unsupervised text tokenizer for Neural Network-based text generation.