Scaling Law for LLM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Beyond neural scaling laws – Paper Explained

Scaling Laws refer to the observed trend of some machine learning architectures (notably transformers) to scale their performance on predictable power law when given more compute, data, or parameters (model size), assuming they are not bottlenecked on one of the other resources. This has been observed as highly consistent over more than six orders of magnitude.

Scaling Laws Literature Review https://epochai.org/blog/scaling-laws-literature-review#:~:text=The%20term%20%E2%80%9Cscaling%20laws%E2%80%9D%20in,width%2C%20or%20training%20compute).

Database of Scaling Laws https://docs.google.com/spreadsheets/d/1XHU0uyCojH6daSWEq9d1SHnlrQVW7li8iqBMasawMns/edit#gid=0

Training Large NNs

pruning
scaling

Beyond neural scaling laws – Paper Explained

WHY AND HOW OF SCALING LARGE LANGUAGE MODELS | NICHOLAS JOSEPH

References

Scaling Laws for Neural Language Models
Jared Kaplan | Scaling Laws and Their Implications for Coding AI
Beyond neural scaling laws: beating power law scaling via data pruning
WHY AND HOW OF SCALING LARGE LANGUAGE MODELS | NICHOLAS JOSEPH

Ben Chuanlong Du's Blog

It is never too late to learn.

Scaling Law for LLM

Training Large NNs

References

Comments