LLM

Large Language Model studies and utilizations

Paradigm Shifts in Language Models

From beginning of Markov based approach that predicting next n-words, Language Model (LM) evolved drastically with the Transformers and GPT-2 / BERT - approaches. LLM stands for concurrent (2025) LM, mostly served by big tech companies and startups, most of them are now closed source but several LLM architectures and their way of training are officially open (LLama, Deepseek, and so on.)

(ref. https://arxiv.org/pdf/2303.18223.pdf )

Type	Backbone Architecture	Description	Example
LLM (Large Language Model, early 2020s ~ present)	Transformers (primarily)	Scaled up PLMs, dataset size, and computational resources. Trained with Human interactions (RLHF - Reinforcement Learning from Human Feedback) or other Reinforcement Learning techniques	GPT-3, GPT-4, PaLM-2, Gemini-1.5, LLama-3.1, DeepSeek-V3
PLM (Pre-trained Language Model, late 2010s)	Transformers, RNNs (earlier models)	Massive data induced deep learning model when pre-training, followed by fine-tuning for specific (downstream) tasks. Self-supervised learning with masked language model (MLM - ELMo, BERT, RoBERTa) and causal language model (GPT) applied for training.	ELMo (RNNs), BERT, RoBERTa, GPT-2 (Transformers)
NLM (Neural Language Model, early-mid 2010s)	RNNs, Feedforward Neural Networks	Introduces neural networks to model language, enabling distributed representations (word vectors) and overcoming some limitations of SLMs.	word2vec (Feedforward), GloVe (Matrix Factorization), Recurrent Neural Networks
SLM (Statistical Language Model, Pre 2010s)	N/A (Statistical methods)	Traditional probabilistic models (Markov assumption). Predicts next words / context(n-words).	n-gram language model

In brief, there are no huge differences between LLMs and PLMs in model base architecture. However, as a LLM has much larger size in terms of model parameters and it requires much more data, AI scientists tried various training techniques to train / fine-tuning the models.

LLM

Paradigm Shifts in Language Models

Table of contents