大模型发展史的核心论文汇总。

发布日期: 2025-05-07

仅用于站内搜索，没有排版格式，具体信息请跳转上方微信公众号内链接

↑↑↑关注后”星标”kaggle竞赛宝典
kaggle竞赛宝典
文章摘自：Awesome-LLM
大模型发展史的核心论文汇总。
日期
关键词
机构
论文
2017-06
Transformers
Google
AttentionIsAllYouNeed
2018-06
GPT1.0
OpenAI
ImprovingLanguageUnderstandingbyGenerativePre-Training
2018-10
BERT
Google
BERT:Pre-trainingofDeepBidirectionalTransformersforLanguageUnderstanding
2019-02
GPT2.0
OpenAI
LanguageModelsareUnsupervisedMultitaskLearners
2019-09
Megatron-LM
NVIDIA
Megatron-LM:TrainingMulti-BillionParameterLanguageModelsUsingModelParallelism
2019-10
T5
Google
ExploringtheLimitsofTransferLearningwithaUnifiedText-to-TextTransformer
2019-10
ZeRO
Microsoft
ZeRO:MemoryOptimizationsTowardTrainingTrillionParameterModels
2020-01
ScalingLaw
OpenAI
ScalingLawsforNeuralLanguageModels
2020-05
GPT3.0
OpenAI
Languagemodelsarefew-shotlearners
2021-01
SwitchTransformers
Google
SwitchTransformers:ScalingtoTrillionParameterModelswithSimpleandEfficientSparsity
2021-08
Codex
OpenAI
EvaluatingLargeLanguageModelsTrainedonCode
2021-08
FoundationModels
Stanford
OntheOpportunitiesandRisksofFoundationModels
2021-09
FLAN
Google
FinetunedLanguageModelsareZero-ShotLearners
2021-10
T0
HuggingFaceetal.
MultitaskPromptedTrainingEnablesZero-ShotTaskGeneralization
2021-12
GLaM
Google
GLaM:EfficientScalingofLanguageModelswithMixture-of-Experts
2021-12
WebGPT
OpenAI
WebGPT:Browser-assistedquestion-answeringwithhumanfeedback
2021-12
Retro
DeepMind
Improvinglanguagemodelsbyretrievingfromtrillionsoftokens
2021-12
Gopher
DeepMind
ScalingLanguageModels:Methods,Analysis&InsightsfromTrainingGopher
2022-01
COT
Google
Chain-of-ThoughtPromptingElicitsReasoninginLargeLanguageModels
2022-01
LaMDA
Google
LaMDA:LanguageModelsforDialogApplications
2022-01
Minerva
Google
SolvingQuantitativeReasoningProblemswithLanguageModels
2022-01
Megatron-TuringNLG
Microsoft&NVIDIA
UsingDeepandMegatrontoTrainMegatron-TuringNLG530B,ALarge-ScaleGenerativeLanguageModel
2022-03
InstructGPT
OpenAI
Traininglanguagemodelstofollowinstructionswithhumanfeedback
2022-04
PaLM
Google
PaLM:ScalingLanguageModelingwithPathways
2022-04
Chinchilla
DeepMind
TrainingCompute-OptimalLargeLanguageModels
2022-05
OPT
Meta
OPT:OpenPre-trainedTransformerLanguageModels
2022-05
UL2
Google
UnifyingLanguageLearningParadigms
2022-06
EmergentAbilities
Google
EmergentAbilitiesofLargeLanguageModels
2022-06
BIG-bench
Google
BeyondtheImitationGame:Quantifyingandextrapolatingthecapabilitiesoflanguagemodels
2022-06
METALM
Microsoft
LanguageModelsareGeneral-PurposeInterfaces
2022-09
Sparrow
DeepMind
Improvingalignmentofdialogueagentsviatargetedhumanjudgements
2022-10
Flan-T5/PaLM
Google
ScalingInstruction-FinetunedLanguageModels
2022-10
GLM-130B
Tsinghua
GLM-130B:AnOpenBilingualPre-trainedModel
2022-11
HELM
Stanford
HolisticEvaluationofLanguageModels
2022-11
BLOOM
BigScience
BLOOM:A176B-ParameterOpen-AccessMultilingualLanguageModel
2022-11
Galactica
Meta
Galactica:ALargeLanguageModelforScience
2022-12
OPT-IML
Meta
OPT-IML:ScalingLanguageModelInstructionMetaLearningthroughtheLensofGeneralization
2023-01
Flan2022Collection
Google
TheFlanCollection:DesigningDataandMethodsforEffectiveInstructionTuning
2023-02
LLaMA
Meta
LLaMA:OpenandEfficientFoundationLanguageModels
2023-02
Kosmos-1
Microsoft
LanguageIsNotAllYouNeed:AligningPerceptionwithLanguageModels
2023-03
LRU
DeepMind
ResurrectingRecurrentNeuralNetworksforLongSequences
2023-03
PaLM-E
Google
PaLM-E:AnEmbodiedMultimodalLanguageModel
2023-03
GPT4
OpenAI
GPT-4TechnicalReport
2023-04
LLaVA
UW–Madison&Microsoft
VisualInstructionTuning
2023-04
Pythia
EleutherAIetal.
Pythia:ASuiteforAnalyzingLargeLanguageModelsAcrossTrainingandScaling
2023-05
Dromedary
CMUetal.
Principle-DrivenSelf-AlignmentofLanguageModelsfromScratchwithMinimalHumanSupervision
2023-05
PaLM2
Google
PaLM2TechnicalReport
2023-05
RWKV
BoPeng
RWKV:ReinventingRNNsfortheTransformerEra
2023-05
DPO
Stanford
DirectPreferenceOptimization:YourLanguageModelisSecretlyaRewardModel
2023-05
ToT
Google&Princeton
TreeofThoughts:DeliberateProblemSolvingwithLargeLanguageModels
2023-07
LLaMA2
Meta
Llama2:OpenFoundationandFine-TunedChatModels
2023-10
Mistral7B
Mistral
Mistral7B
2023-12
Mamba
CMU&Princeton
Mamba:Linear-TimeSequenceModelingwithSelectiveStateSpaces
2024-01
DeepSeek-v2
DeepSeek
DeepSeek-V2:AStrong,Economical,andEfficientMixture-of-ExpertsLanguageModel
2024-02
OLMo
Ai2
OLMo:AcceleratingtheScienceofLanguageModels
2024-05
Mamba2
CMU&Princeton
TransformersareSSMs:GeneralizedModelsandEfficientAlgorithmsThroughStructuredStateSpaceDuality
2024-05
Llama3
Meta
TheLlama3HerdofModels
2024-06
FineWeb
HuggingFace
TheFineWebDatasets:DecantingtheWebfortheFinestTextDataatScale
2024-09
OLMoE
Ai2
OLMoE:OpenMixture-of-ExpertsLanguageModels
2024-12
Qwen2.5
Alibaba
Qwen2.5TechnicalReport
2024-12
DeepSeek-V3
DeepSeek
DeepSeek-V3TechnicalReport
2025-01
DeepSeek-R1
DeepSeek
DeepSeek-R1:IncentivizingReasoningCapabilityinLLMsviaReinforcementLearning