当前位置：首页 > news >正文

Large Language Models(LLMs) Concepts

news 来源：原创 2024/9/20 13:36:16

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

Large: Training data and resources.
Language: Human-like text.
Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

Sentiment analysis
Identifying themes
Translating text or speech
Generating code
Next-word prediction

1.2、Real-world application

Transforming finance industry:

[Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

Revolutionizing healthcare sector:

- Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

Education:

- Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

Visual question answering:

Defining multimodel:Multimodel:
- Many types of processing or generationNun-multimodel:
- One type of processing or generationVisual question answering:
- Answers to questions about visual content
- Object identification & relationships
- Scene description

1.3、Challenges of language modeling

Sequence matters
Context modeling
Long-range dependency
Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

Overcome data's unstructured nature
Outperform traditional models
Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

Tokenization: Splits text into individual words, or tokens.
Stop word removal: Stop words do not add meaning.
Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

Text data into numerical form.

Bag-of-words:

Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

No explicit training.
Uses language understanding and context.
Generalizes without any prior examples.

2.4.2、Few-shot learning

Learn a new task with a few examples.

2.4.3、Multi-shot learning

Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

Supervised learning technique.
Predicts next word and generates coherent text.
Captures the dependencies between words.
Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

Hides a selective word.
Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

Relationship between words.
Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

Text preprocessing: tokenization, stop word removal, lemmatization.
Text representation: word embedding.

(2) Positional encoding:

Information on the position of each word.
Understand distant words.

(3) Encoders:

Attention mechanism: directs attention to specific words and relationships.
Neural network: process specific features.

(4) Decoders:

Includes attention and neural networks.
Generates the output.

3.2.3、Transformers and long-range dependencies

Initial challenge: lone-range dependency.
Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

Limitation of traditional language models: Sequential - one word at a time.
Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

Understand complex structures.
Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

Pre-training：
Fine-tuning:
RLHF:
（1）Why RLHF?

（2）Starts with the need to fine-tune

3.4.2、Simplifying RLHF

Model output reviewed by human.
Updates model based on the feedback.

Step1:

Receives a prompt.
Generates multiple responses.

Step2:

Human expert checks these responses.
Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

Learns from expert's ranking.
To align its response in future with their preferences.

And it goes on:

Continues to generate responses.
Receives expert's rankings.
Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

Data volume and compute power.
Data quality.
Labeling.
Bias.
Privacy.

4.1.1、Data volume and compute power

LLMs need a lot of data.
Extensive computing power.
Can cost millions of dollars.

4.1.2、Data quality

Quality data is essential.

4.1.3、Labeled data

Correct data label.
Labor-intensive.
Incorrect labels impact model performance.
Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

Influenced by societal stereotypes.
Lack of diversity in training data.
Discrimination and unfair outcomes.

Spot and deal with the biased data:

Evaluate data imbalances.
Promote diversity.
Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

Compliance with data protection and privacy regulations.
Sensitive or personally identifiable information (PII).
Privacy is a concern.
Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

Transparency risk - Challenging to understand the output.
Accountavility risk - Responsibility of LLMs' actions.
Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

Ecological footprint of LLMs.
Substantial energy resources to train.
Impact through carbon emissions.

4.3、Where are LLMs heading?

Model explainability.
Efficiency.
Unsupervised bias handling.
Enhanced creativity.

相关文章：

北京网站建设多少钱？

辽宁网页制作哪家好_网站建设

高端品牌网站建设_汉中网站制作

【最全深度学习介绍】基本概念、类型、应用、优缺点、与机器学习区别是什么？

springboot中根据id查询用户信息

SAP 查询中间表

【Spring】Spring MVC 入门（2）

TeamTalk消息服务器学习

spring入门（一）spring简介

debian系统安装mysql

taro ui 小程序at-calendar日历组件自定义样式+选择范围日历崩溃处理

阿里龙晰系统上将yum安装的myql_8.0.36升级到mysql_8.4.0的过程

279.完全平方数

【Qt 事件】—— 详解Qt事件处理

代码随想录Day 31|leetcode题目：56.合并区间、738.单调递增的数字、968.监控二叉树

【网络原理】从0开始学习计算机网络常识，中学生看了都能学会

倒计时1天！每日一题，零基础入门FPGA

【时间盒子】-【2.准备】HarmonyOS 开发前需要准备什么？

#Java异常处理

Just for fun——迅速写完快速排序

MD5加密原理解析及OC版原理实现

mysql 5.6 原生Online DDL解析

Netty+SpringBoot+FastDFS+Html5实现聊天App(六)

Phpstorm怎样批量删除空行？

spring cloud gateway 源码解析（4）跨域问题处理

spring学习第二天

storm drpc实例

UMLCHINA 首席专家潘加宇鼎力推荐

vuex 学习笔记 01

webpack4 一点通

等保2.0 | 几维安全发布等保检测、等保加固专版加速企业等保合规

对话：中国为什么有前途/ 写给中国的经济学

前端js -- this指向总结。

数组大概知多少

新书推荐|Windows黑客编程技术详解

一个6年java程序员的工作感悟，写给还在迷茫的你

容器镜像

LeetCode解法汇总518. 零钱兑换 II

＂无招胜有招＂nbsp;史上最全的互…

（9）YOLO-Pose:使用对象关键点相似性损失增强多人姿态估计的增强版YOLO

(cos^2 X）的定积分,求积分 ∫sin^2(x) dx

（C语言）二分查找超详细

（附源码）python房屋租赁管理系统毕业设计 745613

（附源码）基于ssm的模具配件账单管理系统毕业设计 081848

（黑马出品_高级篇_01）SpringCloud+RabbitMQ+Docker+Redis+搜索+分布式

（十）T检验-第一部分

（一）项目实践-利用Appdesigner制作目标跟踪仿真软件

（幽默漫画）有个程序员老公，是怎样的体验？

（原创）boost.property_tree解析xml的帮助类以及中文解析问题的解决

./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t)’: ./incl

.NET CF命令行调试器MDbg入门(一)

.net 调用海康SDK以及常见的坑解释

.NET 使用 ILMerge 合并多个程序集，避免引入额外的依赖

.NET/C# 如何获取当前进程的 CPU 和内存占用？如何获取全局 CPU 和内存占用？

.net的socket示例

.NET建议使用的大小写命名原则

.NET开发者必备的11款免费工具

/dev下添加设备节点的方法步骤（通过device_create）