当前位置: 首页 > news >正文

Large Language Models(LLMs) Concepts

1、Introduction to Large Language Models(LLM)

1.1、Definition of LLMs

  • Large: Training data and resources.
  • Language: Human-like text.
  • Models: Learn complex patterns using text data.

The LLM is considered the defining moment in the history of AI.

Some applications:

  • Sentiment analysis
  • Identifying themes
  • Translating text or speech
  • Generating code
  • Next-word prediction

1.2、Real-world application

  • Transforming finance industry: 
    [Investment outlook] | [Annual reports] | [News articles] | [Social media posts]--> LLM[Market analysis] | [Portfolio management] [Investment opportunities]

  • Revolutionizing healthcare sector:
    - Analyze patient data to offer personalized recommendations.- Must adhere to privacy laws.

  • Education:
    - Personalized coaching and feedback.- Interactive learning experience.- AI-powered tutor:- Ask questions.- Receive guidance.- Discuss ideas.

  • Visual question answering:
    Defining multimodel:Multimodel:
    - Many types of processing or generationNun-multimodel:
    - One type of processing or generationVisual question answering:
    - Answers to questions about visual content
    - Object identification & relationships
    - Scene description

1.3、Challenges of language modeling

  • Sequence matters
  • Context modeling
  • Long-range dependency
  • Single-task learning

2、Building Blocks of LLMs

2.1、Novelty of LLMs

  • Overcome data's unstructured nature
  • Outperform traditional models
  • Understand linguistic subteties

The bulding blocks show below:

2.2、Generalized overview of NLP

2.2.1、Text Pre-processing

Can be done in a different order as they are independent.

  • Tokenization: Splits text into individual words, or tokens.

  • Stop word removal: Stop words do not add meaning.

  • Lemmatization: Group slightly different words with similar meaning so we can reduce words to their basic form. For example, we can map them to their root word.

2.2.2、Text Representation

  • Text data into numerical form.
  • Bag-of-words:

     
    Limitation:- Does not capture the order or context.- Does not capture the semantics between the words.

  • Word embeddings:

2.3、Fine-tuning

Fine-tuning:
- Addresses some of these challenges.
- Adapts a pre-trained model.Pre-trained model:
- Learned from general-purpose datasets.
- Not optimized for specific-tasks.
- Can be fine-tuned for a specific problem.

2.4、Learning techniques

N-shot learning: zero-shot, few-show, and multi-shot.

2.4.1、Zero-shot learning

  • No explicit training.
  • Uses language understanding and context.
  • Generalizes without any prior examples.

2.4.2、Few-shot learning

  • Learn a new task with a few examples.

2.4.3、Multi-shot learning

  • Requires more examples than few-shot.

3、Training Methodology and Techniques

3.1、Building blocks to train LLMs

3.1.1、Generative pre-training

Trained using generative pre-training- Input data of text tokens.- Trained to predict the tokens within the dataset.Types:- Next word prediction.- Masked language modeling.

3.1.2、Next word prediction

  • Supervised learning technique.
  • Predicts next word and generates coherent text.
  • Captures the dependencies between words.
  • Training data consist of pairs of input and output examples.

3.1.3、Masked language modeling

  • Hides a selective word.
  • Trained model predicts the masked word.

3.2、Introducing the transformer

3.2.1、Transformer architecture

  • Relationship between words.
  • Components: Pre-processing, Positional Encoding, Encoders, and Decoders.

3.2.2、Inside the transformer

(1) Text pre-processing and representation:

  • Text preprocessing: tokenization, stop word removal, lemmatization.
  • Text representation: word embedding.

(2) Positional encoding:

  • Information on the position of each word.
  • Understand distant words.

(3) Encoders:

  • Attention mechanism: directs attention to specific words and relationships.
  • Neural network: process specific features.

(4) Decoders:

  • Includes attention and neural networks.
  • Generates the output.

3.2.3、Transformers and long-range dependencies

  • Initial challenge: lone-range dependency.
  • Attention: focus on different parts of the input.

3.2.4、Processes multiple parts simultaneously

  • Limitation of traditional language models: Sequential - one word at a time.
  • Transformers: Process multiple parts simultaneously (Faster processing).

3.3、Attention mechanisms

3.3.1、Attention mechanisms

  • Understand complex structures.
  • Focus on important words.

3.3.2、Two primary types: Slef-attention and multi-head attention

For example:

3.4、Advanced fine-tuning

3.4.1、LLM training three steps:

  • Pre-training:
  • Fine-tuning:
  • RLHF:
    (1)Why RLHF?

    (2)Starts with the need to fine-tune

3.4.2、Simplifying RLHF

  • Model output reviewed by human.
  • Updates model based on the feedback.

Step1:

  • Receives a prompt.
  • Generates multiple responses.

Step2:

  • Human expert checks these responses.
  • Ranks the responses based on quality: Accuracy、Relevance、Coherence.

Step3:

  • Learns from expert's ranking.
  • To align its response in future with their preferences.

And it goes on:

  • Continues to generate responses.
  • Receives expert's rankings.
  • Adjusts the learning.

3.4.3、Recap

4、Concerns and Considerations

4.1、Data concerns and considerations

  • Data volume and compute power.
  • Data quality.
  • Labeling.
  • Bias.
  • Privacy.

4.1.1、Data volume and compute power

  • LLMs need a lot of data.
  • Extensive computing power.
  • Can cost millions of dollars.

4.1.2、Data quality

  • Quality data is essential.

4.1.3、Labeled data

  • Correct data label.
  • Labor-intensive.
  • Incorrect labels impact model performance.
  • Address errors: identify >>> analyze >>> iterate.

4.1.4、Data bias

  • Influenced by societal stereotypes.
  • Lack of diversity in training data.
  • Discrimination and unfair outcomes.

Spot and deal with the biased data:

  • Evaluate data imbalances.
  • Promote diversity.
  • Bias mitigation techniques: more diverse examples.

4.1.5、Data privacy

  • Compliance with data protection and privacy regulations.
  • Sensitive or personally identifiable information (PII).
  • Privacy is a concern.
  • Get permission.

4.2、Ethical and environmental concerns

4.2.1、Ethical concerns

  • Transparency risk - Challenging to understand the output.
  • Accountavility risk - Responsibility of LLMs' actions.
  • Information hazards - Disseminating harmful information.

4.2.2、Environmental concerns

  • Ecological footprint of LLMs.
  • Substantial energy resources to train.
  • Impact through carbon emissions.

4.3、Where are LLMs heading?

  • Model explainability.
  • Efficiency.
  • Unsupervised bias handling.
  • Enhanced creativity.

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • 【最全深度学习介绍】基本概念、类型、应用、优缺点、与机器学习区别是什么?
  • springboot中根据id查询用户信息
  • SAP 查询中间表
  • 【Spring】Spring MVC 入门(2)
  • TeamTalk消息服务器学习
  • spring入门(一)spring简介
  • debian系统安装mysql
  • taro ui 小程序at-calendar日历组件自定义样式+选择范围日历崩溃处理
  • 阿里龙晰系统上将yum安装的myql_8.0.36升级到mysql_8.4.0的过程
  • 279.完全平方数
  • 【Qt 事件】—— 详解Qt事件处理
  • 代码随想录Day 31|leetcode题目:56.合并区间、738.单调递增的数字、968.监控二叉树
  • 【网络原理】从0开始学习计算机网络常识,中学生看了都能学会
  • 倒计时1天!每日一题,零基础入门FPGA
  • 【时间盒子】-【2.准备】HarmonyOS 开发前需要准备什么?
  • #Java异常处理
  • Just for fun——迅速写完快速排序
  • MD5加密原理解析及OC版原理实现
  • mysql 5.6 原生Online DDL解析
  • Netty+SpringBoot+FastDFS+Html5实现聊天App(六)
  • Phpstorm怎样批量删除空行?
  • spring cloud gateway 源码解析(4)跨域问题处理
  • spring学习第二天
  • storm drpc实例
  • UMLCHINA 首席专家潘加宇鼎力推荐
  • vuex 学习笔记 01
  • webpack4 一点通
  • 等保2.0 | 几维安全发布等保检测、等保加固专版 加速企业等保合规
  • 对话:中国为什么有前途/ 写给中国的经济学
  • 前端js -- this指向总结。
  • 数组大概知多少
  • 新书推荐|Windows黑客编程技术详解
  • 一个6年java程序员的工作感悟,写给还在迷茫的你
  • 容器镜像
  • ​LeetCode解法汇总518. 零钱兑换 II
  • "无招胜有招"nbsp;史上最全的互…
  • (9)YOLO-Pose:使用对象关键点相似性损失增强多人姿态估计的增强版YOLO
  • (cos^2 X)的定积分,求积分 ∫sin^2(x) dx
  • (C语言)二分查找 超详细
  • (附源码)python房屋租赁管理系统 毕业设计 745613
  • (附源码)基于ssm的模具配件账单管理系统 毕业设计 081848
  • (黑马出品_高级篇_01)SpringCloud+RabbitMQ+Docker+Redis+搜索+分布式
  • (十)T检验-第一部分
  • (一)项目实践-利用Appdesigner制作目标跟踪仿真软件
  • (幽默漫画)有个程序员老公,是怎样的体验?
  • (原创)boost.property_tree解析xml的帮助类以及中文解析问题的解决
  • ./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t)’: ./incl
  • .NET CF命令行调试器MDbg入门(一)
  • .net 调用海康SDK以及常见的坑解释
  • .NET 使用 ILMerge 合并多个程序集,避免引入额外的依赖
  • .NET/C# 如何获取当前进程的 CPU 和内存占用?如何获取全局 CPU 和内存占用?
  • .net的socket示例
  • .NET建议使用的大小写命名原则
  • .NET开发者必备的11款免费工具
  • /dev下添加设备节点的方法步骤(通过device_create)