当前位置: 首页 > news >正文

【大模型】fineturn Q-wen

github上下载qwen1_5源码

修改finetun.sh

然后在路径qwen1_5/examples/sft下修改finetun.sh, 内容如下

#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=`pwd`# Guide:
# This script supports distributed training on multi-gpu workers (as well as single-worker training).
# Please set the options below according to the comments.
# For multi-gpu workers training, these options should be manually set for each worker.
# After setting the options, please run the script on each worker.# Number of GPUs per GPU worker
GPUS_PER_NODE=$(python -c 'import torch; print(torch.cuda.device_count())')# Number of GPU workers, for single-worker training, please set to 1
NNODES=${NNODES:-1}# The rank of this worker, should be in {0, ..., WORKER_CNT-1}, for single-worker training, please set to 0
NODE_RANK=${NODE_RANK:-0}# The ip address of the rank-0 worker, for single-worker training, please set to localhost
MASTER_ADDR=${MASTER_ADDR:-localhost}# The port for communication
MASTER_PORT=${MASTER_PORT:-6010}MODEL="Qwen/Qwen1.5-7B" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="path_to_data"
DS_CONFIG_PATH="finetune/ds_config_zero3.json"
USE_LORA=False
Q_LORA=Falsefunction usage() {echo '
Usage: bash finetune/finetune_lora_ds.sh [-m MODEL_PATH] [-d DATA_PATH] [--deepspeed DS_CONFIG_PATH] [--use_lora USE_LORA] [--q_lora Q_LORA]
'
}while [[ "$1" != "" ]]; docase $1 in-m | --model )shiftMODEL=$1;;-d | --data )shiftDATA=$1;;--deepspeed )shiftDS_CONFIG_PATH=$1;;--use_lora  )shiftUSE_LORA=$1;;--q_lora    )shiftQ_LORA=$1;;-h | --help )usageexit 0;;* )echo "Unknown argument ${1}"exit 1;;esacshift
doneDISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE \--nnodes $NNODES \--node_rank $NODE_RANK \--master_addr $MASTER_ADDR \--master_port $MASTER_PORT
"torchrun $DISTRIBUTED_ARGS finetune.py \--model_name_or_path $MODEL \--data_path $DATA \--bf16 True \--output_dir output_qwen \--num_train_epochs 5 \--per_device_train_batch_size 2 \--per_device_eval_batch_size 1 \--gradient_accumulation_steps 8 \--evaluation_strategy "no" \--save_strategy "steps" \--save_steps 10 \--save_total_limit 10 \--learning_rate 3e-4 \--weight_decay 0.01 \--adam_beta2 0.95 \--warmup_ratio 0.01 \--lr_scheduler_type "cosine" \--logging_steps 1 \--report_to "none" \--model_max_length 512 \--lazy_preprocess True \--use_lora ${USE_LORA} \--q_lora ${Q_LORA} \--gradient_checkpointing \--deepspeed ${DS_CONFIG_PATH}

训练

(在qwen1_5/examples/sft路径下开个bash里运行finetune.sh,不要在jupyter里跑)

pip install transformers==4.37.0# 要用命令行运行
# 不想用多卡训练的时候,先 export CUDA_VISIBLE_DEVICE=0
bash finetune.sh -m "/opt/app-root/src/Qwen1.5-14B-Chat" -d "./data/traindata.jsonl" --deepspeed "ds_config_zero3.json" --use_lora True

预测

(在qwen1_5/examples/sft路径下建个inference.py)

pip install transformers==4.33.0
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device = "cuda" # the device to load the model onto
path = "output_qwen/checkpoint-70"
model = AutoModelForCausalLM.from_pretrained(path,torch_dtype="auto",device_map="cuda:0"
)
tokenizer = AutoTokenizer.from_pretrained(path)def predict_answer(messages):text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)model_inputs = tokenizer([text], return_tensors="pt").to(device)generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512,)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]return responsetext = "xxxx"
messages = [{"role": "user", "content": "我需要起草投标文件中的一个章节,章节内容为:\n\n\n{}\n\n\n\n请将章节内容拆分成多个小节,每个小节覆盖一个信息点,形成一份本章节的提纲。注意,要覆盖所有信息点,不要使用‘同上、略’等省略表述,尽可能保持原文的措词。".format(text)}]
response = predict_answer(messages)
print(response)
训练数据格式

格式为jsonl,每行一条json,位于qwen1_5/examples/sft/data下,不妨命名为traindata.jsonl

{"type": "chatml", "messages": [{"role": "user", "content": "PROMPT"}, {"role": "assistant", "content": "ANSWER"}], "source": "self-made"}

相关文章:

  • 洛谷 P10374 操作
  • 【面试必看】Java并发
  • 经典面试题:MySQL如何调优?
  • JAVA实现图书管理系统(初阶)
  • LeetCode26. 删除有序数组中的重复项
  • win10/win11 优先调用大核的电源计划性能设置
  • 在vue中实现下载文件功能
  • VUE3-form表单保存附件与基本信息
  • 【C++初阶】—— 类和对象 (上)
  • 深入了解Redis的过期策略和内存淘汰机制
  • 5月27日
  • Spring Boot中如何实现定时任务?
  • el-select 组件获取整个对象
  • 模型实战(20)之 yolov8分类模型训练自己的数据集
  • yolov8+ROS+ubuntu18.04——学习记录
  • [rust! #004] [译] Rust 的内置 Traits, 使用场景, 方式, 和原因
  • [分享]iOS开发-关于在xcode中引用文件夹右边出现问号的解决办法
  • classpath对获取配置文件的影响
  • Java多线程(4):使用线程池执行定时任务
  • leetcode98. Validate Binary Search Tree
  • niucms就是以城市为分割单位,在上面 小区/乡村/同城论坛+58+团购
  • nodejs调试方法
  • Protobuf3语言指南
  • python大佬养成计划----difflib模块
  • Solarized Scheme
  • windows下如何用phpstorm同步测试服务器
  • 基于Javascript, Springboot的管理系统报表查询页面代码设计
  • 基于MaxCompute打造轻盈的人人车移动端数据平台
  • ------- 计算机网络基础
  • 解析带emoji和链接的聊天系统消息
  • 它承受着该等级不该有的简单, leetcode 564 寻找最近的回文数
  • 微信小程序--------语音识别(前端自己也能玩)
  • 硬币翻转问题,区间操作
  • 在Mac OS X上安装 Ruby运行环境
  • ​人工智能书单(数学基础篇)
  • #QT(QCharts绘制曲线)
  • $(document).ready(function(){}), $().ready(function(){})和$(function(){})三者区别
  • (1)(1.8) MSP(MultiWii 串行协议)(4.1 版)
  • (2)nginx 安装、启停
  • (2021|NIPS,扩散,无条件分数估计,条件分数估计)无分类器引导扩散
  • (BFS)hdoj2377-Bus Pass
  • (C语言)fgets与fputs函数详解
  • (js)循环条件满足时终止循环
  • (二)换源+apt-get基础配置+搜狗拼音
  • (分类)KNN算法- 参数调优
  • (附源码)ssm高校升本考试管理系统 毕业设计 201631
  • (十)【Jmeter】线程(Threads(Users))之jp@gc - Stepping Thread Group (deprecated)
  • (转)Oracle 9i 数据库设计指引全集(1)
  • (转)详解PHP处理密码的几种方式
  • (自用)gtest单元测试
  • .bat批处理(十):从路径字符串中截取盘符、文件名、后缀名等信息
  • .NET Core WebAPI中封装Swagger配置
  • .net core 实现redis分片_基于 Redis 的分布式任务调度框架 earth-frost
  • .net dataexcel 脚本公式 函数源码
  • .NET delegate 委托 、 Event 事件