瑞芯微(EASY EAI)RV1126B AI模型转换
1. AI模型转换本章主要说明如何实现Hugging Face格式的大语言模型Large Language Model, LLM如何转换为RKLLM模型目前支持的模型包括Deepseek、LLaMA, Qwen, Qwen2, Phi-2, Phi-3, ChatGLM3, Gemma, InternLM2 和 MiniCPM等本章以Deepseek-R1为例。本章主要说明如何实现Deepseek-R1大语言模型如何转换为RKLLM模型。1.1 模型下载本节提供两种大模型文件Hugging face的原始模型和转换完成的NPU模型。下载链接: https://pan.baidu.com/s/1u05E5qZcilbxCWMW0Dl6ag?pwd1234 提取码: 1234。1.2 模型转换下载完成后模型和脚本放到同一个目录:在RKLLM-Toolkit环境执行以下指令进行模型转换:至此模型转换成功,生成deepseek_r1_rv1126b_w4a16.rkllm NPU化的大模型文件test.py转换脚本如下所示, 用于转换DeepSeek-R1-Distill-Qwen-1.5B模型:from rkllm.api import RKLLM from datasets import load_dataset from transformers import AutoTokenizer from tqdm import tqdm import torch from torch import nn import os # os.environ[CUDA_VISIBLE_DEVICES]1 modelpath /home/developer/RKLLM-Toolkit/DeepSeek-R1-Distill-Qwen-1.5B llm RKLLM() # Load model # Use export CUDA_VISIBLE_DEVICES2 to specify GPU device # options [cpu, cuda] ret llm.load_huggingface(modelmodelpath, model_lora None, devicecpu) # ret llm.load_gguf(model modelpath) if ret ! 0: print(Load model failed!) exit(ret) # Build model dataset ./data_quant.json # Json file format, please note to add prompt in the inputlike this: # [{input:Human: 你好\nAssistant: , target: 你好我是人工智能助手KK},...] qparams None # qparams gdq.qparams # Use extra_qparams ret llm.build(do_quantizationTrue, optimization_level1, quantized_dtypew4a16, quantized_algorithmnormal, target_platformrv1126b, num_npu_core1, extra_qparamsqparams, datasetNone) if ret ! 0: print(Build model failed!) exit(ret) # Chat with model messages |im_start|system You are a helpful assistant.|im_end||im_start|user你好\n|im_end||im_start|assistant kwargs {max_length: 128, top_k: 1, top_p: 0.8, temperature: 0.8, do_sample: True, repetition_penalty: 1.1} # print(llm.chat_model(messages, kwargs)) # Export rkllm model ret llm.export_rkllm(./deepseek_r1_rv1126b_w4a16.rkllm) if ret ! 0: print(Export model failed!) exit(ret)