对DeepSeekR1模型进行微调

2025-04-15T13:17:07+08:00 | 9分钟阅读 | 更新于 2025-04-15T13:17:07+08:00

Macro Zhao

对DeepSeek-R1模型进行微调

推荐超级课程：

DeepSeek 正在颠覆人工智能领域，通过推出一系列高级推理模型来挑战 OpenAI 的主导地位。最好的部分是什么？这些模型完全免费使用，没有任何限制，使每个人都能使用它们。在本教程中，我们将对 Hugging Face 上的 Medical Chain-of-Thought 数据集进行微调，以微调 DeepSeek-R1-Distill-Llama-8B 模型。这个蒸馏的 DeepSeek-R1 模型是在 DeepSeek-R1 生成数据上微调 Llama 3.1 8B 模型创建的。它展示了与原始模型相似的推理能力。

DeepSeek R1 简介

中国人工智能公司 DeepSeek AI 已经开源了其第一代推理模型 DeepSeek-R1 和 DeepSeek-R1-Zero，它们在数学、编码和逻辑等推理任务上的性能可以与 OpenAI 的 o1 相媲美。

DeepSeek-R1-Zero

DeepSeek-R1-Zero 是第一个完全使用大规模强化学习 (RL) 进行训练的开源模型，而不是使用监督微调 (SFT) 作为初始步骤。这种方法使模型能够独立地探索思维链 (CoT) 推理、解决复杂问题并迭代地改进其输出。然而，它也带来了挑战，例如推理步骤重复、可读性差和语言混合，这些都可能影响其清晰度和可用性。

DeepSeek-R1

为了克服 DeepSeek-R1-Zero 的局限性，DeepSeek-R1 被引入，通过在强化学习之前包含冷启动数据，为推理和非推理任务提供一个强大的基础。这种多阶段训练使模型能够达到最先进的性能，与 OpenAI-o1 在数学、代码和推理基准测试中相当，同时提高了其输出的可读性和连贯性。

DeepSeek 蒸馏

除了需要大量计算能力和内存才能运行的大型语言模型之外，DeepSeek 还引入了蒸馏模型。这些更小、更高效的模型已经证明它们仍然可以实现惊人的推理性能。这些模型的参数范围从 1.5B 到 70B，它们保留了强大的推理能力，DeepSeek-R1-Distill-Qwen-32B 在多个基准测试中优于 OpenAI-o1-mini。较小的模型继承了较大模型的推理模式，展示了蒸馏过程的有效性。

微调 DeepSeek R1：分步指南

要微调 DeepSeek R1 模型，您可以按照以下步骤操作：

1. 设置

对于此项目，我们使用 Kaggle 作为我们的云 IDE，因为它提供了免费访问 GPU 的权限，这些 GPU 通常比 Google Colab 中可用的 GPU 更强大。要开始，请启动一个新的 Kaggle 笔记本，并将您的 Hugging Face Token和 Weights & Biases Token作为机密添加到其中。您可以通过导航到 Kaggle 笔记本界面中的 Add-ons 选项卡并选择 Secrets 选项来添加机密。设置机密后，安装 unsloth Python 包。Unsloth 是一个开源框架，旨在使微调大型语言模型 (LLMs) 的速度提高 2 倍，内存效率提高 2 倍。

!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

使用我们从 Kaggle Secrets 安全提取的 Hugging Face API 登录到 Hugging Face CLI。

from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

使用您的 API 密钥登录 Weights & Biases (wandb) 并创建一个新项目来跟踪实验和微调进度。

import wandb
wb_token = user_secrets.get_secret("wandb")
wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)

2. 加载模型和分词器

对于此项目，我们将加载 DeepSeek-R1-Distill-Llama-8B 的 Unsloth 版本。此外，我们将以 4 位量化方式加载模型以优化内存使用和性能。

from unsloth import FastLanguageModel
max_seq_length = 2048 
dtype = None 
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)

3. 微调前的模型推理

为了创建模型的提示样式，我们将定义一个系统提示，并包含用于问题生成和响应生成的占位符。该提示将引导模型逐步思考并提供逻辑上准确且准确的响应。

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
### Question:
{}
### Response:
<think>{}"""

在这个例子中，我们将一个医疗问题提供给 prompt_style，将其转换为Token，然后将Token传递给模型进行响应生成。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

即使没有进行微调，我们的模型也成功地生成了思维链并提供了解释，然后才给出最终答案。推理过程包含在标签中。那么，为什么我们还需要微调呢？推理过程虽然详细，但冗长且不简洁。此外，最终答案以项目符号格式呈现，这与我们想要微调的数据集的结构和风格不符。

<think>
Okay, so I have this medical question to answer. Let me try to break it down. The patient is a 61-year-old woman with a history of involuntary urine loss during activities like coughing or sneezing, but she doesn't leak at night. She's had a gynecological exam and a Q-tip test. I need to figure out what cystometry would show regarding her residual volume and detrusor contractions.
First, I should recall what I know about urinary incontinence. Involuntary urine loss during activities like coughing or sneezing makes me think of stress urinary incontinence. Stress incontinence typically happens when the urethral sphincter isn't strong enough to resist increased abdominal pressure from activities like coughing, laughing, or sneezing. This usually affects women, especially after childbirth when the pelvic muscles and ligaments are weakened.
The Q-tip test is a common diagnostic tool for stress urinary incontinence. The test involves inserting a Q-tip catheter, which is a small balloon catheter, into the urethra. The catheter is connected to a pressure gauge. The patient is asked to cough, and the pressure reading is taken. If the pressure is above normal (like above 100 mmHg), it suggests that the urethral sphincter isn't closing properly, which is a sign of stress incontinence.
So, based on the history and the Q-tip test, the diagnosis is likely stress urinary incontinence. Now, moving on to what cystometry would show. Cystometry, also known as a filling cystometry, is a diagnostic procedure where a catheter is inserted into the bladder, and the bladder is filled with a liquid to measure how much it can hold (residual volume) and how it responds to being filled (like during a cough or sneeze). This helps in assessing the capacity and compliance of the bladder.
In a patient with stress incontinence, the bladder's capacity might be normal, but the sphincter's function is impaired. So, during the cystometry, the residual volume might be within normal limits because the bladder isn't overfilled. However, when the patient is asked to cough or perform a Valsalva maneuver, the detrusor muscle (the smooth muscle layer of the bladder) might not contract effectively, leading to an increase in intra-abdominal pressure, which might cause leakage.
Wait, but detrusor contractions are usually associated with voiding. In stress incontinence, the issue isn't with the detrusor contractions but with the sphincter's inability to prevent leakage. So, during cystometry, the detrusor contractions would be normal because they are part of the normal voiding process. However, the problem is that the sph

4. 加载和处理数据集

我们将稍微更改用于处理数据集的提示样式，通过添加第三个占位符用于复杂的思维链列。

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 
### Question:
{}
### Response:
<think>
{}
</think>
{}"""

编写 Python 函数，在数据集中创建一个 “text” 列，其中包含训练提示样式。用问题、思维链和答案填充占位符。

EOS_TOKEN = tokenizer.eos_token  # 必须添加 EOS_TOKEN
def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

我们将从 Hugging Face Hub 上的 FreedomIntelligence/medical-o1-reasoning-SFT 数据集中加载前 500 个样本。之后，我们将使用 formatting_prompts_func 函数映射 text 列。

from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

正如我们所见，文本列包含系统提示、指令、思维链和答案。

"Below is an instruction that describes a task, paired with an input that provides further context. nWrite a response that appropriately completes the request. nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.nn### Instruction:nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. nPlease answer the following medical question. nn### Question:nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?nn### Response:n<think>nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. nnThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.nnNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal. nnAll in all, it seems like if they do a cystometry on her, it will likely show a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.n</think>nCystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.<｜end▁of▁sentence｜>"

5. 设置模型

使用目标模块，我们将通过添加低秩适配器来设置模型。

model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True 或 "unsloth" 用于非常长的上下文
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

接下来，我们将设置训练参数和训练器，提供模型、分词器、数据集和其他重要的训练参数，这些参数将优化我们的微调过程。

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # 使用 num_train_epochs = 1, warmup_ratio 用于完整训练运行！
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

6. 模型训练

运行以下命令开始训练。
trainer_stats = trainer.train() 训练过程耗时 44 分钟完成。训练损失逐渐减少，这是模型性能更好的良好迹象。

您可以通过登录到网站并查看项目来在 Weights and bais 仪表板上查看完整的模型评估报告。

7. 微调后的模型推理

为了进行比较，我们将像之前一样向微调后的模型提出相同的问题，看看有什么变化。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model)  # Unsloth 有 2 倍的推理速度！
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

这要好得多，也更准确。思维链是直接的，答案简单明了，只有一段。微调是成功的。

<think>
Okay, so let's think about this. We have a 61-year-old woman who's been dealing with involuntary urine loss during things like coughing or sneezing, but she's not leaking at night. That suggests she might have some kind of problem with her pelvic floor muscles or maybe her bladder.
Now, she's got a gynecological exam and a Q-tip test. Let's break that down. The Q-tip test is usually used to check for urethral obstruction. If it's positive, that means there's something blocking the urethra, like a urethral stricture or something else.
Given that she's had a positive Q-tip test, it's likely there's a urethral obstruction. That would mean her urethra is narrow, maybe due to a stricture or some kind of narrowing. So, her bladder can't empty properly during activities like coughing because the urethral obstruction is making it hard.
Now, let's think about what happens when her bladder can't empty. If there's a urethral obstruction, the bladder is forced to hold more urine, increasing the residual volume. That's because her bladder doesn't empty completely```
So, her residual volume is probably increased.
Also, if her bladder can't empty properly, she might have increased detrusor contractions. These contractions are usually stronger to push the urine out. So, we expect her detrusor contractions to be increased.
Putting it all together, if she has a urethral obstruction and a positive Q-tip test, we'd expect her cystometry results to show increased residual volume and increased detrusor contractions. That makes sense because of the obstruction and how her bladder is trying to compensate by contracting more.
</think>
Based on the findings of the gynecological exam and the positive Q-tip test, it is most likely that the cystometry would reveal increased residual volume and increased detrusor contractions. The positive Q-tip test indicates urethral obstruction, which would force the bladder to retain more urine, thereby increasing the residual volume. Additionally, the obstruction can lead to increased detrusor contractions as the bladder tries to compensate by contracting more to expel the urine.<｜end▁of▁sentence｜>

8. 将模型本地保存

现在，让我们将适配器、完整模型和分词器本地保存，以便我们可以在其他项目中使用它们。

new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

Model and tokenizer saved locally

9. 将模型推送到 Hugging Face Hub

我们还将适配器、分词器和模型推送到 Hugging Face Hub，以便人工智能社区可以利用此模型，将其集成到他们的系统中。

new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)
model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

Model and tokenizer saved on Hugging Face hub.

将合并模型转换为 GGUF 格式

为了在本地使用微调后的模型，我们首先需要将其转换为 GGUF 格式。为什么？因为这是一个 llama.cpp 格式，并且被所有桌面聊天机器人应用程序接受。将合并模型转换为 llama.cpp 格式非常简单。我们只需要转到 GGUF My Repo Hugging Face Hub。使用 Hugging Face 帐户登录。输入您的微调模型存储库链接，如“kingabzpro/llama-3.2-3b-it-Ecommerce-ChatBot”，然后按下“Submit”按钮。

Hugging Face Spaces: https://huggingface.co/spaces/ggml-org/gguf-my-repo

几秒钟内，模型的量化版本将在新的 Hugging Face 存储库中创建。

Quantize GGUF model repository on Hugging Face.

点击“Files”选项卡，仅下载 GGUF 文件。

Downloading the quantize GGUF model from the repository.

结论

人工智能领域的变革正在迅速进行。开源社区现在正在接管，挑战过去三年统治人工智能领域的专有模型的霸主地位。开源大型语言模型 (LLMs) 正变得越来越好、越来越快、越来越高效，这使得在较低的计算机和内存资源上进行微调比以往任何时候都更容易。在本教程中，我们探讨了 DeepSeek R1 推理模型，并学习了如何为其蒸馏版本进行医疗问答任务的微调。微调的推理模型不仅提高了性能，而且还使其能够在医学、紧急服务和医疗保健等关键领域得到应用。为了应对 DeepSeek R1 的推出，OpenAI 推出了两个强大的工具：OpenAI 的 o3，这是一个更高级的推理模型，以及 OpenAI 的 Operator AI 代理，它由新的计算机使用代理 (CUA) 模型提供支持，可以自主导航网站并执行任务。

上一页以Llama模型为例学习如何进行LLM模型微调

下一页本地离线Deepseek AI方案部署实战教程