离线模式 · PHP/Python/前端/Linux 等等学习笔记

[TOC] > [官方离线模式教程](https://huggingface.co/docs/transformers/v4.40.2/zh/installation) ## 离线模式由于 checkpoint 名称加载方式需要连接网络，因此在大部分情况下我们都会采用本地路径的方式加载模型。部分模型的 Hub 页面中会包含很多文件，我们通常只需要下载模型对应的*config.json*和*pytorch\_model.bin*，以及分词器对应的*tokenizer.json*、*tokenizer\_config.json*和*vocab.txt*。 ## 禁止检测网络,查询更新环境变量如果不想每次都去下载,使用变量,尤其是在局域网内使用时,可减少联网检查 ``` // 设置离线 export HF_DATASETS_OFFLINE=1 export TRANSFORMERS_OFFLINE=1 ``` ### 手动获取离线时使用的模型和分词器 ![](https://img.kancloud.cn/4c/4c/4c4c779ddd2fdd383dea8a41e6f59830_2618x1574.png) 手动下载模型相关文件到指定位置 ### 使用代码下载到指定位置方式一: **直接在下载时,指定缓存[推荐]** ``` tokenizer = AutoTokenizer.from_pretrained("wangrongsheng/MiniGPT-4-LLaMA", cache_dir="MiniGPT-4-LLaMA") model = AutoModel.from_pretrained("wangrongsheng/MiniGPT-4-LLaMA", cache_dir="MiniGPT-4-LLaMA") ``` 指定缓存位置为当前目录下的`MiniGPT-4-LLaMA` 方式二: ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B") model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B") tokenizer.save_pretrained("./your/path/bigscience_t0") model.save_pretrained("./your/path/bigscience_t0") ``` 之后即可在离线模式下使用了 ``` tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0") model = AutoModel.from_pretrained("./your/path/bigscience_t0") ``` 方式三: **使用 huggingface_hub 下载** 安装模块 ``` python -m pip install huggingface_hub ``` 下载 ``` from huggingface_hub import hf_hub_download hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0") ``` 使用 ``` from transformers import AutoConfig config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json") ``` 方式三四:**设置缓存的下载路径** ``` set HF_HOME=./cache ```