%%capture import os, re if"COLAB_"notin"".join(os.environ.keys()): !pip install unsloth # Do this in local & cloud setups else: import torch; v = re.match(r'[\d]{1,}\.[\d]{1,}', str(torch.__version__)).group(0) xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34") !pip install sentencepiece protobuf "datasets==4.3.0""huggingface_hub>=0.34.0" hf_transfer !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth !pip install transformers==5.3.0 !pip install --no-deps trl==0.22.2
Unsloth
from unsloth import FastVisionModel # FastLanguageModel for LLMs import torch
# 4bit pre quantized models we support for 4x faster downloading + no OOMs. fourbit_models = [ "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", # Llama 3.2 vision support "unsloth/Llama-3.2-11B-Vision-bnb-4bit", "unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit", # Can fit in a 80GB card! "unsloth/Llama-3.2-90B-Vision-bnb-4bit",
"unsloth/Pixtral-12B-2409-bnb-4bit", # Pixtral fits in 16GB! "unsloth/Pixtral-12B-Base-2409-bnb-4bit", # Pixtral base model
"unsloth/Qwen2-VL-2B-Instruct-bnb-4bit", # Qwen2 VL support "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit", "unsloth/Qwen2-VL-72B-Instruct-bnb-4bit",
"unsloth/llava-v1.6-mistral-7b-hf-bnb-4bit", # Any Llava variant works! "unsloth/llava-1.5-7b-hf-bnb-4bit", ] # More models at https://huggingface.co/unsloth
model, tokenizer = FastVisionModel.from_pretrained( "unsloth/Qwen3.5-4B", load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA. use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context )
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2026.3.5: Fast Qwen3_5 patching. Transformers: 5.3.0.
\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for qwen3_5 won't work! Using float32.
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.
model.safetensors.index.json: 0.00B [00:00, ?B/s]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 0%| | 0/723 [00:00<?, ?it/s]
processor_config.json: 0.00B [00:00, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
preprocessor_config.json: 0%| | 0.00/336 [00:00<?, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json: 0%| | 0.00/20.0M [00:00<?, ?B/s]
video_preprocessor_config.json: 0%| | 0.00/385 [00:00<?, ?B/s]
为视觉语言模型(VLM)配置并加载 LoRA(Low-Rank Adaptation)微调参数
model = FastVisionModel.get_peft_model( model, finetune_vision_layers = True, # False if not finetuning vision layers finetune_language_layers = True, # False if not finetuning language layers finetune_attention_modules = True, # False if not finetuning attention layers finetune_mlp_modules = True, # False if not finetuning MLP layers
r = 16, # The larger, the higher the accuracy, but might overfit lora_alpha = 16, # Recommended alpha == r at least lora_dropout = 0, bias = "none", random_state = 3407, use_rslora = False, # We support rank stabilized LoRA loftq_config = None, # And LoftQ # target_modules = "all-linear", # Optional now! Can specify a list if needed )
Unsloth: Making `model.base_model.model.model.visual` require gradients
Now let’s train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None. We also support DPOTrainer and GRPOTrainer for reinforcement learning!
We use our new UnslothVisionDataCollator which will help in our vision finetuning setup.
from unsloth.trainer import UnslothVisionDataCollator from trl import SFTTrainer, SFTConfig
FastVisionModel.for_training(model) # Enable for training!
trainer = SFTTrainer( model = model, tokenizer = tokenizer, data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use! train_dataset = converted_dataset, args = SFTConfig( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, max_steps = 30, # num_train_epochs = 1, # Set this instead of max_steps for full training runs learning_rate = 2e-4, logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.001, lr_scheduler_type = "linear", seed = 3407, output_dir = "outputs", report_to = "none", # For Weights and Biases
# You MUST put the below items for vision finetuning: remove_unused_columns = False, dataset_text_field = "", dataset_kwargs = {"skip_prepare_dataset": True}, max_length = 2048, ), )
Unsloth: Model does not have a default image size - using 512
Unsloth: Switching to float32 training since model cannot work with float16
GPU = Tesla T4. Max memory = 14.563 GB.
9.68 GB of memory reserved.
trainer_stats = trainer.train()
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046}.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 68,686 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4
\ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
"-____-" Trainable parameters = 38,756,352 of 4,578,021,888 (0.85% trained)
Unsloth: Will smartly offload gradients to save VRAM!
<div>
<progress value='30' max='30' style='width:300px; height:20px; vertical-align: middle;'></progress>
[30/30 05:33, Epoch 0/1]
</div>
<table border="1" class="dataframe">
Step
Training Loss
1
0.699990
2
0.861260
3
0.662023
4
0.451824
5
0.364678
6
0.347653
7
0.242224
8
0.133970
9
0.087778
10
0.092887
11
0.052814
12
0.073216
13
0.065354
14
0.026245
15
0.037974
16
0.035780
17
0.047666
18
0.035650
19
0.038626
20
0.059324
21
0.035856
22
0.037750
23
0.022070
24
0.032624
25
0.077887
26
0.059872
27
0.043020
28
0.183679
29
0.044639
30
0.078759
# @title Show final memory and time stats used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) used_memory_for_lora = round(used_memory - start_gpu_memory, 3) used_percentage = round(used_memory / max_memory * 100, 3) lora_percentage = round(used_memory_for_lora / max_memory * 100, 3) print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") print( f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training." ) print(f"Peak reserved memory = {used_memory} GB.") print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") print(f"Peak reserved memory % of max memory = {used_percentage} %.") print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
412.4279 seconds used for training.
6.87 minutes used for training.
Peak reserved memory = 9.922 GB.
Peak reserved memory for training = 0.242 GB.
Peak reserved memory % of max memory = 68.132 %.
Peak reserved memory for training % of max memory = 1.662 %.
Inference
Let’s run the model! You can change the instruction and input - leave the output blank!
We use min_p = 0.1 and temperature = 1.5. Read this Tweet for more information on why.
FastVisionModel.for_inference(model) # Enable for inference!
image = dataset[2]["image"] instruction = "Write the LaTeX representation for this image."
# 1. 显式删除模型和推理/训练器 try: del model del tokenizer del trainer # 如果你定义了训练器,必须删掉它 except NameError: pass
# 2. 强制垃圾回收 gc.collect()
# 3. 清空 CUDA 缓存 torch.cuda.empty_cache()
# 4. 特别针对 Unsloth:重置峰值监控(有时能触发系统级回收) torch.cuda.reset_peak_memory_stats() # torch.cuda.reset_accumulated_stats() # Removed this line as it caused an AttributeError
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2026.3.5: Fast Qwen3_5 patching. Transformers: 5.3.0.
\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for qwen3_5 won't work! Using float32.
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 0%| | 0/723 [00:00<?, ?it/s]
微调模型已成功加载并设置为推理模式。
Saving to float16 for VLLM
We also support saving to float16 directly. Select merged_16bit for float16. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See our docs for more deployment options.
# Select ONLY 1 to save! (Both not needed!)
# Save locally to 16bit ifTrue: model.save_pretrained_merged("Qwen3.5-4B-LaTeX", tokenizer,)
# To export and save to your Hugging Face account ifTrue: model.push_to_hub_merged("Weidows/Qwen3.5-4B-LaTeX", tokenizer, token = YOUR_HF_TOKEN)
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 50%|█████ | 1/2 [01:20<01:20, 80.41s/it][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 100%|██████████| 2/2 [02:21<00:00, 70.88s/it]
Successfully copied all 2 files from cache to `Qwen3.5-4B-LaTeX`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 14438.22it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [01:36<01:36, 96.44s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [02:45<00:00, 82.63s/it]
Unsloth: Merge process complete. Saved to `/content/Qwen3.5-4B-LaTeX`
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...5-4B-LaTeX/tokenizer.json: 100%|##########| 20.0MB / 20.0MB
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `Weidows/Qwen3.5-4B-LaTeX`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `Weidows/Qwen3.5-4B-LaTeX`: 50%|█████ | 1/2 [01:01<01:01, 61.75s/it][A
Unsloth: Copying 2 files from cache to `Weidows/Qwen3.5-4B-LaTeX`: 100%|██████████| 2/2 [01:46<00:00, 53.28s/it]
Successfully copied all 2 files from cache to `Weidows/Qwen3.5-4B-LaTeX`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 17924.38it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...0001-of-00002.safetensors: 0%| | 92.5kB / 5.33GB
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [02:40<02:40, 160.85s/it][A
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...0002-of-00002.safetensors: 0%| | 73.4kB / 3.99GB
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [04:55<00:00, 147.86s/it]
Unsloth: Merge process complete. Saved to `/content/Weidows/Qwen3.5-4B-LaTeX`
GGUF / llama.cpp Conversion
To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.
Some supported quant methods (full list on our docs page):
q8_0 - Fast conversion. High resource use, but generally acceptable.
q4_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.
[NEW] To finetune and auto export to Ollama, try our Ollama notebook
# Save to 8bit Q8_0 ifTrue: model.save_pretrained_gguf("Qwen3.5-4B-LaTeX", tokenizer,) # Remember to go to https://huggingface.co/settings/tokens for a token! # And change hf to your username! ifTrue: model.push_to_hub_gguf("Weidows/Qwen3.5-4B-LaTeX", tokenizer, token = YOUR_HF_TOKEN)
# Save to 16bit GGUF ifTrue: model.save_pretrained_gguf("Qwen3.5-4B-LaTeX", tokenizer, quantization_method = "f16") ifTrue: model.push_to_hub_gguf("Weidows/Qwen3.5-4B-LaTeX", tokenizer, quantization_method = "f16", token = YOUR_HF_TOKEN)
# Save to q4_k_m GGUF ifTrue: model.save_pretrained_gguf("Qwen3.5-4B-LaTeX", tokenizer, quantization_method = "q4_k_m") ifTrue: model.push_to_hub_gguf("Weidows/Qwen3.5-4B-LaTeX", tokenizer, quantization_method = "q4_k_m", token = YOUR_HF_TOKEN)
# Save to multiple GGUF options - much faster if you want multiple! ifTrue: model.push_to_hub_gguf( "Weidows/Qwen3.5-4B-LaTeX", # Change hf to your username! tokenizer, quantization_method = ["q4_k_m", "q8_0", "q5_k_m",], token = YOUR_HF_TOKEN, )
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 50%|█████ | 1/2 [00:56<00:56, 56.93s/it][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 100%|██████████| 2/2 [01:45<00:00, 52.66s/it]
Successfully copied all 2 files from cache to `Qwen3.5-4B-LaTeX`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 14873.42it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [01:09<01:09, 69.14s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [01:58<00:00, 59.14s/it]
Unsloth: Merge process complete. Saved to `/content/Qwen3.5-4B-LaTeX`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q8_0'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: Updating system package directories
Unsloth: Cloning llama.cpp repository...
Unsloth: Building llama.cpp - please wait 1 to 3 minutes
Unsloth: Successfully installed llama.cpp!
Unsloth: Preparing converter script...
WARNING:unsloth_zoo.llama_cpp:Unsloth: Qwen2MoE num_experts patch target not found.
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: [2] Converting GGUF f16 into q8_0. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.Q8_0.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.Q8_0.gguf --mmproj Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_8wilpvla`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_8wilpvla`: 50%|█████ | 1/2 [01:01<01:01, 61.98s/it][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_8wilpvla`: 100%|██████████| 2/2 [01:40<00:00, 50.27s/it]
Successfully copied all 2 files from cache to `/tmp/unsloth_gguf_8wilpvla`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 15563.28it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [01:02<01:02, 62.57s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [01:41<00:00, 50.95s/it]
Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_8wilpvla`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q8_0'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['/tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.F16.gguf', '/tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: [2] Converting GGUF f16 into q8_0. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['/tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.Q8_0.gguf', '/tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m /tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.Q8_0.gguf --mmproj /tmp/unsloth_gguf_8wilpvla_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Uploading GGUF to Huggingface Hub...
Uploading Qwen3.5-4B.Q8_0.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...gguf/Qwen3.5-4B.Q8_0.gguf: 0%| | 10.7MB / 4.48GB
Uploading Qwen3.5-4B.F16-mmproj.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...wen3.5-4B.F16-mmproj.gguf: 1%| | 3.67MB / 672MB
Uploading config.json...
No files have been modified since last commit. Skipping to prevent empty commit.
WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Successfully uploaded GGUF to https://huggingface.co/Weidows/Qwen3.5-4B-LaTeX
Unsloth: Cleaning up temporary files...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 50%|█████ | 1/2 [01:13<01:13, 73.53s/it][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 100%|██████████| 2/2 [02:07<00:00, 63.56s/it]
Successfully copied all 2 files from cache to `Qwen3.5-4B-LaTeX`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 17015.43it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [02:00<02:00, 120.62s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [02:59<00:00, 89.98s/it]
Unsloth: Merge process complete. Saved to `/content/Qwen3.5-4B-LaTeX`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['f16'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16.gguf --mmproj Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_cwymog7t`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_cwymog7t`: 50%|█████ | 1/2 [01:10<01:10, 70.33s/it][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_cwymog7t`: 100%|██████████| 2/2 [02:10<00:00, 65.00s/it]
Successfully copied all 2 files from cache to `/tmp/unsloth_gguf_cwymog7t`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 16384.00it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [02:07<02:07, 127.75s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [03:04<00:00, 92.33s/it]
Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_cwymog7t`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['f16'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['/tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16.gguf', '/tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['/tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16.gguf', '/tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m /tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16.gguf --mmproj /tmp/unsloth_gguf_cwymog7t_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Uploading GGUF to Huggingface Hub...
Uploading Qwen3.5-4B.F16.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
..._gguf/Qwen3.5-4B.F16.gguf: 0%| | 10.7MB / 8.42GB
Uploading Qwen3.5-4B.F16-mmproj.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...wen3.5-4B.F16-mmproj.gguf: 5%|4 | 31.9MB / 672MB
Uploading config.json...
No files have been modified since last commit. Skipping to prevent empty commit.
WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Successfully uploaded GGUF to https://huggingface.co/Weidows/Qwen3.5-4B-LaTeX
Unsloth: Cleaning up temporary files...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 50%|█████ | 1/2 [00:54<00:54, 54.24s/it][A
Unsloth: Copying 2 files from cache to `Qwen3.5-4B-LaTeX`: 100%|██████████| 2/2 [01:37<00:00, 48.61s/it]
Successfully copied all 2 files from cache to `Qwen3.5-4B-LaTeX`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 17734.90it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [00:56<00:56, 56.87s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [01:44<00:00, 52.25s/it]
Unsloth: Merge process complete. Saved to `/content/Qwen3.5-4B-LaTeX`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: [2] Converting GGUF f16 into q4_k_m. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.Q4_K_M.gguf', 'Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.Q4_K_M.gguf --mmproj Qwen3.5-4B-LaTeX_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_0t1qmj62`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_0t1qmj62`: 50%|█████ | 1/2 [00:58<00:58, 58.85s/it][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_0t1qmj62`: 100%|██████████| 2/2 [01:43<00:00, 51.83s/it]
Successfully copied all 2 files from cache to `/tmp/unsloth_gguf_0t1qmj62`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 17439.93it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [01:08<01:08, 68.45s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [01:54<00:00, 57.28s/it]
Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_0t1qmj62`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['/tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.F16.gguf', '/tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: [2] Converting GGUF f16 into q4_k_m. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['/tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.Q4_K_M.gguf', '/tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m /tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.Q4_K_M.gguf --mmproj /tmp/unsloth_gguf_0t1qmj62_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Uploading GGUF to Huggingface Hub...
Uploading Qwen3.5-4B.Q4_K_M.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...uf/Qwen3.5-4B.Q4_K_M.gguf: 0%| | 10.7MB / 2.71GB
Uploading Qwen3.5-4B.F16-mmproj.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...wen3.5-4B.F16-mmproj.gguf: 6%|5 | 39.9MB / 672MB
Uploading config.json...
No files have been modified since last commit. Skipping to prevent empty commit.
WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Successfully uploaded GGUF to https://huggingface.co/Weidows/Qwen3.5-4B-LaTeX
Unsloth: Cleaning up temporary files...
Unsloth: Converting model to GGUF format...
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 1 files: 0%| | 0/1 [00:00<?, ?it/s]
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_99z_hskz`: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_99z_hskz`: 50%|█████ | 1/2 [00:55<00:55, 55.29s/it][A
Unsloth: Copying 2 files from cache to `/tmp/unsloth_gguf_99z_hskz`: 100%|██████████| 2/2 [01:37<00:00, 48.82s/it]
Successfully copied all 2 files from cache to `/tmp/unsloth_gguf_99z_hskz`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|██████████| 2/2 [00:00<00:00, 15505.74it/s]
Unsloth: Merging weights into 16bit: 0%| | 0/2 [00:00<?, ?it/s][A
Unsloth: Merging weights into 16bit: 50%|█████ | 1/2 [01:09<01:09, 69.91s/it][A
Unsloth: Merging weights into 16bit: 100%|██████████| 2/2 [01:55<00:00, 57.88s/it]
Unsloth: Merge process complete. Saved to `/tmp/unsloth_gguf_99z_hskz`
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF f16 might take 3 minutes.
\ / [2] Converting GGUF f16 to ['q4_k_m', 'q8_0', 'q5_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into f16 GGUF format.
This might take 3 minutes...
Unsloth: Initial conversion completed! Files: ['/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.F16.gguf', '/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: [2] Converting GGUF f16 into q4_k_m. This might take 10 minutes...
Unsloth: [2] Converting GGUF f16 into q8_0. This might take 10 minutes...
Unsloth: [2] Converting GGUF f16 into q5_k_m. This might take 10 minutes...
Unsloth: Model files cleanup...
Unsloth: All GGUF conversions completed successfully!
Generated files: ['/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.Q5_K_M.gguf', '/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.Q8_0.gguf', '/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.Q4_K_M.gguf', '/tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.F16-mmproj.gguf']
Unsloth: No Ollama template mapping found for model 'unsloth/Qwen3.5-4B'. Skipping Ollama Modelfile
Unsloth: example usage for Multimodal LLMs: /root/.unsloth/llama.cpp/llama-mtmd-cli -m /tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.Q5_K_M.gguf --mmproj /tmp/unsloth_gguf_99z_hskz_gguf/Qwen3.5-4B.F16-mmproj.gguf
Unsloth: load image inside llama.cpp runner: /image test_image.jpg
Unsloth: Prompt model to describe the image
Unsloth: Uploading GGUF to Huggingface Hub...
Uploading Qwen3.5-4B.Q5_K_M.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...uf/Qwen3.5-4B.Q5_K_M.gguf: 1%| | 23.6MB / 3.07GB
Uploading Qwen3.5-4B.Q8_0.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...gguf/Qwen3.5-4B.Q8_0.gguf: 1%| | 24.0MB / 4.48GB
Uploading Qwen3.5-4B.Q4_K_M.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...uf/Qwen3.5-4B.Q4_K_M.gguf: 1%| | 23.9MB / 2.71GB
Uploading Qwen3.5-4B.F16-mmproj.gguf...
Processing Files (0 / 0) : | | 0.00B / 0.00B
New Data Upload : | | 0.00B / 0.00B
...wen3.5-4B.F16-mmproj.gguf: 7%|7 | 48.0MB / 672MB
Uploading config.json...
No files have been modified since last commit. Skipping to prevent empty commit.
WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.
Unsloth: Successfully uploaded GGUF to https://huggingface.co/Weidows/Qwen3.5-4B-LaTeX
Unsloth: Cleaning up temporary files...