🚀 Ultimate Guide: Deploying DeepSeek 14B on Google Cloud (T4 GPU)
This guide provides a detailed step-by-step walkthrough for deploying DeepSeek 14B on Google Cloud VM with a T4 GPU. It includes troubleshooting tips, performance optimizations, and detailed instructions for smooth deployment.

📌 Step 1: Google Cloud Setup
1.1 Sign Up for Google Cloud & Enable Billing
- Go to Google Cloud Console and create an account.
- Set Up Billing:
- Navigate to Billing → Create Billing Account.
- Attach it to your new project.
- Google provides free credits for new users.
1.2 Create a New Project
- In Google Cloud Console, click Select a Project → New Project.
- Name it (e.g.,
deepseek-ai
). - Click Create.
1.3 Enable Compute Engine API
- Navigate to APIs & Services → Enable APIs.
- Search for Compute Engine API and enable it.
📌 Step 2: Create Google Cloud VM with T4 GPU
2.1 Create a Virtual Machine Instance
- Go to Compute Engine → VM Instances → Create Instance.
- Configure instance settings:
- Name:
deepseek-t4
- Region:
us-central1
- Machine Type:
n1-standard-8
(8 vCPUs, 30GB RAM) - GPU: 1 x NVIDIA T4
- Boot Disk: Ubuntu 22.04 LTS, 100GB SSD
- Name:
- Click Create and wait for the VM to start.
2.2 Connect to the VM via SSH
gcloud compute ssh deepseek-t4
📌 Step 3: Install Required Dependencies
3.1 Update the System & Install CUDA
sudo apt update && sudo apt upgrade -y
sudo apt install -y nvidia-cuda-toolkit
nvidia-smi
3.2 Install Python & Virtual Environment
sudo apt install -y python3 python3-pip python3-venv
python3 -m venv deepseek-env
source deepseek-env/bin/activate
3.3 Install Required Python Packages
pip install torch torchvision transformers accelerate flask bitsandbytes
📌 Step 4: Load DeepSeek 14B
4.1 Full Python Code for Running the Model
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"
huggingface_token = os.getenv("HUGGINGFACE_TOKEN")
tokenizer = AutoTokenizer.from_pretrained(model_name, token=huggingface_token)
model = AutoModelForCausalLM.from_pretrained(
model_name, token=huggingface_token, device_map="auto", torch_dtype=torch.float16
)
prompt = "Explain artificial intelligence."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
inputs.input_ids.to(model.device),
attention_mask=inputs.attention_mask.to(model.device),
max_length=200,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\nResponse:", response)