ComfyUI on Google Colab: Generate Images and Videos with AI

How to run ComfyUI on Google Colab to generate images and videos with AI — no local GPU needed, with Wan 2.2, Flux, and SDXL.

by Cleverson

ComfyUI on Google Colab: Generate Images and Videos with AI

ComfyUI on Google Colab is currently the cheapest and most practical way to generate professional-quality AI images and videos without needing an expensive graphics card at home. Anyone trying to run Stable Diffusion, Flux, or Wan 2.2 on a laptop with 6 GB of VRAM will hit a wall on the first try. I myself lost nights trying until I gave up on local hardware for a client of Yeshua who needed 80 thumbnails per week. The combination of ComfyUI on Google Colab solved the impasse — and in this guide I show exactly how to replicate the setup, from scratch to the first rendered video.

TL;DR

  • ComfyUI is the standard node-based frontend for diffusion in 2026, supporting SDXL, Flux, Wan 2.2, LTX-Video, Mochi-1, HiDream, and Lumina.
  • Google Colab provides a free NVIDIA T4 GPU (16 GB VRAM) per session, ideal for 1024 px images and short videos with quantized models.
  • The setup takes 8 to 12 minutes on the first run and provides a trycloudflare.com URL to use ComfyUI in the browser.
  • For video with Wan 2.2 14B on a T4 GPU, use the GGUF Q4 or Q5 version — otherwise VRAM runs out.
  • Free tier gives 15 to 30 hours of T4 per week; Colab Pro ($9.99/month) unlocks occasional access to L4 and A100.

Why ComfyUI Became the Standard for AI Media Generation in 2026

Until 2024, tools like Automatic1111 dominated Stable Diffusion. ComfyUI changed the game for a simple reason: each node represents a step in the diffusion pipeline, and that means absolute control. You see the KSampler, the VAE, the CLIP Text Encoder, the Checkpoint Loader, and the Save Image as visual blocks. When a new model comes out — Flux, Wan 2.2, LTX-Video, Mochi-1 — the community publishes custom nodes in days, not months.

The ecosystem has passed a thousand custom node packages in 2026. ControlNet, IPAdapter, AnimateDiff, Hunyuan, NVIDIA's Cosmos: everything is pluggable. For those working seriously with generative AI — agencies, e-commerce, EAD, social media — this is the environment where models arrive first.

The problem is hardware. Running Flux Dev 12B requires about 16 GB of VRAM for comfort. Wan 2.2 in fp16 needs 20 GB and takes 1h20min on an RTX 4090. Buying such a card costs as much as a popular car. The solution is the cloud — and the most accessible cloud is called Google Colab.

Google Colab: The Free GPU That Unlocks ComfyUI Without a Graphics Card

Google Colab is a Jupyter notebook hosted on Google's infrastructure. Behind each notebook you have a virtual machine with a GPU, usually an NVIDIA Tesla T4 with 16 GB of VRAM on the free tier. For ComfyUI on Google Colab, this hardware is sufficient for SDXL at 1024 px, quantized Flux, and short videos with Wan 2.2 5B or GGUF versions of the 14B models.

The T4 is not the fastest GPU in the world. It's Turing, from 2018, without native FP8 support that Hopper and Ada Lovelace have. Even so, it renders an SDXL image in about 25 to 40 seconds and an 81-frame clip with Wan 2.2 GGUF in about 12 to 20 minutes — acceptable numbers for iterating creatives.

Free Tier Limits You Need to Understand

Free Colab has rules that aren't written in big letters:

  • 15 to 30 hours of T4 per week, dynamically adjusted based on global demand.
  • Maximum session of 12 hours, usually cut earlier (4 to 6 hours is realistic).
  • Idle timeout of 90 minutes — if you don't interact, the runtime drops.
  • No GPU guarantee: sometimes only CPU is available during peak hours.
  • VM storage is ephemeral: everything on the machine's disk disappears when the session ends.

When to Upgrade to Colab Pro and Pro+

If you use ComfyUI on Google Colab once a week for a client, the free tier suffices. For continuous professional use, it makes sense to pay:

  • Colab Pro — $9.99/month: priority queue, better GPUs (frequent L4, occasional A100), longer sessions, 100 compute units per month.
  • Colab Pro+ — $49.99/month: 500 compute units, runtimes that continue running in the background for up to 24 hours, priority access to A100.

An A100 with 40 GB renders Wan 2.2 in fp16 in less than 15 minutes. For those who earn from generated video, Pro+ pays for itself in the first week.

Step by Step: Running ComfyUI on Google Colab

The sequence below is the script I use to spin up a new environment. Paste it into Colab in a single cell, or grab a ready-made notebook from the community — the comfyui_colab in the official ComfyAnonymous repository is the most reliable starting point.

Initial Setup and Custom Node Installation

  1. At colab.research.google.com, create a new notebook and change the runtime to T4 GPU in Runtime → Change runtime type.
  2. Paste the clone of ComfyUI: !git clone https://github.com/comfyanonymous/ComfyUI.
  3. Install dependencies: %cd ComfyUI && !pip install -r requirements.txt.
  4. Add the Manager — the node that installs other nodes: !git clone https://github.com/ltdrdata/ComfyUI-Manager custom_nodes/ComfyUI-Manager.
  5. Start with Cloudflare tunneling: !python main.py --dont-print-server & wait_then_tunnel.
  6. Open the trycloudflare.com URL that appears in the output — that's your ComfyUI running.

On the first time, after loading a workflow, click Manager → Install Missing Custom Nodes → check all → Install. Restart the server. Done.

Connecting Google Drive to Persist Models

The biggest stumbling block for those starting with ComfyUI on Google Colab is re-downloading 30 GB of models every session. The solution is to mount Drive:

from google.colab import drive
drive.mount('/content/drive')
!ln -s /content/drive/MyDrive/ComfyUI/models /content/ComfyUI/models

Store checkpoints, LoRAs, VAEs, and clip encoders on Drive. Loading via I/O is 3 to 5 times slower than local disk, but still viable — and you save hours of download per week.

Basic Image Workflow: SDXL, Flux, and Checkpoints

A minimal SDXL workflow has seven nodes: Checkpoint Loader, two CLIP Text Encode (positive and negative), Empty Latent Image, KSampler, VAE Decode, and Save Image. You drag the workflow JSON onto the ComfyUI canvas and it assembles itself.

To start quickly, download a base checkpoint — SDXL Base 1.0 or JuggernautXL — and drop it into models/checkpoints. Load the workflow, write the prompt in the positive node, click Queue Prompt. The T4 renders in 25 to 35 seconds per image at 1024×1024 with 25 steps on the DPM++ 2M Karras sampler.

Flux Dev is the next step. It requires the tripod: flux1-dev.safetensors, ae.safetensors (VAE), and two text encoders (t5xxl and clip_l). On T4, use the GGUF Q4_K_S version of Flux that fits in 16 GB and maintains quality close to fp16. Average time: 90 to 120 seconds per 1024 px image with 20 steps.

Tips I learned the hard way:

  • Always enable --lowvram in the start command on T4 — forces dynamic offload of weights to CPU.
  • Do not use VAE in FP16 with Flux GGUF — it creates artifacts. Load in BF16.
  • Save seeds that worked — ComfyUI has a Primitive node to fix the seed and iterate the prompt without losing composition.

Generating Videos with AI: Wan 2.2, LTX-Video, and Mochi-1

Video is the area that has advanced the most in 2026. Wan 2.2, released by Alibaba, has become the open-source reference. There are three relevant variants:

  • Wan 2.2 5B — fits in 8 GB of VRAM natively, ideal for the free tier.
  • Wan 2.2 14B fp16 — maximum quality, requires 24 GB+.
  • Wan 2.2 14B GGUF Q4/Q5 — quantized packaging that fits in the T4's 16 GB.

The model supports text-to-video (t2v), image-to-video (i2v), text+image-to-video, and even audio-to-video in some builds.

Image-to-Video (i2v) with Wan 2.2 on Free T4

The most useful case for product and marketing is giving motion to a static image. The official ComfyUI workflow for Wan 2.2 i2v requires:

  1. Wan 2.2 14B i2v model in GGUF (Q4_K_S works on T4).
  2. Wan 2.2 VAE (wan_2.2_vae.safetensors).
  3. Text encoder umt5_xxl in fp8.
  4. Clip vision clip_vision_h.safetensors.

After loading, connect the source image to the WanImageToVideo node, set to 49 or 81 frames, choose 24 fps, and render. On a T4, expect 12 to 25 minutes per 4-second clip. The results rival Runway Gen-3 without the $35 monthly fee of the standard plan.

LTX-Video is the faster alternative. It runs in 6 to 8 minutes per clip on the same T4, with slightly lower quality, but excellent for iterating variants before finalizing the version in Wan.

Comparative Table: Running ComfyUI on Colab vs Local PC vs Dedicated Cloud

Criteria ComfyUI on Google Colab (Free T4) Local PC with RTX 3060 12 GB RunPod A100 On-Demand
Initial cost $0 $700 (card) $0
Recurring cost $0 to $10/month (Pro) $15/month (energy) $1.89/h
Available VRAM 16 GB 12 GB 40 or 80 GB
SDXL image time 30 s 25 s 8 s
Wan 2.2 14B video time 18 min (GGUF Q4) impossible in fp16 12 min (fp16)
Model persistence Google Drive (slow) Local SSD (fast) Instance volume
Idle timeout 90 min none manual
Censorship/policy NSFW content allowed free free

The choice depends on volume. Up to 50 renders per week, free ComfyUI on Google Colab suffices. Between 50 and 300, Pro is worth it. Above that, either buy a local GPU or rent an A100 by the hour on RunPod, Vast.ai, or Lambda Labs.

Optimizing Cost and Time: GGUF, Quantization, and Strategies for T4

GGUF is the quantization format that came from the LLM world and was adapted for diffusion by city96/ComfyUI-GGUF. Instead of loading weights in fp16 (16 bits per parameter), you use Q8 (8 bits), Q5_K_S (~5.5 bits), or Q4_K_S (~4.5 bits). The quality loss between fp16 and Q5 is practically imperceptible for web and social media generation.

In practice, this means a 14-billion-parameter model, which would weigh 28 GB in fp16, fits in 8 to 10 GB in Q4. It's the difference between running and not running on free Colab.

Other tactics worth the time:

  • Enable --use-split-cross-attention at startup — cuts VRAM usage in the attention layer.
  • Use tiled_vae for images above 1024 px — decodes in tiles.
  • Keep only 1 model loaded at a time — unload previous checkpoints with the Unload Model node.
  • Do small batching — on T4, batch 1 is faster than batch 2 because it avoids swapping.
  • Save outputs directly to Drive with the Save Image node pointing to /content/drive/MyDrive/outputs/.

Common Pitfalls and How to Avoid Them (Session Dropped, Out of VRAM, Model Stuck)

After months helping clients set up ComfyUI on Google Colab, I've seen five problems repeat:

  1. "CUDA out of memory" in the middle of generation — almost always VAE in fp16 with Flux/Wan model. Switch to BF16 or enable --cpu-vae.
  2. ComfyUI hangs on "Loading" — corrupted cache. Restart the entire Colab runtime (Runtime → Disconnect and delete runtime) and redo the setup.
  3. Session drops before finishing the video — idle timeout. Keep an active tab running a JavaScript script in the console: setInterval(() => document.querySelector('colab-toolbar-button#connect').click(), 60000);.
  4. URL trycloudflare.com stops responding — Cloudflare drops idle tunnels. Restart the cell that starts the tunnel without restarting everything.
  5. No GPU available — Google prioritizes paying users. Try between 2 AM and 8 AM (Brasília time), demand drops.

Real Use Cases: Marketing, E-commerce, Social Media, and EAD

What justifies investing time learning ComfyUI on Google Colab? Cases where I saved (or helped clients save) real money:

  • E-commerce: generate 200 background variations for a single product photographed once. Cost via photo agency: $800. Via ComfyUI on Colab: $0.
  • Paid traffic: create 30 to 50 different creatives for A/B testing on Meta Ads without asking a designer every week. See how this integrates with our strategy in unlimited agents for business WhatsApp — because scalable creatives only work if the support can handle the volume.
  • EAD and Moodle: produce course covers and animated characters for microlearning. Combined with a custom Moodle app, visual content boosts student engagement.
  • Social media: animate static carousels into 5-second Reels with Wan 2.2 i2v. Replaces motion designer budgets in small projects.
  • Editorial and blog: article cover illustrations, illustrated infographics, mockups of imaginary products.

The game-changing skill is not generating a pretty image — any SaaS tool does that. It's controlling the entire pipeline: prompt, seed, CFG, sampler, LoRA, ControlNet, upscaler, refinement. ComfyUI gives you that control. Colab gives you the hardware. Together, they give you autonomy.

Next Steps: From Colab to Production (and How Agathas Helps)

ComfyUI on Google Colab is great for prototyping and medium volume. When the project grows — 24/7 automation, WhatsApp integration, admin panel, billing — the setup needs to become a service. That's where transforming a workflow into an API comes in.

At Agathas Web we make exactly that bridge: we deploy ComfyUI on a dedicated GPU instance (RunPod, Vast.ai, or GCP), expose the workflows as REST endpoints, and plug them into existing systems — website, CRM, support bot, payment gateway. The client sends a brief and receives ready images or videos without having to open Colab every time.

If you've made it this far, you already have enough to set up your own environment. Start with the free tier, feel the time of each model, discover where the GGUF shortcut saves your session. Then, if you need to turn this into serious operation, talk to us — infrastructure, security, and integration have been our specialty since 2008.

Conclusion

Generating images and videos with AI is no longer a privilege for those with an RTX 4090 at home. ComfyUI on Google Colab democratized access: you open the browser, click Run, and in 10 minutes you're rendering Wan 2.2 with studio quality. The secret is not the model of the week — it's mastering the node-based pipeline, understanding the T4's limits, and using GGUF quantization when VRAM runs out. Start simple: an SDXL workflow today, Flux tomorrow, Wan 2.2 on the weekend. Every hour invested in ComfyUI returns as weeks of automated creative work.