Overview
For the hackathon, VideoDB has unlocked selected models so you can build with video indexing and GenAI workflows. Use the VideoDB SDK to ingest media, index it with AI, and expose video/audio perception to your agent or application.
Pipe in live streams, uploaded files, RTSP feeds, YouTube links, or any continuous media source. Build spoken-word, scene, and visual indexes; search across ingested media; compose clips; trigger events; and wire responses into Slack, web apps, or phone workflows.
A VideoDB sandbox is a dedicated compute pool for model
workloads. Create one sandbox, wait for it to become active, pass its
sandbox_id into supported generation/indexing APIs, then stop
it when you're finished to conserve credits.
Try the notebook first
Open the sandbox compute notebook in the VideoDB cookbook hackathon branch, or launch it directly in Google Colab.
Prerequisites
Install the hackathon branch of the VideoDB SDK:
!pip install "git+https://github.com/Video-DB/videodb-python.git@hackathon"
from videodb import connect, SandboxModel, SandboxTier, IndexType, SearchType, SceneExtractionType, play_stream
conn = connect()
coll = conn.get_collection()
Your environment should include the normal VideoDB credentials/API key
expected by videodb.connect().
Sandbox lifecycle
1. Create a sandbox
sandbox = conn.create_sandbox(
tier=SandboxTier.medium,
idle_timeout=600, # stop after 10 minutes of inactivity
)
print(f"Sandbox: {sandbox.id}, Status: {sandbox.status}, Tier: {sandbox.tier}")
Sandbox creation returns immediately, usually with status
provisioning. Use the idle_timeout parameter to
automatically stop the sandbox after a period of inactivity and conserve
credits.
2. Wait until ready
sandbox.wait_for_ready(timeout=300, interval=5)
print(f"Sandbox ready: {sandbox.id}, Status: {sandbox.status}")
Only run jobs after sandbox.status == "active" or
sandbox.is_active is true.
3. Reuse the sandbox ID
Pass the same sandbox ID to supported indexing and generation APIs:
sandbox_id = sandbox.id
If sandbox_id is omitted, the server may try to auto-resolve a
compatible active sandbox, but explicitly passing sandbox_id
is recommended for predictable routing.
4. Inspect sandboxes
# Refresh one sandbox
sandbox.refresh()
print(sandbox.status, sandbox.is_active)
# List all sandboxes
for sb in conn.list_sandboxes():
print(f"{sb.id} | {sb.name} | {sb.tier} | {sb.status}")
# Get one sandbox by ID
sb = conn.get_sandbox(sandbox.id)
print(sb.id, sb.status)
5. Stop the sandbox
Stop the sandbox when finished. Billing is based on sandbox runtime.
sandbox.stop()
sandbox.wait_for_stop(timeout=120)
print(f"Sandbox {sandbox.id} final status: {sandbox.status}")
Sandbox tiers and supported models
Use the smallest tier that supports your selected model. The notebook uses
SandboxModel enum constants so examples stay aligned with the SDK.
| Model enum | Use case | Minimum tier |
|---|---|---|
SandboxModel.GEMMA_4_E2B | Scene indexing / faster visual understanding | SandboxTier.small |
SandboxModel.QWEN_9B | Scene indexing / smaller VLM option | SandboxTier.small |
SandboxModel.GEMMA_4_26B | Scene indexing / higher quality visual understanding | SandboxTier.medium |
SandboxModel.QWEN_27B | Scene indexing / larger VLM option | SandboxTier.medium |
SandboxModel.GEMMA_4_31B | Scene indexing / best fit for the notebook demo | SandboxTier.medium |
SandboxModel.OMNIVOICE | Text-to-speech, voice design, and voice clone | SandboxTier.small |
SandboxModel.FLUX | FLUX image generation | SandboxTier.medium |
Supported workloads
| Workload | API | Model | Notes |
|---|---|---|---|
| Scene indexing / VLM extraction | video.index_scenes(...) |
SandboxModel.GEMMA_4_31B or another supported VLM enum |
Use with SceneExtractionType and an extraction prompt. Pick a tier that fits the model. |
| RTStream visual indexing | rtstream.index_visuals(...) |
SandboxModel.GEMMA_4_31B or another supported VLM enum |
For live RTSP / RTMP / capture streams. Pass sandbox_id=sandbox.id just like video scene indexing. |
| Text-to-speech | coll.generate_voice(...) |
SandboxModel.OMNIVOICE |
Supports basic TTS, voice design, voice clone, and extra config. Small tier is usually suitable. |
| Image generation | coll.generate_image(...) |
SandboxModel.FLUX |
Supports config such as size, inference steps, guidance scale, negative prompt. Medium tier recommended. |
Scene indexing example
video = coll.upload("https://www.youtube.com/watch?v=jeA-KBv0b68")
index_id = video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={
"time": 10,
"select_frames": ["first"],
"frame_count": 1,
},
model_name=SandboxModel.GEMMA_4_31B,
prompt="Describe the scene in a clear, concise way.",
sandbox_id=sandbox.id,
)
idx = video.get_scene_index(index_id)
print(idx)
res = video.search("Claude", index_type=IndexType.scene, search_type=SearchType.semantic)
stream_url = res.compile()
play_stream(stream_url)
RTStream indexing
Sandbox-backed models are also available for RTStream indexing. Same
lifecycle: create a sandbox, wait until active, pass sandbox_id=sandbox.id,
and stop it when finished.
Visual indexing
rtstream = coll.connect_rtstream(
url="rtsp://your-camera-or-stream-url",
name="Hackathon Live Stream",
media_types=["video"],
store=True,
)
rtstream.start()
visual_index = rtstream.index_visuals(
prompt="Describe what is happening in the live video. Return concise observations.",
batch_config={"type": "time", "value": 5, "frame_count": 3},
model_name=SandboxModel.GEMMA_4_31B,
sandbox_id=sandbox.id,
name="live_visual_index",
)
Audio indexing
audio_index = rtstream.index_audio(
prompt="Summarize the important spoken content and events.",
batch_config={"type": "time", "value": 30},
model_name=SandboxModel.QWEN_9B,
sandbox_id=sandbox.id,
name="live_audio_index",
)
Stop the RTStream and sandbox when you're done:
rtstream.stop()
sandbox.stop()
OmniVoice examples
Basic TTS
job = coll.generate_voice(
text="Hello, welcome to VideoDB.",
model_name=SandboxModel.OMNIVOICE,
sandbox_id=sandbox.id,
)
audio = job.wait(timeout=900, interval=5)
print(audio.id)
Voice design
job = coll.generate_voice(
text="Breaking news! Scientists discover a new planet.",
model_name=SandboxModel.OMNIVOICE,
sandbox_id=sandbox.id,
config={
"instructions": "A deep, authoritative male news anchor voice",
},
)
Voice clone
ref_audio = coll.upload(
url="https://www.youtube.com/shorts/7xOPzBhHKWY",
media_type="audio",
)
job = coll.generate_voice(
text="This is a cloned voice powered by OmniVoice.",
model_name=SandboxModel.OMNIVOICE,
sandbox_id=sandbox.id,
config={
"ref_audio": ref_audio.generate_url(),
"ref_text": "Sample reference text for the audio clip",
},
)
Extra TTS config
job = coll.generate_voice(
text="Hola, bienvenidos a VideoDB.",
model_name=SandboxModel.OMNIVOICE,
sandbox_id=sandbox.id,
config={
"response_format": "wav",
"speed": 1.2,
"language": "es",
},
)
FLUX examples
Basic image generation
job = coll.generate_image(
prompt="A futuristic cityscape at sunset, cyberpunk style",
model_name=SandboxModel.FLUX,
sandbox_id=sandbox.id,
)
image = job.wait(timeout=900, interval=5)
print(image.id)
Image generation with config
job = coll.generate_image(
prompt="A photorealistic portrait of a robot reading a book in a cozy library",
model_name=SandboxModel.FLUX,
sandbox_id=sandbox.id,
config={
"size": "1024x1536",
"num_inference_steps": 50,
"guidance_scale": 4.0,
"negative_prompt": "blurry, low quality, watermark",
},
)
Combining generated assets
You can generate a FLUX image and OmniVoice narration on the same sandbox,
then compose them with videodb.editor:
from videodb.editor import Timeline, Track, Clip, ImageAsset, AudioAsset, Fit
image_job = coll.generate_image(
prompt="A dramatic mountain landscape at dawn",
model_name=SandboxModel.FLUX,
sandbox_id=sandbox.id,
config={"size": "1280x720", "num_inference_steps": 28},
)
image = image_job.wait(timeout=900, interval=5)
audio_job = coll.generate_voice(
text="Witness the breathtaking beauty of dawn over the mountains.",
model_name=SandboxModel.OMNIVOICE,
sandbox_id=sandbox.id,
config={"instructions": "female, young adult, calm and cinematic"},
)
audio = audio_job.wait(timeout=900, interval=5)
timeline = Timeline(conn)
timeline.resolution = "1280x720"
timeline.background = "#000000"
image_track = Track()
image_track.add_clip(0, Clip(asset=ImageAsset(id=image.id), duration=float(audio.length), fit=Fit.crop))
audio_track = Track()
audio_track.add_clip(0, Clip(asset=AudioAsset(id=audio.id), duration=float(audio.length)))
timeline.add_track(image_track)
timeline.add_track(audio_track)
stream_url = timeline.generate_stream()
player_url = f"https://player.videodb.io/watch?v={stream_url}"
print(player_url)
Pricing and limits
Hackathon sandbox compute is charged against your credits based on runtime.
Pricing
| Sandbox tier | Price |
|---|---|
small | $1 / hour |
medium | $3.50 / hour |
Concurrent sandbox limits
| Sandbox tier | Parallel sandbox limit |
|---|---|
small | 4 |
medium | 2 |
Best practices
- Create one sandbox per session/workflow and reuse it for compatible jobs.
- Always wait for the sandbox to be active before submitting jobs.
- Pass
sandbox_id=sandbox.idexplicitly for sandbox-backed jobs. - Select a tier based on the heaviest model in your workflow.
- Use
job.wait(timeout=900, interval=5)for long-running generation jobs. - Stop the sandbox after use to avoid unnecessary runtime billing and conserve your hackathon credits.
- Keep the sandbox ID in logs so jobs can be debugged or retried.
Need help?
If you face any issue with sandbox setup, model access, indexing, generation, or credits, please reach out to the VideoDB team at team@videodb.io or drop your queries in the hackathon Discord — fastest way to get unblocked. If you're just getting started, try the sandbox notebook in Colab first as the reference implementation.
Ready to build?
Back to the landing page →