Home
sync labs
56

sync. – an api for realtime lipsync

we're building audio-visual models to generate, modify, and synthesize humans in video.

TL;DR: we’ve built a state-of-the-art lip-sync model – and we’re building towards real-time face-to-face conversations w/ AI indistinguishable from humans 🦾

try our playground here: https://app.synclabs.so/playground

how does it work?

theoretically, our models can support any language — they learn phoneme / viseme mappings (the most basic unit / “token” of how sounds we make map to the shapes our mouths make to create them). it’s simple, but a start towards learning a foundational understanding of humans from video.

why is this useful?

[1] we can dissolve language as a barrier

check out how we used it to dub the entire 2-hour Tucker Carlson interview with Putin speaking fluent English.

imagine millions gaining access to knowledge, entertainment, and connection — regardless of their native tongue.

realtime at the edge takes us further — live multilingual broadcasts + video calls, even walking around Tokyo w/ a Vision Pro 2 speaking English while everyone else Japanese.

[2] we can move the human-computer interface beyond text-based-chat

keyboard / mice are lossy + low bandwidth. human communication is rich and goes beyond just the words we say. what if we could compute w/ a face-to-face interaction?

maybe embedding context around expressions + body language in inputs / outputs would help us interact w/ computers in a more human way. this thread of research is exciting.

[3] and more

powerful models small enough to run at the edge could unlock a lot:

eg.

extreme compression for face-to-face video streaming

enhanced, spatial-aware transcription w/ lip-reading

detecting deepfakes in the wild

on-device real-time video translation

etc.

who are we?

Prady Modukuru [CEO] | Led product for a research team at Microsoft that made Defender a $350M+ product, took MSR research into production moving it from bottom of market to #1 in industry evals.

Rudrabha Mukhopadhyay [CTO] | PhD CVIT @ IIIT-Hyderabad, co-authored wav2lip / 20+ major publications + 1200+ citations in the last 5 years.

Prajwal K R [CRO] | PhD, VGG @ University of Oxford, w/ Andrew Zisserman, prev. Research Scientist @ Meta, authored multiple breakthrough research papers (incl. Wav2Lip) on understanding and generating humans in video

Pavan Reddy [COO/CFO] 2x venture-backed founder/operator, built the first smart air purifier in India, prev. monetizing sota research @ IIIT-Hyderabad, engineering @ IIT Madras

how did we meet?

Prajwal + Rudrabha worked together at IIIT-hyderabad — and became famous by shipping the world’s first model that could sync the lips in a video to any audio in the wild in any language, no training required.

they formed a company w/ Pavan and then worked w/ the university to monetize state-of-the-art research coming out of the labs and bring it to market.

Prady met everyone online — first by hacking together a viral app around their open source models, then collaborating on product + research for fun, to cofounding sync. + going mega-viral.

Since then we’ve hacked irl across 4 different countries, across the US coasts, and moved into a hacker house in SF together.

what’s our ask?

try out our playground and API and let us know how we can make it easier to understand and simpler to use 😄

play around here: https://app.synclabs.so/playground