PerfectBit: Training data for frontier AI labs

PerfectBit

Training data for frontier AI labs

Spring 2026

Active

Artificial Intelligence

Infrastructure

San Francisco

https://perfectbit.ai

Training data for frontier AI labs

We create a new kind of data for training AI models. Most LLMs are pre-trained on noisy web-scraped text, but they hallucinate and still fail on tasks that humans find trivial. World models try to solve this through multimodality. Another way is to give LLMs information-dense supplements about the natural world which you don't get by scraping the internet. We're projecting the laws of physics, biological facts, self-consistent logic and more into natural language.

Active Founders

Peter Vajda

Founder

I worked as Director of Media Generation at Meta before 2026 for 11 years. I was managing the Media GenAI foundation model research and development, including efficient media generation, text to image generation (Emu), image editing, Movie gen, text to video, video editing and character consistent image and video generation. Previously, led efficient deep learning for computer vision teams supporting on-device models for AR/VR. I was Assistant Professor at Stanford University

Peter Vajda

Founder

Seiji Yamamoto

Founder

Led teams in the Core Llama group at Meta Superintelligence Labs. Senior Staff Research Scientist across 9 years at Meta spanning LLM pre-training and post-training, inference optimization, full-duplex speech models, and computer vision vision models. Before tech: PhD in Physics, published in Proceedings of the National Academy of Sciences and Physical Review Letters, co-authored with Fields Medalist. Educated at Stanford, Rice, Columbia, post-doc at a National Lab.

Seiji Yamamoto

Founder