Training data for frontier AI labs
We create a new kind of data for training AI models. Most LLMs are pre-trained on noisy web-scraped text, but they hallucinate and still fail on tasks that humans find trivial. World models try to solve this through multimodality. Another way is to give LLMs information-dense supplements about the natural world which you don't get by scraping the internet. We're projecting the laws of physics, biological facts, self-consistent logic and more into natural language.