Home
Companies
FiddleCube

FiddleCube

Generate and manage high-quality training datasets for LLMs

FiddleCube enables developers to create high-quality datasets using AI. Our data platform enables users to: 1. Create datasets using just a prompt, few seed examples or a knowledge base of documents. 2. Manage and annotate the datasets, apply various quality metrics and evals. 3. Export the data in a structured format that connects with any GPT or open source fine-tuning API.

FiddleCube
Founded:2023
Team Size:2
Location:San Francisco
Group Partner:Harj Taggar

Active Founders

Neha Nupoor

Creating high quality datasets at FiddleCube. Fascinated about AI alignment. Curious about health-tech, design and fitness. Full Stack engineer, part-time illustrator.

Neha Nupoor
Neha Nupoor
FiddleCube

Kaushik Srinivasan

Obsessed with improving LLMs with high-quality synthetic data. In my previous life, I built products at companies like Google, Uber and LinkedIn for nearly 10 years.

Kaushik Srinivasan
Kaushik Srinivasan
FiddleCube

Company Launches

TL;DR: Upload your files & generate a high-quality dataset in minutes. Give it a go!

Llama3.1 405B has just dropped, and it's already outperforming GPT-4o. As we assist our customers in fine-tuning domain-specific LLMs, we see firsthand that it's no small feat. It requires an extensive, diverse, and superior-quality dataset, and multiple iterations of training to get it right.

❌ Creating high-quality datasets from raw data is messy!

Identifying the right data in the knowledge base is a manual, challenging process.

Data cleaning and filtering takes significant effort and man-hours, and is error-prone.

Costs of training & evals skyrocket with bad datasets requiring multiple iterations of training.

✅ We're making it easy and efficient for businesses.

FiddleCube’s data platform converts your data corpus into a high-quality fine-tuning dataset. Generate 1000s of rows of multi-turn chat, function-calling, and QnAs. Additionally, augment your datasets synthetically from unstructured data to improve your model's performance.

Our users have used us to:

  • Save >2 months in their data cleaning, preparation, generation, and quality check cycle.
  • Generate a high-quality training dataset that accurately resembles their production data without PII.
  • Generate a golden dataset for testing & benchmarking their AI-powered apps.
  • Generate gender diversity, safety & guard railing dataset.
  • Customize the tone of their responses instead of a standard GPT-like tone.

🚀 FiddleCube’s - Data platform empowers you with:

  • Data generation - A simple & clean UI to generate datasets from PDF, TXT, and data sources to train your model.
  • Dataset Management - Editing, versioning, RBAC, and synthetic data augmentation to create self-correcting datasets.
  • Diagnosing and improving underperforming queries with regression testing & detailed data diagnostic tools.
  • Use production logs and feedback to auto-generate datasets.

🙋‍♂️ Let's Take Your Data to Production and Get You Started

Sign up here to generate your first dataset. Or book a call with us for help in getting started.

Other Company Launches

🎲 FiddleCube - Automated dataset generation for fine-tuning LLMs

Create high-quality datasets for fine-tuning and reinforcement learning.
Read Launch ›

YC Sign Photo

YC Sign Photo