Generate and manage high-quality training datasets for LLMs

FiddleCube enables developers to create high-quality datasets using AI. Our data platform enables users to: 1. Create datasets using just a prompt, few seed examples or a knowledge base of documents. 2. Manage and annotate the datasets, apply various quality metrics and evals. 3. Export the data in a structured format that connects with any GPT or open source fine-tuning API.

Team Size:2
Location:San Francisco
Group Partner:Harj Taggar

Active Founders

Neha Nupoor

Creating high quality datasets at FiddleCube. Fascinated about AI alignment. Curious about health-tech, design and fitness. Full Stack engineer, part-time illustrator.

Kaushik Srinivasan

As the founder of FiddleCube, obsessed with creating high quality datasets. Making baby steps towards AI alignment. Prior to this, was working as a software engineer for nearly a decade at companies like LinkedIn, Uber and Google. Experienced in building software systems that are highly reliable, have low latency and fault tolerant at planet scale.

Kaushik Srinivasan
Kaushik Srinivasan

Company Launches

Tl;Dr; Fine-tuning LLMs requires high-quality datasets. FiddleCube automagically generates fine-tuning datasets from your data.

User Data Source > Fine-tuning Datasets (FiddleCube) > Fine-tuning

Head over to fiddlecube.ai to get started!

Hi everyone, we are Neha and Kaushik. Weโ€™re building FiddleCube to make high-quality datasets accessible to everyone.

๐Ÿฆธ Kaushik spent most of the last decade building tech at companies like Google, Uber, and LinkedIn.

๐Ÿง™๐Ÿป Neha has spent a similar amount of time as a dev at multiple startups, most recently at Uber

๐Ÿ‘ซ๐Ÿป๐Ÿซถ๐Ÿป We met at Uber, eventually got married, and decided to build a startup together, following our passion for AI.

๐Ÿ˜ค The Problem

In the real world, LLMs need to be aligned to follow human instructions. It needs to respond in a manner that is:

  • Positive, Truthful & Honest
  • And in accordance with human beliefs and sensibilities

Remarkable outcomes have been achieved towards this end by fine-tuning and reinforcement learning with high-quality datasets. However, creating these datasets takes significant time, manual effort, and money.

๐Ÿ’กThe Solution

FiddleCube leverages a suite of AI models to create high-quality datasets for fine-tuning and reinforcement learning.

  • Generate annotated datasets from raw data.
  • Augment the datasets - create large datasets to significantly improve model performance.
  • Evaluate and improve the data quality of your training dataset.

We create a rich, diverse, high-quality dataset to produce better models with a lower corpus of data.

โš™๏ธ Use Cases


Give the model a personality, voice, and tone. For example, you can create a safe Dora the explorer / Peppa Pig model that speaks to children.

๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป API calling and coding

For specific use cases like making API calls or generating code, fine-tuning has provably demonstrated better results. You can fine-tune the LLM on a corpus of code or API data to significantly improve their ability at these tasks.

๐Ÿš„ Increase Throughput, Reduce Latency and Cost

Fine-tuned LLMs are much smaller than the foundational models. You can use them to increase throughput and reduce latency and cost.

๐Ÿ—บ๏ธ Low Resource Domains

LLMs perform poorly in certain domains like vernacular languages. These domains lack a sufficient corpus of high-quality data. Fine-tuning using generated datasets has shown remarkable improvements over the state of the art in these cases.

๐Ÿ™๐Ÿป Ask

Are you fine-tuning any LLM, or looking to fine-tune LLaMa V2, MPT, or Falcon? We would love to know your use case. Drop a comment on what you are doing, or reach out to us privately!

๐Ÿ‘‹๐Ÿป Need help with fine-tuning?

Book a slot on our calendar ๐Ÿ—“๏ธ or drop us a line using:

- Email ๐Ÿ“ง : kaushik@fiddlecube.ai

- Typeform ๐Ÿ“

and we will get back to you!

YC Sign Photo

YC Sign Photo