Fine-tuning LLMs with high-quality synthetic data

FiddleCube helps developers fine-tune & deploy LLMs with synthetic data. We use AI to generate private, high-quality datasets for customers who want custom LLMs but don’t have the resources to create annotated training datasets for fine-tuning & reinforcement learning.

Neha Nupoor

Creating high quality datasets at FiddleCube. Fascinated about AI alignment. Curious about health-tech, design and fitness. Full Stack engineer, part-time illustrator.

Kaushik Srinivasan

As the founder of FiddleCube, obsessed with creating high quality datasets. Making baby steps towards AI alignment. Prior to this, was working as a software engineer for nearly a decade at companies like LinkedIn, Uber and Google. Experienced in building software systems that are highly reliable, have low latency and fault tolerant at planet scale.

Tl;Dr; Fine-tuning LLMs requires high-quality datasets. FiddleCube automagically generates fine-tuning datasets from your data.

User Data Source > Fine-tuning Datasets (FiddleCube) > Fine-tuning

Head over to fiddlecube.ai to get started!

Hi everyone, we are Neha and Kaushik. We’re building FiddleCube to make high-quality datasets accessible to everyone.

🦸 Kaushik spent most of the last decade building tech at companies like Google, Uber, and LinkedIn.

🧙🏻 Neha has spent a similar amount of time as a dev at multiple startups, most recently at Uber

👫🏻🫶🏻 We met at Uber, eventually got married, and decided to build a startup together, following our passion for AI.

😤 The Problem

In the real world, LLMs need to be aligned to follow human instructions. It needs to respond in a manner that is:

  • Positive, Truthful & Honest
  • And in accordance with human beliefs and sensibilities

Remarkable outcomes have been achieved towards this end by fine-tuning and reinforcement learning with high-quality datasets. However, creating these datasets takes significant time, manual effort, and money.

💡The Solution

FiddleCube leverages a suite of AI models to create high-quality datasets for fine-tuning and reinforcement learning.

  • Generate annotated datasets from raw data.
  • Augment the datasets - create large datasets to significantly improve model performance.
  • Evaluate and improve the data quality of your training dataset.

We create a rich, diverse, high-quality dataset to produce better models with a lower corpus of data.

⚙️ Use Cases


Give the model a personality, voice, and tone. For example, you can create a safe Dora the explorer / Peppa Pig model that speaks to children.

👩🏻‍💻 API calling and coding

For specific use cases like making API calls or generating code, fine-tuning has provably demonstrated better results. You can fine-tune the LLM on a corpus of code or API data to significantly improve their ability at these tasks.

🚄 Increase Throughput, Reduce Latency and Cost

Fine-tuned LLMs are much smaller than the foundational models. You can use them to increase throughput and reduce latency and cost.

🗺️ Low Resource Domains

LLMs perform poorly in certain domains like vernacular languages. These domains lack a sufficient corpus of high-quality data. Fine-tuning using generated datasets has shown remarkable improvements over the state of the art in these cases.

🙏🏻 Ask

Are you fine-tuning any LLM, or looking to fine-tune LLaMa V2, MPT, or Falcon? We would love to know your use case. Drop a comment on what you are doing, or reach out to us privately!

👋🏻 Need help with fine-tuning?

Book a slot on our calendar 🗓️ or drop us a line using:

- Email 📧 : kaushik@fiddlecube.ai

- Typeform 📝

and we will get back to you!

YC Sign Photo

