Fine-tuning LLMs with high-quality synthetic data

FiddleCube helps developers fine-tune & deploy LLMs with synthetic data. We use AI to generate private, high-quality datasets for customers who want custom LLMs but don’t have the resources to create annotated training datasets for fine-tuning & reinforcement learning.

Team Size:2
Location:San Francisco
Group Partner:Harj Taggar

Active Founders

Neha Nupoor

Creating high quality datasets at FiddleCube. Fascinated about AI alignment. Curious about health-tech, design and fitness. Full Stack engineer, part-time illustrator.

Kaushik Srinivasan

As the founder of FiddleCube, obsessed with creating high quality datasets. Making baby steps towards AI alignment. Prior to this, was working as a software engineer for nearly a decade at companies like LinkedIn, Uber and Google. Experienced in building software systems that are highly reliable, have low latency and fault tolerant at planet scale.

Kaushik Srinivasan
Kaushik Srinivasan

Company Launches

Tl;Dr; Fine-tuning LLMs requires high-quality datasets. FiddleCube automagically generates fine-tuning datasets from your data.

User Data Source > Fine-tuning Datasets (FiddleCube) > Fine-tuning

Head over to fiddlecube.ai to get started!

Hi everyone, we are Neha and Kaushik. We’re building FiddleCube to make high-quality datasets accessible to everyone.

🦸 Kaushik spent most of the last decade building tech at companies like Google, Uber, and LinkedIn.

🧙🏻 Neha has spent a similar amount of time as a dev at multiple startups, most recently at Uber

👫🏻🫶🏻 We met at Uber, eventually got married, and decided to build a startup together, following our passion for AI.

😤 The Problem

In the real world, LLMs need to be aligned to follow human instructions. It needs to respond in a manner that is:

  • Positive, Truthful & Honest
  • And in accordance with human beliefs and sensibilities

Remarkable outcomes have been achieved towards this end by fine-tuning and reinforcement learning with high-quality datasets. However, creating these datasets takes significant time, manual effort, and money.

💡The Solution

FiddleCube leverages a suite of AI models to create high-quality datasets for fine-tuning and reinforcement learning.

  • Generate annotated datasets from raw data.
  • Augment the datasets - create large datasets to significantly improve model performance.
  • Evaluate and improve the data quality of your training dataset.

We create a rich, diverse, high-quality dataset to produce better models with a lower corpus of data.

⚙️ Use Cases


Give the model a personality, voice, and tone. For example, you can create a safe Dora the explorer / Peppa Pig model that speaks to children.

👩🏻‍💻 API calling and coding

For specific use cases like making API calls or generating code, fine-tuning has provably demonstrated better results. You can fine-tune the LLM on a corpus of code or API data to significantly improve their ability at these tasks.

🚄 Increase Throughput, Reduce Latency and Cost

Fine-tuned LLMs are much smaller than the foundational models. You can use them to increase throughput and reduce latency and cost.

🗺️ Low Resource Domains

LLMs perform poorly in certain domains like vernacular languages. These domains lack a sufficient corpus of high-quality data. Fine-tuning using generated datasets has shown remarkable improvements over the state of the art in these cases.

🙏🏻 Ask

Are you fine-tuning any LLM, or looking to fine-tune LLaMa V2, MPT, or Falcon? We would love to know your use case. Drop a comment on what you are doing, or reach out to us privately!

👋🏻 Need help with fine-tuning?

Book a slot on our calendar 🗓️ or drop us a line using:

- Email 📧 : kaushik@fiddlecube.ai

- Typeform 📝

and we will get back to you!

YC Sign Photo

YC Sign Photo