Developer platform for fine-tuned LLMs

Empower is a developer platform for fine-tuned LLMs. It aims to provide best-in-class infrastructure and prebuilt, task-specific base models as building blocks, enabling developers to cost-effectively build and deploy fine-tuned LLMs for their specific use cases, offering an alternative to expensive and slow general-purpose LLMs without compromising on response quality.

Team Size:2
Location:San Mateo, CA
Group Partner:Diana Hu

Active Founders

Yulong Liu

Co-founder at Empower. Previously machine learning engineering manager at Snap, and senior software engineer at Google Research

Yulong Liu
Yulong Liu

Daiyi Yang

Co-founder at Empower, previous uber TL in Meta across multiple areas including Lead Ads, News Feed Experience and Metaverse Avatar. Director of engineering at Revinate before Meta leading the product development and infrastructure.

Daiyi Yang
Daiyi Yang

Company Launches

Empower-functions is a model that offers GPT-4-level function call capabilities, focusing on real-world use cases such as multi-turn and parallel calling, but with a 3 times faster response time and 10 times lower cost. Check out our doctor appointment booking bot live demo!

The Problem

The full potential of Large Language Models (LLMs) is realized not only through conversations but also through their integration with external APIs, enabling them to perform actions such as interacting with internal systems for identity verification, booking appointments, and processing checkouts. The capability to call functions is critical to empower a wide range of real-world use cases, including workflow automation and support agent tasks.

Currently, the predominant solution involves using OpenAI's models, where users face a choice between GPT-4, which offers high response quality but is hindered by significant latency and high costs that limit its applicability in various use cases, and GPT-3.5, which, while faster and more affordable, is more likely to generate inaccurate responses. The demand for a more balanced solution, a model that offers higher response quality than GPT-3.5 with much better performance than GPT-4, reveals few alternatives. While the emergence of open-source software (OSS) models broadens possibilities and flexibility, none of the current major providers, such as Fireworks, Anyscale, or Together AI, adequately address this need in real-world use cases. For instance, they generally underperform in multi-turn interactions, and few support parallel calling.

The Solution: empower-functions, a model tailored for real-world function calling use cases

Empower-functions is an LLM developed by empower.dev, focusing on the real-world function calling use case.

Below, we use a screenshot to showcase how the empower-functions model performs on a complex, multi-turn conversation that requires multiple function calls. For a more hands-on experience, please try our live demo.

Under the shell, the empower-functions model is fine-tuned based on the Mixtral-8X7B-Instruct model. We specifically collected data and tailored the model to support multi-turn conversations and to determine whether to trigger functions automatically. These efforts ensure the best performance in real-world use cases, which typically involve multi-turn conversations interleaved with function calls. Levering our proprietary inference engine, we have reduced the TTFT(time to first token) latency to under 400ms, a substantial improvement over GPT-4’s one-second latency. We are offering this model at a price point of $1.5 per million tokens.

To comprehensively assess the response quality of the model, we benchmarked it across three datasets (all of the datasets can be found here):

  • Single Turn Dataset: The model is evaluated for its ability to execute a precise function call, assessing both the accuracy of the selected function and the arguments.
  • Parallel Call Dataset: In this scenario, the model demonstrates its capacity to handle multiple (2-6) function calls within a single message, a feature not supported by Fireworks and Anyscale.
  • Multi-Turn Dataset: Designed to simulate a complex real-world environment, such as a healthcare appointment booking system, the model navigates between natural conversation, initiating function calls, asking clarifying questions, and, when necessary, transferring to customer service. The assessment focuses on the accuracy of intent classification and the correctness of function calls.

In the benchmark, we compared the model against other function-calling models, including GPT-4, GPT-3.5, firefunction, together, and Anyscale. For Together and Anyscale, we used mistralai/Mixtral-8x7B-Instruct-v0.1, as it represents their best offering. empower-functions consistently deliver superior performance in all scenarios, especially in the multi-turn dataset and the parallel-calling dataset, which are closer to real-world use cases.

How to Use

We have made the model generally available on our platform today. You can experiment with our live demo for a hands-on experience with the model in a real-world use case. To use the model in your project, simply sign up for an account and obtain an API key. We also provide free credits for your trial journey— see our quick start guide.

The completion API we provide is fully compatible with the OpenAI API, allowing you to use the empower-functions model as a drop-in replacement. More details can be found in our function calling documentation.

Our Asks

  • Try our online demo at https://app.empower.dev/chat-demo, and sign up to grab an API key to use in your projects.
  • Share this post! Please help spread the word to those who have the need.
  • Contact us at founders@empower.dev, or book a time with us here for any feedback about the model or if you have any use case in mind for fine-tuned LLMs. We have full platform support on both infrastructure and modeling in support of that.

Other Company Launches

DSensei - Pinpoint the root cause of metric fluctuations in one minute

An AI-powered key driver analysis engine
Read Launch ›