Zep AI

Zep: Fast, accurate structured data extraction for AI assistant apps

10x faster than GPT-4o, with field format and validity guarantees.

Many business and consumer apps must extract structured data from conversations between an LLM-powered Assistant and a human user. Often, the extracted data is the objective of the conversation. Consider completing a sales order, making a reservation, or requesting leave. All of these tasks require progressively collecting data from the conversation.

Latency and correctness are important. You will often want to identify the correct data values you have collected and those you still need. You’ll then prompt the LLM to request the latter.


If you’re making multiple calls to an LLM to extract and validate data on every chat turn, you’re likely adding significant latency to your response. This can be a slow and inaccurate exercise, frustrating your users.

Zep’s new Structured Data Extraction feature is a low-latency, high-accuracy tool for extracting the data you need from Chat History stored in Zep's Long-term Memory service.

Up to 10x faster than gpt-4o. For many multi-field extraction tasks, you can expect latency of under 400ms, with the addition of fields increasing latency sub-linearly.

Comparing Zep with LLM JSON Mode

Many model providers offer a JSON inference mode that guarantees the output will be well-formed JSON.


  • There are no guarantees that the field values will conform to the JSON Schema you define or that they are correct (vs. being hallucinated).
  • All fields are extracted in a single inference call, with additional fields adding linearly or greater to extraction latency.

Preprocessing, Guided LLM Output, and Validation

To ensure that the extracted data is in the format you expect and is valid given the current dialog, Zep uses a combination of:

  • dialog preprocessing, which, amongst other things, improves accuracy for machine-transcribed dialogs;
  • using guided output inference techniques on LLMs running on our own infrastructure;
  • and post-inference validation.

You will not receive back data in an incorrect format when using a Zep field type such as email, zip code, or date time.

While there are limits to the extraction accuracy when the conversation is very nuanced or ambiguous, carefully crafting field descriptions can achieve high accuracy in most cases.

Up to 10x Faster than OpenAI gpt-4o

When comparing like-to-like JSON Schema model extraction against gpt-4o, Zep is up to 10x faster.

Zep's extraction latency scales sub-linearly with the number of fields in your model. That is, you may add additional fields with a low marginal increase in latency.

Support for Partial and Relative Dates

Zep understands various date and time formats, including relative times such as “yesterday” or “last week.” It can also parse partial dates and times, such as “at 3pm” or “on the 15th.”

Extracting from Speech Transcripts

Zep can understand and extract data from machine-transcribed transcripts. Spelled-out numbers and dates will be parsed as if they were written language. Utterances such as “uh” or “um” are ignored.

Using Progressive Data Extraction To Guide LLMs

Your application may need to collect several fields to accomplish a task. Since Zep's SDE is so fast, you can guide the LLM through this process by calling the extractor on every chat turn. This enables you to identify which fields are still needed and then direct the LLM to collect the remaining data.

You have already collected the following data:
- Company name: Acme Inc.
- Lead name: John Doe
- Lead email: john.doe@acme.com

You still need to collect the following data:
- Lead phone number
- Lead budget
- Product name
- Zip code

Do not ask for all fields at once. Rather, work the fields 
into your conversation with the user and gradually collect the data.

Next Steps

  1. Learn more in the Zep Structured Data Guide.
  2. Sign up for Zep's Long-term Memory Service for AI Assistants.