AI Engineering Platform

Latitude

·Helping product teams build AI flows with confidence

Problem

Building AI features is unpredictable. LLMs are not deterministic, susceptible to prompt changes, collaboration between developers and domain experts is difficult, and there is a lack of observability.

Solution

A collaborative platform where product teams can create, version, evaluate, optimize and deploy AI prompts and agents with confidence.

Result

Signals of product-market fit: +3k stars on GitHub and first paying users within the first 2 months.

My Role

Product Designer

Team

Gerard Clos

CTO

Andres Gutierrez

Senior Software Engineer

Carlos Sansón

Software Engineer

César Migueláñez

CEO

Problem

At Latitude, we encountered the problem of creating good prompts for our previous pivot: a data chatbot. We iterated the prompt many times with no control or confidence about the results, and with data that required 100% accuracy. We then started interviewing product teams and found common pain points when building AI features.

LLMs are not deterministic — They can't ensure 100% accuracy on every request and may have hallucinations.
Susceptible to changes — Any change in the prompt can impact the output's performance, improving specific cases but degrading others.
Collaboration is hard — Some companies put the prompt in the code, and domain experts or other non-technical members can't edit it.
Testing all cases is difficult — It's hard to test an AI workflow with all use cases and tons of real data before deploying it.
Lack of observability — It's hard to know a prompt's performance in production and test it with real data before deploying it.
No control over iterations — Each company has its own system to track prompt versions: Excel, Notion pages, Google Docs, etc.

Opportunity

After the research, we found many companies facing common problems with LLMs, a technology growing rapidly. The global LLM market is projected to reach $259 billion by 2030.

The market is young and lacked a solution to this problem. We could build a product to fill the gap and take advantage of the wave, so we pivoted to position Latitude as the go-to platform for prompt engineering.

Solution

After analyzing the problem, we divided the solution into 3 parts: Build, Evaluate, and Refine.

First, we covered the basics: build and test prompts easily. Then, evaluate them with observability into performance. Finally, help users refine prompts and close the loop, increasing accuracy and success.

Projects and Version Control

The first step was solving collaboration and tracking prompt iterations. We based Latitude on projects and a version control system similar to Git.

Each project can contain multiple prompts, which are essentially files with instructions for the LLM. These prompts have versions that store the history of modifications.

Both developers and other team members can work together in a draft version, editing a prompt before deploying it. Once deployed, the version is live, and users can access the prompts in their apps through our API or SDK.

Editor

The second step was creating an editor to build and edit prompts. Users have access to a simple tag syntax to define system, user, and assistant messages.

We added logic to cover complex prompt patterns. Users can reference other prompts, use parameters, loops, conditions, and chains of prompts to reuse results from one prompt in another.

A live preview lets users see the result and test it in real time while editing.

Evaluations

Building and collaborating on prompts is covered, but users still face uncertainty about prompt performance.

We solve this with evaluations. Evaluations are prompts with instructions to analyze the output of another prompt and return a score. This allows users to ensure prompt quality before and after deploying it using test data and production logs.

Refiner and Copilot

The final step is turning evaluation results into action and closing the loop. Now that users know a prompt isn't working, how can it be improved?

We added a feature to refine prompts using evaluation results. Users can pass bad evaluation results to our refiner system, which uses a series of prompts to improve the original. Instead of starting from scratch, users can use evaluation results to improve the prompt and test it again with minimal effort.

We also added Copilot, an LLM-based assistant that knows the platform's syntax and capabilities, helping users with any request about their prompts.

Success

To validate the solution, we talked to potential users through a three-step process: first, an interview to understand the problem and context; second, an interview with a prototype to get feedback and iterate; third, opening a beta to let users test the solution, receive real feedback, and track usage metrics.

During the first month, we had more than 1,000 registered users and a retention rate between 10% and 20%. The next step is to iterate on the product and increase this metric.