It’s been almost 9 months since ChatGPT’s release precipitated an AI hype-cycle that has permeated pop culture and led to a wave of VC funding in early-stage startups building their own Large Language Models (LLMs) or applications powered by LLMs. Although ChatGPT’s user base shrank for the first time in June 2023, the race is still on to build enterprise value with LLM-powered tools, with one third of YC’s current cohort being “AI” startups. Many successful companies are already being built upon LLM-driven products and use-cases:
- Copywriting (Jasper.ai, Copy.ai)
- Data-Querying (Ottertune)
- Code-Generation (Github CoPilot, CodeGen)
- Content-Generation & Design (Tome, Galileo)
- Coaching & Tutoring (Korbit, Glean)
Over the last few months at OAK’S LAB, we’ve been fortunate enough to work with some of the startups in this space. Although the technology is still relatively new, I wanted to share some best practices that we’ve learnt along the way:
1. Start with an out-of-the-box LLM for your MVP
If you’re a founder wanting to leverage an LLM in your product, you essentially have three options:
- Build and train your own model from scratch
- Fine-tune an existing open-source model
- Use the API of an existing model (such as GPT-4)
As a startup building an MVP, we’d recommend starting off by using the APIs of the well-known models. This allows you to see what is achievable using only prompt engineering (or “in-context learning”). You should be looking to quickly learn if your idea is even feasible and if you’re even solving a real customer problem that they’ll pay money to solve. If you’re going to fail, this approach gives you the chance to do so quickly and cheaply.
While fine-tuning might be the best option for some companies in the future, it requires proprietary data, compute power, and data science know-how. Because we’re so early, each month new tools and new GPUs are being released that are rapidly reducing the cost and difficulty of all of these things. The open-source models that exist right now will likely all be replaced by superior models in 6 months time. There is no need to rush into pre-training a model that might be redundant by the end of the year!
2. Be open to different models
You can get started with OpenAI’s well-known models (GPT-3.5-turbo & GPT-4), but we also recommend testing out the lesser known alternatives such as Anthropic’s Claude. Depending on the task you’re trying to perform, it may be that different models might generate better results than others. Vercel’s AI playground is a great place to compare how the different models perform on the same task.
A single product can also use multiple different models. If you’re building a chatbot, it’s possible to chain together different “agents” to allow you to call different models depending on the question that needs to be answered. For example, with one of our products we utilize:
- GPT-3.5-Turbo for simple tasks to save on cost and increase speed
- GPT-4 for more complex summarization and generative tasks
- Claude when we want our chatbot to sound as “human-like” as possible
We’re still very early, so it’s important to know that, whatever model you use now, it will almost certainly be redundant in a year. For this reason, it’s important to build your product in a way that is “model-agnostic” so that you can plug in new and better models over time.
3. Understand what skill sets and resources you need
A common misconception that founders still seem to have is that, to build an AI-powered application in 2023, you need “Machine-Learning Engineers” or “Data Scientists.” This simply isn’t true. Building deep-learning algorithms and neural networks may require a specific skill set, but it’s now totally achievable for a more classic “full-stack” software engineer to build AI-powered applications.
Right now, one of the most powerful tools empowering product builders to work with LLMs is LangChain. It’s a framework, developed in 2022, that allows you to “chain” together different components to create advanced use cases around LLMs. Imagine you want to build a chatbot that is able to query your database to respond in a specific way to your customer. You might want to understand some personal details– what they’ve purchased and whether they’ve had any previous conversations with us. LangChain provides the tools to enable engineers without “AI” experience to connect these data sources with prompts and agents and provide a great experience to the end user.
4. Work on your prompts
Prompt engineering completely alters the paradigm for how software can be built. Rather than code, a lot of your product’s user experience will be defined by prompts written in natural language. Software engineers are necessary to build the underlying logic and infrastructure upon which your product can run, but they might not be skilled in writing prose!
There are numerous resources available to support with prompt engineering. Some good, some bad. (We recommend this learn prompting guide). But it might be valuable to iterate on your prompts with non-technical team members such as marketers or product managers, who have more experience with writing quality, coherent copy. I’ve personally found that my legal background has been a great advantage when creating concise and precise prompts.
In particular, it’s important to provide the model with examples of what a good or bad response might be. This is known as “one-shot” or “few-shot” prompting and has been extremely effective in the applications we’ve built to date.
5. Iterate, iterate, iterate
Even in more classic product development, building software is an inherently uncertain process. This is exponentially magnified when working with LLMs. Because the models are often black boxes, it is virtually impossible to predict the exact output from any given prompt. For this reason, it’s vital to be constantly iterating and testing to find the output you’re looking for.
Typically, for our SaaS products, we release a new version at the end of each bi-weekly sprint. With our LLM-powered products, we’re releasing a new version of the application every 1-2 days. You can also save a lot of time by using OpenAI’s playground to iterate on your prompts rapidly without having to release a new version of your application.
6. Create a way to measure success
If you’re going to release new versions of the products at short intervals, it’s also vital that you’re providing feedback to each version of the output in a constructive way. Your feedback to each version of the product can inform the next iteration, but it can also be used for future machine learning algorithms.
When building our products, we set up success criteria to evaluate the model's output for each iteration. We’ve then built simple web applications where our team, the startup’s team, and users can evaluate the output using both a simple scoring model and more long-form written feedback.
7. Manage your budget effectively
The APIs of existing LLMs typically use a token-based pricing model. This means that the price you pay each month is a function of the length of your prompt and the number of times the API is called. While competition and GPU power will continue to drive the price down over time, it’s still important that the unit economics of your product works. We recommend modeling out your anticipated number of tokens and the number of times the API is called to calculate your total cost. You’ll need to continuously revise your budget as you iterate on your product and understand how it’s used.
You can optimize your cost in a number of different ways. The first version of your prompt may be long, but you can continue to reduce its size over time by being extremely concise and effective with your language. It’s also important to carefully consider which model you use for each task. GPT-4 will almost always give you a “better” response than GPT-3.5, but for simpler tasks, the cheaper models may suffice.
We expect AI interest in both the startup and enterprise ecosystems to continue to grow in the upcoming months. If you have any questions on how you can leverage LLMs in your products and build AI applications, please reach out!