Prompts are now a new way to write application logic. When it comes to manipulating text, this is particularly clear. In this post, we want to share a text transformation problem we recently tackled using a carefully engineered prompt. A few months ago, we would have resorted to the usual techniques, such as regular expressions and parsers. These techniques are comforting in that they are well known, and work exactly as expected (they are deterministic). However, using a prompt approach turned out to be easy to implement, flexible, and cheap. Most importantly, it enables end users (not just us, developers of the platform) to customize the behavior in infinitely many new ways, essentially giving everyone a way to "program behavior".
Caveat: this world is new to us. Our approach feels right and feels powerful, but time will tell whether it is reliable at scale and will turn into an industry best practice.
The problem
One of Markprompt's enterprise customers has been syncing a large docs repository containing Markdown files in order to create a Q&A bot. It works remarkably well. However, after conducting some QA, we noticed that some links in the generated responses would not be valid. Investigating the source Markdown files, we saw the following:
- Some links were anchor links, like
[Step 1](#step1)
. - Some links were file paths relative to the file tree, like
[Welcome](/docs/welcome.md)
. In the live website, the/docs
root and the extension are removed, so the link should read/welcome
. - Some links were relative to the current file, like
[Start](projects/start.md)
inside/docs/guides/index.md
. In the live website, the two paths should be merged into/guides/projects/start
.
In addition, a prompt response can be built from several source files. For instance, a response like this:
For projects, refer to the Projects tutorial, and for spaces, refer to the Spaces tutorial.
could be made up from two distinct sources, like
and
Just like a search box, the prompt interface can be opened from any page. For instance, an anchor link would only work if it includes the full path of the page.
So the problem is: how can we make sure that all links contained in prompt responses are fully specified, and independent of the page from which the response is being displayed?
The "software engineer solution"
One obvious approach to solving this problem is to parse all Markdown content during training using Remark, detect Markdown links, and prepend a base path depending on the file that is being processed. That solution would guarantee a correct outcome.
However, we were not satisfied with this solution for the following reasons:
- As a platform, our solution needs to work for any customer, not just for a single one. So should we expose an API that allows customers to specify how links should be transformed? One customer may want to remove the
/docs
base path, another may have a/blog
base path, another may want to keep links verbatim. It introduces new complexity in our codebase. - All content would need to be retrained, and this would be costly (Markprompt has 500,000 indexed sections currently).
- We were not comfortable with altering the original content. Perhaps the same customer would want the transformed content in some situations, and the original content in others. Better keep the source content intact.
So if we didn't want to modify the original content, could we move the transformation to the other end of the pipeline, when the response is being generated for the user? We were not happy about this solution either:
- We would need to put in place some middleware (either by shipping a Markdown parser to the client, adding to the bundle size, or transforming the response stream in a server component, costing CPU cycles).
- Like before, we'd need to create an API to allow customers to specify rules for transforming the content.
The "prompt engineer solution"
Having played around extensively with GPT-4 (for instance, to simulate CRDTs for text collaboration), we knew the power of the approach. With sufficiently clear instructions, logical and predictible behavior could be achieved. So we started experimenting with altering the input prompt to do the link transformation task.
Let's first go through how Markprompt works. When a customer sends their data (e.g. a set of Markdown files) for training, each file is being chunked into small sections and stored as embeddings, which capture the "meaning" of the section. Then, when a user asks a question to the docs, Markprompt finds the sections that are closest in meaning to the question, and thus may contain key information to produce a satisfactory response. These relevant sections are then injected into the following "parent" prompt:
Here, {{CONTEXT}}
corresponds to these relevant sections, and it is in these sections that we can have links that need to be transformed.
In addition to the section content, Markprompt also injects an identifier that gives information about where the section is taken from. This identifier is typically the path of the file in a repository. So the final prompt sent to the completions endpoint looks something like this:
It turns out that we can get far with this basic information. After a few tries in the OpenAI GPT-4 playground, we came up with a way to instruct GPT-4 to transform the links to fulfill the requirements:
It works like a charm. As you can see from the prompt, we provide some clear and explicit instructions as to how links should be transformed, as well as examples. And it works! Anchor links are correctly prepended, /docs
is removed from absolute paths, the .md
file extension is gone, and relative paths are rebuilt taking into account the relative location of the parent file. All other links are kept intact.
What is more, this prompt is exposed as a template via the templatePrompt
parameter in our API. This means that any customer can tweak it to suit their needs, essentially giving them a way to program behavior—similar to a runtime with evals.
Conclusion
We loved this approach because we didn't have to change anything in the current setup (no new APIs to expose and maintain, no migrations, no middleware). We also loved the approach because it is versatile. It can be used not just for links, but for any transformation task, such as styling images, harmonizing tone and style, and even translating content. It's also easy to share: instead of publishing libraries providing new functionality (e.g. on NPM), users can share prompt snippets. We plan to create a shared repository of prompts, so people can share their "prompt hacks".
Of course, this is not a perfect solution. In fact, it's not guaranteed to work. LLMs are by nature non-deterministic (in Markprompt, we mitigate this by setting a low temperature and instructing the prompt to be strict about the data source). New edge cases may come up. But so far, the results are promising.
We genuinely enjoyed coming up with the above solution. Instead of writing machine instructions in code, it felt like teaching a task to a novice. There is something special about explaining things in a simple and succint manner, and it feels like programming computers will increasingly be akin to teaching and learning from each other—an incredibly gratifying feeling!