Introducing website sources

Introducing website sources

Markprompt aims to connect all your sources of content into a cohesive embeddings base from which you can build LLM-native applications. We started with GitHub repositories and file uploads, and are now adding websites.

In the Markprompt dashboard, head over to the Data tab, and select "Connect website". Hit train, and Markprompt will crawl through all the pages on the website and add them to the embeddings database. You will then be able to ask questions to your website!

Subpaths

If you provide a subpath, such as markprompt.com/blog, only pages under /blog will be indexed.

Excluding paths

In the project settings, you can set up path inclusion and exclusion rules to omit specific paths, such as your privacy policy or terms pages:

json
1{
2  "include": [
3    "**/*"
4  ],
5  "exclude": [
6    "/terms",
7    "/privacy"
8  ]
9}

You can read more about this in our Path configuration docs.

robots.txt and sitemap.xml

Markprompt respects your robots.txt and sitemap.xml configurations. If a path is listed under disallow in your robots.txt file, it will not be included. If you have a sitemap.xml file, only paths listed there will be included.