Markprompt aims to connect all your sources of content into a cohesive embeddings base from which you can build LLM-native applications. We started with GitHub repositories and file uploads, and are now adding websites.
In the Markprompt dashboard, head over to the Data tab, and select "Connect website". Hit train, and Markprompt will crawl through all the pages on the website and add them to the embeddings database. You will then be able to ask questions to your website!
#Subpaths
If you provide a subpath, such as markprompt.com/blog
, only pages under /blog
will be indexed.
#Excluding paths
In the project settings, you can set up path inclusion and exclusion rules to omit specific paths, such as your privacy policy or terms pages:
{"include": ["**/*"],"exclude": ["/terms","/privacy"]}
You can read more about this in our Path configuration docs.
#robots.txt and sitemap.xml
Markprompt respects your robots.txt
and sitemap.xml
configurations. If a path is listed under disallow
in your robots.txt
file, it will not be included. If you have a sitemap.xml
file, only paths listed there will be included.