Using documents

An agent can use one or multiple documents, and the same documents can be shared across multiple agents.

To use a document:

In the Navigation menu pane, expand AI Agents, and then click Agents.
Create a new AI Agent, or select and existing one, and click Edit.
In the Documents tab,
1. From the 'Use documents; drop-down list, select one of the supported modes. For details, see Supported modes.
2. Add the relevant documents.

Supported modes

Supported modes for working with documents.

RAG in every query

Implements classic Retrieval-Augmented Generation (RAG) pipeline:

For each user query AI Agent determines relevant document extracts (chunks) by performing semantic search.
Relevant document extracts are appended to user’s query – providing additional context to LLM.

This mode is ideally suited for Q&A agents, however may be problematic if agent needs to call some tools – as document extract may “distract” LLM from following the conversation context.

RAG via doc_search tool

Implements agentic RAG pipeline, where LLM decides on its own when it needs document extracts and gets them via the doc_search tool, implicitly added to the agent.

This mode co-exists much better with other tools. It may also slightly reduce agent’s costs, because the LLM may sometimes skip extracts generation, for example, for initial “Hello” message from user.

Advanced version of doc_search mode may be activated by setting the ‘doc_search mode’ parameter to “Advanced”. In this mode, the LLM needs to specify name of the document in which it searches for relevant extracts, rather than perform the search in all attached documents.

Full content in prompt

Includes complete content of attached documents in the prompt. This mode is suitable for relatively short documents where complete document content is required for proper agent behavior.

For example, consider an agent that recommends a room in the hotel suitable for customer needs. That agent has access to a document that describes all types of rooms in the hotel. If you rely on RAG, your agent may get partial document extracts, where some types of rooms are missing, and thus fail to do its task. Including complete document content in prompt will mitigate this problem.

Content via doc_get tool

Implements agentic pipeline, where the LLM can request content from specific document via the doc_get tool, implicitly added to the agent.

This mode allows you to attach multiple documents to the agent, and let agent access the specific document content when it’s needed.

Advanced version of doc_get mode may be activated by setting the ‘doc_get mode’ parameter to “Advanced”. In this mode, the LLM can learn table of contents of specific document (deduced from its headers) via the doc_toc tool, implicitly added to the agent and then request specific section of the document instead of the full document content.

Document tool responses in message history

Responses of doc_search tool are by default redacted from the message history (conversation context) to reduce token consumption.

Responses of doc_get and doc_toc tools are by default kept in the message history.

You may change this behavior via the following agent’s advanced configuration parameters:

{
    "doc_tools": {
        "tool name": {  # "doc_search" / "doc_get" / "doc_toc"
            "redact_response": true / false
        }
     }
}

Controlling amount of consumed documents data

“RAG for every query” and “RAG via doc_search tool” modes by default use 5 document extracts semantically close to user query. You may customize this number via rag_chunks advanced configuration parameter to optimize retrieval performance based on your specific use case. Increasing the number of chunks provides more comprehensive context for complex queries but may lead to higher token usage and processing costs, while reducing the number creates more focused responses with lower computational overhead but potentially less complete information coverage.

{
    "rag_chunks ": 10
}

In “Full content in prompt” and “Full content via doc_get tool” modes, you may use doc_content_len advanced configuration parameter to limit the total length of the consumed document content. This allows you to manage token usage and processing costs while also improving LLM response times.

{
    "doc_content_len": 50000
}