Skip to content

Gemini CLI 核心:工具 API

¥Gemini CLI Core: Tools API

Gemini CLI 核心(packages/core) 拥有一个强大的系统,用于定义、注册和执行工具。这些工具扩展了 Gemini 模型的功能,使其能够与本地环境交互、获取 Web 内容,并执行除简单文本生成之外的各种操作。

¥The Gemini CLI core (packages/core) features a robust system for defining, registering, and executing tools. These tools extend the capabilities of the Gemini model, allowing it to interact with the local environment, fetch web content, and perform various actions beyond simple text generation.

核心概念

¥Core Concepts

  • 工具 (tools.ts):接口和基类(BaseTool) 定义了所有工具的契约。每个工具必须具备:

    ¥Tool (tools.ts): An interface and base class (BaseTool) that defines the contract for all tools. Each tool must have:

  • name:唯一的内部名称(用于对 Gemini 的 API 调用)。

    ¥name: A unique internal name (used in API calls to Gemini).

  • displayName:一个用户友好的名称。

    ¥displayName: A user-friendly name.

  • description:清晰地解释该工具的作用,并提供给 Gemini 模型。

    ¥description: A clear explanation of what the tool does, which is provided to the Gemini model.

  • parameterSchema:定义该工具接受的参数的 JSON 模式。这对于 Gemini 模型理解如何正确调用该工具至关重要。

    ¥parameterSchema: A JSON schema defining the parameters that the tool accepts. This is crucial for the Gemini model to understand how to call the tool correctly.

  • validateToolParams():验证传入参数的方法。

    ¥validateToolParams(): A method to validate incoming parameters.

  • getDescription():一种方法,在执行之前提供工具将如何处理特定参数的可读描述。

    ¥getDescription(): A method to provide a human-readable description of what the tool will do with specific parameters before execution.

  • shouldConfirmExecute():一种确定执行前是否需要用户确认的方法(例如,对于潜在的破坏性操作)。

    ¥shouldConfirmExecute(): A method to determine if user confirmation is required before execution (e.g., for potentially destructive operations).

  • execute():执行工具操作并返回的核心方法ToolResult

    ¥execute(): The core method that performs the tool's action and returns a ToolResult.

  • ToolResult(tools.ts):定义工具执行结果的结构的接口:

    ¥ToolResult (tools.ts): An interface defining the structure of a tool's execution outcome:

  • llmContent:要包含在历史记录中的事实内容,返回给 LLM 作为上下文。可以是一个简单的字符串,也可以是PartListUnion(一系列Part对象和字符串)来获得丰富的内容。

    ¥llmContent: The factual content to be included in the history sent back to the LLM for context. This can be a simple string or a PartListUnion (an array of Part objects and strings) for rich content.

  • returnDisplay:一个用户友好的字符串(通常是 Markdown)或一个特殊对象(例如FileDiff) 在 CLI 中显示。

    ¥returnDisplay: A user-friendly string (often Markdown) or a special object (like FileDiff) for display in the CLI.

  • 返回丰富内容:工具并不局限于返回简单的文本。llmContent可以是PartListUnion,这是一个可以包含混合的数组Part对象(例如图像、音频等)和strings. 这允许单个工具执行返回多条丰富的内容。

    ¥Returning Rich Content: Tools are not limited to returning simple text. The llmContent can be a PartListUnion, which is an array that can contain a mix of Part objects (for images, audio, etc.) and strings. This allows a single tool execution to return multiple pieces of rich content.

  • 工具注册表(tool-registry.ts):一个类(ToolRegistry) 负责:

    ¥Tool Registry (tool-registry.ts): A class (ToolRegistry) responsible for:

  • 注册工具:保存所有可用的内置工具的集合(例如,ReadFileToolShellTool)。

    ¥Registering Tools: Holding a collection of all available built-in tools (e.g., ReadFileTool, ShellTool).

  • 发现工具:它还可以动态发现工具:

    ¥Discovering Tools: It can also discover tools dynamically:

    • 基于命令的发现:如果toolDiscoveryCommand在设置中配置后,此命令将被执行。预期输出描述自定义工具的 JSON,这些工具随后会被注册为DiscoveredTool实例。

      ¥Command-based Discovery: If toolDiscoveryCommand is configured in settings, this command is executed. It's expected to output JSON describing custom tools, which are then registered as DiscoveredTool instances.

    • 基于MCP的发现:如果mcpServerCommand配置完成后,注册表可以连接到模型上下文协议 (MCP) 服务器来列出和注册工具(DiscoveredMCPTool)。

      ¥MCP-based Discovery: If mcpServerCommand is configured, the registry can connect to a Model Context Protocol (MCP) server to list and register tools (DiscoveredMCPTool).

  • 提供模式:暴露FunctionDeclaration将所有注册工具的模式添加到 Gemini 模型中,这样它就知道有哪些工具可用以及如何使用它们。

    ¥Providing Schemas: Exposing the FunctionDeclaration schemas of all registered tools to the Gemini model, so it knows what tools are available and how to use them.

  • 检索工具:允许核心通过名称获取特定工具来执行。

    ¥Retrieving Tools: Allowing the core to get a specific tool by name for execution.

内置工具

¥Built-in Tools

该核心带有一套预定义的工具,通常位于packages/core/src/tools/其中包括:

¥The core comes with a suite of pre-defined tools, typically found in packages/core/src/tools/. These include:

  • 文件系统工具:

    ¥File System Tools:

  • LSTool(ls.ts): 列出目录内容。

    ¥LSTool (ls.ts): Lists directory contents.

  • ReadFileTool(read-file.ts): 读取单个文件的内容。它需要一个absolute_path参数,必须是绝对路径。

    ¥ReadFileTool (read-file.ts): Reads the content of a single file. It takes an absolute_path parameter, which must be an absolute path.

  • WriteFileTool(write-file.ts): 将内容写入文件。

    ¥WriteFileTool (write-file.ts): Writes content to a file.

  • GrepTool(grep.ts): 在文件中搜索模式。

    ¥GrepTool (grep.ts): Searches for patterns in files.

  • GlobTool(glob.ts): 查找与 glob 模式匹配的文件。

    ¥GlobTool (glob.ts): Finds files matching glob patterns.

  • EditTool(edit.ts): 对文件执行就地修改(通常需要确认)。

    ¥EditTool (edit.ts): Performs in-place modifications to files (often requiring confirmation).

  • ReadManyFilesTool(read-many-files.ts): 读取并连接来自多个文件或 glob 模式的内容(由@命令)。

    ¥ReadManyFilesTool (read-many-files.ts): Reads and concatenates content from multiple files or glob patterns (used by the @ command in CLI).

  • 执行工具:

    ¥Execution Tools:

  • ShellTool(shell.ts): 执行任意 shell 命令(需要仔细的沙盒和用户确认)。

    ¥ShellTool (shell.ts): Executes arbitrary shell commands (requires careful sandboxing and user confirmation).

  • Web 工具:

    ¥Web Tools:

  • WebFetchTool(web-fetch.ts): 从 URL 获取内容。

    ¥WebFetchTool (web-fetch.ts): Fetches content from a URL.

  • WebSearchTool(web-search.ts): 执行网络搜索。

    ¥WebSearchTool (web-search.ts): Performs a web search.

  • 记忆工具:

    ¥Memory Tools:

  • MemoryTool(memoryTool.ts): 与AI的记忆进行互动。

    ¥MemoryTool (memoryTool.ts): Interacts with the AI's memory.

这些工具中的每一个都扩展了BaseTool并实现其特定功能所需的方法。

¥Each of these tools extends BaseTool and implements the required methods for its specific functionality.

工具执行流程

¥Tool Execution Flow

  1. 模型要求:Gemini 模型根据用户的提示和提供的工具模式,决定使用一个工具并返回一个FunctionCall在其响应中指定工具名称和参数。

    ¥Model Request: The Gemini model, based on the user's prompt and the provided tool schemas, decides to use a tool and returns a FunctionCall part in its response, specifying the tool name and arguments.

  2. 核心接收请求:核心解析这个FunctionCall

    ¥Core Receives Request: The core parses this FunctionCall.

  3. 工具检索:它在ToolRegistry

    ¥Tool Retrieval: It looks up the requested tool in the ToolRegistry.

  4. 参数验证:该工具的validateToolParams()方法被调用。

    ¥Parameter Validation: The tool's validateToolParams() method is called.

  5. 确认(如果需要):

    ¥Confirmation (if needed):

    • 该工具的shouldConfirmExecute()方法被调用。

      ¥The tool's shouldConfirmExecute() method is called.

    • 如果它返回需要确认的详细信息,核心会将其传达回 CLI,然后提示用户。

      ¥If it returns details for confirmation, the core communicates this back to the CLI, which prompts the user.

    • 用户的决定(例如,继续、取消)被发送回核心。

      ¥The user's decision (e.g., proceed, cancel) is sent back to the core.

  6. 执行:如果经过验证并确认(或者不需要确认),核心将调用该工具的execute()方法与提供的参数和一个AbortSignal(可能取消)。

    ¥Execution: If validated and confirmed (or if no confirmation is needed), the core calls the tool's execute() method with the provided arguments and an AbortSignal (for potential cancellation).

  7. 结果处理:ToolResultexecute()被核心接收。

    ¥Result Processing: The ToolResult from execute() is received by the core.

  8. 对模型的响应:llmContentToolResult被包装为FunctionResponse并发送回 Gemini 模型,以便它可以继续生成面向用户的响应。

    ¥Response to Model: The llmContent from the ToolResult is packaged as a FunctionResponse and sent back to the Gemini model so it can continue generating a user-facing response.

  9. 显示给用户:returnDisplayToolResult发送到 CLI 来向用户显示该工具执行的操作。

    ¥Display to User: The returnDisplay from the ToolResult is sent to the CLI to show the user what the tool did.

使用自定义工具进行扩展

¥Extending with Custom Tools

虽然在为典型最终用户提供的文件中没有明确详细说明用户直接以编程方式注册新工具的主要工作流程,但该体系结构支持通过以下方式进行扩展:

¥While direct programmatic registration of new tools by users isn't explicitly detailed as a primary workflow in the provided files for typical end-users, the architecture supports extension through:

  • 基于命令的发现:高级用户或项目管理员可以定义toolDiscoveryCommandsettings.json。此命令由 Gemini CLI 核心运行时,应输出一个 JSON 数组,内容为FunctionDeclaration对象。核心随后会将这些对象作为DiscoveredTool实例。相应的toolCallCommand然后将负责实际执行这些自定义工具。

    ¥Command-based Discovery: Advanced users or project administrators can define a toolDiscoveryCommand in settings.json. This command, when run by the Gemini CLI core, should output a JSON array of FunctionDeclaration objects. The core will then make these available as DiscoveredTool instances. The corresponding toolCallCommand would then be responsible for actually executing these custom tools.

  • MCP 服务器:对于更复杂的场景,可以通过mcpServers设置settings.json。Gemini CLI 核心可以发现并使用这些服务器公开的工具。如前所述,如果您有多个 MCP 服务器,则工具名称将以您配置中的服务器名称为前缀(例如,serverAlias__actualToolName)。

    ¥MCP Server(s): For more complex scenarios, one or more MCP servers can be set up and configured via the mcpServers setting in settings.json. The Gemini CLI core can then discover and use tools exposed by these servers. As mentioned, if you have multiple MCP servers, the tool names will be prefixed with the server name from your configuration (e.g., serverAlias__actualToolName).

该工具系统提供了一种灵活而强大的方式来增强 Gemini 模型的功能,使 Gemini CLI 成为各种任务的多功能助手。

¥This tool system provides a flexible and powerful way to augment the Gemini model's capabilities, making the Gemini CLI a versatile assistant for a wide range of tasks.