From ecebcb6d6426082f8e2816a787733d80b0f17682 Mon Sep 17 00:00:00 2001 From: shenlan Date: Tue, 12 Aug 2025 22:32:45 +0800 Subject: [PATCH] docs: detail Milestone 2 subtasks --- README.md | 48 +++++++++- docs/Milestone-2-todo.md | 36 ++++++++ docs/Milestone-2.md | 37 ++++++++ docs/Roadmap.md | 27 ++++++ docs/api-endpoints.md | 51 +++++++++-- docs/changelog.md | 17 ++++ go.mod | 10 ++- go.sum | 23 +++-- server/api/askai.go | 182 +++++++++----------------------------- server/config/server.yaml | 10 +-- 10 files changed, 276 insertions(+), 165 deletions(-) create mode 100644 docs/Milestone-2-todo.md create mode 100644 docs/Milestone-2.md create mode 100644 docs/Roadmap.md create mode 100644 docs/changelog.md diff --git a/README.md b/README.md index 541d823..a2c14fc 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,19 @@ All UI components provide both Chinese and English interfaces. | Gateway | OpenResty | 1.27.1.2 | | Database | PostgreSQL + pgvector | N/A | | Cache | Redis | N/A | -| Model | Large Language Models via CodePRobot | N/A | +| Model (Local) | HuggingFace Hub + Ollama | N/A | +| Model (Online) | Chutes AskAI + CodePRobot | N/A | + +## LangChainGo 核心功能一览 + +XControl 通过 LangChainGo 统一接入多种大模型,并为 AskAI、CLI 与 Server 提供链式调用能力: + +- **LLM 接口层(Model I/O)**:统一调用 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型接口。 +- **Chains(链式流程)**:将 prompt、检索结果、工具调用等组合成完整流程,支持 RAG、聊天、代码生成等场景。 +- **工具与 Agent 体系**:定义 Web 搜索、Scraper、SQL 查询等工具,并集成到 LLM Agent,实现 ReAct 风格的工具调用。 +- **向量检索与数据接入**:适配 PGVector、Weaviate、Qdrant、MongoDB Atlas Vector Search、Chroma、Pinecone、Redis Vector 等向量存储。 +- **文档加载与分块**:提供 Document Loaders 与 Text Splitters,用于处理长文本与构建向量检索块。 +- **Memory 与历史追踪**:支持 Conversation Buffer 等对话记忆机制,增强交互体验。 ## Supported Platforms @@ -88,6 +100,40 @@ log: The flag value takes precedence over the configuration file. +## Changelog + +See [docs/changelog.md](./docs/changelog.md) for a list of completed changes, including all work from Milestone 1. + +## Roadmap + +The roadmap below is also available in [docs/Roadmap.md](./docs/Roadmap.md). + +### Milestone 1: MVP (Completed) +- Use default Redis port (#98) and establish PostgreSQL & Redis baseline. +- Stream RAG sync progress for GitHub repository synchronization (#100). +- Add client-side Markdown parsing to the CLI (#104). +- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103). +- RAG API functional tests and per-file ingestion workflow (#115). +- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120). +- Ingest files automatically (#123). + +### Milestone 2: Hybrid Search +- CLI and server dynamically support 1024-dimensional embeddings. +- Update docs and configs to vector(1024) (#130). +- Add embedding configuration fields (#131). +- Add RAG API integration tests for vectors (#132). +- Add allama support (#136). +- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19). +- Deploy XControl panel via GitHub Actions (#20). +- Fix yarn lock context concatenation (#21). + +### Milestone 3: Production Monitoring & Optimization +- Switch server and CLI to Cobra (#133). +- Add repo sync proxy configuration (#135). +- Allow custom AskAI timeout (#141). +- Add log level support to CLI and server and log AskAI errors (#125, #140). +- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129). + ## License This project is licensed under the terms of the [MIT License](./LICENSE). diff --git a/docs/Milestone-2-todo.md b/docs/Milestone-2-todo.md new file mode 100644 index 0000000..6244b75 --- /dev/null +++ b/docs/Milestone-2-todo.md @@ -0,0 +1,36 @@ +# Milestone 2 TODO + +使用 LangChainGo 框架优化 CLI、Server 以及 AskAI 接口的子任务规划: + +1. **LLM 接口层(Model I/O)** + - [ ] 构建 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型的 provider registry。 + - [ ] 在 CLI 与 Server 配置中暴露模型提供商切换能力。 + - [ ] 编写单元测试验证不同 provider 间的切换。 + - [ ] 补充配置和环境变量使用文档。 +2. **Chains(链式流程)** + - [ ] 将 prompt、检索结果、工具调用组合成 RAG 与聊天链。 + - [ ] 为 AskAI 提供可复用的链式定义,支持复杂任务编排。 + - [ ] 在 CLI 中提供链式调用示例。 + - [ ] 编写链式流程的集成测试。 +3. **工具与 Agent 体系** + - [ ] 实现 Web 搜索、Scraper、SQL 查询等常用工具。 + - [ ] 将工具注册到 Agent 框架中,支持动态调用。 + - [ ] 在 CLI 中演示 ReAct 风格的工具调用。 + - [ ] 为工具与 Agent 交互添加测试用例。 +4. **向量检索与数据接入** + - [ ] 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。 + - [ ] 支持自定义向量维度与检索参数。 + - [ ] 为不同向量存储编写基准测试与比较。 + - [ ] 提供检索参数调优的文档示例。 +5. **文档加载与分块** + - [ ] 提供 Markdown、代码、HTML 等多格式的 Document Loader。 + - [ ] 支持按 token 或递归策略的 Text Splitter。 + - [ ] 统一存储分块结果并支持增量更新 API。 + - [ ] 为 loader 与 splitter 编写测试。 +6. **Memory 与历史追踪** + - [ ] 为 AskAI 增加 conversation buffer 等对话记忆。 + - [ ] 在 Server 中持久化会话历史并提供配置项。 + - [ ] 支持调整记忆长度与清理策略。 + - [ ] 编写端到端测试验证记忆保留。 + +以上任务将逐步落实,以完成混合检索与多模型支持目标。 diff --git a/docs/Milestone-2.md b/docs/Milestone-2.md new file mode 100644 index 0000000..d6f6413 --- /dev/null +++ b/docs/Milestone-2.md @@ -0,0 +1,37 @@ +# Milestone 2: Hybrid Search + +RAG 第二阶段优化规划 + +参考 GitHub issue "RAG 第二优化节点阶段",本阶段围绕现有 RAG 系统继续迭代,目标是提升检索效果与服务稳定性,并扩展多模型与多数据源支持。 + +## 目标 + +- 提升向量检索精准度与性能。 +- 支持增量同步与多仓库数据接入。 +- 提供多种嵌入与大模型选择,方便灵活部署。 +- 加强 API/CLI 的错误处理、监控与自动化测试。 + +## 主要任务 + +1. **向量检索优化** + - 对比评估不同嵌入模型与相似度度量。 + - 引入向量索引/压缩策略,减少查询延迟。 +2. **数据同步管道** + - 实现增量更新机制,按需重建向量。 + - 支持同步进度追踪与失败重试。 +3. **多模型与配置** + - 通过 LangChainGo 统一接入本地及云端模型。 + - 允许针对不同模型自定义参数与超时配置。 +4. **API 与 CLI 稳定性** + - 改进异常处理与日志记录,暴露更多诊断信息。 + - 完善集成测试,覆盖 RAG upsert 与查询流程。 +5. **监控与观测** + - 接入指标与日志上报,便于性能分析。 + - 构建健康检查与告警机制。 + +## 里程碑 + +- **M2.1**:完成增量同步与检索优化的原型验证。 +- **M2.2**:集成多模型支持并上线监控体系。 +- **M2.3**:完善自动化测试与文档,准备下一阶段迭代。 + diff --git a/docs/Roadmap.md b/docs/Roadmap.md new file mode 100644 index 0000000..7f58d27 --- /dev/null +++ b/docs/Roadmap.md @@ -0,0 +1,27 @@ +# Roadmap + +## Milestone 1: MVP (Completed) +- Use default Redis port (#98) and establish PostgreSQL & Redis baseline. +- Stream RAG sync progress for GitHub repository synchronization (#100). +- Add client-side Markdown parsing to the CLI (#104). +- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103). +- RAG API functional tests and per-file ingestion workflow (#115). +- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120). +- Ingest files automatically (#123). + +## Milestone 2: Hybrid Search +- CLI and server dynamically support 1024-dimensional embeddings. +- Update docs and configs to vector(1024) (#130). +- Add embedding configuration fields (#131). +- Add RAG API integration tests for vectors (#132). +- Add allama support (#136). +- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19). +- Deploy XControl panel via GitHub Actions (#20). +- Fix yarn lock context concatenation (#21). + +## Milestone 3: Production Monitoring & Optimization +- Switch server and CLI to Cobra (#133). +- Add repo sync proxy configuration (#135). +- Allow custom AskAI timeout (#141). +- Add log level support to CLI and server and log AskAI errors (#125, #140). +- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129). diff --git a/docs/api-endpoints.md b/docs/api-endpoints.md index 11fc91c..49e63c2 100644 --- a/docs/api-endpoints.md +++ b/docs/api-endpoints.md @@ -74,16 +74,51 @@ Expected response on success: `{"rows":1}`. If the vector database is unavailabl ``` ## POST /api/askai -- **Description:** Ask the AI service for an answer. Requires a valid Chutes token in the server configuration. +- **Description:** Ask the AI service for an answer. The endpoint uses [LangChainGo](https://github.com/tmc/langchaingo) to communicate with the configured model provider (e.g., OpenAI-compatible services or a local Ollama instance). Ensure the server configuration includes the proper token or local server URL. - **Body Parameters (JSON):** - `question` – Question text. -- **Configuration:** In `server/config/server.yaml` the `api.askai` section controls request behaviour: - ```yaml - api: - askai: - timeout: 60 # seconds - retries: 3 # retry attempts - ``` +**Configuration:** In `server/config/server.yaml` the `models` section selects the LLM and embedding providers. +For local debugging with HuggingFace and Ollama: + +```yaml +models: + embedder: + provider: "huggingface_hub" + models: "bge-m3" + endpoint: "http://127.0.0.1:9000/v1/embeddings" + generator: + provider: "ollama" + models: + - 'llama2:13b' + endpoint: "http://127.0.0.1:11434/v1/chat/completions" +``` + +For online services using Chutes: + +```yaml +#models: +# embedder: +# provider: "chutes" +# models: "bge-m3" +# endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed" +# token: "cpk_xxxx" +# generator: +# provider: "chutes" +# models: +# - 'moonshotai/Kimi-K2-Instruct' +# endpoint: "https://llm.chutes.ai/v1/chat/completions" +# token: "cpk_xxxx" +``` + +The `api.askai` section controls request behaviour: + +```yaml +api: + askai: + timeout: 60 # seconds + retries: 3 # retry attempts +``` + - **Test:** ```bash curl -X POST http://localhost:8080/api/askai \ diff --git a/docs/changelog.md b/docs/changelog.md new file mode 100644 index 0000000..246f7da --- /dev/null +++ b/docs/changelog.md @@ -0,0 +1,17 @@ +# Changelog + +## Milestone 1: MVP +- Use default Redis port (#98) and establish PostgreSQL & Redis baseline. +- Stream RAG sync progress for GitHub repository synchronization (#100). +- Add client-side Markdown parsing to the CLI (#104). +- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103). +- Perform RAG API functional tests. +- Support per-file ingestion workflow in the CLI (#115). +- Allow RAG upsert to migrate embedding dimensions (#119). +- Add pgvector database initialization guide (#120). +- Ingest files automatically (#123). + +## Milestone 2: Hybrid Search (In Progress) +- Rename RAG 第二阶段优化规划为 `docs/Milestone-2.md` 并新增子任务列表。 +- AskAI 接口与 CLI 规划使用 LangChainGo 框架以支持多模型与链式调用。 +- Document local and Chutes model configurations for AskAI. diff --git a/go.mod b/go.mod index 6c4d804..694a539 100644 --- a/go.mod +++ b/go.mod @@ -11,6 +11,7 @@ require ( github.com/pgvector/pgvector-go v0.3.0 github.com/redis/go-redis/v9 v9.12.0 github.com/spf13/cobra v1.9.1 + github.com/tmc/langchaingo v0.1.13 github.com/yuin/goldmark v1.7.13 golang.org/x/net v0.39.0 gopkg.in/yaml.v3 v3.0.1 @@ -27,6 +28,7 @@ require ( github.com/cloudflare/circl v1.6.1 // indirect github.com/cyphar/filepath-securejoin v0.4.1 // indirect github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect + github.com/dlclark/regexp2 v1.10.0 // indirect github.com/emirpasic/gods v1.18.1 // indirect github.com/gabriel-vasile/mimetype v1.4.2 // indirect github.com/gin-contrib/sse v0.1.0 // indirect @@ -37,6 +39,7 @@ require ( github.com/go-playground/validator/v10 v10.14.0 // indirect github.com/goccy/go-json v0.10.2 // indirect github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect + github.com/google/uuid v1.6.0 // indirect github.com/inconshreveable/mousetrap v1.1.0 // indirect github.com/jackc/pgpassfile v1.0.0 // indirect github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect @@ -47,11 +50,12 @@ require ( github.com/kevinburke/ssh_config v1.2.0 // indirect github.com/klauspost/cpuid/v2 v2.2.4 // indirect github.com/leodido/go-urn v1.2.4 // indirect - github.com/mattn/go-isatty v0.0.19 // indirect + github.com/mattn/go-isatty v0.0.20 // indirect github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect github.com/modern-go/reflect2 v1.0.2 // indirect - github.com/pelletier/go-toml/v2 v2.0.8 // indirect + github.com/pelletier/go-toml/v2 v2.0.9 // indirect github.com/pjbgf/sha1cd v0.3.2 // indirect + github.com/pkoukk/tiktoken-go v0.1.6 // indirect github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 // indirect github.com/skeema/knownhosts v1.3.1 // indirect github.com/spf13/pflag v1.0.6 // indirect @@ -62,6 +66,6 @@ require ( golang.org/x/crypto v0.37.0 // indirect golang.org/x/sys v0.32.0 // indirect golang.org/x/text v0.24.0 // indirect - google.golang.org/protobuf v1.33.0 // indirect + google.golang.org/protobuf v1.34.1 // indirect gopkg.in/warnings.v0 v0.1.2 // indirect ) diff --git a/go.sum b/go.sum index 93676d8..c7021a1 100644 --- a/go.sum +++ b/go.sum @@ -33,6 +33,8 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78= github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc= +github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0= +github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8= github.com/elazarl/goproxy v1.7.2 h1:Y2o6urb7Eule09PjlhQRGNsqRfPmYI3KKQLFpCAV3+o= github.com/elazarl/goproxy v1.7.2/go.mod h1:82vkLNir0ALaW14Rc399OTTjyNREgmdL2cVoIbS6XaE= github.com/emirpasic/gods v1.18.1 h1:FXtiHYKDGKCW2KzwZKx0iC0PQmdlorYgdFG9jPXJ1Bc= @@ -110,8 +112,8 @@ github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q= github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4= github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw= github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o= -github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA= -github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= +github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY= +github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= @@ -119,14 +121,16 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= github.com/onsi/gomega v1.34.1 h1:EUMJIKUjM8sKjYbtxQI9A4z2o+rruxnzNvpknOXie6k= github.com/onsi/gomega v1.34.1/go.mod h1:kU1QgUvBDLXBJq618Xvm2LUX6rSAfRaFRTcdOeDLwwY= -github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ= -github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4= +github.com/pelletier/go-toml/v2 v2.0.9 h1:uH2qQXheeefCCkuBBSLi7jCiSmj3VRh2+Goq2N7Xxu0= +github.com/pelletier/go-toml/v2 v2.0.9/go.mod h1:tJU2Z3ZkXwnxa4DPO899bsyIoywizdUvyaeZurnPPDc= github.com/pgvector/pgvector-go v0.3.0 h1:Ij+Yt78R//uYqs3Zk35evZFvr+G0blW0OUN+Q2D1RWc= github.com/pgvector/pgvector-go v0.3.0/go.mod h1:duFy+PXWfW7QQd5ibqutBO4GxLsUZ9RVXhFZGIBsWSA= github.com/pjbgf/sha1cd v0.3.2 h1:a9wb0bp1oC2TGwStyn0Umc/IGKQnEgF0vVaZ8QF8eo4= github.com/pjbgf/sha1cd v0.3.2/go.mod h1:zQWigSxVmsHEZow5qaLtPYxpcKMMQpa09ixqBxuCS6A= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pkoukk/tiktoken-go v0.1.6 h1:JF0TlJzhTbrI30wCvFuiw6FzP2+/bR+FIxUdgEAcUsw= +github.com/pkoukk/tiktoken-go v0.1.6/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/redis/go-redis/v9 v9.12.0 h1:XlVPGlflh4nxfhsNXPA8Qp6EmEfTo0rp8oaBzPipXnU= @@ -154,9 +158,11 @@ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/ github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= -github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= +github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA= github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= +github.com/tmc/langchaingo v0.1.13 h1:rcpMWBIi2y3B90XxfE4Ao8dhCQPVDMaNPnN5cGB1CaA= +github.com/tmc/langchaingo v0.1.13/go.mod h1:vpQ5NOIhpzxDfTZK9B6tf2GM/MoaHewPWM5KXXGh7hg= github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo= github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs= github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI= @@ -213,8 +219,8 @@ golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.24.0 h1:dd5Bzh4yt5KYA8f9CJHCP4FB4D51c2c6JvN37xJJkJ0= golang.org/x/text v0.24.0/go.mod h1:L8rBsPeo2pSS+xqN0d5u2ikmjtmoJbDBT1b7nHvFCdU= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= -google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI= -google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos= +google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg= +google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= @@ -222,6 +228,7 @@ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EV gopkg.in/warnings.v0 v0.1.2 h1:wFXVbFY8DY5/xOe1ECiWdKCzZlxgshcYVNkBHstARME= gopkg.in/warnings.v0 v0.1.2/go.mod h1:jksf8JmL6Qr/oQM2OXTHunEvvTAsrWBLb6OOjuVWRNI= gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= @@ -233,3 +240,5 @@ gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8= mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo= mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw= rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4= +sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo= +sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8= diff --git a/server/api/askai.go b/server/api/askai.go index 0e500ce..ffe1cbb 100644 --- a/server/api/askai.go +++ b/server/api/askai.go @@ -1,11 +1,8 @@ package api import ( - "bytes" - "encoding/json" - "errors" + "context" "fmt" - "io" "log/slog" "net/http" "os" @@ -14,6 +11,9 @@ import ( "time" "github.com/gin-gonic/gin" + "github.com/tmc/langchaingo/llms" + "github.com/tmc/langchaingo/llms/ollama" + "github.com/tmc/langchaingo/llms/openai" "gopkg.in/yaml.v3" ) @@ -128,7 +128,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) { endpoint = "http://localhost:11434/v1/chat/completions" } if model == "" { - model = "gpt-oss:20b" + model = "llama2:13b" } return provider, token, model, endpoint, timeout, retries case "chutes": @@ -150,150 +150,50 @@ func loadConfig() (string, string, string, string, time.Duration, int) { } } -// callChutes sends the question to the hosted LLM service and returns the reply. -func callChutes(token, model, url string, timeout time.Duration, retries int, question string) (string, error) { - if token == "" || token == "cpk_xxxxxxx" { - return "", errors.New("chutes token not set") - } - - reqBody := map[string]interface{}{ - "model": model, - "messages": []interface{}{map[string]interface{}{"role": "user", "content": question}}, - "stream": false, - "max_tokens": 1024, - "temperature": 0.7, - } - data, err := json.Marshal(reqBody) - if err != nil { - return "", err - } - - client := &http.Client{Timeout: timeout} - var lastErr error - for i := 0; i <= retries; i++ { - req, err := http.NewRequest("POST", url, bytes.NewBuffer(data)) - if err != nil { - return "", err - } - req.Header.Set("Authorization", "Bearer "+token) - req.Header.Set("Content-Type", "application/json") - - resp, err := client.Do(req) - if err != nil { - lastErr = err - continue - } - - b, err := io.ReadAll(resp.Body) - resp.Body.Close() - if err != nil { - lastErr = err - continue - } - if resp.StatusCode != http.StatusOK { - lastErr = fmt.Errorf("chutes API error: %s", string(b)) - continue - } - - var res struct { - Choices []struct { - Message struct { - Content string `json:"content"` - } `json:"message"` - } `json:"choices"` - } - if err := json.Unmarshal(b, &res); err != nil { - lastErr = err - continue - } - if len(res.Choices) == 0 { - lastErr = errors.New("no choices returned") - continue - } - return res.Choices[0].Message.Content, nil - } - if lastErr == nil { - lastErr = errors.New("request failed") - } - return "", lastErr -} - -// callOllama sends the question to a local Ollama server. -func callOllama(model, url string, timeout time.Duration, retries int, question string) (string, error) { - reqBody := map[string]any{ - "model": model, - "messages": []any{map[string]any{"role": "user", "content": question}}, - "stream": false, - "max_tokens": 1024, - "temperature": 0.7, - } - data, err := json.Marshal(reqBody) - if err != nil { - return "", err - } - client := &http.Client{Timeout: timeout} - var lastErr error - for i := 0; i <= retries; i++ { - req, err := http.NewRequest("POST", url, bytes.NewReader(data)) - if err != nil { - return "", err - } - req.Header.Set("Content-Type", "application/json") - resp, err := client.Do(req) - if err != nil { - lastErr = err - continue - } - b, err := io.ReadAll(resp.Body) - resp.Body.Close() - if err != nil { - lastErr = err - continue - } - if resp.StatusCode != http.StatusOK { - lastErr = fmt.Errorf("ollama API error: %s", string(b)) - continue - } - var res struct { - Choices []struct { - Message struct { - Content string `json:"content"` - } `json:"message"` - } `json:"choices"` - } - if err := json.Unmarshal(b, &res); err != nil { - lastErr = err - continue - } - if len(res.Choices) == 0 { - lastErr = errors.New("no choices returned") - continue - } - return res.Choices[0].Message.Content, nil - } - if lastErr == nil { - lastErr = errors.New("request failed") - } - return "", lastErr -} - -// callLLM dispatches the question to the configured provider. +// callLLM dispatches the question to the configured provider using LangChainGo. func callLLM(question string) (string, error) { provider, token, model, url, timeout, retries := loadConfig() + + httpClient := &http.Client{Timeout: timeout} + var ( - answer string - err error + llm llms.Model + err error ) + switch provider { case "ollama": - answer, err = callOllama(model, url, timeout, retries, question) - case "chutes": - answer, err = callChutes(token, model, url, timeout, retries, question) + llm, err = ollama.New( + ollama.WithModel(model), + ollama.WithServerURL(url), + ollama.WithHTTPClient(httpClient), + ) default: - answer, err = callChutes(token, model, url, timeout, retries, question) + llm, err = openai.New( + openai.WithToken(token), + openai.WithModel(model), + openai.WithBaseURL(url), + openai.WithHTTPClient(httpClient), + ) } if err != nil { - return "", fmt.Errorf("%w (timeout=%s retries=%d)", err, timeout, retries) + return "", fmt.Errorf("init llm: %w", err) } - return answer, nil + + ctx, cancel := context.WithTimeout(context.Background(), timeout) + defer cancel() + + var answer string + var lastErr error + for i := 0; i <= retries; i++ { + answer, lastErr = llms.GenerateFromSinglePrompt(ctx, llm, question) + if lastErr == nil { + return answer, nil + } + } + + if lastErr == nil { + lastErr = fmt.Errorf("request failed") + } + return "", fmt.Errorf("%w (timeout=%s retries=%d)", lastErr, timeout, retries) } diff --git a/server/config/server.yaml b/server/config/server.yaml index 2f49b0b..e5d31c5 100644 --- a/server/config/server.yaml +++ b/server/config/server.yaml @@ -28,7 +28,7 @@ models: generator: provider: "ollama" models: - - 'gpt-oss:20b' + - 'llama2:13b' endpoint: "http://127.0.0.1:11434/v1/chat/completions" token: "" # For PROD @@ -36,14 +36,14 @@ models: # embedder: #provider: "chutes" #models: "bge-m3" - #endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed/v1/embeddings" - #token: "cpk_xxxxxxxxxxxxxxxxxxxx" + #endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed" + #token: "cpk_xxxx" # generator: #provider: "chutes" - #endpoint: "https://llm.chutes.ai/v1/chat/completions" - #token: "cpk_xxxxxxxxxxxxxxxxxxxx" #models: # - 'moonshotai/Kimi-K2-Instruct' + #endpoint: "https://llm.chutes.ai/v1/chat/completions" + #token: "cpk_xxxx" embedding: max_batch: 64