docs: detail Milestone 2 subtasks

This commit is contained in:
shenlan 2025-08-12 22:32:45 +08:00
parent 66a3057388
commit ecebcb6d64
10 changed files with 276 additions and 165 deletions

View File

@ -22,7 +22,19 @@ All UI components provide both Chinese and English interfaces.
| Gateway | OpenResty | 1.27.1.2 |
| Database | PostgreSQL + pgvector | N/A |
| Cache | Redis | N/A |
| Model | Large Language Models via CodePRobot | N/A |
| Model (Local) | HuggingFace Hub + Ollama | N/A |
| Model (Online) | Chutes AskAI + CodePRobot | N/A |
## LangChainGo 核心功能一览
XControl 通过 LangChainGo 统一接入多种大模型,并为 AskAI、CLI 与 Server 提供链式调用能力:
- **LLM 接口层Model I/O**:统一调用 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型接口。
- **Chains链式流程**:将 prompt、检索结果、工具调用等组合成完整流程支持 RAG、聊天、代码生成等场景。
- **工具与 Agent 体系**:定义 Web 搜索、Scraper、SQL 查询等工具,并集成到 LLM Agent实现 ReAct 风格的工具调用。
- **向量检索与数据接入**:适配 PGVector、Weaviate、Qdrant、MongoDB Atlas Vector Search、Chroma、Pinecone、Redis Vector 等向量存储。
- **文档加载与分块**:提供 Document Loaders 与 Text Splitters用于处理长文本与构建向量检索块。
- **Memory 与历史追踪**:支持 Conversation Buffer 等对话记忆机制,增强交互体验。
## Supported Platforms
@ -88,6 +100,40 @@ log:
The flag value takes precedence over the configuration file.
## Changelog
See [docs/changelog.md](./docs/changelog.md) for a list of completed changes, including all work from Milestone 1.
## Roadmap
The roadmap below is also available in [docs/Roadmap.md](./docs/Roadmap.md).
### Milestone 1: MVP (Completed)
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
- Stream RAG sync progress for GitHub repository synchronization (#100).
- Add client-side Markdown parsing to the CLI (#104).
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
- RAG API functional tests and per-file ingestion workflow (#115).
- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
- Ingest files automatically (#123).
### Milestone 2: Hybrid Search
- CLI and server dynamically support 1024-dimensional embeddings.
- Update docs and configs to vector(1024) (#130).
- Add embedding configuration fields (#131).
- Add RAG API integration tests for vectors (#132).
- Add allama support (#136).
- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
- Deploy XControl panel via GitHub Actions (#20).
- Fix yarn lock context concatenation (#21).
### Milestone 3: Production Monitoring & Optimization
- Switch server and CLI to Cobra (#133).
- Add repo sync proxy configuration (#135).
- Allow custom AskAI timeout (#141).
- Add log level support to CLI and server and log AskAI errors (#125, #140).
- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).
## License
This project is licensed under the terms of the [MIT License](./LICENSE).

36
docs/Milestone-2-todo.md Normal file
View File

@ -0,0 +1,36 @@
# Milestone 2 TODO
使用 LangChainGo 框架优化 CLI、Server 以及 AskAI 接口的子任务规划:
1. **LLM 接口层Model I/O**
- [ ] 构建 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型的 provider registry。
- [ ] 在 CLI 与 Server 配置中暴露模型提供商切换能力。
- [ ] 编写单元测试验证不同 provider 间的切换。
- [ ] 补充配置和环境变量使用文档。
2. **Chains链式流程**
- [ ] 将 prompt、检索结果、工具调用组合成 RAG 与聊天链。
- [ ] 为 AskAI 提供可复用的链式定义,支持复杂任务编排。
- [ ] 在 CLI 中提供链式调用示例。
- [ ] 编写链式流程的集成测试。
3. **工具与 Agent 体系**
- [ ] 实现 Web 搜索、Scraper、SQL 查询等常用工具。
- [ ] 将工具注册到 Agent 框架中,支持动态调用。
- [ ] 在 CLI 中演示 ReAct 风格的工具调用。
- [ ] 为工具与 Agent 交互添加测试用例。
4. **向量检索与数据接入**
- [ ] 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。
- [ ] 支持自定义向量维度与检索参数。
- [ ] 为不同向量存储编写基准测试与比较。
- [ ] 提供检索参数调优的文档示例。
5. **文档加载与分块**
- [ ] 提供 Markdown、代码、HTML 等多格式的 Document Loader。
- [ ] 支持按 token 或递归策略的 Text Splitter。
- [ ] 统一存储分块结果并支持增量更新 API。
- [ ] 为 loader 与 splitter 编写测试。
6. **Memory 与历史追踪**
- [ ] 为 AskAI 增加 conversation buffer 等对话记忆。
- [ ] 在 Server 中持久化会话历史并提供配置项。
- [ ] 支持调整记忆长度与清理策略。
- [ ] 编写端到端测试验证记忆保留。
以上任务将逐步落实,以完成混合检索与多模型支持目标。

37
docs/Milestone-2.md Normal file
View File

@ -0,0 +1,37 @@
# Milestone 2: Hybrid Search
RAG 第二阶段优化规划
参考 GitHub issue "RAG 第二优化节点阶段",本阶段围绕现有 RAG 系统继续迭代,目标是提升检索效果与服务稳定性,并扩展多模型与多数据源支持。
## 目标
- 提升向量检索精准度与性能。
- 支持增量同步与多仓库数据接入。
- 提供多种嵌入与大模型选择,方便灵活部署。
- 加强 API/CLI 的错误处理、监控与自动化测试。
## 主要任务
1. **向量检索优化**
- 对比评估不同嵌入模型与相似度度量。
- 引入向量索引/压缩策略,减少查询延迟。
2. **数据同步管道**
- 实现增量更新机制,按需重建向量。
- 支持同步进度追踪与失败重试。
3. **多模型与配置**
- 通过 LangChainGo 统一接入本地及云端模型。
- 允许针对不同模型自定义参数与超时配置。
4. **API 与 CLI 稳定性**
- 改进异常处理与日志记录,暴露更多诊断信息。
- 完善集成测试,覆盖 RAG upsert 与查询流程。
5. **监控与观测**
- 接入指标与日志上报,便于性能分析。
- 构建健康检查与告警机制。
## 里程碑
- **M2.1**:完成增量同步与检索优化的原型验证。
- **M2.2**:集成多模型支持并上线监控体系。
- **M2.3**:完善自动化测试与文档,准备下一阶段迭代。

27
docs/Roadmap.md Normal file
View File

@ -0,0 +1,27 @@
# Roadmap
## Milestone 1: MVP (Completed)
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
- Stream RAG sync progress for GitHub repository synchronization (#100).
- Add client-side Markdown parsing to the CLI (#104).
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
- RAG API functional tests and per-file ingestion workflow (#115).
- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
- Ingest files automatically (#123).
## Milestone 2: Hybrid Search
- CLI and server dynamically support 1024-dimensional embeddings.
- Update docs and configs to vector(1024) (#130).
- Add embedding configuration fields (#131).
- Add RAG API integration tests for vectors (#132).
- Add allama support (#136).
- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
- Deploy XControl panel via GitHub Actions (#20).
- Fix yarn lock context concatenation (#21).
## Milestone 3: Production Monitoring & Optimization
- Switch server and CLI to Cobra (#133).
- Add repo sync proxy configuration (#135).
- Allow custom AskAI timeout (#141).
- Add log level support to CLI and server and log AskAI errors (#125, #140).
- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).

View File

@ -74,16 +74,51 @@ Expected response on success: `{"rows":1}`. If the vector database is unavailabl
```
## POST /api/askai
- **Description:** Ask the AI service for an answer. Requires a valid Chutes token in the server configuration.
- **Description:** Ask the AI service for an answer. The endpoint uses [LangChainGo](https://github.com/tmc/langchaingo) to communicate with the configured model provider (e.g., OpenAI-compatible services or a local Ollama instance). Ensure the server configuration includes the proper token or local server URL.
- **Body Parameters (JSON):**
- `question` Question text.
- **Configuration:** In `server/config/server.yaml` the `api.askai` section controls request behaviour:
```yaml
api:
askai:
timeout: 60 # seconds
retries: 3 # retry attempts
```
**Configuration:** In `server/config/server.yaml` the `models` section selects the LLM and embedding providers.
For local debugging with HuggingFace and Ollama:
```yaml
models:
embedder:
provider: "huggingface_hub"
models: "bge-m3"
endpoint: "http://127.0.0.1:9000/v1/embeddings"
generator:
provider: "ollama"
models:
- 'llama2:13b'
endpoint: "http://127.0.0.1:11434/v1/chat/completions"
```
For online services using Chutes:
```yaml
#models:
# embedder:
# provider: "chutes"
# models: "bge-m3"
# endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
# token: "cpk_xxxx"
# generator:
# provider: "chutes"
# models:
# - 'moonshotai/Kimi-K2-Instruct'
# endpoint: "https://llm.chutes.ai/v1/chat/completions"
# token: "cpk_xxxx"
```
The `api.askai` section controls request behaviour:
```yaml
api:
askai:
timeout: 60 # seconds
retries: 3 # retry attempts
```
- **Test:**
```bash
curl -X POST http://localhost:8080/api/askai \

17
docs/changelog.md Normal file
View File

@ -0,0 +1,17 @@
# Changelog
## Milestone 1: MVP
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
- Stream RAG sync progress for GitHub repository synchronization (#100).
- Add client-side Markdown parsing to the CLI (#104).
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
- Perform RAG API functional tests.
- Support per-file ingestion workflow in the CLI (#115).
- Allow RAG upsert to migrate embedding dimensions (#119).
- Add pgvector database initialization guide (#120).
- Ingest files automatically (#123).
## Milestone 2: Hybrid Search (In Progress)
- Rename RAG 第二阶段优化规划为 `docs/Milestone-2.md` 并新增子任务列表。
- AskAI 接口与 CLI 规划使用 LangChainGo 框架以支持多模型与链式调用。
- Document local and Chutes model configurations for AskAI.

10
go.mod
View File

@ -11,6 +11,7 @@ require (
github.com/pgvector/pgvector-go v0.3.0
github.com/redis/go-redis/v9 v9.12.0
github.com/spf13/cobra v1.9.1
github.com/tmc/langchaingo v0.1.13
github.com/yuin/goldmark v1.7.13
golang.org/x/net v0.39.0
gopkg.in/yaml.v3 v3.0.1
@ -27,6 +28,7 @@ require (
github.com/cloudflare/circl v1.6.1 // indirect
github.com/cyphar/filepath-securejoin v0.4.1 // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
github.com/dlclark/regexp2 v1.10.0 // indirect
github.com/emirpasic/gods v1.18.1 // indirect
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
github.com/gin-contrib/sse v0.1.0 // indirect
@ -37,6 +39,7 @@ require (
github.com/go-playground/validator/v10 v10.14.0 // indirect
github.com/goccy/go-json v0.10.2 // indirect
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/jackc/pgpassfile v1.0.0 // indirect
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
@ -47,11 +50,12 @@ require (
github.com/kevinburke/ssh_config v1.2.0 // indirect
github.com/klauspost/cpuid/v2 v2.2.4 // indirect
github.com/leodido/go-urn v1.2.4 // indirect
github.com/mattn/go-isatty v0.0.19 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/pelletier/go-toml/v2 v2.0.8 // indirect
github.com/pelletier/go-toml/v2 v2.0.9 // indirect
github.com/pjbgf/sha1cd v0.3.2 // indirect
github.com/pkoukk/tiktoken-go v0.1.6 // indirect
github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 // indirect
github.com/skeema/knownhosts v1.3.1 // indirect
github.com/spf13/pflag v1.0.6 // indirect
@ -62,6 +66,6 @@ require (
golang.org/x/crypto v0.37.0 // indirect
golang.org/x/sys v0.32.0 // indirect
golang.org/x/text v0.24.0 // indirect
google.golang.org/protobuf v1.33.0 // indirect
google.golang.org/protobuf v1.34.1 // indirect
gopkg.in/warnings.v0 v0.1.2 // indirect
)

23
go.sum
View File

@ -33,6 +33,8 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78=
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0=
github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
github.com/elazarl/goproxy v1.7.2 h1:Y2o6urb7Eule09PjlhQRGNsqRfPmYI3KKQLFpCAV3+o=
github.com/elazarl/goproxy v1.7.2/go.mod h1:82vkLNir0ALaW14Rc399OTTjyNREgmdL2cVoIbS6XaE=
github.com/emirpasic/gods v1.18.1 h1:FXtiHYKDGKCW2KzwZKx0iC0PQmdlorYgdFG9jPXJ1Bc=
@ -110,8 +112,8 @@ github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
@ -119,14 +121,16 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/onsi/gomega v1.34.1 h1:EUMJIKUjM8sKjYbtxQI9A4z2o+rruxnzNvpknOXie6k=
github.com/onsi/gomega v1.34.1/go.mod h1:kU1QgUvBDLXBJq618Xvm2LUX6rSAfRaFRTcdOeDLwwY=
github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
github.com/pelletier/go-toml/v2 v2.0.9 h1:uH2qQXheeefCCkuBBSLi7jCiSmj3VRh2+Goq2N7Xxu0=
github.com/pelletier/go-toml/v2 v2.0.9/go.mod h1:tJU2Z3ZkXwnxa4DPO899bsyIoywizdUvyaeZurnPPDc=
github.com/pgvector/pgvector-go v0.3.0 h1:Ij+Yt78R//uYqs3Zk35evZFvr+G0blW0OUN+Q2D1RWc=
github.com/pgvector/pgvector-go v0.3.0/go.mod h1:duFy+PXWfW7QQd5ibqutBO4GxLsUZ9RVXhFZGIBsWSA=
github.com/pjbgf/sha1cd v0.3.2 h1:a9wb0bp1oC2TGwStyn0Umc/IGKQnEgF0vVaZ8QF8eo4=
github.com/pjbgf/sha1cd v0.3.2/go.mod h1:zQWigSxVmsHEZow5qaLtPYxpcKMMQpa09ixqBxuCS6A=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pkoukk/tiktoken-go v0.1.6 h1:JF0TlJzhTbrI30wCvFuiw6FzP2+/bR+FIxUdgEAcUsw=
github.com/pkoukk/tiktoken-go v0.1.6/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/redis/go-redis/v9 v9.12.0 h1:XlVPGlflh4nxfhsNXPA8Qp6EmEfTo0rp8oaBzPipXnU=
@ -154,9 +158,11 @@ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
github.com/tmc/langchaingo v0.1.13 h1:rcpMWBIi2y3B90XxfE4Ao8dhCQPVDMaNPnN5cGB1CaA=
github.com/tmc/langchaingo v0.1.13/go.mod h1:vpQ5NOIhpzxDfTZK9B6tf2GM/MoaHewPWM5KXXGh7hg=
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo=
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs=
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
@ -213,8 +219,8 @@ golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.24.0 h1:dd5Bzh4yt5KYA8f9CJHCP4FB4D51c2c6JvN37xJJkJ0=
golang.org/x/text v0.24.0/go.mod h1:L8rBsPeo2pSS+xqN0d5u2ikmjtmoJbDBT1b7nHvFCdU=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI=
google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
@ -222,6 +228,7 @@ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EV
gopkg.in/warnings.v0 v0.1.2 h1:wFXVbFY8DY5/xOe1ECiWdKCzZlxgshcYVNkBHstARME=
gopkg.in/warnings.v0 v0.1.2/go.mod h1:jksf8JmL6Qr/oQM2OXTHunEvvTAsrWBLb6OOjuVWRNI=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
@ -233,3 +240,5 @@ gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo=
mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw=
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=

View File

@ -1,11 +1,8 @@
package api
import (
"bytes"
"encoding/json"
"errors"
"context"
"fmt"
"io"
"log/slog"
"net/http"
"os"
@ -14,6 +11,9 @@ import (
"time"
"github.com/gin-gonic/gin"
"github.com/tmc/langchaingo/llms"
"github.com/tmc/langchaingo/llms/ollama"
"github.com/tmc/langchaingo/llms/openai"
"gopkg.in/yaml.v3"
)
@ -128,7 +128,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
endpoint = "http://localhost:11434/v1/chat/completions"
}
if model == "" {
model = "gpt-oss:20b"
model = "llama2:13b"
}
return provider, token, model, endpoint, timeout, retries
case "chutes":
@ -150,150 +150,50 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
}
}
// callChutes sends the question to the hosted LLM service and returns the reply.
func callChutes(token, model, url string, timeout time.Duration, retries int, question string) (string, error) {
if token == "" || token == "cpk_xxxxxxx" {
return "", errors.New("chutes token not set")
}
reqBody := map[string]interface{}{
"model": model,
"messages": []interface{}{map[string]interface{}{"role": "user", "content": question}},
"stream": false,
"max_tokens": 1024,
"temperature": 0.7,
}
data, err := json.Marshal(reqBody)
if err != nil {
return "", err
}
client := &http.Client{Timeout: timeout}
var lastErr error
for i := 0; i <= retries; i++ {
req, err := http.NewRequest("POST", url, bytes.NewBuffer(data))
if err != nil {
return "", err
}
req.Header.Set("Authorization", "Bearer "+token)
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
lastErr = err
continue
}
b, err := io.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
lastErr = err
continue
}
if resp.StatusCode != http.StatusOK {
lastErr = fmt.Errorf("chutes API error: %s", string(b))
continue
}
var res struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
if err := json.Unmarshal(b, &res); err != nil {
lastErr = err
continue
}
if len(res.Choices) == 0 {
lastErr = errors.New("no choices returned")
continue
}
return res.Choices[0].Message.Content, nil
}
if lastErr == nil {
lastErr = errors.New("request failed")
}
return "", lastErr
}
// callOllama sends the question to a local Ollama server.
func callOllama(model, url string, timeout time.Duration, retries int, question string) (string, error) {
reqBody := map[string]any{
"model": model,
"messages": []any{map[string]any{"role": "user", "content": question}},
"stream": false,
"max_tokens": 1024,
"temperature": 0.7,
}
data, err := json.Marshal(reqBody)
if err != nil {
return "", err
}
client := &http.Client{Timeout: timeout}
var lastErr error
for i := 0; i <= retries; i++ {
req, err := http.NewRequest("POST", url, bytes.NewReader(data))
if err != nil {
return "", err
}
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
lastErr = err
continue
}
b, err := io.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
lastErr = err
continue
}
if resp.StatusCode != http.StatusOK {
lastErr = fmt.Errorf("ollama API error: %s", string(b))
continue
}
var res struct {
Choices []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
} `json:"choices"`
}
if err := json.Unmarshal(b, &res); err != nil {
lastErr = err
continue
}
if len(res.Choices) == 0 {
lastErr = errors.New("no choices returned")
continue
}
return res.Choices[0].Message.Content, nil
}
if lastErr == nil {
lastErr = errors.New("request failed")
}
return "", lastErr
}
// callLLM dispatches the question to the configured provider.
// callLLM dispatches the question to the configured provider using LangChainGo.
func callLLM(question string) (string, error) {
provider, token, model, url, timeout, retries := loadConfig()
httpClient := &http.Client{Timeout: timeout}
var (
answer string
err error
llm llms.Model
err error
)
switch provider {
case "ollama":
answer, err = callOllama(model, url, timeout, retries, question)
case "chutes":
answer, err = callChutes(token, model, url, timeout, retries, question)
llm, err = ollama.New(
ollama.WithModel(model),
ollama.WithServerURL(url),
ollama.WithHTTPClient(httpClient),
)
default:
answer, err = callChutes(token, model, url, timeout, retries, question)
llm, err = openai.New(
openai.WithToken(token),
openai.WithModel(model),
openai.WithBaseURL(url),
openai.WithHTTPClient(httpClient),
)
}
if err != nil {
return "", fmt.Errorf("%w (timeout=%s retries=%d)", err, timeout, retries)
return "", fmt.Errorf("init llm: %w", err)
}
return answer, nil
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
var answer string
var lastErr error
for i := 0; i <= retries; i++ {
answer, lastErr = llms.GenerateFromSinglePrompt(ctx, llm, question)
if lastErr == nil {
return answer, nil
}
}
if lastErr == nil {
lastErr = fmt.Errorf("request failed")
}
return "", fmt.Errorf("%w (timeout=%s retries=%d)", lastErr, timeout, retries)
}

View File

@ -28,7 +28,7 @@ models:
generator:
provider: "ollama"
models:
- 'gpt-oss:20b'
- 'llama2:13b'
endpoint: "http://127.0.0.1:11434/v1/chat/completions"
token: ""
# For PROD
@ -36,14 +36,14 @@ models:
# embedder:
#provider: "chutes"
#models: "bge-m3"
#endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed/v1/embeddings"
#token: "cpk_xxxxxxxxxxxxxxxxxxxx"
#endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
#token: "cpk_xxxx"
# generator:
#provider: "chutes"
#endpoint: "https://llm.chutes.ai/v1/chat/completions"
#token: "cpk_xxxxxxxxxxxxxxxxxxxx"
#models:
# - 'moonshotai/Kimi-K2-Instruct'
#endpoint: "https://llm.chutes.ai/v1/chat/completions"
#token: "cpk_xxxx"
embedding:
max_batch: 64