docs: detail Milestone 2 subtasks
This commit is contained in:
parent
66a3057388
commit
ecebcb6d64
48
README.md
48
README.md
@ -22,7 +22,19 @@ All UI components provide both Chinese and English interfaces.
|
||||
| Gateway | OpenResty | 1.27.1.2 |
|
||||
| Database | PostgreSQL + pgvector | N/A |
|
||||
| Cache | Redis | N/A |
|
||||
| Model | Large Language Models via CodePRobot | N/A |
|
||||
| Model (Local) | HuggingFace Hub + Ollama | N/A |
|
||||
| Model (Online) | Chutes AskAI + CodePRobot | N/A |
|
||||
|
||||
## LangChainGo 核心功能一览
|
||||
|
||||
XControl 通过 LangChainGo 统一接入多种大模型,并为 AskAI、CLI 与 Server 提供链式调用能力:
|
||||
|
||||
- **LLM 接口层(Model I/O)**:统一调用 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型接口。
|
||||
- **Chains(链式流程)**:将 prompt、检索结果、工具调用等组合成完整流程,支持 RAG、聊天、代码生成等场景。
|
||||
- **工具与 Agent 体系**:定义 Web 搜索、Scraper、SQL 查询等工具,并集成到 LLM Agent,实现 ReAct 风格的工具调用。
|
||||
- **向量检索与数据接入**:适配 PGVector、Weaviate、Qdrant、MongoDB Atlas Vector Search、Chroma、Pinecone、Redis Vector 等向量存储。
|
||||
- **文档加载与分块**:提供 Document Loaders 与 Text Splitters,用于处理长文本与构建向量检索块。
|
||||
- **Memory 与历史追踪**:支持 Conversation Buffer 等对话记忆机制,增强交互体验。
|
||||
|
||||
## Supported Platforms
|
||||
|
||||
@ -88,6 +100,40 @@ log:
|
||||
|
||||
The flag value takes precedence over the configuration file.
|
||||
|
||||
## Changelog
|
||||
|
||||
See [docs/changelog.md](./docs/changelog.md) for a list of completed changes, including all work from Milestone 1.
|
||||
|
||||
## Roadmap
|
||||
|
||||
The roadmap below is also available in [docs/Roadmap.md](./docs/Roadmap.md).
|
||||
|
||||
### Milestone 1: MVP (Completed)
|
||||
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
|
||||
- Stream RAG sync progress for GitHub repository synchronization (#100).
|
||||
- Add client-side Markdown parsing to the CLI (#104).
|
||||
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
|
||||
- RAG API functional tests and per-file ingestion workflow (#115).
|
||||
- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
|
||||
- Ingest files automatically (#123).
|
||||
|
||||
### Milestone 2: Hybrid Search
|
||||
- CLI and server dynamically support 1024-dimensional embeddings.
|
||||
- Update docs and configs to vector(1024) (#130).
|
||||
- Add embedding configuration fields (#131).
|
||||
- Add RAG API integration tests for vectors (#132).
|
||||
- Add allama support (#136).
|
||||
- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
|
||||
- Deploy XControl panel via GitHub Actions (#20).
|
||||
- Fix yarn lock context concatenation (#21).
|
||||
|
||||
### Milestone 3: Production Monitoring & Optimization
|
||||
- Switch server and CLI to Cobra (#133).
|
||||
- Add repo sync proxy configuration (#135).
|
||||
- Allow custom AskAI timeout (#141).
|
||||
- Add log level support to CLI and server and log AskAI errors (#125, #140).
|
||||
- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the terms of the [MIT License](./LICENSE).
|
||||
|
||||
36
docs/Milestone-2-todo.md
Normal file
36
docs/Milestone-2-todo.md
Normal file
@ -0,0 +1,36 @@
|
||||
# Milestone 2 TODO
|
||||
|
||||
使用 LangChainGo 框架优化 CLI、Server 以及 AskAI 接口的子任务规划:
|
||||
|
||||
1. **LLM 接口层(Model I/O)**
|
||||
- [ ] 构建 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型的 provider registry。
|
||||
- [ ] 在 CLI 与 Server 配置中暴露模型提供商切换能力。
|
||||
- [ ] 编写单元测试验证不同 provider 间的切换。
|
||||
- [ ] 补充配置和环境变量使用文档。
|
||||
2. **Chains(链式流程)**
|
||||
- [ ] 将 prompt、检索结果、工具调用组合成 RAG 与聊天链。
|
||||
- [ ] 为 AskAI 提供可复用的链式定义,支持复杂任务编排。
|
||||
- [ ] 在 CLI 中提供链式调用示例。
|
||||
- [ ] 编写链式流程的集成测试。
|
||||
3. **工具与 Agent 体系**
|
||||
- [ ] 实现 Web 搜索、Scraper、SQL 查询等常用工具。
|
||||
- [ ] 将工具注册到 Agent 框架中,支持动态调用。
|
||||
- [ ] 在 CLI 中演示 ReAct 风格的工具调用。
|
||||
- [ ] 为工具与 Agent 交互添加测试用例。
|
||||
4. **向量检索与数据接入**
|
||||
- [ ] 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。
|
||||
- [ ] 支持自定义向量维度与检索参数。
|
||||
- [ ] 为不同向量存储编写基准测试与比较。
|
||||
- [ ] 提供检索参数调优的文档示例。
|
||||
5. **文档加载与分块**
|
||||
- [ ] 提供 Markdown、代码、HTML 等多格式的 Document Loader。
|
||||
- [ ] 支持按 token 或递归策略的 Text Splitter。
|
||||
- [ ] 统一存储分块结果并支持增量更新 API。
|
||||
- [ ] 为 loader 与 splitter 编写测试。
|
||||
6. **Memory 与历史追踪**
|
||||
- [ ] 为 AskAI 增加 conversation buffer 等对话记忆。
|
||||
- [ ] 在 Server 中持久化会话历史并提供配置项。
|
||||
- [ ] 支持调整记忆长度与清理策略。
|
||||
- [ ] 编写端到端测试验证记忆保留。
|
||||
|
||||
以上任务将逐步落实,以完成混合检索与多模型支持目标。
|
||||
37
docs/Milestone-2.md
Normal file
37
docs/Milestone-2.md
Normal file
@ -0,0 +1,37 @@
|
||||
# Milestone 2: Hybrid Search
|
||||
|
||||
RAG 第二阶段优化规划
|
||||
|
||||
参考 GitHub issue "RAG 第二优化节点阶段",本阶段围绕现有 RAG 系统继续迭代,目标是提升检索效果与服务稳定性,并扩展多模型与多数据源支持。
|
||||
|
||||
## 目标
|
||||
|
||||
- 提升向量检索精准度与性能。
|
||||
- 支持增量同步与多仓库数据接入。
|
||||
- 提供多种嵌入与大模型选择,方便灵活部署。
|
||||
- 加强 API/CLI 的错误处理、监控与自动化测试。
|
||||
|
||||
## 主要任务
|
||||
|
||||
1. **向量检索优化**
|
||||
- 对比评估不同嵌入模型与相似度度量。
|
||||
- 引入向量索引/压缩策略,减少查询延迟。
|
||||
2. **数据同步管道**
|
||||
- 实现增量更新机制,按需重建向量。
|
||||
- 支持同步进度追踪与失败重试。
|
||||
3. **多模型与配置**
|
||||
- 通过 LangChainGo 统一接入本地及云端模型。
|
||||
- 允许针对不同模型自定义参数与超时配置。
|
||||
4. **API 与 CLI 稳定性**
|
||||
- 改进异常处理与日志记录,暴露更多诊断信息。
|
||||
- 完善集成测试,覆盖 RAG upsert 与查询流程。
|
||||
5. **监控与观测**
|
||||
- 接入指标与日志上报,便于性能分析。
|
||||
- 构建健康检查与告警机制。
|
||||
|
||||
## 里程碑
|
||||
|
||||
- **M2.1**:完成增量同步与检索优化的原型验证。
|
||||
- **M2.2**:集成多模型支持并上线监控体系。
|
||||
- **M2.3**:完善自动化测试与文档,准备下一阶段迭代。
|
||||
|
||||
27
docs/Roadmap.md
Normal file
27
docs/Roadmap.md
Normal file
@ -0,0 +1,27 @@
|
||||
# Roadmap
|
||||
|
||||
## Milestone 1: MVP (Completed)
|
||||
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
|
||||
- Stream RAG sync progress for GitHub repository synchronization (#100).
|
||||
- Add client-side Markdown parsing to the CLI (#104).
|
||||
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
|
||||
- RAG API functional tests and per-file ingestion workflow (#115).
|
||||
- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
|
||||
- Ingest files automatically (#123).
|
||||
|
||||
## Milestone 2: Hybrid Search
|
||||
- CLI and server dynamically support 1024-dimensional embeddings.
|
||||
- Update docs and configs to vector(1024) (#130).
|
||||
- Add embedding configuration fields (#131).
|
||||
- Add RAG API integration tests for vectors (#132).
|
||||
- Add allama support (#136).
|
||||
- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
|
||||
- Deploy XControl panel via GitHub Actions (#20).
|
||||
- Fix yarn lock context concatenation (#21).
|
||||
|
||||
## Milestone 3: Production Monitoring & Optimization
|
||||
- Switch server and CLI to Cobra (#133).
|
||||
- Add repo sync proxy configuration (#135).
|
||||
- Allow custom AskAI timeout (#141).
|
||||
- Add log level support to CLI and server and log AskAI errors (#125, #140).
|
||||
- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).
|
||||
@ -74,16 +74,51 @@ Expected response on success: `{"rows":1}`. If the vector database is unavailabl
|
||||
```
|
||||
|
||||
## POST /api/askai
|
||||
- **Description:** Ask the AI service for an answer. Requires a valid Chutes token in the server configuration.
|
||||
- **Description:** Ask the AI service for an answer. The endpoint uses [LangChainGo](https://github.com/tmc/langchaingo) to communicate with the configured model provider (e.g., OpenAI-compatible services or a local Ollama instance). Ensure the server configuration includes the proper token or local server URL.
|
||||
- **Body Parameters (JSON):**
|
||||
- `question` – Question text.
|
||||
- **Configuration:** In `server/config/server.yaml` the `api.askai` section controls request behaviour:
|
||||
```yaml
|
||||
api:
|
||||
askai:
|
||||
timeout: 60 # seconds
|
||||
retries: 3 # retry attempts
|
||||
```
|
||||
**Configuration:** In `server/config/server.yaml` the `models` section selects the LLM and embedding providers.
|
||||
For local debugging with HuggingFace and Ollama:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
embedder:
|
||||
provider: "huggingface_hub"
|
||||
models: "bge-m3"
|
||||
endpoint: "http://127.0.0.1:9000/v1/embeddings"
|
||||
generator:
|
||||
provider: "ollama"
|
||||
models:
|
||||
- 'llama2:13b'
|
||||
endpoint: "http://127.0.0.1:11434/v1/chat/completions"
|
||||
```
|
||||
|
||||
For online services using Chutes:
|
||||
|
||||
```yaml
|
||||
#models:
|
||||
# embedder:
|
||||
# provider: "chutes"
|
||||
# models: "bge-m3"
|
||||
# endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
|
||||
# token: "cpk_xxxx"
|
||||
# generator:
|
||||
# provider: "chutes"
|
||||
# models:
|
||||
# - 'moonshotai/Kimi-K2-Instruct'
|
||||
# endpoint: "https://llm.chutes.ai/v1/chat/completions"
|
||||
# token: "cpk_xxxx"
|
||||
```
|
||||
|
||||
The `api.askai` section controls request behaviour:
|
||||
|
||||
```yaml
|
||||
api:
|
||||
askai:
|
||||
timeout: 60 # seconds
|
||||
retries: 3 # retry attempts
|
||||
```
|
||||
|
||||
- **Test:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/askai \
|
||||
|
||||
17
docs/changelog.md
Normal file
17
docs/changelog.md
Normal file
@ -0,0 +1,17 @@
|
||||
# Changelog
|
||||
|
||||
## Milestone 1: MVP
|
||||
- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
|
||||
- Stream RAG sync progress for GitHub repository synchronization (#100).
|
||||
- Add client-side Markdown parsing to the CLI (#104).
|
||||
- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
|
||||
- Perform RAG API functional tests.
|
||||
- Support per-file ingestion workflow in the CLI (#115).
|
||||
- Allow RAG upsert to migrate embedding dimensions (#119).
|
||||
- Add pgvector database initialization guide (#120).
|
||||
- Ingest files automatically (#123).
|
||||
|
||||
## Milestone 2: Hybrid Search (In Progress)
|
||||
- Rename RAG 第二阶段优化规划为 `docs/Milestone-2.md` 并新增子任务列表。
|
||||
- AskAI 接口与 CLI 规划使用 LangChainGo 框架以支持多模型与链式调用。
|
||||
- Document local and Chutes model configurations for AskAI.
|
||||
10
go.mod
10
go.mod
@ -11,6 +11,7 @@ require (
|
||||
github.com/pgvector/pgvector-go v0.3.0
|
||||
github.com/redis/go-redis/v9 v9.12.0
|
||||
github.com/spf13/cobra v1.9.1
|
||||
github.com/tmc/langchaingo v0.1.13
|
||||
github.com/yuin/goldmark v1.7.13
|
||||
golang.org/x/net v0.39.0
|
||||
gopkg.in/yaml.v3 v3.0.1
|
||||
@ -27,6 +28,7 @@ require (
|
||||
github.com/cloudflare/circl v1.6.1 // indirect
|
||||
github.com/cyphar/filepath-securejoin v0.4.1 // indirect
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
|
||||
github.com/dlclark/regexp2 v1.10.0 // indirect
|
||||
github.com/emirpasic/gods v1.18.1 // indirect
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
|
||||
github.com/gin-contrib/sse v0.1.0 // indirect
|
||||
@ -37,6 +39,7 @@ require (
|
||||
github.com/go-playground/validator/v10 v10.14.0 // indirect
|
||||
github.com/goccy/go-json v0.10.2 // indirect
|
||||
github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
||||
github.com/jackc/pgpassfile v1.0.0 // indirect
|
||||
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
|
||||
@ -47,11 +50,12 @@ require (
|
||||
github.com/kevinburke/ssh_config v1.2.0 // indirect
|
||||
github.com/klauspost/cpuid/v2 v2.2.4 // indirect
|
||||
github.com/leodido/go-urn v1.2.4 // indirect
|
||||
github.com/mattn/go-isatty v0.0.19 // indirect
|
||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 // indirect
|
||||
github.com/pelletier/go-toml/v2 v2.0.9 // indirect
|
||||
github.com/pjbgf/sha1cd v0.3.2 // indirect
|
||||
github.com/pkoukk/tiktoken-go v0.1.6 // indirect
|
||||
github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 // indirect
|
||||
github.com/skeema/knownhosts v1.3.1 // indirect
|
||||
github.com/spf13/pflag v1.0.6 // indirect
|
||||
@ -62,6 +66,6 @@ require (
|
||||
golang.org/x/crypto v0.37.0 // indirect
|
||||
golang.org/x/sys v0.32.0 // indirect
|
||||
golang.org/x/text v0.24.0 // indirect
|
||||
google.golang.org/protobuf v1.33.0 // indirect
|
||||
google.golang.org/protobuf v1.34.1 // indirect
|
||||
gopkg.in/warnings.v0 v0.1.2 // indirect
|
||||
)
|
||||
|
||||
23
go.sum
23
go.sum
@ -33,6 +33,8 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78=
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
|
||||
github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0=
|
||||
github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
|
||||
github.com/elazarl/goproxy v1.7.2 h1:Y2o6urb7Eule09PjlhQRGNsqRfPmYI3KKQLFpCAV3+o=
|
||||
github.com/elazarl/goproxy v1.7.2/go.mod h1:82vkLNir0ALaW14Rc399OTTjyNREgmdL2cVoIbS6XaE=
|
||||
github.com/emirpasic/gods v1.18.1 h1:FXtiHYKDGKCW2KzwZKx0iC0PQmdlorYgdFG9jPXJ1Bc=
|
||||
@ -110,8 +112,8 @@ github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
|
||||
github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
|
||||
github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
|
||||
github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
|
||||
github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
|
||||
github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
|
||||
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
@ -119,14 +121,16 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G
|
||||
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
|
||||
github.com/onsi/gomega v1.34.1 h1:EUMJIKUjM8sKjYbtxQI9A4z2o+rruxnzNvpknOXie6k=
|
||||
github.com/onsi/gomega v1.34.1/go.mod h1:kU1QgUvBDLXBJq618Xvm2LUX6rSAfRaFRTcdOeDLwwY=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
|
||||
github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
|
||||
github.com/pelletier/go-toml/v2 v2.0.9 h1:uH2qQXheeefCCkuBBSLi7jCiSmj3VRh2+Goq2N7Xxu0=
|
||||
github.com/pelletier/go-toml/v2 v2.0.9/go.mod h1:tJU2Z3ZkXwnxa4DPO899bsyIoywizdUvyaeZurnPPDc=
|
||||
github.com/pgvector/pgvector-go v0.3.0 h1:Ij+Yt78R//uYqs3Zk35evZFvr+G0blW0OUN+Q2D1RWc=
|
||||
github.com/pgvector/pgvector-go v0.3.0/go.mod h1:duFy+PXWfW7QQd5ibqutBO4GxLsUZ9RVXhFZGIBsWSA=
|
||||
github.com/pjbgf/sha1cd v0.3.2 h1:a9wb0bp1oC2TGwStyn0Umc/IGKQnEgF0vVaZ8QF8eo4=
|
||||
github.com/pjbgf/sha1cd v0.3.2/go.mod h1:zQWigSxVmsHEZow5qaLtPYxpcKMMQpa09ixqBxuCS6A=
|
||||
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
|
||||
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
||||
github.com/pkoukk/tiktoken-go v0.1.6 h1:JF0TlJzhTbrI30wCvFuiw6FzP2+/bR+FIxUdgEAcUsw=
|
||||
github.com/pkoukk/tiktoken-go v0.1.6/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg=
|
||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
||||
github.com/redis/go-redis/v9 v9.12.0 h1:XlVPGlflh4nxfhsNXPA8Qp6EmEfTo0rp8oaBzPipXnU=
|
||||
@ -154,9 +158,11 @@ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/
|
||||
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
||||
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
||||
github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
||||
github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
||||
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
||||
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
|
||||
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
|
||||
github.com/tmc/langchaingo v0.1.13 h1:rcpMWBIi2y3B90XxfE4Ao8dhCQPVDMaNPnN5cGB1CaA=
|
||||
github.com/tmc/langchaingo v0.1.13/go.mod h1:vpQ5NOIhpzxDfTZK9B6tf2GM/MoaHewPWM5KXXGh7hg=
|
||||
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo=
|
||||
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
||||
@ -213,8 +219,8 @@ golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.24.0 h1:dd5Bzh4yt5KYA8f9CJHCP4FB4D51c2c6JvN37xJJkJ0=
|
||||
golang.org/x/text v0.24.0/go.mod h1:L8rBsPeo2pSS+xqN0d5u2ikmjtmoJbDBT1b7nHvFCdU=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI=
|
||||
google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
|
||||
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
|
||||
google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
|
||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
||||
@ -222,6 +228,7 @@ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EV
|
||||
gopkg.in/warnings.v0 v0.1.2 h1:wFXVbFY8DY5/xOe1ECiWdKCzZlxgshcYVNkBHstARME=
|
||||
gopkg.in/warnings.v0 v0.1.2/go.mod h1:jksf8JmL6Qr/oQM2OXTHunEvvTAsrWBLb6OOjuVWRNI=
|
||||
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
|
||||
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
|
||||
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
|
||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||
@ -233,3 +240,5 @@ gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
|
||||
mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo=
|
||||
mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw=
|
||||
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
||||
sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
|
||||
sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=
|
||||
|
||||
@ -1,11 +1,8 @@
|
||||
package api
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"os"
|
||||
@ -14,6 +11,9 @@ import (
|
||||
"time"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/tmc/langchaingo/llms"
|
||||
"github.com/tmc/langchaingo/llms/ollama"
|
||||
"github.com/tmc/langchaingo/llms/openai"
|
||||
"gopkg.in/yaml.v3"
|
||||
)
|
||||
|
||||
@ -128,7 +128,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
|
||||
endpoint = "http://localhost:11434/v1/chat/completions"
|
||||
}
|
||||
if model == "" {
|
||||
model = "gpt-oss:20b"
|
||||
model = "llama2:13b"
|
||||
}
|
||||
return provider, token, model, endpoint, timeout, retries
|
||||
case "chutes":
|
||||
@ -150,150 +150,50 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
|
||||
}
|
||||
}
|
||||
|
||||
// callChutes sends the question to the hosted LLM service and returns the reply.
|
||||
func callChutes(token, model, url string, timeout time.Duration, retries int, question string) (string, error) {
|
||||
if token == "" || token == "cpk_xxxxxxx" {
|
||||
return "", errors.New("chutes token not set")
|
||||
}
|
||||
|
||||
reqBody := map[string]interface{}{
|
||||
"model": model,
|
||||
"messages": []interface{}{map[string]interface{}{"role": "user", "content": question}},
|
||||
"stream": false,
|
||||
"max_tokens": 1024,
|
||||
"temperature": 0.7,
|
||||
}
|
||||
data, err := json.Marshal(reqBody)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
client := &http.Client{Timeout: timeout}
|
||||
var lastErr error
|
||||
for i := 0; i <= retries; i++ {
|
||||
req, err := http.NewRequest("POST", url, bytes.NewBuffer(data))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
req.Header.Set("Authorization", "Bearer "+token)
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
|
||||
b, err := io.ReadAll(resp.Body)
|
||||
resp.Body.Close()
|
||||
if err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
lastErr = fmt.Errorf("chutes API error: %s", string(b))
|
||||
continue
|
||||
}
|
||||
|
||||
var res struct {
|
||||
Choices []struct {
|
||||
Message struct {
|
||||
Content string `json:"content"`
|
||||
} `json:"message"`
|
||||
} `json:"choices"`
|
||||
}
|
||||
if err := json.Unmarshal(b, &res); err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
if len(res.Choices) == 0 {
|
||||
lastErr = errors.New("no choices returned")
|
||||
continue
|
||||
}
|
||||
return res.Choices[0].Message.Content, nil
|
||||
}
|
||||
if lastErr == nil {
|
||||
lastErr = errors.New("request failed")
|
||||
}
|
||||
return "", lastErr
|
||||
}
|
||||
|
||||
// callOllama sends the question to a local Ollama server.
|
||||
func callOllama(model, url string, timeout time.Duration, retries int, question string) (string, error) {
|
||||
reqBody := map[string]any{
|
||||
"model": model,
|
||||
"messages": []any{map[string]any{"role": "user", "content": question}},
|
||||
"stream": false,
|
||||
"max_tokens": 1024,
|
||||
"temperature": 0.7,
|
||||
}
|
||||
data, err := json.Marshal(reqBody)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
client := &http.Client{Timeout: timeout}
|
||||
var lastErr error
|
||||
for i := 0; i <= retries; i++ {
|
||||
req, err := http.NewRequest("POST", url, bytes.NewReader(data))
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
resp, err := client.Do(req)
|
||||
if err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
b, err := io.ReadAll(resp.Body)
|
||||
resp.Body.Close()
|
||||
if err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
lastErr = fmt.Errorf("ollama API error: %s", string(b))
|
||||
continue
|
||||
}
|
||||
var res struct {
|
||||
Choices []struct {
|
||||
Message struct {
|
||||
Content string `json:"content"`
|
||||
} `json:"message"`
|
||||
} `json:"choices"`
|
||||
}
|
||||
if err := json.Unmarshal(b, &res); err != nil {
|
||||
lastErr = err
|
||||
continue
|
||||
}
|
||||
if len(res.Choices) == 0 {
|
||||
lastErr = errors.New("no choices returned")
|
||||
continue
|
||||
}
|
||||
return res.Choices[0].Message.Content, nil
|
||||
}
|
||||
if lastErr == nil {
|
||||
lastErr = errors.New("request failed")
|
||||
}
|
||||
return "", lastErr
|
||||
}
|
||||
|
||||
// callLLM dispatches the question to the configured provider.
|
||||
// callLLM dispatches the question to the configured provider using LangChainGo.
|
||||
func callLLM(question string) (string, error) {
|
||||
provider, token, model, url, timeout, retries := loadConfig()
|
||||
|
||||
httpClient := &http.Client{Timeout: timeout}
|
||||
|
||||
var (
|
||||
answer string
|
||||
err error
|
||||
llm llms.Model
|
||||
err error
|
||||
)
|
||||
|
||||
switch provider {
|
||||
case "ollama":
|
||||
answer, err = callOllama(model, url, timeout, retries, question)
|
||||
case "chutes":
|
||||
answer, err = callChutes(token, model, url, timeout, retries, question)
|
||||
llm, err = ollama.New(
|
||||
ollama.WithModel(model),
|
||||
ollama.WithServerURL(url),
|
||||
ollama.WithHTTPClient(httpClient),
|
||||
)
|
||||
default:
|
||||
answer, err = callChutes(token, model, url, timeout, retries, question)
|
||||
llm, err = openai.New(
|
||||
openai.WithToken(token),
|
||||
openai.WithModel(model),
|
||||
openai.WithBaseURL(url),
|
||||
openai.WithHTTPClient(httpClient),
|
||||
)
|
||||
}
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("%w (timeout=%s retries=%d)", err, timeout, retries)
|
||||
return "", fmt.Errorf("init llm: %w", err)
|
||||
}
|
||||
return answer, nil
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), timeout)
|
||||
defer cancel()
|
||||
|
||||
var answer string
|
||||
var lastErr error
|
||||
for i := 0; i <= retries; i++ {
|
||||
answer, lastErr = llms.GenerateFromSinglePrompt(ctx, llm, question)
|
||||
if lastErr == nil {
|
||||
return answer, nil
|
||||
}
|
||||
}
|
||||
|
||||
if lastErr == nil {
|
||||
lastErr = fmt.Errorf("request failed")
|
||||
}
|
||||
return "", fmt.Errorf("%w (timeout=%s retries=%d)", lastErr, timeout, retries)
|
||||
}
|
||||
|
||||
@ -28,7 +28,7 @@ models:
|
||||
generator:
|
||||
provider: "ollama"
|
||||
models:
|
||||
- 'gpt-oss:20b'
|
||||
- 'llama2:13b'
|
||||
endpoint: "http://127.0.0.1:11434/v1/chat/completions"
|
||||
token: ""
|
||||
# For PROD
|
||||
@ -36,14 +36,14 @@ models:
|
||||
# embedder:
|
||||
#provider: "chutes"
|
||||
#models: "bge-m3"
|
||||
#endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed/v1/embeddings"
|
||||
#token: "cpk_xxxxxxxxxxxxxxxxxxxx"
|
||||
#endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
|
||||
#token: "cpk_xxxx"
|
||||
# generator:
|
||||
#provider: "chutes"
|
||||
#endpoint: "https://llm.chutes.ai/v1/chat/completions"
|
||||
#token: "cpk_xxxxxxxxxxxxxxxxxxxx"
|
||||
#models:
|
||||
# - 'moonshotai/Kimi-K2-Instruct'
|
||||
#endpoint: "https://llm.chutes.ai/v1/chat/completions"
|
||||
#token: "cpk_xxxx"
|
||||
|
||||
embedding:
|
||||
max_batch: 64
|
||||
|
||||
Loading…
Reference in New Issue
Block a user