docs: detail Milestone 2 subtasks

2025-08-12 22:32:45 +08:00 · 2025-08-12 22:32:45 +08:00 · ecebcb6d64
commit ecebcb6d64
parent 66a3057388
10 changed files with 276 additions and 165 deletions
--- a/README.md
+++ b/README.md
@ -22,7 +22,19 @@ All UI components provide both Chinese and English interfaces.
 | Gateway   | OpenResty  | 1.27.1.2 |
 | Database  | PostgreSQL + pgvector | N/A |
 | Cache     | Redis      | N/A     |
-| Model     | Large Language Models via CodePRobot | N/A |
+| Model (Local)  | HuggingFace Hub + Ollama | N/A |
+| Model (Online) | Chutes AskAI + CodePRobot | N/A |
+
+## LangChainGo 核心功能一览
+
+XControl 通过 LangChainGo 统一接入多种大模型，并为 AskAI、CLI 与 Server 提供链式调用能力：
+
+- **LLM 接口层（Model I/O）**：统一调用 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型接口。
+- **Chains（链式流程）**：将 prompt、检索结果、工具调用等组合成完整流程，支持 RAG、聊天、代码生成等场景。
+- **工具与 Agent 体系**：定义 Web 搜索、Scraper、SQL 查询等工具，并集成到 LLM Agent，实现 ReAct 风格的工具调用。
+- **向量检索与数据接入**：适配 PGVector、Weaviate、Qdrant、MongoDB Atlas Vector Search、Chroma、Pinecone、Redis Vector 等向量存储。
+- **文档加载与分块**：提供 Document Loaders 与 Text Splitters，用于处理长文本与构建向量检索块。
+- **Memory 与历史追踪**：支持 Conversation Buffer 等对话记忆机制，增强交互体验。

 ## Supported Platforms

@ -88,6 +100,40 @@ log:

 The flag value takes precedence over the configuration file.

+## Changelog
+
+See [docs/changelog.md](./docs/changelog.md) for a list of completed changes, including all work from Milestone&nbsp;1.
+
+## Roadmap
+
+The roadmap below is also available in [docs/Roadmap.md](./docs/Roadmap.md).
+
+### Milestone 1: MVP (Completed)
+- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
+- Stream RAG sync progress for GitHub repository synchronization (#100).
+- Add client-side Markdown parsing to the CLI (#104).
+- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
+- RAG API functional tests and per-file ingestion workflow (#115).
+- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
+- Ingest files automatically (#123).
+
+### Milestone 2: Hybrid Search
+- CLI and server dynamically support 1024-dimensional embeddings.
+- Update docs and configs to vector(1024) (#130).
+- Add embedding configuration fields (#131).
+- Add RAG API integration tests for vectors (#132).
+- Add allama support (#136).
+- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
+- Deploy XControl panel via GitHub Actions (#20).
+- Fix yarn lock context concatenation (#21).
+
+### Milestone 3: Production Monitoring & Optimization
+- Switch server and CLI to Cobra (#133).
+- Add repo sync proxy configuration (#135).
+- Allow custom AskAI timeout (#141).
+- Add log level support to CLI and server and log AskAI errors (#125, #140).
+- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).
+
 ## License

 This project is licensed under the terms of the [MIT License](./LICENSE).
--- a/docs/Milestone-2-todo.md
+++ b/docs/Milestone-2-todo.md
@ -0,0 +1,36 @@
+# Milestone 2 TODO
+
+使用 LangChainGo 框架优化 CLI、Server 以及 AskAI 接口的子任务规划：
+
+1. **LLM 接口层（Model I/O）**
+   - [ ] 构建 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型的 provider registry。
+   - [ ] 在 CLI 与 Server 配置中暴露模型提供商切换能力。
+   - [ ] 编写单元测试验证不同 provider 间的切换。
+   - [ ] 补充配置和环境变量使用文档。
+2. **Chains（链式流程）**
+   - [ ] 将 prompt、检索结果、工具调用组合成 RAG 与聊天链。
+   - [ ] 为 AskAI 提供可复用的链式定义，支持复杂任务编排。
+   - [ ] 在 CLI 中提供链式调用示例。
+   - [ ] 编写链式流程的集成测试。
+3. **工具与 Agent 体系**
+   - [ ] 实现 Web 搜索、Scraper、SQL 查询等常用工具。
+   - [ ] 将工具注册到 Agent 框架中，支持动态调用。
+   - [ ] 在 CLI 中演示 ReAct 风格的工具调用。
+   - [ ] 为工具与 Agent 交互添加测试用例。
+4. **向量检索与数据接入**
+   - [ ] 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。
+   - [ ] 支持自定义向量维度与检索参数。
+   - [ ] 为不同向量存储编写基准测试与比较。
+   - [ ] 提供检索参数调优的文档示例。
+5. **文档加载与分块**
+   - [ ] 提供 Markdown、代码、HTML 等多格式的 Document Loader。
+   - [ ] 支持按 token 或递归策略的 Text Splitter。
+   - [ ] 统一存储分块结果并支持增量更新 API。
+   - [ ] 为 loader 与 splitter 编写测试。
+6. **Memory 与历史追踪**
+   - [ ] 为 AskAI 增加 conversation buffer 等对话记忆。
+   - [ ] 在 Server 中持久化会话历史并提供配置项。
+   - [ ] 支持调整记忆长度与清理策略。
+   - [ ] 编写端到端测试验证记忆保留。
+
+以上任务将逐步落实，以完成混合检索与多模型支持目标。
--- a/docs/Milestone-2.md
+++ b/docs/Milestone-2.md
@ -0,0 +1,37 @@
+# Milestone 2: Hybrid Search
+
+RAG 第二阶段优化规划
+
+参考 GitHub issue "RAG 第二优化节点阶段"，本阶段围绕现有 RAG 系统继续迭代，目标是提升检索效果与服务稳定性，并扩展多模型与多数据源支持。
+
+## 目标
+
+- 提升向量检索精准度与性能。
+- 支持增量同步与多仓库数据接入。
+- 提供多种嵌入与大模型选择，方便灵活部署。
+- 加强 API/CLI 的错误处理、监控与自动化测试。
+
+## 主要任务
+
+1. **向量检索优化**
+   - 对比评估不同嵌入模型与相似度度量。
+   - 引入向量索引/压缩策略，减少查询延迟。
+2. **数据同步管道**
+   - 实现增量更新机制，按需重建向量。
+   - 支持同步进度追踪与失败重试。
+3. **多模型与配置**
+   - 通过 LangChainGo 统一接入本地及云端模型。
+   - 允许针对不同模型自定义参数与超时配置。
+4. **API 与 CLI 稳定性**
+   - 改进异常处理与日志记录，暴露更多诊断信息。
+   - 完善集成测试，覆盖 RAG upsert 与查询流程。
+5. **监控与观测**
+   - 接入指标与日志上报，便于性能分析。
+   - 构建健康检查与告警机制。
+
+## 里程碑
+
+- **M2.1**：完成增量同步与检索优化的原型验证。
+- **M2.2**：集成多模型支持并上线监控体系。
+- **M2.3**：完善自动化测试与文档，准备下一阶段迭代。
+
--- a/docs/Roadmap.md
+++ b/docs/Roadmap.md
@ -0,0 +1,27 @@
+# Roadmap
+
+## Milestone 1: MVP (Completed)
+- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
+- Stream RAG sync progress for GitHub repository synchronization (#100).
+- Add client-side Markdown parsing to the CLI (#104).
+- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
+- RAG API functional tests and per-file ingestion workflow (#115).
+- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector initialization (#120).
+- Ingest files automatically (#123).
+
+## Milestone 2: Hybrid Search
+- CLI and server dynamically support 1024-dimensional embeddings.
+- Update docs and configs to vector(1024) (#130).
+- Add embedding configuration fields (#131).
+- Add RAG API integration tests for vectors (#132).
+- Add allama support (#136).
+- Deploy homepage via rsync from CI and fix SSH directory creation (#18, #19).
+- Deploy XControl panel via GitHub Actions (#20).
+- Fix yarn lock context concatenation (#21).
+
+## Milestone 3: Production Monitoring & Optimization
+- Switch server and CLI to Cobra (#133).
+- Add repo sync proxy configuration (#135).
+- Allow custom AskAI timeout (#141).
+- Add log level support to CLI and server and log AskAI errors (#125, #140).
+- Continue performance optimization, error handling, multi-model support, permission control, hot reload, and improve RAG upsert docs (#129).
--- a/docs/api-endpoints.md
+++ b/docs/api-endpoints.md
@ -74,16 +74,51 @@ Expected response on success: `{"rows":1}`. If the vector database is unavailabl
  ```

 ## POST /api/askai
- **Description:** Ask the AI service for an answer. Requires a valid Chutes token in the server configuration.
+- **Description:** Ask the AI service for an answer. The endpoint uses [LangChainGo](https://github.com/tmc/langchaingo) to communicate with the configured model provider (e.g., OpenAI-compatible services or a local Ollama instance). Ensure the server configuration includes the proper token or local server URL.
 - **Body Parameters (JSON):**
  - `question` – Question text.
- **Configuration:** In `server/config/server.yaml` the `api.askai` section controls request behaviour:
-  ```yaml
-  api:
-    askai:
-      timeout: 60   # seconds
-      retries: 3    # retry attempts
-  ```
+**Configuration:** In `server/config/server.yaml` the `models` section selects the LLM and embedding providers.
+For local debugging with HuggingFace and Ollama:
+
+```yaml
+models:
+  embedder:
+    provider: "huggingface_hub"
+    models: "bge-m3"
+    endpoint: "http://127.0.0.1:9000/v1/embeddings"
+  generator:
+    provider: "ollama"
+    models:
+      - 'llama2:13b'
+    endpoint: "http://127.0.0.1:11434/v1/chat/completions"
+```
+
+For online services using Chutes:
+
+```yaml
+#models:
+#  embedder:
+#    provider: "chutes"
+#    models: "bge-m3"
+#    endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
+#    token: "cpk_xxxx"
+#  generator:
+#    provider: "chutes"
+#    models:
+#      - 'moonshotai/Kimi-K2-Instruct'
+#    endpoint: "https://llm.chutes.ai/v1/chat/completions"
+#    token: "cpk_xxxx"
+```
+
+The `api.askai` section controls request behaviour:
+
+```yaml
+api:
+  askai:
+    timeout: 60   # seconds
+    retries: 3    # retry attempts
+```
+
 - **Test:**
  ```bash
  curl -X POST http://localhost:8080/api/askai \
--- a/docs/changelog.md
+++ b/docs/changelog.md
@ -0,0 +1,17 @@
+# Changelog
+
+## Milestone 1: MVP
+- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
+- Stream RAG sync progress for GitHub repository synchronization (#100).
+- Add client-side Markdown parsing to the CLI (#104).
+- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
+- Perform RAG API functional tests.
+- Support per-file ingestion workflow in the CLI (#115).
+- Allow RAG upsert to migrate embedding dimensions (#119).
+- Add pgvector database initialization guide (#120).
+- Ingest files automatically (#123).
+
+## Milestone 2: Hybrid Search (In Progress)
+- Rename RAG 第二阶段优化规划为 `docs/Milestone-2.md` 并新增子任务列表。
+- AskAI 接口与 CLI 规划使用 LangChainGo 框架以支持多模型与链式调用。
+- Document local and Chutes model configurations for AskAI.
--- a/go.mod
+++ b/go.mod
@ -11,6 +11,7 @@ require (
 	github.com/pgvector/pgvector-go v0.3.0
 	github.com/redis/go-redis/v9 v9.12.0
 	github.com/spf13/cobra v1.9.1
+	github.com/tmc/langchaingo v0.1.13
 	github.com/yuin/goldmark v1.7.13
 	golang.org/x/net v0.39.0
 	gopkg.in/yaml.v3 v3.0.1
@ -27,6 +28,7 @@ require (
 	github.com/cloudflare/circl v1.6.1 // indirect
 	github.com/cyphar/filepath-securejoin v0.4.1 // indirect
 	github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
+	github.com/dlclark/regexp2 v1.10.0 // indirect
 	github.com/emirpasic/gods v1.18.1 // indirect
 	github.com/gabriel-vasile/mimetype v1.4.2 // indirect
 	github.com/gin-contrib/sse v0.1.0 // indirect
@ -37,6 +39,7 @@ require (
 	github.com/go-playground/validator/v10 v10.14.0 // indirect
 	github.com/goccy/go-json v0.10.2 // indirect
 	github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect
+	github.com/google/uuid v1.6.0 // indirect
 	github.com/inconshreveable/mousetrap v1.1.0 // indirect
 	github.com/jackc/pgpassfile v1.0.0 // indirect
 	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
@ -47,11 +50,12 @@ require (
 	github.com/kevinburke/ssh_config v1.2.0 // indirect
 	github.com/klauspost/cpuid/v2 v2.2.4 // indirect
 	github.com/leodido/go-urn v1.2.4 // indirect
-	github.com/mattn/go-isatty v0.0.19 // indirect
+	github.com/mattn/go-isatty v0.0.20 // indirect
 	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
 	github.com/modern-go/reflect2 v1.0.2 // indirect
-	github.com/pelletier/go-toml/v2 v2.0.8 // indirect
+	github.com/pelletier/go-toml/v2 v2.0.9 // indirect
 	github.com/pjbgf/sha1cd v0.3.2 // indirect
+	github.com/pkoukk/tiktoken-go v0.1.6 // indirect
 	github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 // indirect
 	github.com/skeema/knownhosts v1.3.1 // indirect
 	github.com/spf13/pflag v1.0.6 // indirect
@ -62,6 +66,6 @@ require (
 	golang.org/x/crypto v0.37.0 // indirect
 	golang.org/x/sys v0.32.0 // indirect
 	golang.org/x/text v0.24.0 // indirect
-	google.golang.org/protobuf v1.33.0 // indirect
+	google.golang.org/protobuf v1.34.1 // indirect
 	gopkg.in/warnings.v0 v0.1.2 // indirect
 )
--- a/go.sum
+++ b/go.sum
@ -33,6 +33,8 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78=
 github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
+github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0=
+github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
 github.com/elazarl/goproxy v1.7.2 h1:Y2o6urb7Eule09PjlhQRGNsqRfPmYI3KKQLFpCAV3+o=
 github.com/elazarl/goproxy v1.7.2/go.mod h1:82vkLNir0ALaW14Rc399OTTjyNREgmdL2cVoIbS6XaE=
 github.com/emirpasic/gods v1.18.1 h1:FXtiHYKDGKCW2KzwZKx0iC0PQmdlorYgdFG9jPXJ1Bc=
@ -110,8 +112,8 @@ github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
 github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
 github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
 github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
-github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
-github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
+github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
+github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
 github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
@ -119,14 +121,16 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G
 github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
 github.com/onsi/gomega v1.34.1 h1:EUMJIKUjM8sKjYbtxQI9A4z2o+rruxnzNvpknOXie6k=
 github.com/onsi/gomega v1.34.1/go.mod h1:kU1QgUvBDLXBJq618Xvm2LUX6rSAfRaFRTcdOeDLwwY=
-github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
-github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
+github.com/pelletier/go-toml/v2 v2.0.9 h1:uH2qQXheeefCCkuBBSLi7jCiSmj3VRh2+Goq2N7Xxu0=
+github.com/pelletier/go-toml/v2 v2.0.9/go.mod h1:tJU2Z3ZkXwnxa4DPO899bsyIoywizdUvyaeZurnPPDc=
 github.com/pgvector/pgvector-go v0.3.0 h1:Ij+Yt78R//uYqs3Zk35evZFvr+G0blW0OUN+Q2D1RWc=
 github.com/pgvector/pgvector-go v0.3.0/go.mod h1:duFy+PXWfW7QQd5ibqutBO4GxLsUZ9RVXhFZGIBsWSA=
 github.com/pjbgf/sha1cd v0.3.2 h1:a9wb0bp1oC2TGwStyn0Umc/IGKQnEgF0vVaZ8QF8eo4=
 github.com/pjbgf/sha1cd v0.3.2/go.mod h1:zQWigSxVmsHEZow5qaLtPYxpcKMMQpa09ixqBxuCS6A=
 github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
 github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
+github.com/pkoukk/tiktoken-go v0.1.6 h1:JF0TlJzhTbrI30wCvFuiw6FzP2+/bR+FIxUdgEAcUsw=
+github.com/pkoukk/tiktoken-go v0.1.6/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/redis/go-redis/v9 v9.12.0 h1:XlVPGlflh4nxfhsNXPA8Qp6EmEfTo0rp8oaBzPipXnU=
@ -154,9 +158,11 @@ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/
 github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
 github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
 github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
-github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
+github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
 github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
 github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
+github.com/tmc/langchaingo v0.1.13 h1:rcpMWBIi2y3B90XxfE4Ao8dhCQPVDMaNPnN5cGB1CaA=
+github.com/tmc/langchaingo v0.1.13/go.mod h1:vpQ5NOIhpzxDfTZK9B6tf2GM/MoaHewPWM5KXXGh7hg=
 github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo=
 github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs=
 github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
@ -213,8 +219,8 @@ golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
 golang.org/x/text v0.24.0 h1:dd5Bzh4yt5KYA8f9CJHCP4FB4D51c2c6JvN37xJJkJ0=
 golang.org/x/text v0.24.0/go.mod h1:L8rBsPeo2pSS+xqN0d5u2ikmjtmoJbDBT1b7nHvFCdU=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
-google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI=
-google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
+google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
+google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
@ -222,6 +228,7 @@ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EV
 gopkg.in/warnings.v0 v0.1.2 h1:wFXVbFY8DY5/xOe1ECiWdKCzZlxgshcYVNkBHstARME=
 gopkg.in/warnings.v0 v0.1.2/go.mod h1:jksf8JmL6Qr/oQM2OXTHunEvvTAsrWBLb6OOjuVWRNI=
 gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
+gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
 gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
 gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
@ -233,3 +240,5 @@ gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
 mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo=
 mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw=
 rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
+sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
+sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=
--- a/server/api/askai.go
+++ b/server/api/askai.go
@ -1,11 +1,8 @@
 package api

 import (
-	"bytes"
-	"encoding/json"
-	"errors"
+	"context"
 	"fmt"
-	"io"
 	"log/slog"
 	"net/http"
 	"os"
@ -14,6 +11,9 @@ import (
 	"time"

 	"github.com/gin-gonic/gin"
+	"github.com/tmc/langchaingo/llms"
+	"github.com/tmc/langchaingo/llms/ollama"
+	"github.com/tmc/langchaingo/llms/openai"
 	"gopkg.in/yaml.v3"
 )

@ -128,7 +128,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
 			endpoint = "http://localhost:11434/v1/chat/completions"
 		}
 		if model == "" {
-			model = "gpt-oss:20b"
+			model = "llama2:13b"
 		}
 		return provider, token, model, endpoint, timeout, retries
 	case "chutes":
@ -150,150 +150,50 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
 	}
 }

-// callChutes sends the question to the hosted LLM service and returns the reply.
-func callChutes(token, model, url string, timeout time.Duration, retries int, question string) (string, error) {
-	if token == "" || token == "cpk_xxxxxxx" {
-		return "", errors.New("chutes token not set")
-	}
-
-	reqBody := map[string]interface{}{
-		"model":       model,
-		"messages":    []interface{}{map[string]interface{}{"role": "user", "content": question}},
-		"stream":      false,
-		"max_tokens":  1024,
-		"temperature": 0.7,
-	}
-	data, err := json.Marshal(reqBody)
-	if err != nil {
-		return "", err
-	}
-
-	client := &http.Client{Timeout: timeout}
-	var lastErr error
-	for i := 0; i <= retries; i++ {
-		req, err := http.NewRequest("POST", url, bytes.NewBuffer(data))
-		if err != nil {
-			return "", err
-		}
-		req.Header.Set("Authorization", "Bearer "+token)
-		req.Header.Set("Content-Type", "application/json")
-
-		resp, err := client.Do(req)
-		if err != nil {
-			lastErr = err
-			continue
-		}
-
-		b, err := io.ReadAll(resp.Body)
-		resp.Body.Close()
-		if err != nil {
-			lastErr = err
-			continue
-		}
-		if resp.StatusCode != http.StatusOK {
-			lastErr = fmt.Errorf("chutes API error: %s", string(b))
-			continue
-		}
-
-		var res struct {
-			Choices []struct {
-				Message struct {
-					Content string `json:"content"`
-				} `json:"message"`
-			} `json:"choices"`
-		}
-		if err := json.Unmarshal(b, &res); err != nil {
-			lastErr = err
-			continue
-		}
-		if len(res.Choices) == 0 {
-			lastErr = errors.New("no choices returned")
-			continue
-		}
-		return res.Choices[0].Message.Content, nil
-	}
-	if lastErr == nil {
-		lastErr = errors.New("request failed")
-	}
-	return "", lastErr
-}
-
-// callOllama sends the question to a local Ollama server.
-func callOllama(model, url string, timeout time.Duration, retries int, question string) (string, error) {
-	reqBody := map[string]any{
-		"model":       model,
-		"messages":    []any{map[string]any{"role": "user", "content": question}},
-		"stream":      false,
-		"max_tokens":  1024,
-		"temperature": 0.7,
-	}
-	data, err := json.Marshal(reqBody)
-	if err != nil {
-		return "", err
-	}
-	client := &http.Client{Timeout: timeout}
-	var lastErr error
-	for i := 0; i <= retries; i++ {
-		req, err := http.NewRequest("POST", url, bytes.NewReader(data))
-		if err != nil {
-			return "", err
-		}
-		req.Header.Set("Content-Type", "application/json")
-		resp, err := client.Do(req)
-		if err != nil {
-			lastErr = err
-			continue
-		}
-		b, err := io.ReadAll(resp.Body)
-		resp.Body.Close()
-		if err != nil {
-			lastErr = err
-			continue
-		}
-		if resp.StatusCode != http.StatusOK {
-			lastErr = fmt.Errorf("ollama API error: %s", string(b))
-			continue
-		}
-		var res struct {
-			Choices []struct {
-				Message struct {
-					Content string `json:"content"`
-				} `json:"message"`
-			} `json:"choices"`
-		}
-		if err := json.Unmarshal(b, &res); err != nil {
-			lastErr = err
-			continue
-		}
-		if len(res.Choices) == 0 {
-			lastErr = errors.New("no choices returned")
-			continue
-		}
-		return res.Choices[0].Message.Content, nil
-	}
-	if lastErr == nil {
-		lastErr = errors.New("request failed")
-	}
-	return "", lastErr
-}
-
-// callLLM dispatches the question to the configured provider.
+// callLLM dispatches the question to the configured provider using LangChainGo.
 func callLLM(question string) (string, error) {
 	provider, token, model, url, timeout, retries := loadConfig()
+
+	httpClient := &http.Client{Timeout: timeout}
+
 	var (
-		answer string
-		err    error
+		llm llms.Model
+		err error
 	)
+
 	switch provider {
 	case "ollama":
-		answer, err = callOllama(model, url, timeout, retries, question)
-	case "chutes":
-		answer, err = callChutes(token, model, url, timeout, retries, question)
+		llm, err = ollama.New(
+			ollama.WithModel(model),
+			ollama.WithServerURL(url),
+			ollama.WithHTTPClient(httpClient),
+		)
 	default:
-		answer, err = callChutes(token, model, url, timeout, retries, question)
+		llm, err = openai.New(
+			openai.WithToken(token),
+			openai.WithModel(model),
+			openai.WithBaseURL(url),
+			openai.WithHTTPClient(httpClient),
+		)
 	}
 	if err != nil {
-		return "", fmt.Errorf("%w (timeout=%s retries=%d)", err, timeout, retries)
+		return "", fmt.Errorf("init llm: %w", err)
 	}
-	return answer, nil
+
+	ctx, cancel := context.WithTimeout(context.Background(), timeout)
+	defer cancel()
+
+	var answer string
+	var lastErr error
+	for i := 0; i <= retries; i++ {
+		answer, lastErr = llms.GenerateFromSinglePrompt(ctx, llm, question)
+		if lastErr == nil {
+			return answer, nil
+		}
+	}
+
+	if lastErr == nil {
+		lastErr = fmt.Errorf("request failed")
+	}
+	return "", fmt.Errorf("%w (timeout=%s retries=%d)", lastErr, timeout, retries)
 }
--- a/server/config/server.yaml
+++ b/server/config/server.yaml
@ -28,7 +28,7 @@ models:
  generator:
    provider: "ollama"
    models:
-      - 'gpt-oss:20b'
+      - 'llama2:13b'
    endpoint: "http://127.0.0.1:11434/v1/chat/completions"
    token: ""
 # For PROD
@ -36,14 +36,14 @@ models:
 #  embedder:
    #provider: "chutes"
    #models: "bge-m3"
-    #endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed/v1/embeddings"
-    #token: "cpk_xxxxxxxxxxxxxxxxxxxx"
+    #endpoint: "https://chutes-baai-bge-m3.chutes.ai/embed"
+    #token: "cpk_xxxx"
 #  generator:
    #provider: "chutes"
-    #endpoint: "https://llm.chutes.ai/v1/chat/completions"
-    #token: "cpk_xxxxxxxxxxxxxxxxxxxx"
    #models:
    #  - 'moonshotai/Kimi-K2-Instruct'
+    #endpoint: "https://llm.chutes.ai/v1/chat/completions"
+    #token: "cpk_xxxx"

 embedding:
  max_batch: 64