Merge branch 'main' into codex/optimize-cli-server-with-langchaingo

2025-08-12 23:48:45 +08:00 · 2025-08-12 23:48:45 +08:00 · 61d4dff7d3
commit 61d4dff7d3
parent 915d7d8c3e 82ec512a88
13 changed files with 256 additions and 54 deletions
--- a/README.md
+++ b/README.md
@ -20,9 +20,22 @@ All UI components provide both Chinese and English interfaces.
 | Framework | Go         | 1.24    |
 | Framework | Next.js    | 14.1.0  |
 | Gateway   | OpenResty  | 1.27.1.2 |
-| Database  | PostgreSQL + pgvector | 14.18 |
 | Cache     | Redis      | 8.2.0     |
-| Model     | ollama/chutes.ai| baai/bge-m3, llama2:13b, moonshotai/Kimi-K2-Instruct |
+| Database  | PostgreSQL + pgvector | 14.18 |
+| Model (Local)  | HuggingFace Hub + Ollama | baai/bge-m3, llama2:13b |
+| Model (Online) | Chutes.AI  | baai/bge-m3, moonshotai/Kimi-K2-Instruct |
+
+## LangChainGo 核心功能一览
+
+XControl 通过 LangChainGo 统一接入多种大模型，并为 AskAI、CLI 与 Server 提供链式调用能力：
+
+- **LLM 接口层（Model I/O）**：统一调用 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型接口。
+- **Chains（链式流程）**：将 prompt、检索结果、工具调用等组合成完整流程，支持 RAG、聊天、代码生成等场景。
+- **工具与 Agent 体系**：定义 Web 搜索、Scraper、SQL 查询等工具，并集成到 LLM Agent，实现 ReAct 风格的工具调用。
+- **向量检索与数据接入**：适配 PGVector、Weaviate、Qdrant、MongoDB Atlas Vector Search、Chroma、Pinecone、Redis Vector 等向量存储。
+- **文档加载与分块**：提供 Document Loaders 与 Text Splitters，用于处理长文本与构建向量检索块。
+- **Memory 与历史追踪**：支持 Conversation Buffer 等对话记忆机制，增强交互体验。
+

 ## LangChainGo 核心功能一览

--- a/docs/Milestone-2-todo.md
+++ b/docs/Milestone-2-todo.md
@ -3,22 +3,34 @@
 使用 LangChainGo 框架优化 CLI、Server 以及 AskAI 接口的子任务规划：

 1. **LLM 接口层（Model I/O）**
-   - 统一接入 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型。
-   - 支持在 CLI 与 Server 中通过配置切换不同模型提供商。
+   - [ ] 构建 OpenAI、Hugging Face、Ollama、Google AI、Cohere 等模型的 provider registry。
+   - [ ] 在 CLI 与 Server 配置中暴露模型提供商切换能力。
+   - [ ] 编写单元测试验证不同 provider 间的切换。
+   - [ ] 补充配置和环境变量使用文档。
 2. **Chains（链式流程）**
-   - 将 prompt、检索结果、工具调用组合成完整流程，完善 RAG 与聊天能力。
-   - 为 AskAI 提供可组合的链式 API，简化复杂任务编排。
+   - [ ] 将 prompt、检索结果、工具调用组合成 RAG 与聊天链。
+   - [ ] 为 AskAI 提供可复用的链式定义，支持复杂任务编排。
+   - [ ] 在 CLI 中提供链式调用示例。
+   - [ ] 编写链式流程的集成测试。
 3. **工具与 Agent 体系**
-   - 定义常用工具（Web 搜索、Scraper、SQL 查询等）并集成到 Agent。
-   - 在 CLI 中实现 ReAct 风格的工具调用示例。
+   - [ ] 实现 Web 搜索、Scraper、SQL 查询等常用工具。
+   - [ ] 将工具注册到 Agent 框架中，支持动态调用。
+   - [ ] 在 CLI 中演示 ReAct 风格的工具调用。
+   - [ ] 为工具与 Agent 交互添加测试用例。
 4. **向量检索与数据接入**
-   - 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。
-   - 允许自定义向量维度和检索参数。
+   - [ ] 接入 PGVector、Weaviate、Qdrant、Chroma、Pinecone、Redis Vector 等存储。
+   - [ ] 支持自定义向量维度与检索参数。
+   - [ ] 为不同向量存储编写基准测试与比较。
+   - [ ] 提供检索参数调优的文档示例。
 5. **文档加载与分块**
-   - 提供 Document Loaders 与 Text Splitters，适配不同格式与长度的文本。
-   - 将分块结果统一存储并提供增量更新能力。
+   - [ ] 提供 Markdown、代码、HTML 等多格式的 Document Loader。
+   - [ ] 支持按 token 或递归策略的 Text Splitter。
+   - [ ] 统一存储分块结果并支持增量更新 API。
+   - [ ] 为 loader 与 splitter 编写测试。
 6. **Memory 与历史追踪**
-   - 为 AskAI 增加对话记忆，如 conversation buffer。
-   - 在 Server 中持久化对话上下文，提升交互体验。
+   - [ ] 为 AskAI 增加 conversation buffer 等对话记忆。
+   - [ ] 在 Server 中持久化会话历史并提供配置项。
+   - [ ] 支持调整记忆长度与清理策略。
+   - [ ] 编写端到端测试验证记忆保留。

 以上任务将逐步落实，以完成混合检索与多模型支持目标。
--- a/docs/api-endpoints.md
+++ b/docs/api-endpoints.md
@ -90,7 +90,7 @@ models:
    provider: "ollama"
    models:
      - 'llama2:13b'
-    endpoint: "http://127.0.0.1:11434/v1/chat/completions"
+    endpoint: "http://127.0.0.1:11434"
 ```

 For online services using Chutes:
@ -106,7 +106,7 @@ For online services using Chutes:
 #    provider: "chutes"
 #    models:
 #      - 'moonshotai/Kimi-K2-Instruct'
-#    endpoint: "https://llm.chutes.ai/v1/chat/completions"
+#    endpoint: "https://llm.chutes.ai/v1"
 #    token: "cpk_xxxx"
 ```

--- a/docs/changelog.md
+++ b/docs/changelog.md
@ -1,22 +1,17 @@
 # Changelog

 ## Milestone 1: MVP (Completed)
-Use default Redis port (#98) and establish PostgreSQL & Redis baseline.

-Stream RAG sync progress for GitHub repository synchronization (#100).
-
-Add client-side Markdown parsing to the CLI (#104).
-
-Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
-
-Perform RAG API functional tests and support per-file ingestion workflow in the CLI (#115).
-
-Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector database initialization (#120).
-
-Ingest files automatically (#123).
-
-## Milestone 2: Hybrid Search
+- Use default Redis port (#98) and establish PostgreSQL & Redis baseline.
+- Stream RAG sync progress for GitHub repository synchronization (#100).
+- Add client-side Markdown parsing to the CLI (#104).
+- Refactor RAG ingestion into the CLI with a server upsert endpoint (#103).
+- Perform RAG API functional tests and support per-file ingestion workflow in the CLI (#115).
+- Allow RAG upsert to migrate embedding dimensions (#119) and document pgvector database initialization (#120).
+- Ingest files automatically (#123).

+## Milestone 2: Hybrid Search (In Progress)
+- Rename RAG 第二阶段优化规划为 `docs/Milestone-2.md` 并新增子任务列表。
 - AskAI 接口与 CLI 规划使用 LangChainGo 框架以支持多模型与链式调用。
 - Document local and Chutes model configurations for AskAI.
 - CLI and server dynamically support 1024-dimensional embeddings.
--- a/example/server/config/server.yaml
+++ b/example/server/config/server.yaml
@ -20,11 +20,11 @@ sync:

 provider:
  - name: ollama
-    endpoint: http://localhost:11434/v1/chat/completions
+    endpoint: http://localhost:11434
    models:
      - 'gpt-oss:20b'
  - name: chutes
-    endpoint: https://llm.chutes.ai/v1/chat/completions
+    endpoint: https://llm.chutes.ai/v1
    token: "cpk_xxxxxxxxxxxxxxxxxx"
    models:
      - 'moonshotai/Kimi-K2-Instruct'
--- a/internal/rag/config/config.go
+++ b/internal/rag/config/config.go
@ -118,10 +118,15 @@ type Config struct {
 	Models struct {
 		Embedder  ModelCfg `yaml:"embedder"`
 		Generator ModelCfg `yaml:"generator"`
+		Reranker  ModelCfg `yaml:"reranker"`
 	} `yaml:"models"`
 	Embedding EmbeddingCfg `yaml:"embedding"`
 	Chunking  ChunkingCfg  `yaml:"chunking"`
-	API       struct {
+	Retrieval struct {
+		Alpha      float64 `yaml:"alpha"`
+		Candidates int     `yaml:"candidates"`
+	} `yaml:"retrieval"`
+	API struct {
 		AskAI struct {
 			Timeout int `yaml:"timeout"`
 			Retries int `yaml:"retries"`
--- a/internal/rag/config/runtime.go
+++ b/internal/rag/config/runtime.go
@ -64,6 +64,11 @@ type Runtime struct {
 	Datasources []DataSource `yaml:"datasources"`
 	Proxy       string       `yaml:"proxy"`
 	Embedding   RuntimeEmbedding
+	Reranker    ModelCfg
+	Retrieval   struct {
+		Alpha      float64 `yaml:"alpha"`
+		Candidates int     `yaml:"candidates"`
+	} `yaml:"retrieval"`
 }

 // ServerConfigPath points to the server configuration file.
@ -82,6 +87,8 @@ func LoadServer() (*Runtime, error) {
 	}
 	rt.Redis = cfg.Global.Redis
 	rt.Embedding = cfg.ResolveEmbedding()
+	rt.Reranker = cfg.Models.Reranker
+	rt.Retrieval = cfg.Retrieval
 	return rt, nil
 }

@ -101,6 +108,8 @@ func (rt *Runtime) ToConfig() *Config {
 	if rt.Embedding.Model != "" {
 		c.Models.Embedder.Models = []string{rt.Embedding.Model}
 	}
+	c.Models.Reranker = rt.Reranker
+	c.Retrieval = rt.Retrieval
 	c.Embedding.Dimension = rt.Embedding.Dimension
 	c.Embedding.MaxBatch = rt.Embedding.MaxBatch
 	c.Embedding.MaxChars = rt.Embedding.MaxChars
--- a/internal/rag/rerank/bge.go
+++ b/internal/rag/rerank/bge.go
@ -0,0 +1,58 @@
+package rerank
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"time"
+)
+
+// BGE implements a reranker backed by a bge-reranker service.
+type BGE struct {
+	endpoint string
+	token    string
+	client   *http.Client
+}
+
+// NewBGE returns a new BGE reranker.
+func NewBGE(endpoint, token string) *BGE {
+	return &BGE{
+		endpoint: endpoint,
+		token:    token,
+		client:   &http.Client{Timeout: 30 * time.Second},
+	}
+}
+
+// Rerank posts query and docs to the service and returns scores.
+func (b *BGE) Rerank(ctx context.Context, query string, docs []string) ([]float32, error) {
+	payload := map[string]any{"query": query, "documents": docs}
+	body, _ := json.Marshal(payload)
+	req, err := http.NewRequestWithContext(ctx, http.MethodPost, b.endpoint, bytes.NewReader(body))
+	if err != nil {
+		return nil, err
+	}
+	req.Header.Set("Content-Type", "application/json")
+	if b.token != "" {
+		req.Header.Set("Authorization", "Bearer "+b.token)
+	}
+	resp, err := b.client.Do(req)
+	if err != nil {
+		return nil, err
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode >= 300 {
+		return nil, fmt.Errorf("rerank failed: %s", resp.Status)
+	}
+	var out struct {
+		Scores []float32 `json:"scores"`
+	}
+	if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
+		return nil, err
+	}
+	if len(out.Scores) != len(docs) {
+		return nil, fmt.Errorf("unexpected scores length")
+	}
+	return out.Scores, nil
+}
--- a/internal/rag/rerank/rerank.go
+++ b/internal/rag/rerank/rerank.go
@ -0,0 +1,8 @@
+package rerank
+
+import "context"
+
+// Reranker scores a list of documents for a given query.
+type Reranker interface {
+	Rerank(ctx context.Context, query string, docs []string) ([]float32, error)
+}
--- a/internal/rag/service.go
+++ b/internal/rag/service.go
@ -3,12 +3,15 @@ package rag
 import (
 	"context"
 	"encoding/json"
+	"fmt"
+	"sort"

 	"github.com/jackc/pgx/v5"
 	pgvector "github.com/pgvector/pgvector-go"

 	"xcontrol/internal/rag/config"
 	"xcontrol/internal/rag/embed"
+	"xcontrol/internal/rag/rerank"
 	"xcontrol/internal/rag/store"
 )

@ -89,24 +92,107 @@ func (s *Service) Query(ctx context.Context, question string, limit int) ([]Docu
 	}
 	defer conn.Close(ctx)

-	rows, err := conn.Query(ctx, `SELECT repo, path, chunk_id, content, metadata FROM documents ORDER BY embedding <-> $1 LIMIT $2`,
-		pgvector.NewVector(vecs[0]), limit)
+	alpha := s.cfg.Retrieval.Alpha
+	if alpha < 0 || alpha > 1 {
+		alpha = 0.5
+	}
+	cand := s.cfg.Retrieval.Candidates
+	if cand <= 0 {
+		cand = 50
+	}
+
+	type scored struct {
+		Document
+		vscore float64
+		tscore float64
+		score  float64
+	}
+	docsMap := map[string]*scored{}
+
+	vrows, err := conn.Query(ctx, `SELECT repo,path,chunk_id,content,metadata, embedding <-> $1 AS dist FROM documents ORDER BY embedding <-> $1 LIMIT $2`,
+		pgvector.NewVector(vecs[0]), cand)
 	if err != nil {
 		return nil, err
 	}
-	defer rows.Close()
-
-	var docs []Document
-	for rows.Next() {
-		var d Document
+	for vrows.Next() {
+		var d scored
 		var metaBytes []byte
-		if err := rows.Scan(&d.Repo, &d.Path, &d.ChunkID, &d.Content, &metaBytes); err != nil {
+		var dist float64
+		if err := vrows.Scan(&d.Repo, &d.Path, &d.ChunkID, &d.Content, &metaBytes, &dist); err != nil {
+			vrows.Close()
 			return nil, err
 		}
 		if len(metaBytes) > 0 {
 			_ = json.Unmarshal(metaBytes, &d.Metadata)
 		}
-		docs = append(docs, d)
+		d.vscore = -dist
+		key := fmt.Sprintf("%s|%s|%d", d.Repo, d.Path, d.ChunkID)
+		docsMap[key] = &d
 	}
-	return docs, rows.Err()
+	vrows.Close()
+
+	trows, err := conn.Query(ctx, `SELECT repo,path,chunk_id,content,metadata, ts_rank_cd(content_tsv, websearch_to_tsquery($1)) AS rank FROM documents WHERE content_tsv @@ websearch_to_tsquery($1) ORDER BY rank DESC LIMIT $2`,
+		question, cand)
+	if err != nil {
+		return nil, err
+	}
+	for trows.Next() {
+		var metaBytes []byte
+		var rank float64
+		key := ""
+		d := scored{}
+		if err := trows.Scan(&d.Repo, &d.Path, &d.ChunkID, &d.Content, &metaBytes, &rank); err != nil {
+			trows.Close()
+			return nil, err
+		}
+		if len(metaBytes) > 0 {
+			_ = json.Unmarshal(metaBytes, &d.Metadata)
+		}
+		d.tscore = rank
+		key = fmt.Sprintf("%s|%s|%d", d.Repo, d.Path, d.ChunkID)
+		if exist, ok := docsMap[key]; ok {
+			exist.tscore = d.tscore
+		} else {
+			docsMap[key] = &d
+		}
+	}
+	trows.Close()
+
+	candidates := make([]*scored, 0, len(docsMap))
+	for _, d := range docsMap {
+		d.score = alpha*d.vscore + (1-alpha)*d.tscore
+		candidates = append(candidates, d)
+	}
+	sort.Slice(candidates, func(i, j int) bool { return candidates[i].score > candidates[j].score })
+	if len(candidates) > cand {
+		candidates = candidates[:cand]
+	}
+
+	// optional reranking
+	var rr rerank.Reranker
+	rCfg := s.cfg.Models.Reranker
+	if rCfg.Endpoint != "" {
+		rr = rerank.NewBGE(rCfg.Endpoint, rCfg.Token)
+	}
+	if rr != nil {
+		docs := make([]string, len(candidates))
+		for i, c := range candidates {
+			docs[i] = c.Content
+		}
+		if scores, err := rr.Rerank(ctx, question, docs); err == nil && len(scores) == len(candidates) {
+			for i := range candidates {
+				candidates[i].score = float64(scores[i])
+			}
+			sort.Slice(candidates, func(i, j int) bool { return candidates[i].score > candidates[j].score })
+		}
+	}
+
+	if limit > len(candidates) {
+		limit = len(candidates)
+	}
+	out := make([]Document, 0, limit)
+	for i := 0; i < limit; i++ {
+		out = append(out, candidates[i].Document)
+	}
+	return out, nil
 }
--- a/internal/rag/store/store.go
+++ b/internal/rag/store/store.go
@ -64,6 +64,20 @@ func EnsureSchema(ctx context.Context, conn *pgx.Conn, dim int, migrate bool) er
 			return err
 		}
 	}
+	// ensure full-text search column
+	var hasTSV bool
+	err = conn.QueryRow(ctx, `SELECT EXISTS (
+               SELECT 1 FROM information_schema.columns
+               WHERE table_name='documents' AND column_name='content_tsv'
+       )`).Scan(&hasTSV)
+	if err != nil {
+		return err
+	}
+	if !hasTSV {
+		if _, err := conn.Exec(ctx, `ALTER TABLE documents ADD COLUMN content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED`); err != nil {
+			return err
+		}
+	}
 	// check dimension
 	var curDim int
 	err = conn.QueryRow(ctx, `SELECT atttypmod-4 FROM pg_attribute a JOIN pg_type t ON a.atttypid=t.oid WHERE a.attrelid='documents'::regclass AND a.attname='embedding'`).Scan(&curDim)
@ -82,6 +96,9 @@ func EnsureSchema(ctx context.Context, conn *pgx.Conn, dim int, migrate bool) er
 	if _, err := conn.Exec(ctx, `CREATE INDEX IF NOT EXISTS documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops)`); err != nil {
 		return err
 	}
+	if _, err := conn.Exec(ctx, `CREATE INDEX IF NOT EXISTS documents_content_tsv_idx ON documents USING GIN (content_tsv)`); err != nil {
+		return err
+	}
 	return nil
 }

--- a/server/api/askai.go
+++ b/server/api/askai.go
@ -12,7 +12,6 @@ import (

 	"github.com/gin-gonic/gin"
 	"github.com/tmc/langchaingo/llms"
-	"github.com/tmc/langchaingo/llms/ollama"
 	"github.com/tmc/langchaingo/llms/openai"
 	"gopkg.in/yaml.v3"
 )
@ -122,10 +121,14 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
 	}
 	provider = strings.ToLower(provider)
 	endpoint = strings.TrimRight(endpoint, "/")
+	endpoint = strings.TrimSuffix(endpoint, "/chat/completions")
+	endpoint = strings.TrimRight(endpoint, "/")
 	switch provider {
 	case "ollama":
+		endpoint = strings.TrimSuffix(endpoint, "/v1")
+		endpoint = strings.TrimRight(endpoint, "/")
 		if endpoint == "" {
-			endpoint = "http://localhost:11434/v1/chat/completions"
+			endpoint = "http://localhost:11434"
 		}
 		if model == "" {
 			model = "llama2:13b"
@ -133,7 +136,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
 		return provider, token, model, endpoint, timeout, retries
 	case "chutes":
 		if endpoint == "" {
-			endpoint = "https://llm.chutes.ai/v1/chat/completions"
+			endpoint = "https://llm.chutes.ai/v1"
 		}
 		if model == "" {
 			model = "deepseek-ai/DeepSeek-R1"
@ -141,7 +144,7 @@ func loadConfig() (string, string, string, string, time.Duration, int) {
 		return provider, token, model, endpoint, timeout, retries
 	default:
 		if endpoint == "" {
-			endpoint = "https://llm.chutes.ai/v1/chat/completions"
+			endpoint = "https://llm.chutes.ai/v1"
 		}
 		if model == "" {
 			model = "deepseek-ai/DeepSeek-R1"
@ -163,11 +166,7 @@ func callLLM(question string) (string, error) {

 	switch provider {
 	case "ollama":
-		llm, err = ollama.New(
-			ollama.WithModel(model),
-			ollama.WithServerURL(url),
-			ollama.WithHTTPClient(httpClient),
-		)
+		fallthrough
 	default:
 		llm, err = openai.New(
 			openai.WithToken(token),
--- a/server/config/server.yaml
+++ b/server/config/server.yaml
@ -29,7 +29,7 @@ models:
    provider: "ollama"
    models:
      - 'llama2:13b'
-    endpoint: "http://127.0.0.1:11434/v1/chat/completions"
+    endpoint: "http://127.0.0.1:11434"
    token: ""
 # For PROD
 #models:
@ -42,7 +42,7 @@ models:
    #provider: "chutes"
    #models:
    #  - 'moonshotai/Kimi-K2-Instruct'
-    #endpoint: "https://llm.chutes.ai/v1/chat/completions"
+    #endpoint: "https://llm.chutes.ai/v1"
    #token: "cpk_xxxx"

 embedding: