Compare commits

..

No commits in common. "main" and "codex/fix-agent-install-first-run-exit" have entirely different histories.

156 changed files with 13004 additions and 3839 deletions

2
.gitignore vendored
View File

@ -4,7 +4,6 @@
#########################################################
# pigsty-ca and other certs
#########################################################
files/*.key
files/*.crt
files/pki/*
@ -27,7 +26,6 @@ docker/data/
#########################################################
# tmp files
#########################################################
.env
# IDE files
.idea/
.code/

132
README.md
View File

@ -5,7 +5,7 @@
**Observability.svc.plus** is an observability solution strictly following the Apache 2.0 license.
> **Focus**: Monitoring & Observability (监控/可观测). Integrating OpenTelemetry (OTel), VictoriaMetrics, and DeepFlow-based network observability without long-term raw-flow lock-in.
> **Focus**: Monitoring & Observability (监控/可观测). Integrating OpenTelemetry (OTel), with future plans to incorporate DeepFlow Agent and other open-source NPM (Network Performance Monitoring) probes.
[Website](https://svc.plus/) | [Public Demo](https://svc.plus/services) | [Blog](https://svc.plus/blogs) | [Support](https://www.svc.plus/support)
@ -31,70 +31,6 @@ flowchart LR
## 3) Start
当前推荐按“混合部署到已有主机”的方式执行。
1. 先更新 DNS`observability.svc.plus` 指到 `us-xhttp.svc.plus`
2. 在 `us-xhttp.svc.plus` 上执行下面的 Server side 示例,部署中心端
3. 再到其他已有主机执行下面的 Client side 示例,把采集数据回传到 `observability.svc.plus`
当前接入主机:
- `us-xhttp.svc.plus`:继续承载现有服务,同时承载 `observability.svc.plus`
- `openclaw.svc.plus`:部署 agent采集后上报到中心端
- `jp-xhttp.svc.plus`:部署 agent采集后上报到中心端
### Ansible (Recommended)
#### Server side
先导出 Cloudflare Token然后在 `us-xhttp.svc.plus` 上执行服务端部署。`deploy_observability_service.yml` 会先把 Cloudflare 上的 `observability.svc.plus` 更新成指向 `us-xhttp.svc.plus` 的非代理记录,再等待公共 DNS 生效后继续部署,这样更容易保证 Caddy 首次自动签名成功。
```bash
export CLOUDFLARE_API_TOKEN=...
ansible-playbook -i <your-inventory> deploy_observability_service.yml -l us-xhttp.svc.plus
```
如果希望给 `/ingest/*` 增加一层基础认证,可以在服务端部署时一起打开:
```bash
export CLOUDFLARE_API_TOKEN=...
ansible-playbook -i <your-inventory> deploy_observability_service.yml -l us-xhttp.svc.plus \
-e observability_ingest_basic_auth_enabled=true \
-e observability_ingest_basic_auth_user=ingest \
-e observability_ingest_basic_auth_password='<strong-password>'
```
#### Client side (agent)
再到采集端主机执行 `node.yml` 的 push mode
```bash
ansible-playbook -i <your-inventory> node.yml \
-l openclaw.svc.plus,jp-xhttp.svc.plus \
-e node_monitor_mode=push \
-e observability_endpoint=https://observability.svc.plus/
```
如果服务端已开启 ingest 基本认证,采集端也要带上同一组凭据:
```bash
ansible-playbook -i <your-inventory> node.yml \
-l openclaw.svc.plus,jp-xhttp.svc.plus \
-e node_monitor_mode=push \
-e observability_endpoint=https://observability.svc.plus/ \
-e observability_ingest_basic_auth_enabled=true \
-e observability_ingest_basic_auth_user=ingest \
-e observability_ingest_basic_auth_password='<strong-password>'
```
> `node_monitor_mode=push` 会在远端主机上部署 `node_exporter + process_exporter + vector`,并把 metrics / logs 主动汇总到 `observability.svc.plus`。`vector` 固定归到采集端任务,服务端 `infra.yml` 不再默认部署它。
>
> 如果采集端与 Victoria 服务端同机playbook 会自动把 metrics / logs 改走本机 `127.0.0.1` ingest跨主机时默认走 `https://observability.svc.plus/` 并自动补全 `/ingest/metrics/api/v1/write``/ingest/logs/insert`
>
> `observability_ingest_basic_auth_*` 只保护 `/ingest/*` 写入入口,不影响 Caddy 暴露的其他站点页面;服务端和采集端必须使用同一组认证信息。
### Script Installers
### Server side
```bash
@ -118,66 +54,10 @@ curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability
> - logs endpoint: `/ingest/logs/insert`
> - The script automatically verifies installation after setup.
### Optional: DeepFlow Agent on Client
If you have deployed DeepFlow with `deepflow.yml`, you can install `deepflow-agent` on client nodes via the same script:
### Remote client example (clawdbot.svc.plus)
```bash
# example: endpoint exposed by caddy grpc ingress (deepflow_grpc_domain:443)
curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- \
--endpoint https://observability.svc.plus/ingest/otlp \
--deepflow-agent \
--deepflow-grpc-endpoint deepflow-agent.svc.plus:443 \
--deepflow-agent-download-url https://example.com/path/to/deepflow-agent
```
> If `deepflow-agent` binary already exists on host, replace `--deepflow-agent-download-url` with `--deepflow-agent-bin /path/to/deepflow-agent`.
## 🚀 DeepFlow Deployment (Server Side)
This repo now provides dedicated DeepFlow roles:
- `deepflow_mysql`
- `deepflow_clickhouse_s3`
- `deepflow_server`
- `deepflow_connector`
- `deepflow_agent`
Quick start:
```bash
./configure -c deepflow/deepflow
vi pigsty.yml # adjust domain/password/ports
./deploy.yml
./docker.yml
./deepflow.yml
./infra.yml -t caddy # apply deepflow_grpc_domain ingress
```
Default inventory template: `conf/deepflow/deepflow.yml`
### Lightweight Topology
- `deepflow-server` stays containerized with Docker Compose
- ClickHouse is kept as short-retention local storage
- MinIO/S3 is optional in lightweight mode
- `deepflow_connector` exports selected DeepFlow L4/L7 metrics to VictoriaMetrics
- `deepflow_agent` supports `binary/systemd`, `docker`, and rendered `k8s` manifests
- default `deepflow_agent_profile=lite` keeps `pcap` enabled and disables built-in `vector`
### Remote client example (openclaw.svc.plus)
```bash
ssh root@openclaw.svc.plus \
'curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- --endpoint https://observability.svc.plus/ingest/otlp'
```
### Remote client example (jp-xhttp.svc.plus)
```bash
ssh root@jp-xhttp.svc.plus \
ssh root@clawdbot.svc.plus \
'curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- --endpoint https://observability.svc.plus/ingest/otlp'
```
@ -185,18 +65,18 @@ ssh root@jp-xhttp.svc.plus \
### Optional SSH manager env example
```bash
SSH_SERVER_CLAWBOT_HOST=openclaw.svc.plus
SSH_SERVER_CLAWBOT_HOST=clawdbot.svc.plus
SSH_SERVER_CLAWBOT_USER=root
SSH_SERVER_CLAWBOT_KEYPATH=~/.ssh/id_rsa
SSH_SERVER_CLAWBOT_PORT=22
SSH_SERVER_CLAWBOT_DESCRIPTION=openclaw_server
SSH_SERVER_CLAWBOT_DESCRIPTION=clawdbot_server
```
## 4) Features
- **Observability First**: SOTA monitoring for PG / Infra / Node based on VictoriaMetrics, Grafana, and OpenTelemetry.
- **OTel Integration**: Native support for OpenTelemetry, facilitating unified trace, metric, and log ingestion.
- **DeepFlow Ready**: Lightweight DeepFlow server/agent deployment with short-lived flow storage and VictoriaMetrics archiving for high-value protocol metrics.
- **Future Ready**: Planned integration for DeepFlow Agent and other open-source NPM probes for deep network and application observability.
- **Reliable Base**: Robust self-healing HA clusters, PITR, and secure infrastructure.
- **Maintainable**: One-Cmd Deploy, IaC support, and easy customization.
- **Controllable**: Self-sufficient Cloud Neutral FOSS. Run on bare Linux.

View File

@ -3,11 +3,11 @@ forks = 10
nocows = 1
timeout = 15
pipelining = True
inventory = observability.yml
inventory = pigsty.yml
host_key_checking = False
command_warnings = False
deprecation_warnings = False
force_valid_group_names = ignore
use_persistent_connections = True
allow_world_readable_tmpfiles = False
ansible_managed = 'ansible managed: %Y-%m-%d %H:%M:%S'
ansible_managed = 'ansible managed: %Y-%m-%d %H:%M:%S'

View File

@ -1,115 +0,0 @@
---
#==============================================================#
# File : deepflow.yml
# Desc : observability config for running DeepFlow stack
# Ctime : 2026-02-04
# Mtime : 2026-02-04
# License : Apache-2.0 @ https://pigsty.io/docs/about/license/
#==============================================================#
# how to use this template:
#
# curl -fsSL https://repo.pigsty.io/get | bash; cd ~/pigsty
# ./bootstrap # prepare local repo & ansible
# ./configure -c deepflow/deepflow # use this deepflow config template
# vi pigsty.yml # IMPORTANT: CHANGE CREDENTIALS / DOMAIN
# ./deploy.yml # install infra stack
# ./docker.yml # install docker & docker-compose
# ./deepflow.yml # install deepflow with compose + optional connector/agent
all:
children:
deepflow:
hosts: { 10.10.10.10: {} }
vars:
deepflow_enabled: true
deepflow_mysql_enabled: true
deepflow_clickhouse_s3_enabled: true
deepflow_connector_enabled: true
deepflow_agent_enabled: false
deepflow_deploy_profile: lite
deepflow_storage_mode: short_ttl
deepflow_data: /data/deepflow
# role: deepflow_mysql
deepflow_mysql_port: 13306
deepflow_mysql_root_password: DeepFlow.Root.ChangeMe
deepflow_mysql_user: deepflow
deepflow_mysql_password: DeepFlow.MySQL.ChangeMe
deepflow_mysql_database: deepflow
# role: deepflow_clickhouse_s3
deepflow_clickhouse_http_port: 18123
deepflow_clickhouse_tcp_port: 19000
deepflow_clickhouse_retention_hours: 24
deepflow_s3_enabled: false
deepflow_minio_api_port: 19090
deepflow_minio_console_port: 19091
deepflow_s3_bucket: deepflow
deepflow_s3_access_key: deepflow
deepflow_s3_secret_key: DeepFlow.S3.ChangeMe
deepflow_s3_region: us-east-1
# role: deepflow_server
deepflow_server_grpc_port: 20035
deepflow_server_http_port: 20417
deepflow_app_port: 20880
deepflow_clickhouse_addr: host.docker.internal:19000
deepflow_s3_endpoint: http://host.docker.internal:19090
deepflow_mysql_addr: host.docker.internal:13306
deepflow_l4_log_ttl_hour: 24
deepflow_l7_log_ttl_hour: 24
deepflow_flow_metrics_ttl_hour: 24
deepflow_metrics_ttl_hour: 24
deepflow_prometheus_ttl_hour: 24
# role: deepflow_connector
deepflow_connector_source_endpoint: http://127.0.0.1:20417/metrics
deepflow_connector_remote_write_url: http://127.0.0.1:8428/api/v1/write
# role: deepflow_agent
deepflow_agent_mode: binary
deepflow_agent_profile: lite
deepflow_agent_disable_pcap: false
deepflow_agent_disable_vector: true
deepflow_agent_grpc_endpoint: "{{ deepflow_grpc_domain }}:443"
infra: { hosts: { 10.10.10.10: { infra_seq: 1 } } }
etcd: { hosts: { 10.10.10.10: { etcd_seq: 1 } }, vars: { etcd_cluster: etcd } }
vars:
version: v4.0.0
admin_ip: 10.10.10.10
region: default
node_tune: oltp
pg_conf: oltp.yml
docker_enabled: true
# Caddy gRPC ingress for deepflow-agent:
caddy_enabled: true
deepflow_grpc_enabled: true
deepflow_grpc_domain: deepflow-agent.pigsty
deepflow_grpc_upstream: 127.0.0.1:20035
infra_portal:
home : { domain: svc.plus }
deepflow : { domain: deepflow.pigsty ,endpoint: "10.10.10.10:20880" }
proxy_env:
no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.*,*.tsinghua.edu.cn"
repo_enabled: false
node_repo_modules: node,infra,pgsql
grafana_admin_password: pigsty
grafana_view_password: DBUser.Viewer
pg_admin_password: DBUser.DBA
pg_monitor_password: DBUser.Monitor
pg_replication_password: DBUser.Replicator
patroni_password: Patroni.API
haproxy_admin_password: pigsty
minio_secret_key: S3User.MinIO
etcd_root_password: Etcd.Root

View File

@ -1,28 +0,0 @@
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : deepflow.yml
# Desc : deploy deepflow stack with three dedicated roles
# Ctime : 2026-02-04
# Mtime : 2026-02-04
# Path : deepflow.yml
# License : Apache-2.0 @ https://pigsty.io/docs/about/license/
#==============================================================#
- name: DEEPFLOW STACK
become: true
hosts: all
gather_facts: no
roles:
- { role: node_id , tags: node-id, when: deepflow_enabled | default(true) | bool }
- { role: deepflow_mysql , tags: deepflow_mysql, when: deepflow_mysql_enabled | default(true) | bool }
- { role: deepflow_clickhouse_s3, tags: deepflow_clickhouse_s3, when: deepflow_clickhouse_s3_enabled | default(true) | bool }
- { role: deepflow_server , tags: deepflow_server, when: deepflow_enabled | default(true) | bool }
- { role: deepflow_connector , tags: deepflow_connector, when: deepflow_connector_enabled | default(false) | bool }
- { role: deepflow_agent , tags: deepflow_agent, when: deepflow_agent_enabled | default(false) | bool }
# Usage:
# 1. Define deepflow group in pigsty.yml
# 2. Ensure docker is installed: ./docker.yml
# 3. Run ./deepflow.yml -l <deepflow_group>

View File

@ -1,147 +0,0 @@
---
- name: Update Cloudflare DNS for observability.svc.plus
hosts: localhost
connection: local
gather_facts: false
vars:
cloudflare_zone_name: svc.plus
cloudflare_api_base: https://api.cloudflare.com/client/v4
observability_domain: observability.svc.plus
observability_dns_target: us-xhttp.svc.plus
observability_dns_type: CNAME
observability_dns_ttl: 1
observability_dns_proxied: false
dns_wait_retries: 30
dns_wait_delay: 10
tasks:
- name: Validate Cloudflare token is present in environment
ansible.builtin.assert:
that:
- lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') | length > 0
fail_msg: "CLOUDFLARE_API_TOKEN must be exported before running this playbook."
- name: Resolve Cloudflare zone id
ansible.builtin.uri:
url: "{{ cloudflare_api_base }}/zones?name={{ cloudflare_zone_name }}"
method: GET
headers:
Authorization: "Bearer {{ lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') }}"
Content-Type: application/json
return_content: true
register: cloudflare_zone_lookup
- name: Validate zone lookup result
ansible.builtin.assert:
that:
- cloudflare_zone_lookup.json.success
- cloudflare_zone_lookup.json.result | length > 0
fail_msg: "Unable to resolve Cloudflare zone id for {{ cloudflare_zone_name }}."
- name: Set Cloudflare zone id
ansible.builtin.set_fact:
cloudflare_zone_id: "{{ cloudflare_zone_lookup.json.result[0].id }}"
- name: Query existing observability DNS records
ansible.builtin.uri:
url: "{{ cloudflare_api_base }}/zones/{{ cloudflare_zone_id }}/dns_records?name={{ observability_domain }}"
method: GET
headers:
Authorization: "Bearer {{ lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') }}"
Content-Type: application/json
return_content: true
register: observability_dns_records
- name: Remove conflicting observability DNS records with different type
ansible.builtin.uri:
url: "{{ cloudflare_api_base }}/zones/{{ cloudflare_zone_id }}/dns_records/{{ item.id }}"
method: DELETE
headers:
Authorization: "Bearer {{ lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') }}"
Content-Type: application/json
loop: "{{ observability_dns_records.json.result | default([]) }}"
loop_control:
label: "{{ item.type }} {{ item.name }}"
when: item.type != observability_dns_type
- name: Create observability DNS record when missing
ansible.builtin.uri:
url: "{{ cloudflare_api_base }}/zones/{{ cloudflare_zone_id }}/dns_records"
method: POST
headers:
Authorization: "Bearer {{ lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') }}"
Content-Type: application/json
body_format: raw
body: >-
{{
{
'type': observability_dns_type,
'name': observability_domain,
'content': observability_dns_target,
'ttl': (observability_dns_ttl | int),
'proxied': (observability_dns_proxied | bool)
} | to_json
}}
when: (observability_dns_records.json.result | selectattr('type', 'equalto', observability_dns_type) | list | length) == 0
- name: Update observability DNS record when target changes
ansible.builtin.uri:
url: "{{ cloudflare_api_base }}/zones/{{ cloudflare_zone_id }}/dns_records/{{ (observability_dns_records.json.result | selectattr('type', 'equalto', observability_dns_type) | list | first).id }}"
method: PUT
headers:
Authorization: "Bearer {{ lookup('ansible.builtin.env', 'CLOUDFLARE_API_TOKEN') }}"
Content-Type: application/json
body_format: raw
body: >-
{{
{
'type': observability_dns_type,
'name': observability_domain,
'content': observability_dns_target,
'ttl': (observability_dns_ttl | int),
'proxied': (observability_dns_proxied | bool)
} | to_json
}}
when:
- (observability_dns_records.json.result | selectattr('type', 'equalto', observability_dns_type) | list | length) > 0
- >
((observability_dns_records.json.result | selectattr('type', 'equalto', observability_dns_type) | list | first).content != observability_dns_target)
or
(((observability_dns_records.json.result | selectattr('type', 'equalto', observability_dns_type) | list | first).proxied | default(false)) != observability_dns_proxied)
- name: Wait for public DNS to expose observability CNAME
ansible.builtin.uri:
url: "https://cloudflare-dns.com/dns-query?name={{ observability_domain }}&type=CNAME"
method: GET
headers:
Accept: application/dns-json
return_content: true
register: observability_dns_public
until:
- observability_dns_public.status == 200
- >
(
observability_dns_public.json.Status
if (observability_dns_public.json is defined)
else ((observability_dns_public.content | from_json).Status | default(1))
) == 0
- >
(
observability_dns_public.json.Answer
if (observability_dns_public.json is defined)
else ((observability_dns_public.content | from_json).Answer | default([]))
) | selectattr('data', 'equalto', observability_dns_target ~ '.')
| list | length > 0
retries: "{{ dns_wait_retries }}"
delay: "{{ dns_wait_delay }}"
- name: Show effective observability DNS target
ansible.builtin.debug:
msg: "{{ observability_domain }} -> {{ observability_dns_target }} proxied={{ observability_dns_proxied }}"
- import_playbook: infra.yml
vars:
infra_domain: observability.svc.plus
infra_portal:
home: { domain: observability.svc.plus }
caddy_enabled: true
nginx_enabled: false

View File

@ -1,14 +0,0 @@
# Documentation Coverage Matrix
This matrix tracks the bilingual canonical documentation set for `observability.svc.plus` and maps it back to the current codebase and older docs.
该矩阵用于跟踪 `observability.svc.plus` 的双语规范文档,并将其与当前代码状态和历史文档对应起来。
| Category | EN | ZH | Current status | Existing references | Next check |
| --- | --- | --- | --- | --- | --- |
| Architecture | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Keep diagrams and ownership notes synchronized with actual directories, services, and integration dependencies. |
| Design | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Promote one-off implementation notes into reusable design records when behavior, APIs, or deployment contracts change. |
| Deployment | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Verify deployment steps against current scripts, manifests, CI/CD flow, and environment contracts before each release. |
| User Guide | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Prefer workflow-oriented examples and keep screenshots or terminal snippets aligned with the latest UI or CLI behavior. |
| Developer Guide | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Keep setup and test commands tied to actual package scripts, Make targets, or language toolchains in this repository. |
| Vibe Coding Reference | Yes | Yes | Seeded from current codebase; deeper legacy consolidation is still needed. | None yet; use the new canonical page as the starting point. | Review prompt templates and repo rules whenever the project adds new subsystems, protected areas, or mandatory verification steps. |

View File

@ -1,31 +0,0 @@
# Observability Service Plus / 可观测性服务
This `docs/` directory now has a bilingual canonical layer for the current repository state.
`docs/` 目录现已补齐双语规范层,用于承接当前仓库状态下的核心文档。
## Quick Entry / 快速入口
- Coverage checklist / 覆盖检查矩阵: `docs/DOC_COVERAGE.md`
- English index / 英文入口: `docs/en/README.md`
- 中文入口 / Chinese index: `docs/zh/README.md`
## Canonical Bilingual Pages / 双语规范页
- `docs/en/architecture.md` / `docs/zh/architecture.md`
- `docs/en/design.md` / `docs/zh/design.md`
- `docs/en/deployment.md` / `docs/zh/deployment.md`
- `docs/en/user-guide.md` / `docs/zh/user-guide.md`
- `docs/en/developer-guide.md` / `docs/zh/developer-guide.md`
- `docs/en/vibe-coding-reference.md` / `docs/zh/vibe-coding-reference.md`
## Current Repo Context / 当前仓库背景
- Root README: `Observability.svc.plus`
- Previous docs index: `Documentation`
- Manifest evidence / 构建清单: repository structure and scripts only
- Active code and ops directories / 当前主要目录: `app/`, `api/`, `scripts/`
## Existing Docs To Reconcile / 需要继续归并的现有文档
- No pre-existing markdown docs were detected in this repository.

View File

@ -1,23 +0,0 @@
# Observability Service Plus Documentation
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
## Current state snapshot
- Root README title: `Observability.svc.plus`
- Build/runtime evidence: repository structure and scripts only
- Primary directories detected: `app/`, `api/`, `scripts/`
- Existing docs count: 0
## Canonical pages
- [Architecture](architecture.md)
- [Design](design.md)
- [Deployment](deployment.md)
- [User Guide](user-guide.md)
- [Developer Guide](developer-guide.md)
- [Vibe Coding Reference](vibe-coding-reference.md)
## Legacy docs to fold in
- No pre-existing markdown docs were detected in this repository.

View File

@ -1,24 +0,0 @@
# Architecture
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page as the canonical bilingual overview of system boundaries, major components, and repo ownership.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Keep diagrams and ownership notes synchronized with actual directories, services, and integration dependencies.

View File

@ -1,24 +0,0 @@
# Deployment
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page to standardize deployment prerequisites, supported topologies, operational checks, and rollback notes.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Verify deployment steps against current scripts, manifests, CI/CD flow, and environment contracts before each release.

View File

@ -1,24 +0,0 @@
# Design
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page to consolidate design decisions, ADR-style tradeoffs, and roadmap-sensitive implementation notes.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Promote one-off implementation notes into reusable design records when behavior, APIs, or deployment contracts change.

View File

@ -1,24 +0,0 @@
# Developer Guide
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page to document local setup, project structure, test surfaces, and contribution conventions tied to the current codebase.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Keep setup and test commands tied to actual package scripts, Make targets, or language toolchains in this repository.

View File

@ -1,24 +0,0 @@
# User Guide
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page to document primary user/operator tasks, everyday workflows, and navigation to existing how-to material.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Prefer workflow-oriented examples and keep screenshots or terminal snippets aligned with the latest UI or CLI behavior.

View File

@ -1,24 +0,0 @@
# Vibe Coding Reference
This repository documents infrastructure orchestration and observability composition rather than a single application binary.
Use this page to align AI-assisted coding prompts, repo boundaries, safe edit rules, and documentation update expectations.
## Current code-aligned notes
- Documentation target: `observability.svc.plus`
- Repo kind: `infra-observability`
- Manifest and build evidence: repository structure and scripts only
- Primary implementation and ops directories: `app/`, `api/`, `scripts/`
- Package scripts snapshot: No package.json scripts were detected.
## Existing docs to reconcile
- No directly matching legacy docs were detected; this page is currently the canonical seed.
## What this page should cover next
- Describe the current implementation rather than an aspirational future-only design.
- Keep terminology aligned with the repository root README, manifests, and actual directories.
- Link deeper runbooks, specs, or subsystem notes from the legacy docs listed above.
- Review prompt templates and repo rules whenever the project adds new subsystems, protected areas, or mandatory verification steps.

View File

@ -1,23 +0,0 @@
# 可观测性服务 文档
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
## 当前状态快照
- 根 README 标题: `Observability.svc.plus`
- 构建与运行时证据: repository structure and scripts only
- 自动识别的主要目录: `app/`, `api/`, `scripts/`
- 现有文档数量: 0
## 核心双语文档
- [架构](architecture.md)
- [设计](design.md)
- [部署](deployment.md)
- [使用手册](user-guide.md)
- [开发手册](developer-guide.md)
- [Vibe Coding 参考](vibe-coding-reference.md)
## 待归并的历史文档
- No pre-existing markdown docs were detected in this repository.

View File

@ -1,24 +0,0 @@
# 架构
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页作为系统边界、核心组件与仓库职责的双语总览入口。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 随着目录结构、服务关系和集成依赖变化,持续同步图示与职责说明。

View File

@ -1,24 +0,0 @@
# 部署
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页用于统一部署前提、支持的拓扑、运维检查项与回滚注意事项。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 每次发布前依据当前脚本、清单、CI/CD 流程和环境契约重新核对部署步骤。

View File

@ -1,24 +0,0 @@
# 设计
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页用于汇总设计决策、类似 ADR 的权衡记录,以及与路线图相关的实现说明。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 当行为、API 或部署契约发生变化时,把一次性实现笔记提升为可复用设计记录。

View File

@ -1,24 +0,0 @@
# 开发手册
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页用于记录本地开发环境、项目结构、测试面与贴合当前代码库的贡献约定。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 持续让环境搭建与测试命令对应真实存在的脚本、Make 目标或语言工具链。

View File

@ -1,24 +0,0 @@
# 使用手册
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页用于记录主要用户或运维角色的日常任务、常见流程,以及现有操作文档入口。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 优先提供面向流程的示例,并确保截图或终端片段与最新 UI/CLI 行为一致。

View File

@ -1,24 +0,0 @@
# Vibe Coding 参考
该仓库更偏向基础设施编排与可观测体系组合,而不是单一应用二进制。
本页用于统一 AI 辅助开发提示词、仓库边界、安全编辑规则与文档同步要求。
## 与当前代码对齐的说明
- 文档目标仓库: `observability.svc.plus`
- 仓库类型: `infra-observability`
- 构建与运行依据: repository structure and scripts only
- 主要实现与运维目录: `app/`, `api/`, `scripts/`
- `package.json` 脚本快照: No package.json scripts were detected.
## 需要继续归并的现有文档
- 尚未发现直接对应的历史文档,本页目前就是该类别的规范起点。
## 本页下一步应补充的内容
- 先描述当前已落地实现,再补充未来规划,避免只写愿景不写现状。
- 术语需要与仓库根 README、构建清单和实际目录保持一致。
- 将上方列出的历史 runbook、spec、子系统说明逐步链接并归并到本页。
- 当项目新增子系统、受保护目录或强制验证步骤时,同步更新提示模板与仓库规则。

View File

@ -1,31 +1,28 @@
# Grafana Dashboards
This directory contains Grafana dashboard definitions for the observability stack.
This directory contains Grafana dashboard definitions for Pigsty monitoring system.
## Overview
The repository currently provides **61 domain dashboards + 1 homepage dashboard**.
Dashboards are organized by platform-engineering resource domains:
Pigsty provides **57 built-in dashboards** organized by module:
| Folder | Count | Description |
|--------|-------|-------------|
| [01-iaas-compute](01-iaas-compute/) | 5 | IAAS compute: node overview, cluster, instance, alert, compatibility summary |
| [02-iaas-storage](02-iaas-storage/) | 4 | IAAS storage: disk, JuiceFS, MinIO overview and instance |
| [03-iaas-network](03-iaas-network/) | 1 | IAAS network: VIP and node-network entry |
| [11-paas-control-plane](11-paas-control-plane/) | 10 | PaaS control plane: Pigsty, Grafana, Victoria stack, Alertmanager, etcd, CMDB |
| [12-paas-cluster](12-paas-cluster/) | 1 | PaaS cluster: Kubernetes overview |
| [13-paas-db](13-paas-db/) | 29 | PaaS DB: PostgreSQL, PGRDS, PGCAT, Mongo/FerretDB |
| [14-paas-cache](14-paas-cache/) | 3 | PaaS cache: Redis overview, cluster, instance |
| [22-bu-proxy](22-bu-proxy/) | 2 | Business unit proxy: Nginx and HAProxy |
| [24-bu-request](24-bu-request/) | 5 | Business unit request: logs, sessions, vector, request-side tooling |
| - | 1 | [homepage.json](homepage.json) - Platform engineering entry dashboard |
| Directory | Count | Description |
|-----------------|-------|-------------------------------------------------------------------------|
| [pgsql](pgsql/) | 29 | PostgreSQL cluster, instance, database, and query monitoring |
| [infra](infra/) | 11 | Infrastructure components (VictoriaMetrics, Grafana, Nginx, etcd, etc.) |
| [node](node/) | 8 | Host-level metrics (CPU, memory, disk, network, HAProxy, VIP) |
| [redis](redis/) | 3 | Redis cluster and instance monitoring |
| [app](app/) | 2 | Application dashboards (PostgreSQL logs analysis) |
| [minio](minio/) | 2 | MinIO S3-compatible storage monitoring |
| [mongo](mongo/) | 1 | MongoDB/FerretDB monitoring |
| - | 1 | [pigsty.json](pigsty.json) - Main home dashboard |
## Dashboard Catalog
### Home
- **[homepage.json](homepage.json)** - Platform engineering entry dashboard with domain summaries and navigation
- **[pigsty.json](pigsty.json)** - Pigsty home dashboard with global overview
### PGSQL Dashboards

View File

@ -10,51 +10,11 @@
#==============================================================#
import os, sys, json, requests
def env_flag(name, default):
value = os.environ.get(name)
if value is None:
return default
return value.lower() in ('1', 'true', 'yes', 'on')
# grafana access info
ENDPOINT = os.environ.get("GRAFANA_ENDPOINT", 'http://i.pigsty/ui')
USERNAME = os.environ.get("GRAFANA_USERNAME", 'admin')
PASSWORD = os.environ.get("GRAFANA_PASSWORD", 'pigsty')
CREATE_FOLDERS = env_flag('GRAFANA_CREATE_FOLDERS', True)
SKIP_SUBFOLDERS = env_flag('GRAFANA_SKIP_SUBFOLDERS', False)
FOLDER_TITLES = {
'01-iaas-compute': 'IAAS / 计算',
'02-iaas-storage': 'IAAS / 存储',
'03-iaas-network': 'IAAS / 网络',
'11-paas-control-plane': 'PaaS / 平台控制面',
'12-paas-cluster': 'PaaS / 集群',
'13-paas-db': 'PaaS / DB',
'14-paas-cache': 'PaaS / 缓存',
'15-paas-queue': 'PaaS / 队列',
'21-bu-dns': '业务单元 / DNS',
'22-bu-proxy': '业务单元 / 代理',
'23-bu-gateway': '业务单元 / 网关',
'24-bu-request': '业务单元 / 请求',
'25-bu-throughput': '业务单元 / 吞吐',
}
FOLDER_TAGS = {
'01-iaas-compute': ['IAAS', 'IAAS-COMPUTE'],
'02-iaas-storage': ['IAAS', 'IAAS-STORAGE'],
'03-iaas-network': ['IAAS', 'IAAS-NETWORK'],
'11-paas-control-plane': ['PAAS', 'PAAS-CONTROL-PLANE'],
'12-paas-cluster': ['PAAS', 'PAAS-CLUSTER'],
'13-paas-db': ['PAAS', 'PAAS-DB'],
'14-paas-cache': ['PAAS', 'PAAS-CACHE'],
'15-paas-queue': ['PAAS', 'PAAS-QUEUE'],
'21-bu-dns': ['BU', 'BU-DNS'],
'22-bu-proxy': ['BU', 'BU-PROXY'],
'23-bu-gateway': ['BU', 'BU-GATEWAY'],
'24-bu-request': ['BU', 'BU-REQUEST'],
'25-bu-throughput': ['BU', 'BU-THROUGHPUT'],
}
CREATE_FOLDERS = True
METADB_PASSWORD = 'DBUser.Viewer'
DEFAULT_DATASOURCES = {
@ -158,7 +118,7 @@ def add_folder(uid, title=""):
if not CREATE_FOLDERS:
return
if title == "":
title = resolve_folder_title(uid)
title = uid.upper()
post('folders', {"uid": uid, "title": title})
return put('folders/%s' % uid, {"title": title, "overwrite": True})
@ -252,30 +212,6 @@ def load_dashboard(path, substitute=False):
else:
return json.load(open(path))
def resolve_folder_title(uid):
return FOLDER_TITLES.get(uid, uid.upper())
def enrich_dashboard(dashboard, folder=None):
if not folder:
return dashboard
extra_tags = FOLDER_TAGS.get(folder, [])
if not extra_tags:
return dashboard
existing_tags = dashboard.get("tags", [])
if not isinstance(existing_tags, list):
existing_tags = []
merged_tags = []
seen = set()
for tag in existing_tags + extra_tags:
if not tag or tag in seen:
continue
seen.add(tag)
merged_tags.append(tag)
dashboard["tags"] = merged_tags
return dashboard
# json serializer: use compact_json if available, fallback to standard json
try:
from compact_json import Formatter
@ -335,7 +271,7 @@ def init_all(dashboard_dir):
if os.path.isfile(abs_path) and f.endswith('.json') and not f.startswith('.'):
print("init dashboard : %s" % f)
add_dashboard(load_dashboard(abs_path, True))
if os.path.isdir(abs_path) and not SKIP_SUBFOLDERS:
if os.path.isdir(abs_path):
folders.append((f, abs_path)) # folder name, abs path
home_uid = "home"
@ -347,13 +283,13 @@ def init_all(dashboard_dir):
# load other second-layer dashboards
for folder_name, folder_path in folders:
print("init folder %s" % folder_name)
add_folder(folder_name, resolve_folder_title(folder_name))
add_folder(folder_name, folder_name.upper())
for f in os.listdir(folder_path):
abs_path = os.path.join(dashboard_dir, folder_name, f)
if os.path.isfile(abs_path) and f.endswith('.json') and not f.startswith('.'):
print("init dashboard: %s / %s" % (folder_name, f))
add_dashboard(enrich_dashboard(load_dashboard(abs_path, True), folder_name), folder_name)
add_dashboard(load_dashboard(abs_path, True), folder_name)
def load_all(dashboard_dir):
@ -364,18 +300,18 @@ def load_all(dashboard_dir):
if os.path.isfile(abs_path) and f.endswith('.json') and not f.startswith('.'):
print("load dashboard : %s" % f)
add_dashboard(load_dashboard(abs_path))
if os.path.isdir(abs_path) and not SKIP_SUBFOLDERS:
if os.path.isdir(abs_path):
folders.append((f, abs_path)) # folder name, abs path
for folder_name, folder_path in folders:
print("add folder %s" % folder_name)
add_folder(folder_name, resolve_folder_title(folder_name))
add_folder(folder_name, folder_name.upper())
for f in os.listdir(folder_path):
abs_path = os.path.join(dashboard_dir, folder_name, f)
if os.path.isfile(abs_path) and f.endswith('.json') and not f.startswith('.'):
print("load dashboard: %s / %s" % (folder_name, f))
add_dashboard(enrich_dashboard(load_dashboard(abs_path), folder_name), folder_name)
add_dashboard(load_dashboard(abs_path), folder_name)
def dump_all(dashboard_dir):

File diff suppressed because one or more lines are too long

View File

@ -103,13 +103,4 @@
# - add_logs : register infra as vector logging source
# - add_ds : register infra victoria stack as grafana datasource
#--------------------------------------------------------------#
# Mixed Existing-Host Deployment
#--------------------------------------------------------------#
# Center service example:
# ./infra.yml -l us-xhttp.svc.plus \
# -e infra_domain=observability.svc.plus \
# -e 'infra_portal={\"home\":{\"domain\":\"observability.svc.plus\"}}' \
# -e caddy_enabled=true \
# -e nginx_enabled=false
#--------------------------------------------------------------#
...
...

433
merge_dashboards.py Normal file → Executable file
View File

@ -1,357 +1,122 @@
import copy
import json
CONTROL_PLANE_PATH = "files/grafana/11-paas-control-plane/pigsty.json"
OUTPUT_PATH = "files/grafana/homepage.json"
VISIBLE_VARS = [
{
"name": "version",
"type": "constant",
"query": "v4.0.0",
"hide": 2,
},
{
"name": "origin_prometheus",
"label": "数据源",
"type": "query",
"datasource": {"uid": "ds-prometheus"},
"query": "label_values(kube_node_info,origin_prometheus)",
"refresh": 1,
},
{
"name": "interval",
"label": "采样间隔",
"type": "interval",
"query": "3m,5m,10m,30m,1h,6h,12h,1d",
},
]
DOMAIN_SECTIONS = [
{
"title": "IAAS资源",
"items": [
{
"title": "计算",
"description": "主机容量、节点健康、实例告警",
"folder_uid": "01-iaas-compute",
"folder_title": "IAAS / 计算",
"tag": "IAAS-COMPUTE",
"highlights": ["Node Overview", "Node Instance", "Node Alert"],
"dash_height": 9,
},
{
"title": "存储",
"description": "磁盘、卷、对象存储、JuiceFS",
"folder_uid": "02-iaas-storage",
"folder_title": "IAAS / 存储",
"tag": "IAAS-STORAGE",
"highlights": ["Node Disk", "MinIO Overview", "Node JuiceFS"],
"dash_height": 9,
},
{
"title": "网络",
"description": "VIP、节点网络、底层连通性",
"folder_uid": "03-iaas-network",
"folder_title": "IAAS / 网络",
"tag": "IAAS-NETWORK",
"highlights": ["Node VIP"],
"dash_height": 8,
},
],
},
{
"title": "PaaS服务",
"items": [
{
"title": "平台控制面",
"description": "Grafana、Victoria、Alertmanager、Etcd、CMDB",
"folder_uid": "11-paas-control-plane",
"folder_title": "PaaS / 平台控制面",
"tag": "PAAS-CONTROL-PLANE",
"highlights": ["Infra Overview", "Victoria Metrics", "Alert Manager"],
"dash_height": 10,
},
{
"title": "集群",
"description": "K8S 集群资源、命名空间与工作负载入口",
"folder_uid": "12-paas-cluster",
"folder_title": "PaaS / 集群",
"tag": "PAAS-CLUSTER",
"highlights": ["K8S Dashboard"],
"dash_height": 8,
},
{
"title": "DB",
"description": "PGSQL、PGRDS、PGCAT、Ferret",
"folder_uid": "13-paas-db",
"folder_title": "PaaS / DB",
"tag": "PAAS-DB",
"highlights": ["PGSQL Overview", "PGSQL Cluster", "PGCAT Instance"],
"dash_height": 14,
},
{
"title": "缓存",
"description": "Redis 集群、实例与缓存服务运行面",
"folder_uid": "14-paas-cache",
"folder_title": "PaaS / 缓存",
"tag": "PAAS-CACHE",
"highlights": ["Redis Overview", "Redis Cluster"],
"dash_height": 9,
},
],
},
{
"title": "业务监控",
"items": [
{
"title": "代理",
"description": "Nginx、HAProxy 与流量接入层",
"folder_uid": "22-bu-proxy",
"folder_title": "业务单元 / 代理",
"tag": "BU-PROXY",
"highlights": ["Nginx Instance", "Node HAProxy"],
"dash_height": 8,
},
{
"title": "请求",
"description": "请求日志、会话、链路与请求级观测",
"folder_uid": "24-bu-request",
"folder_title": "业务单元 / 请求",
"tag": "BU-REQUEST",
"highlights": ["PGLOG Overview", "Logs Instance", "Node Vector"],
"dash_height": 9,
},
],
},
]
def shift_panel(panel, delta_y):
panel["gridPos"]["y"] += delta_y
for nested in panel.get("panels", []):
shift_panel(nested, delta_y)
def clone_panel(panel, x, y, w=None, h=None):
cloned = copy.deepcopy(panel)
cloned["gridPos"] = {
"x": x,
"y": y,
"w": w if w is not None else panel["gridPos"]["w"],
"h": h if h is not None else panel["gridPos"]["h"],
}
return cloned
def make_text_panel(panel_id, title, html, x, y, w, h, transparent=True):
return {
"id": panel_id,
"type": "text",
"title": title,
"gridPos": {"h": h, "w": w, "x": x, "y": y},
"transparent": transparent,
"options": {"content": html, "mode": "html"},
}
def make_row_panel(panel_id, title, y):
return {
"id": panel_id,
"type": "row",
"title": title,
"collapsed": False,
"panels": [],
"gridPos": {"h": 1, "w": 24, "x": 0, "y": y},
}
def make_dashlist_panel(panel_id, title, tags, x, y, w, h, max_items=12):
return {
"id": panel_id,
"type": "dashlist",
"title": title,
"pluginVersion": "12.3.0",
"gridPos": {"h": h, "w": w, "x": x, "y": y},
"options": {
"includeVars": True,
"keepTime": True,
"maxItems": max_items,
"query": "",
"showFolderNames": False,
"showHeadings": False,
"showRecentlyViewed": False,
"showSearch": False,
"showStarred": False,
"tags": tags,
},
}
def summary_card_html(item):
highlights = "".join(
f"<li style='margin:0 0 4px 18px;'>{highlight}</li>"
for highlight in item["highlights"]
)
return f"""
<div style="border:1px solid #d1d5db;border-radius:16px;padding:14px 16px;background:#fbfdff;height:100%;">
<div style="font-size:12px;color:#6b7280;margin-bottom:6px;">{item['folder_title']}</div>
<div style="font-size:20px;font-weight:800;color:#111827;margin-bottom:8px;">{item['title']}</div>
<div style="font-size:13px;line-height:1.5;color:#4b5563;">{item['description']}</div>
<ul style="margin:10px 0 12px 0;padding:0;color:#111827;font-size:13px;line-height:1.45;">{highlights}</ul>
<div style="display:inline-block;padding:8px 12px;border-radius:999px;background:#e5e7eb;color:#374151;font-size:12px;font-weight:700;">
右侧保留可跳转目录
</div>
</div>
"""
def homepage_nav_html():
return """
<div style="padding:6px 2px 0 2px;">
<div style="display:flex;justify-content:space-between;align-items:flex-end;gap:14px;flex-wrap:wrap;margin-bottom:10px;">
<div>
<div style="font-size:11px;color:#6b7280;margin-bottom:4px;">Platform Engineering Home</div>
<div style="font-size:24px;font-weight:800;color:#111827;line-height:1.15;">平台工程总览入口</div>
<div style="font-size:12px;color:#4b5563;margin-top:4px;line-height:1.45;"> IaaSPaaSSaaS 逐层下钻首页只保留入口与全局脉搏</div>
</div>
<div style="font-size:11px;color:#94a3b8;font-weight:700;letter-spacing:0.04em;">IaaS PaaS SaaS</div>
</div>
<div style="display:grid;grid-template-columns:repeat(3,minmax(0,1fr));gap:10px;">
<div style="border:1px solid #c7d2fe;border-radius:999px;padding:12px 18px;background:#eef4ff;min-height:0;display:flex;align-items:center;justify-content:center;">
<div style="text-align:center;">
<div style="font-size:26px;color:#1d4ed8;font-weight:800;line-height:1.1;">IaaS资源</div>
<div style="font-size:12px;color:#5b6b91;margin-top:4px;">计算 / 存储 / 网络</div>
</div>
</div>
<div style="border:1px solid #bbf7d0;border-radius:999px;padding:12px 18px;background:#effdf4;min-height:0;display:flex;align-items:center;justify-content:center;">
<div style="text-align:center;">
<div style="font-size:26px;color:#047857;font-weight:800;line-height:1.1;">PaaS服务</div>
<div style="font-size:12px;color:#537566;margin-top:4px;">控制面 / 集群 / DB / 缓存</div>
</div>
</div>
<div style="border:1px solid #fed7aa;border-radius:999px;padding:12px 18px;background:#fff7ed;min-height:0;display:flex;align-items:center;justify-content:center;">
<div style="text-align:center;">
<div style="font-size:26px;color:#c2410c;font-weight:800;line-height:1.1;">业务监控</div>
<div style="font-size:12px;color:#8a6b53;margin-top:4px;">代理 / 请求</div>
</div>
</div>
</div>
</div>
"""
def select_platform_summary_panels(control_plane):
wanted = ["Pigsty ${version}", "Modules", "Instances", "Firing Alerts"]
by_title = {panel.get("title"): panel for panel in control_plane.get("panels", [])}
return [by_title[title] for title in wanted if title in by_title]
def add_domain_section(homepage, start_id, current_y, section):
panel_id = start_id
homepage["panels"].append(make_row_panel(panel_id, section["title"], current_y))
panel_id += 1
current_y += 1
width = 24 // len(section["items"])
summary_height = 5
max_dash_height = max(item["dash_height"] for item in section["items"])
for index, item in enumerate(section["items"]):
x = width * index
homepage["panels"].append(
make_text_panel(
panel_id,
f"{item['title']}摘要",
summary_card_html(item),
x,
current_y,
width,
summary_height,
)
)
panel_id += 1
current_y += summary_height
for index, item in enumerate(section["items"]):
x = width * index
homepage["panels"].append(
make_dashlist_panel(
panel_id,
f"{item['title']}目录",
[item["tag"]],
x,
current_y,
width,
item["dash_height"],
max_items=20,
)
)
panel_id += 1
current_y += max_dash_height
return panel_id, current_y
import re
import os
def merge_dashboards():
with open(CONTROL_PLANE_PATH, "r") as handle:
control_plane = json.load(handle)
# Paths to source dashboards
pig_path = 'files/grafana/pigsty.json'
node_path = 'files/grafana/node.json'
k8s_path = 'files/grafana/k8s.json'
output_path = 'files/grafana/homepage.json'
# Read raw contents
with open(pig_path, 'r') as f:
pig_raw = f.read()
with open(node_path, 'r') as f:
node_raw = f.read()
with open(k8s_path, 'r') as f:
k8s_raw = f.read()
# Perform fixed variable mapping for node.json
# $name -> $hostname, $instance -> $node, $show_name -> $show_hostname
node_raw = re.sub(r'\$name\b', '$hostname', node_raw)
node_raw = re.sub(r'\$\{name\}', '${hostname}', node_raw)
node_raw = re.sub(r'\$instance\b', '$node', node_raw)
node_raw = re.sub(r'\$\{instance\}', '${node}', node_raw)
node_raw = re.sub(r'\$show_name\b', '$show_hostname', node_raw)
node_raw = re.sub(r'\$\{show_name\}', '${show_hostname}', node_raw)
pig = json.loads(pig_raw)
node = json.loads(node_raw)
k8s = json.loads(k8s_raw)
# Base dashboard
homepage = {
"annotations": control_plane.get("annotations", {"list": []}),
"description": "Platform engineering entry dashboard",
"annotations": pig.get("annotations", {"list": []}),
"description": "Pigsty Consolidated Homepage",
"editable": True,
"graphTooltip": 0,
"id": None,
"links": control_plane.get("links", []),
"links": pig.get("links", []),
"panels": [],
"schemaVersion": 39,
"tags": ["HOME", "Platform"],
"templating": {"list": VISIBLE_VARS},
"time": control_plane.get("time", {"from": "now-1h", "to": "now"}),
"timepicker": control_plane.get("timepicker", {}),
"tags": ["HOME", "Pigsty"],
"templating": {"list": []},
"time": pig.get("time", {"from": "now-1h", "to": "now"}),
"timepicker": pig.get("timepicker", {}),
"timezone": "browser",
"title": "Homepage",
"uid": "home",
"version": 1,
"version": 1
}
panel_id = 1
homepage["panels"].append(
make_text_panel(panel_id, "总览导航", homepage_nav_html(), 0, 0, 24, 5)
)
panel_id += 1
current_y = 5
homepage["panels"].append(make_row_panel(panel_id, "平台脉搏", current_y))
panel_id += 1
current_y += 1
summary_layout = [
("Pigsty ${version}", 0, 6, 4, 6),
("Modules", 4, 6, 4, 6),
("Instances", 8, 6, 8, 6),
("Firing Alerts", 16, 6, 8, 6),
# Unified Variables
unified_vars = [
{"name": "version", "type": "constant", "query": "v4.0.0", "hide": 2},
{"name": "origin_prometheus", "label": "数据源", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(kube_node_info,origin_prometheus)", "refresh": 1},
{"name": "Node", "label": "节点", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(kube_node_info{origin_prometheus=~\"$origin_prometheus\"},node)"},
{"name": "NameSpace", "label": "命名空间", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(kube_namespace_created{origin_prometheus=~\"$origin_prometheus\"},namespace)"},
{"name": "Container", "label": "微服务(容器名)", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(kube_pod_container_info{origin_prometheus=~\"$origin_prometheus\",namespace=~\"$NameSpace\"},container)"},
{"name": "Pod", "label": "Pod", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(kube_pod_container_info{origin_prometheus=~\"$origin_prometheus\",namespace=~\"$NameSpace\",container=~\"$Container\"},pod)"},
{"name": "job", "label": "JOB", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(node_uname_info{origin_prometheus=~\"$origin_prometheus\"},job)"},
{"name": "hostname", "label": "名称", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(node_uname_info{origin_prometheus=~\"$origin_prometheus\", job=~\"$job\"},nodename)"},
{"name": "node", "label": "IP", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(node_uname_info{origin_prometheus=~\"$origin_prometheus\", job=~\"$job\", nodename=~\"$hostname\"},instance)"},
{"name": "device", "label": "网卡", "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(node_network_info{origin_prometheus=~\"$origin_prometheus\", job=~\"$job\", instance=~\"$node\", device!~\"'tap.*|veth.*|br.*|docker.*|virbr.*|lo.*|cni.*'\"},device)"},
{"name": "interval", "label": "间隔", "type": "interval", "query": "3m,5m,10m,30m,1h,6h,12h,1d"},
{"name": "maxmount", "hide": 2, "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "query_result(topk(1,sort_desc(max(node_filesystem_size_bytes{origin_prometheus=~\"$origin_prometheus\",instance=~\"$node\",fstype=~\"ext.?|xfs\",mountpoint!~\".*pods.*\"}) by (mountpoint))))"},
{"name": "show_hostname", "hide": 2, "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "label_values(node_uname_info{origin_prometheus=~\"$origin_prometheus\", job=~\"$job\", nodename=~\"$hostname\", instance=~\"$node\"},nodename)"},
{"name": "total", "hide": 2, "type": "query", "datasource": {"uid": "ds-prometheus"}, "query": "query_result(count(node_uname_info{origin_prometheus=~\"$origin_prometheus\",job=~\"$job\"}))"}
]
summary_panels = {panel.get("title"): panel for panel in select_platform_summary_panels(control_plane)}
for title, x, y, w, h in summary_layout:
if title not in summary_panels:
continue
homepage["panels"].append(clone_panel(summary_panels[title], x, y, w, h))
panel_id += 1
current_y += 6
homepage["templating"]["list"] = unified_vars
for section in DOMAIN_SECTIONS:
panel_id, current_y = add_domain_section(homepage, panel_id, current_y, section)
current_y = 0
# 1. Infra
homepage["panels"].append({"collapsed": False, "gridPos": {"h": 1, "w": 24, "x": 0, "y": current_y}, "title": "Infra Overview", "type": "row", "panels": []})
current_y += 1
infra_max_y = current_y
for p in pig.get("panels", []):
if p.get("type") == "row": continue
# Replace "Apps" panel with "insight Overview" link
if p.get("title") == "Apps":
p["title"] = "insight Overview"
p["type"] = "text"
p["options"] = {
"content": "<div style='text-align: center; padding-top: 10px;'><a href='https://observability.svc.plus/insight/' style='font-size: 18px; color: #58a6ff; font-weight: bold;'>insight Overview</a></div>",
"mode": "html"
}
p["gridPos"]["y"] += current_y
homepage["panels"].append(p)
infra_max_y = max(infra_max_y, p["gridPos"]["y"] + p["gridPos"]["h"])
current_y = infra_max_y
for index, panel in enumerate(homepage["panels"], 1):
panel["id"] = index
# 2. Node
homepage["panels"].append({"collapsed": False, "gridPos": {"h": 1, "w": 24, "x": 0, "y": current_y}, "title": "Node", "type": "row", "panels": []})
current_y += 1
node_max_y = current_y
for p in node.get("panels", []):
p["gridPos"]["y"] += current_y
homepage["panels"].append(p)
node_max_y = max(node_max_y, p["gridPos"]["y"] + p["gridPos"]["h"])
current_y = node_max_y
with open(OUTPUT_PATH, "w") as handle:
json.dump(homepage, handle, indent=2)
# 3. K8S
homepage["panels"].append({"collapsed": False, "gridPos": {"h": 1, "w": 24, "x": 0, "y": current_y}, "title": "K8S Cluster", "type": "row", "panels": []})
current_y += 1
k8s_max_y = current_y
for p in k8s.get("panels", []):
p["gridPos"]["y"] += current_y
homepage["panels"].append(p)
k8s_max_y = max(k8s_max_y, p["gridPos"]["y"] + p["gridPos"]["h"])
current_y = k8s_max_y
for i, p in enumerate(homepage["panels"]):
p["id"] = i + 1
with open(output_path, 'w') as f:
json.dump(homepage, f, indent=2)
if __name__ == "__main__":
merge_dashboards()

View File

@ -32,11 +32,6 @@
# node.yml -l <cls> # add groups
# node.yml -l <ip> # add single node
#
# Observability push-agent mode:
# ./node.yml -l openclaw.svc.plus,jp-xhttp.svc.plus \
# -e node_monitor_mode=push \
# -e observability_endpoint=https://observability.svc.plus/ \
#
# Bootstrap with another admin user: (Create admin with another admin)
# node.yml -t node_admin # create admin user for nodes
# node.yml -t node_admin -k -K -e ansible_user=<another admin>
@ -117,4 +112,4 @@
# - vector_config
# - vector_launch
#---------------------------------------------------------------
...
...

View File

@ -1,27 +0,0 @@
# Role: deepflow_agent
Deploy DeepFlow agent in one of three modes:
- `binary + systemd`
- `docker`
- `k8s` manifest rendering
## Key Variables
- `deepflow_agent_mode` (`binary`, `docker`, `k8s`)
- `deepflow_agent_profile` (`lite`, `full`)
- `deepflow_agent_grpc_endpoint`
- `deepflow_agent_download_url`
- `deepflow_agent_binary_path`
## Default Lightweight Profile
The default `lite` profile keeps `pcap` enabled and disables:
- built-in `vector`
- other optional non-core plugins
## Notes
- `k8s` mode renders a DaemonSet manifest and only applies it when `deepflow_agent_k8s_apply: true`
- `docker` mode requires `docker_enabled: true`

View File

@ -1,41 +0,0 @@
---
#-----------------------------------------------------------------
# DEEPFLOW AGENT
#-----------------------------------------------------------------
deepflow_agent_enabled: false
deepflow_agent_mode: binary # binary|docker|k8s
deepflow_agent_profile: lite # lite|full
deepflow_agent_stack_dir: /opt/deepflow-agent
deepflow_agent_env_file: /etc/default/deepflow-agent
deepflow_agent_compose_file: "{{ deepflow_agent_stack_dir }}/docker-compose.yml"
deepflow_agent_k8s_file: "{{ deepflow_agent_stack_dir }}/deepflow-agent.yaml"
deepflow_agent_run_script: /usr/local/bin/run-deepflow-agent.sh
deepflow_agent_binary_path: /usr/local/bin/deepflow-agent
deepflow_agent_download_url: ''
deepflow_agent_image: deepflowio/deepflow-agent-ce:latest
deepflow_agent_grpc_endpoint: "{{ deepflow_grpc_domain | default('deepflow-agent.svc.plus') }}:443"
deepflow_agent_endpoint_arg: --controller-ips
deepflow_agent_extra_args: []
deepflow_agent_disable_pcap: false
deepflow_agent_disable_vector: true
deepflow_agent_disable_plugins: true
deepflow_agent_extra_env: {}
deepflow_agent_host_network: true
deepflow_agent_container_name: deepflow-agent
deepflow_agent_k8s_namespace: deepflow
deepflow_agent_k8s_apply: false
deepflow_agent_binary_install: true
deepflow_agent_docker_enabled: true
deepflow_agent_cap_add:
- NET_ADMIN
- NET_RAW
- SYS_ADMIN
deepflow_agent_volume_mounts:
- /:/host:ro
- /sys:/sys:ro
- /var/run/docker.sock:/var/run/docker.sock

View File

@ -1,7 +0,0 @@
galaxy_info:
author: observability.svc.plus
description: Deploy DeepFlow agent via binary/systemd, Docker, or Kubernetes manifests
license: Apache-2.0
min_ansible_version: '2.10'
dependencies: []

Some files were not shown because too many files have changed in this diff Show More