observability.svc.plus/README.md
2026-03-14 20:40:39 +08:00

205 lines
7.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Observability.svc.plus
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-green.svg)](LICENSE)
[![Status: Stable](https://img.shields.io/badge/Status-Stable-blue)](https://svc.plus)
**Observability.svc.plus** is an observability solution strictly following the Apache 2.0 license.
> **Focus**: Monitoring & Observability (监控/可观测). Integrating OpenTelemetry (OTel), with future plans to incorporate DeepFlow Agent and other open-source NPM (Network Performance Monitoring) probes.
[Website](https://svc.plus/) | [Public Demo](https://svc.plus/services) | [Blog](https://svc.plus/blogs) | [Support](https://www.svc.plus/support)
[![banner](files/img/observability-banner.jpg)](https://observability.svc.plus)
## 1) 概述
Observability.svc.plus provides a monitoring-focused stack for infrastructure and applications, centered on metrics, logs, and traces. It is designed for self-hosted, cloud-neutral operations with minimal vendor lock-in.
## 2) 架构图
```mermaid
flowchart LR
A["Client Nodes<br/>node_exporter / process_exporter / vector"] --> B["OTel / Ingest Gateway"]
B --> C["Metrics Store<br/>VictoriaMetrics"]
B --> D["Logs Store<br/>Loki-compatible pipeline"]
B --> E["Traces / OTel pipeline"]
C --> F["Grafana"]
D --> F
E --> F
F --> G["Dashboard & Alerting"]
```
## 3) Start
当前推荐按“混合部署到已有主机”的方式执行。
1. 先更新 DNS`observability.svc.plus` 指到 `us-xhttp.svc.plus`
2.`us-xhttp.svc.plus` 上执行下面的 Server side 示例,部署中心端
3. 再到其他已有主机执行下面的 Client side 示例,把采集数据回传到 `observability.svc.plus`
当前接入主机:
- `us-xhttp.svc.plus`:继续承载现有服务,同时承载 `observability.svc.plus`
- `openclaw.svc.plus`:部署 agent采集后上报到中心端
- `jp-xhttp.svc.plus`:部署 agent采集后上报到中心端
### Ansible (Recommended)
#### Server side
先导出 Cloudflare Token然后在 `us-xhttp.svc.plus` 上执行服务端部署。`deploy_observability_service.yml` 会先把 Cloudflare 上的 `observability.svc.plus` 更新成指向 `us-xhttp.svc.plus` 的非代理记录,再等待公共 DNS 生效后继续部署,这样更容易保证 Caddy 首次自动签名成功。
```bash
export CLOUDFLARE_API_TOKEN=...
ansible-playbook -i <your-inventory> deploy_observability_service.yml -l us-xhttp.svc.plus
```
如果希望给 `/ingest/*` 增加一层基础认证,可以在服务端部署时一起打开:
```bash
export CLOUDFLARE_API_TOKEN=...
ansible-playbook -i <your-inventory> deploy_observability_service.yml -l us-xhttp.svc.plus \
-e observability_ingest_basic_auth_enabled=true \
-e observability_ingest_basic_auth_user=ingest \
-e observability_ingest_basic_auth_password='<strong-password>'
```
#### Client side (agent)
再到采集端主机执行 `node.yml` 的 push mode
```bash
ansible-playbook -i <your-inventory> node.yml \
-l openclaw.svc.plus,jp-xhttp.svc.plus \
-e node_monitor_mode=push \
-e observability_endpoint=https://observability.svc.plus/
```
如果服务端已开启 ingest 基本认证,采集端也要带上同一组凭据:
```bash
ansible-playbook -i <your-inventory> node.yml \
-l openclaw.svc.plus,jp-xhttp.svc.plus \
-e node_monitor_mode=push \
-e observability_endpoint=https://observability.svc.plus/ \
-e observability_ingest_basic_auth_enabled=true \
-e observability_ingest_basic_auth_user=ingest \
-e observability_ingest_basic_auth_password='<strong-password>'
```
> `node_monitor_mode=push` 会在远端主机上部署 `node_exporter + process_exporter + vector`,并把 metrics / logs 主动汇总到 `observability.svc.plus`。`vector` 固定归到采集端任务,服务端 `infra.yml` 不再默认部署它。
>
> 如果采集端与 Victoria 服务端同机playbook 会自动把 metrics / logs 改走本机 `127.0.0.1` ingest跨主机时默认走 `https://observability.svc.plus/` 并自动补全 `/ingest/metrics/api/v1/write` 和 `/ingest/logs/insert`。
>
> `observability_ingest_basic_auth_*` 只保护 `/ingest/*` 写入入口,不影响 Caddy 暴露的其他站点页面;服务端和采集端必须使用同一组认证信息。
### Script Installers
### Server side
```bash
curl -fsSL "https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/server-install.sh?$(date +%s)" | bash -s -- observability.svc.plus
```
### Client side (agent)
```bash
# bash -s -- --endpoint <YOUR_ENDPOINT>
curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- --endpoint https://observability.svc.plus/ingest/otlp
```
> **Note**
> - `--endpoint` supports both:
> - `https://observability.svc.plus`
> - `https://observability.svc.plus/ingest/otlp`
> - The installer auto-derives:
> - metrics endpoint: `/ingest/metrics/api/v1/write`
> - logs endpoint: `/ingest/logs/insert`
> - The script automatically verifies installation after setup.
### Optional: DeepFlow Agent on Client
If you have deployed DeepFlow with `deepflow.yml`, you can install `deepflow-agent` on client nodes via the same script:
```bash
# example: endpoint exposed by caddy grpc ingress (deepflow_grpc_domain:443)
curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- \
--endpoint https://observability.svc.plus/ingest/otlp \
--deepflow-agent \
--deepflow-grpc-endpoint deepflow-agent.svc.plus:443 \
--deepflow-agent-download-url https://example.com/path/to/deepflow-agent
```
> If `deepflow-agent` binary already exists on host, replace `--deepflow-agent-download-url` with `--deepflow-agent-bin /path/to/deepflow-agent`.
## 🚀 DeepFlow Deployment (Server Side)
This repo now provides dedicated DeepFlow roles:
- `deepflow_mysql`
- `deepflow_clickhouse_s3`
- `deepflow_server`
Quick start:
```bash
./configure -c app/deepflow
vi pigsty.yml # adjust domain/password/ports
./deploy.yml
./docker.yml
./deepflow.yml
./infra.yml -t caddy # apply deepflow_grpc_domain ingress
```
Default inventory template: `conf/app/deepflow.yml`
### Remote client example (openclaw.svc.plus)
```bash
ssh root@openclaw.svc.plus \
'curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- --endpoint https://observability.svc.plus/ingest/otlp'
```
### Remote client example (jp-xhttp.svc.plus)
```bash
ssh root@jp-xhttp.svc.plus \
'curl -fsSL https://raw.githubusercontent.com/cloud-neutral-toolkit/observability.svc.plus/main/scripts/agent-install.sh \
| bash -s -- --endpoint https://observability.svc.plus/ingest/otlp'
```
### Optional SSH manager env example
```bash
SSH_SERVER_CLAWBOT_HOST=openclaw.svc.plus
SSH_SERVER_CLAWBOT_USER=root
SSH_SERVER_CLAWBOT_KEYPATH=~/.ssh/id_rsa
SSH_SERVER_CLAWBOT_PORT=22
SSH_SERVER_CLAWBOT_DESCRIPTION=openclaw_server
```
## 4) Features
- **Observability First**: SOTA monitoring for PG / Infra / Node based on VictoriaMetrics, Grafana, and OpenTelemetry.
- **OTel Integration**: Native support for OpenTelemetry, facilitating unified trace, metric, and log ingestion.
- **Future Ready**: Planned integration for DeepFlow Agent and other open-source NPM probes for deep network and application observability.
- **Reliable Base**: Robust self-healing HA clusters, PITR, and secure infrastructure.
- **Maintainable**: One-Cmd Deploy, IaC support, and easy customization.
- **Controllable**: Self-sufficient Cloud Neutral FOSS. Run on bare Linux.
## 5) License & Upstream
- **License**: [Apache-2.0](LICENSE)
- **Upstream references**:
- [Pigsty](https://github.com/pgsty/pigsty)
- [OpenTelemetry](https://opentelemetry.io/)
- [VictoriaMetrics](https://victoriametrics.com/)
- [Grafana](https://grafana.com/)
## 6) 致谢
感谢开源社区与所有贡献者,特别是 observability、database、DevOps 相关项目维护者与实践者。