Prometheus监控系统简介及服务端/客户端部署方法-技术学堂

Prometheus是由SoundCloud开发的开源监控报警系统和时序列数据库(TSDB)，使用Go语言开发，比Heapster功能更完善、更全面，其性能也足够支撑上万台规模的集群。

原理

Prometheus的基本原理是通过HTTP协议周期性抓取被监控组件的状态，任意组件只要提供对应的HTTP接口就可以接入监控，不需要任何SDK或者其他的集成过程，适合于做虚拟化环境监控系统，比如VM、Docker、Kubernetes等。

输出被监控组件信息的HTTP接口称为exporter ，目前常见的应用基本都有可以直接使用的exporter，比如Varnish、Haproxy、Nginx、MySQL、Linux系统信息等。

组件

服务端

Prometheus服务端以一个进程方式启动，直接执行./prometheus脚本启动，默认监听9090端口。

每次采集到的数据叫做metrics，这些采集到的数据会先存放在内存中，然后定期再写入硬盘，如果服务重新启动的话会将硬盘数据写回到内存中，所以对内存有一定消耗。

Prometheus不需要重视历史数据，所以默认只会保留15天的数据。

客户端

Prometheus的客户端分为pull和push两种方式。

方式	说明
Pull方式	服务端主动向客户端拉取数据，需要客户端上安装exporters(导出器)作为守护进程。官网提供大量exporters下载，比如使用最多的node_exporters，它几乎可以采集全部系统数据，默认监听9100端口。
Push方式	客户端安装pushgateway插件，而且需要运维人员用脚本把监控数据组织成键值形式提交给pushgateway，再由它提交给服务端。它适合于当exporters无法满足需求时的定制化监控。

metrics主要数据类型

数据类型	说明
Gauges	最简单、使用最多的指标，获取一个返回值，这个返回值没有变化规律，不能肯定它一定是增长或是减少的状态，采集回来是多少就是多少。比如硬盘容量、CPU内存使用率都适合使用Gauges数据类型。
Counters	计数器，数据从0开始累计，理想状态下应该是永远增长或者是不变。适合统计机器开机时间、HTTP访问量等。
Histograms	同summary一样属于高级指标，用于统计数据的分布情况，比如最小值、最大值、中间值。比如说统计一天的日志，Histograms可以分别统计出全部用户的响应时间，比如0-1秒的用户数量、1-2秒的用户数量。

部署服务端

下载

访问官方网站，点击此处访问下载页面。

wget https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz
tar zxvf prometheus-2.27.1.linux-amd64.tar.gz
mv prometheus-2.27.1.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus

配置

Prometheus默认的配置文件为prometheus.yml，以下是一个配置文件示例：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'node-exporter'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090','10.10.200.202:9100']
    relabel_configs:
    - source_labels:  '['  [, ...] ']'
      separator:  | default = ;
      target_label:
      regex:  | default = (.*)
      modulus:
      replacement:  | default = $1
      action:  | default = replace

  - job_name: 'WEBSERVER'
    file_sd_configs:
      - files: ['./hosts-webserver.json']

示例配置中的文件hosts-webserver.json的内容：

[
{
"targets": [
  "10.10.200.201:9100",
  "10.10.200.202:9100",
  "10.10.200.203:9100"
],
"labels": {
    "service": "webserver_node"
    }
},
{
"targets": [
  "10.10.200.204:9100",
  "10.10.200.205:9100",
],
"labels": {
    "service": "webapi_node"
    }
}
]

模块	说明
global	控制Prometheus服务器的全局配置
	scrape_interval	控制Prometheus多久刷新一次目标，也可以针对单个目标进行设置
	evaluation_interval	控制Prometheus多久评估一次规则，通过规则来创建新的时间序列并生成警报
rule_files	指定Prometheus加载其他规则的位置
scrape_configs	控制Prometheus监视的资源，主要用于配置拉取数据节点
	job_name	任务名称
	honor_labels	用于解决拉取数据标签有冲突，当设置为 true, 以拉取数据为准，否则以服务配置为准
	params	数据拉取访问时带的请求参数
	scrape_interval	拉取时间间隔
	scrape_timeout	拉取超时时间
	metrics_path	拉取节点的 metric 路径
	scheme	拉取数据访问协议
	sample_limit	存储的数据标签个数限制，如果超过限制，该数据将被忽略，不入存储；默认值为0，表示没有限制
	relabel_configs	拉取数据重置标签配置
	metric_relabel_configs	metric 重置标签配置

发现

在Prometheus的配置中，一个最重要的概念就是数据源target，而数据源的配置主要分为静态配置和动态发现，大致为以下几类：

发现类型	说明
static_configs	静态服务发现
dns_sd_configs	DNS 服务发现
file_sd_configs	文件服务发现
consul_sd_configs	Consul 服务发现
serverset_sd_configs	Serverset 服务发现
nerve_sd_configs	Nerve 服务发现
marathon_sd_configs	Marathon 服务发现
kubernetes_sd_configs	Kubernetes 服务发现
gce_sd_configs	GCE 服务发现
ec2_sd_configs	EC2 服务发现
openstack_sd_configs	OpenStack 服务发现
azure_sd_configs	Azure 服务发现
triton_sd_configs	Triton 服务发现

启动

# 建议启动前检查配置文件
./promtool check config prometheus.yml

# 默认启动
./prometheus
# 后台启动
./prometheus &
# 指定启动参数
./prometheus --web.enable-lifecycle --config.file="prometheus.yml" --storage.tsdb.path="/data/prometheus/"

# 热加载方式一：直接用kill命令发送HUP信号
kill -HUP `pidof prometheus`
# 热加载方式二：开启web.enable-lifecycle选项后使用curl命令
curl -X POST http://localhost:9090/-/reload

启动参数及说明：

参数示例	说明
--config.file="/etc/prometheus.yml"	指定配置文件，默认为prometheus.yml
--web.listen-address="0.0.0.0:9090"	指定服务端口，默认为9090
--storage.tsdb.path="/data/prom"	指定数据存储路径，默认为data/
--storage.tsdb.retention.time=15d	数据过期清理时间，默认为15天
--collector.systemd	开启服务状态监控，WEB显示相关监控项
--collector.systemd.unit-whitelist=(sshd\|nginx).servicer	指定要监控的服务名
--storage.tsdb.wal-compression	启用预写日志(WAL)压缩
--web.enable-lifecycle	开启热加载配置

查询

访问http://$host_ip:9090登录WEB管理界面即可查询监控数据，查询数据有统一的语法，根据metric_name指标名称和label标签进行过滤和聚合，基本语法是：

<metric name>{<labelname>=<lablevalue>,...}[time]

举几个例子(更多实例请前往Prometheus官网)：

# count指标 #
# 两分钟内cpu的增长率
increase(node_cpu[2m]) / 120
# 两分钟内平均增长率，出现"长尾问题"：某个瞬时cpu100%时无法体现
rate(node_cpu[2m])
# 两分钟内瞬时增长率
irate(node_cpu[2m])
# 通过job和handler查询所有时间内的http请求量
http_requests_total{job="apiserver", handler="/api/comments"}
# 通过job和handler查询5分钟内的http请求量
http_requests_total{job="apiserver", handler="/api/comments"}[5m]
# 查询符合job正则的数据
http_requests_total{job=~".*server"}
# 排除符合status正则的数据
http_requests_total{status!~"4.."}
# 返回30分钟内的每5分钟的http请求量，分辨率为1分钟
rate(http_requests_total[5m])[30m:1m]
# 嵌套查询：deriv函数的子查询使用默认分辨率
max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])

# aggregation聚合 #
# 不同pod的内存rss
sum(container_memory_rss{instance="10.51.1.126:10250"}) by (pod_name)
# http的请求总量
sum(http_requests_total)
# 获取前5的请求量
topk(5,http_request_total)
# 按handler查询前5的http请求量
sum by (handler)(topk(5,http_request_total))
# 累加5分钟内的http每秒请求速率
sum by (job) (
  rate(http_requests_total[5m])
)
# 对两个具有相同维度的数据进行运算：最大内存和已用内存，单位是Mb
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024
# 按app累加
sum by (app, proc) (
  instance_memory_limit_bytes - instance_memory_usage_bytes
) / 1024 / 1024

# 动态Label替换 #
# 正则产生新label
label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string)    
label_replace(up, "host", "$1", "instance",  "(.*):.*")
up{host="localhost",instance="localhost:8080",job="cadvisor"}  # 增加host标签
# 连接产生新label
label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)
label_join(up,"info","&","instance","job")
up{instance="localhost:8080",job="cadvisor",info="localhost:8080&cadvisor"}

查询参数说明：

查询参数	说明
labelname=value	选择标签满足表达式定义的时间序列
labelname!=value	根据标签匹配排除时间序列
labelname=~regx	选择标签符合正则表达式定义的时间序列
labelname!~regx	排除符合正则表达式定义的时间序列
http_request_total{}[5m]	选择最近5分钟内的所有样本数据，单位s,m,h,d,w,y
http_request_total{}[1d] offset 1d	以当前时间为基准，查询其他区间数据使用位移操作offset

部署客户端

客户端一般是指exporter，即导出器，分为两类：一类是内置支持，比如K8S、ETCD等；另一类是间接采集，比如node_exporter、haproxy_exporter、mysqld_exporter等，在Prometheus下载页面均有提供。

安装

本文以安装node_exporter为例，该导出器通过读取/proc下的文件获取系统运行状态。在官网下载好二进制包以后，解压直接启动，需要开启额外的监控选项时指定相应参数即可。

# 以node_exporter为例
wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz
tar zxvf node_exporter-1.1.2.linux-amd64.tar.gz
mv node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter
cd /usr/local/node_exporter

# 默认启动
./node_exporter
# 后台启动
./node_exporter &
# 打开磁盘状态相关监控
./node_exporter --collector-diskstats

查询

通过curl命令访问服务可以看到返回很多键值和数据，一些常用的key有：

node_cpu ：CPU性能
node_disk* ：磁盘性能
node_load1 ：系统负载
node_memory* ：系统内存
node_network* ：网络带宽
node_filesystem*：文件系统

除了通过http://$host_ip:9100/metrics获取数据，还可以在WEB界面通过PromQL查询，支持标签匹配和条件过滤，如果在保证语法没有问题却查询不到数据的情况，请确认机器时间是否同步。

部署Grafana

Prometheus可以很好的和Grafana进行对接，通过Grafana实现更好的展示。

安装

登录官方下载页面查看是否有新版本。

# 源码安装
wget https://dl.grafana.com/oss/release/grafana-7.5.7.linux-amd64.tar.gz
tar -zxvf grafana-7.5.7.linux-amd64.tar.gz
mv grafana-7.5.7 /usr/local/grafana
cd /usr/local/grafana/bin

# 启动
./grafana-server
# 仅以Web方式后台启动
./grafana-server web &
# RPM安装
wget https://dl.grafana.com/oss/release/grafana-7.5.7-1.x86_64.rpm
yum install grafana-7.5.7-1.x86_64.rpm

启动后即可访问到Grafana的WEB页面http://$host_ip:3000，默认账号是admin/admin，初次登录会要求修改密码。

配置数据源

登录Grafana后，按以下步骤添加Prometheus的数据源：

点击菜单"Configuration"
点击按钮"Data Sources"
点击按钮"Add data source"
选择数据源“Prometheus”
设置名称及Prometheus服务地址等
点击按钮“Save & Test”

至此，已经完成Prometheus监控系统基本服务，点击此处观看视频教程。

附加知识

时序数据库

时序数据库全称为时间序列数据库。时间序列数据库主要用于指处理带时间标签（按照时间的顺序变化，即时间序列化）的数据，带时间标签的数据也称为时间序列数据。主要用于存储周期性的采集各种实时监控信息。

目录结构

./data
├── 01BKGV7JBM69T2G1BGBGM6KB12
│ └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K
│ └── meta.json
├── 01BKGV7JC0RY8A6MACW02A2PJD
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
└── wal
├── 00000002
└── checkpoint.000001

在Prometheus的数据存储由许多相对独立的数据块block组成，每个block由一个目录组成，该目录里包含一个或者多个chunk文件（保存timeseries数据）、一个metadata文件、一个index文件（通过metric name和labels查找timeseries数据在chunk文件的位置）。

存储原理

Prometheus按2小时一个block进行存储，最新写入的数据保存在内存block中，达到2小时后写入磁盘。为了防止程序崩溃导致数据丢失，实现了WAL(write-ahead-log)机制，启动时会以写入日志(WAL)的方式来实现重播，从而恢复数据。

水平分区的优势：