HPA 使用自定义指标进行伸缩

Kubernetes 默认提供 CPU 和内存作为 HPA 弹性伸缩的指标,如果有更复杂的场景需求,比如基于业务单副本 QPS 大小来进行自动扩缩容,可以考虑自行安装 prometheus-adapter 来实现基于自定义指标的 Pod 弹性伸缩。

实现原理

Kubernetes 提供了 Custom Metrics APIExternal Metrics API 来对 HPA 的指标进行扩展,让用户能够根据实际需求进行自定义。

prometheus-adapter 对这两种 API 都有支持,通常使用 Custom Metrics API 就够了,本文也主要针对此 API 来实现使用自定义指标进行弹性伸缩。

前提条件

  • 部署有 Prometheus 并做了相应的自定义指标采集。
  • 已安装 helm

业务暴露监控指标

这里使用Promethues+SpringBoot结合的指标暴露服务,

该示例程序暴露了 httpserver_requests_total 指标,记录 HTTP 的请求,通过这个指标可以计算出该业务程序的 QPS 值。

部署业务程序

将前面的程序打包成容器镜像,然后部署到集群,比如使用 Deployment 部署:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
apiVersion: apps/v1
kind: Deployment
metadata:
name: metricdemoapp
namespace: metricdemoapp
spec:
replicas: 3
selector:
matchLabels:
app: metricdemoapp
template:
metadata:
labels:
app: metricdemoapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/actuator/prometheus"
prometheus.io/port: "http"
spec:
containers:
- name: metricdemoapp
image: registry.cn-hangzhou.aliyuncs.com/hardy_clouddo/metrics-demo-app:v1
imagePullPolicy: Always

---

apiVersion: v1
kind: Service
metadata:
name: metricdemoapp
namespace: metricdemoapp
labels:
app: metricdemoapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/actuator/prometheus"
prometheus.io/port: "http"
spec:
type: ClusterIP
ports:
- port: 80
protocol: TCP
name: http
selector:
app: metricdemoapp

Prometheus 采集业务监控

业务部署好了,我们需要让我们的 Promtheus 去采集业务暴露的监控指标。

方式一: 配置 Prometheus 采集规则

在 Promtheus 的采集规则配置文件添加采集规则(k8s部署的没找到promethues.yaml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- job_name: httpserver
scrape_interval: 5s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- httpserver
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_app
regex: httpserver
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: http

方式二: 配置 ServiceMonitor

若已安装 prometheus-operator,则可通过创建 ServiceMonitor 的 CRD 对象配置 Prometheus。示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: metricdemoapp
spec:
endpoints:
- port: http
interval: 5s
path: /actuator/prometheus
namespaceSelector:
matchNames:
- metricdemoapp
selector:
matchLabels:
app: metricdemoapp

安装 prometheus-adapter

我们使用 helm 安装 prometheus-adapter,安装前最重要的是确定并配置自定义指标,按照前面的示例,我们业务中使用 httpserver_requests_total 这个指标来记录 HTTP 请求,那么我们可以通过类似下面的 PromQL 计算出每个业务 Pod 的 QPS 监控:

1
sum(rate(http_requests_total[2m])) by (pod)

我们需要将其转换为 prometheus-adapter 的配置,准备一个 values.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
rules:
default: false
custom:
- seriesQuery: 'micro_req_total'
resources:
template: <<.Resource>>
name:
matches: "micro_req_total"
as: "httpserver_requests_qps" # PromQL 计算出来的 QPS 指标
metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)
prometheus:
url: http://10.1.13.113 # 替换 Prometheus API 的地址 (不写端口)
port: 31186

执行 helm 命令进行安装:

1
2
3
4
5
6
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Helm 3
helm install prometheus-adapter prometheus-community/prometheus-adapter -f values.yaml
# Helm 2
# helm install --name prometheus-adapter prometheus-community/prometheus-adapter -f values.yaml

测试是否安装正确

如果安装正确,是可以看到 Custom Metrics API 返回了我们配置的 QPS 相关指标:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "jobs.batch/httpserver_requests_qps",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "pods/httpserver_requests_qps",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/httpserver_requests_qps",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}

也能看到业务 Pod 的 QPS 值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/httpserver/pods/*/httpserver_requests_qps
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/httpserver/pods/%2A/httpserver_requests_qps"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "httpserver",
"name": "httpserver-6f94475d45-7rln9",
"apiVersion": "/v1"
},
"metricName": "httpserver_requests_qps",
"timestamp": "2020-11-17T09:14:36Z",
"value": "500m",
"selector": null
}
]
}

上面示例 QPS 为 500m,表示 QPS 值为 0.5

测试 HPA

假如我们设置每个业务 Pod 的平均 QPS 达到 50,就触发扩容,最小副本为 1 个,最大副本为1000,HPA 可以这么配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: httpserver
namespace: httpserver
spec:
minReplicas: 1
maxReplicas: 1000
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: httpserver
metrics:
- type: Pods
pods:
metric:
name: httpserver_requests_qps
target:
averageValue: 50
type: AverageValue

然后对业务进行压测,观察是否扩容:

1
2
3
4
5
6
7
8
9
10
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
httpserver Deployment/httpserver 83933m/50 1 1000 2 18h

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
httpserver-6f94475d45-47d5w 1/1 Running 0 3m41s
httpserver-6f94475d45-7rln9 1/1 Running 0 37h
httpserver-6f94475d45-6c5xm 0/1 ContainerCreating 0 1s
httpserver-6f94475d45-wl78d 0/1 ContainerCreating 0 1s

扩容正常则说明已经实现 HPA 基于业务自定义指标进行弹性伸缩。