Kubernetes 默认提供 CPU 和内存作为 HPA 弹性伸缩的指标,如果有更复杂的场景需求,比如基于业务单副本 QPS 大小来进行自动扩缩容,可以考虑自行安装 prometheus-adapter 来实现基于自定义指标的 Pod 弹性伸缩。
Kubernetes 提供了 Custom Metrics API 与 External Metrics API 来对 HPA 的指标进行扩展,让用户能够根据实际需求进行自定义。
prometheus-adapter 对这两种 API 都有支持,通常使用 Custom Metrics API 就够了,本文也主要针对此 API 来实现使用自定义指标进行弹性伸缩。
- 部署有 Prometheus 并做了相应的自定义指标采集。
- 已安装 helm 。
这里使用Promethues+SpringBoot结合的指标暴露服务,
该示例程序暴露了 httpserver_requests_total
指标,记录 HTTP 的请求,通过这个指标可以计算出该业务程序的 QPS 值。
将前面的程序打包成容器镜像,然后部署到集群,比如使用 Deployment 部署:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| apiVersion: apps/v1 kind: Deployment metadata: name: metricdemoapp namespace: metricdemoapp spec: replicas: 3 selector: matchLabels: app: metricdemoapp template: metadata: labels: app: metricdemoapp annotations: prometheus.io/scrape: "true" prometheus.io/path: "/actuator/prometheus" prometheus.io/port: "http" spec: containers: - name: metricdemoapp image: registry.cn-hangzhou.aliyuncs.com/hardy_clouddo/metrics-demo-app:v1 imagePullPolicy: Always
---
apiVersion: v1 kind: Service metadata: name: metricdemoapp namespace: metricdemoapp labels: app: metricdemoapp annotations: prometheus.io/scrape: "true" prometheus.io/path: "/actuator/prometheus" prometheus.io/port: "http" spec: type: ClusterIP ports: - port: 80 protocol: TCP name: http selector: app: metricdemoapp
|
业务部署好了,我们需要让我们的 Promtheus 去采集业务暴露的监控指标。
在 Promtheus 的采集规则配置文件添加采集规则(k8s部署的没找到promethues.yaml):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| - job_name: httpserver scrape_interval: 5s kubernetes_sd_configs: - role: endpoints namespaces: names: - httpserver relabel_configs: - action: keep source_labels: - __meta_kubernetes_service_label_app regex: httpserver - action: keep source_labels: - __meta_kubernetes_endpoint_port_name regex: http
|
若已安装 prometheus-operator,则可通过创建 ServiceMonitor 的 CRD 对象配置 Prometheus。示例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: metricdemoapp spec: endpoints: - port: http interval: 5s path: /actuator/prometheus namespaceSelector: matchNames: - metricdemoapp selector: matchLabels: app: metricdemoapp
|
我们使用 helm 安装 prometheus-adapter,安装前最重要的是确定并配置自定义指标,按照前面的示例,我们业务中使用 httpserver_requests_total
这个指标来记录 HTTP 请求,那么我们可以通过类似下面的 PromQL 计算出每个业务 Pod 的 QPS 监控:
1
| sum(rate(http_requests_total[2m])) by (pod)
|
我们需要将其转换为 prometheus-adapter 的配置,准备一个 values.yaml
:
1 2 3 4 5 6 7 8 9 10 11 12 13
| rules: default: false custom: - seriesQuery: 'micro_req_total' resources: template: <<.Resource>> name: matches: "micro_req_total" as: "httpserver_requests_qps" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>) prometheus: url: http://10.1.13.113 port: 31186
|
执行 helm 命令进行安装:
1 2 3 4 5 6
| helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update
helm install prometheus-adapter prometheus-community/prometheus-adapter -f values.yaml
|
如果安装正确,是可以看到 Custom Metrics API 返回了我们配置的 QPS 相关指标:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| $ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "custom.metrics.k8s.io/v1beta1", "resources": [ { "name": "jobs.batch/httpserver_requests_qps", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "pods/httpserver_requests_qps", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "namespaces/httpserver_requests_qps", "singularName": "", "namespaced": false, "kind": "MetricValueList", "verbs": [ "get" ] } ] }
|
也能看到业务 Pod 的 QPS 值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| $ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/httpserver/pods/*/httpserver_requests_qps { "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/httpserver/pods/%2A/httpserver_requests_qps" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "httpserver", "name": "httpserver-6f94475d45-7rln9", "apiVersion": "/v1" }, "metricName": "httpserver_requests_qps", "timestamp": "2020-11-17T09:14:36Z", "value": "500m", "selector": null } ] }
|
上面示例 QPS 为 500m
,表示 QPS 值为 0.5
假如我们设置每个业务 Pod 的平均 QPS 达到 50,就触发扩容,最小副本为 1 个,最大副本为1000,HPA 可以这么配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: httpserver namespace: httpserver spec: minReplicas: 1 maxReplicas: 1000 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: httpserver metrics: - type: Pods pods: metric: name: httpserver_requests_qps target: averageValue: 50 type: AverageValue
|
然后对业务进行压测,观察是否扩容:
1 2 3 4 5 6 7 8 9 10
| $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE httpserver Deployment/httpserver 83933m/50 1 1000 2 18h
$ kubectl get pods NAME READY STATUS RESTARTS AGE httpserver-6f94475d45-47d5w 1/1 Running 0 3m41s httpserver-6f94475d45-7rln9 1/1 Running 0 37h httpserver-6f94475d45-6c5xm 0/1 ContainerCreating 0 1s httpserver-6f94475d45-wl78d 0/1 ContainerCreating 0 1s
|
扩容正常则说明已经实现 HPA 基于业务自定义指标进行弹性伸缩。