0%

如何解决 Debian 系 Elastic apm-server 7.x 启动失败

本来是几个月前在 Ubuntu 部署 Elastic apm-server 遇到的问题,当时处理起来没遇到特别的卡点,就只是把解决过程丢到 Evernote 了。最近发现还有人在重复踩这个坑,因此我把笔记整理之后搬到这里作一个极简的分享。

apm-server 安装

实际步骤就不需要我复述了,官方提供现成的 deb 安装包。除了查看官方文档,更推荐使用 Kibana APM 看板自带的指南。

指南的 url 路径大概是 http://localhost:5601/app/kibana#/home/tutorial/apm?_g=()

不仅有安装引导,还提供按钮协助检查 apm-server 的服务状态。

Kibana apm-server tutorial

启动异常

在 debian 系发行版安装 apm-server 后,执行 service apm-server start 报告失败,且切换到 systemctl 也无效。

service apm-server status报错如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ service apm-server status                                                                                          
● apm-server.service - Elastic APM Server
Loaded: loaded (/lib/systemd/system/apm-server.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2019-04-16 14:44:42 CST; 3s ago
Docs: https://www.elastic.co/solutions/apm
Process: 4783 ExecStart=/usr/share/apm-server/bin/apm-server $BEAT_LOG_OPTS $BEAT_CONFIG_OPTS $BEAT_PATH_OPTS (code=ex
Main PID: 4783 (code=exited, status=1/FAILURE)

4 月 16 14:44:42 ray systemd[1]: apm-server.service: Service hold-off time over, scheduling restart.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Scheduled restart job, restart counter is at 5.
4 月 16 14:44:42 ray systemd[1]: Stopped Elastic APM Server.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Start request repeated too quickly.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Failed with result 'exit-code'.
4 月 16 14:44:42 ray systemd[1]: Failed to start Elastic APM Server.

检查日志

首先使用 journalctl 查看 systemd 的日志,如下

1
$ journalctl -u apm-server.service

打印日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Logs begin at Wed 2019-04-10 09:30:25 CST, end at Tue 2019-04-16 14:44:42 CST. --                                    
4 月 16 13:43:23 ray systemd[1]: Started Elastic APM Server.
4 月 16 13:43:23 ray apm-server[2487]: Exiting: error loading config file: config file ("/etc/apm-server/apm-server.yml")
4 月 16 13:43:23 ray systemd[1]: apm-server.service: Main process exited, code=exited, status=1/FAILURE
4 月 16 13:43:23 ray systemd[1]: apm-server.service: Failed with result 'exit-code'.
4 月 16 13:43:23 ray systemd[1]: apm-server.service: Service hold-off time over, scheduling restart.
4 月 16 13:43:23 ray systemd[1]: apm-server.service: Scheduled restart job, restart counter is at 1.
4 月 16 13:43:23 ray systemd[1]: Stopped Elastic APM Server.
4 月 16 13:43:23 ray systemd[1]: Started Elastic APM Server.
# ... ,笔者注释,省略中间的多次重启信息
4 月 16 14:44:42 ray apm-server[4783]: Exiting: error loading config file: config file ("/etc/apm-server/apm-server.yml")
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Main process exited, code=exited, status=1/FAILURE
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Failed with result 'exit-code'.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Service hold-off time over, scheduling restart.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Scheduled restart job, restart counter is at 5.
4 月 16 14:44:42 ray systemd[1]: Stopped Elastic APM Server.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Start request repeated too quickly.
4 月 16 14:44:42 ray systemd[1]: apm-server.service: Failed with result 'exit-code'.
4 月 16 14:44:42 ray systemd[1]: Failed to start Elastic APM Server.

这样找出真正的启动错误是 Exiting: error loading config file: config file ("/etc/apm-server/apm-server.yml")

解决方法

配置文件异常,采用 apm-server export config 进一步观察。提示如下:

1
error initializing beat: error loading config file: config file ("/etc/apm-server/apm-server.yml") must be owned by the beat user (uid=1000) or root

github issues 上找到了类似的问题,但没有给出推荐的处理方案,所以决定自己动手解决。

ls -l 观察 /etc/apm-server/ 的信息,发现除了 apm-server.yml 之外,owner 都是 root

1
2
3
4
5
6
$ ls -l /etc/apm-server
total 148K
drwxr-xr-x 2 root root 4.0K 4 月 16 14:11 .
drwxr-xr-x 142 root root 12K 4 月 16 14:11 ..
-rw------- 1 apm-server apm-server 33K 4 月 6 05:48 apm-server.yml
-rw-r--r-- 1 root root 94K 4 月 6 05:48 fields.yml

那么统一将权限变更到 root 吧!

1
$ sudo chown root:root /etc/apm-server/apm-server.yml

改之后测试

1
2
$ sudo apm-server test config
Config OK

再尝试启动则提示成功。