微服务测试环境一个服务异常

springcloud其中一微服务异常

问题描述:

多个微服务,测试环境medicalb服务健康状态异常(具体堆内存溢出)

Jenkins启动命令:

1
app.sh start medicalb-3.0.0.jar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
#!/bin/bash
source /etc/profile
source ~/.bash_profile
# App Info
# 应用存放地址
APP_HOME=/opt/health
BACKUP_PATH=/opt/health/backup
# 应用名称
APP_NAME=$1
JAR_FILE_NAME=${APP_NAME%.*}
# Shell Info
# 使用说明,用来提示输入参数
usage() {
echo "Usage: sh boot [APP_NAME] [start|stop|restart|status|backup]"
exit 1
}
# 检查程序是否在运行
is_exist() {
# 获取PID
PID=$(ps -ef | grep ${APP_NAME} | grep -v $0 | grep -v grep | awk '{print $2}')
# -z "${pid}"判断pid是否存在,如果不存在返回1,存在返回0
if [ -z "${PID}" ]
then
# 如果进程不存在返回1
return 1
else
# 进程存在返回0
return 0
fi
}
# 定义启动程序函数
start() {
is_exist
if [ $? -eq "0" ]
then
echo "${APP_NAME} is already running, PID=${PID}"
else
if [ "${APP_NAME}" = "zipkin.jar" ]
then
nohup java -jar -Xms128m -Xmx512m ${APP_HOME}/zipkin.jar >zipkin.log &
else
nohup java -jar -Xms128m -Xmx512m ${APP_HOME}/${APP_NAME} >/dev/null 2>&1 &
fi
PID=$(echo $!)
echo "${APP_NAME} start success, PID=$!"
fi
}
# 停止进程函数
stop() {
is_exist
if [ $? -eq "0" ]
then
kill -9 ${PID}
echo "${APP_NAME} process stop, PID=${PID}"
else
echo "There is no process of ${APP_NAME}"
fi
}
# 重启进程函数
restart() {
stop
sleep 2
start
}
# 查看进程状态
status() {
is_exist
if [ $? -eq "0" ]
then
echo "${APP_NAME} is running, PID=${PID}"
else
echo "There is no process of ${APP_NAME}"
fi
}
# 备份
backup() {
if [ -f "${APP_NAME}" ];then
# 当前时间
now_time=`date --date='0 days ago' "+%Y%m%d%H%M%S"`
cp -p ${APP_NAME} ${BACKUP_PATH}/${JAR_FILE_NAME}_${now_time}.jar
echo "Backup ${APP_NAME} to ${BACKUP_PATH}/${JAR_FILE_NAME}_${now_time}.jar success"
else
echo "There is no file named ${APP_NAME}"
fi
}
case $2 in
"start")
start
;;
"stop")
stop
;;
"restart")
restart
;;
"status")
status
;;
"backup")
backup
;;
*)
usage
;;
esac
exit 0

排查过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-> ps -ef|grep medicalb-3.0.0.jar
appuser 2733 1 0 7月14 ? 00:24:48 java -jar -Xms128m -Xmx512m /opt/health/medicalb-3.0.0.jar
root 18822 14283 0 23:34 pts/0 00:00:00 grep --color=auto medicalb-3.0.0.jar
# 获取到端口:2733
-> jmap -heap 2733
Attaching to process ID 2733, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.271-b09
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 536870912 (512.0MB)
NewSize = 44564480 (42.5MB)
MaxNewSize = 178782208 (170.5MB)
OldSize = 89653248 (85.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 43515904 (41.5MB)
used = 3237280 (3.087310791015625MB)
free = 40278624 (38.412689208984375MB)
7.439303110881024% used
From Space:
capacity = 524288 (0.5MB)
used = 376848 (0.3593902587890625MB)
free = 147440 (0.1406097412109375MB)
71.8780517578125% used
To Space:
capacity = 524288 (0.5MB)
used = 0 (0.0MB)
free = 524288 (0.5MB)
0.0% used
PS Old Generation
capacity = 164102144 (156.5MB)
used = 108753224 (103.71515655517578MB)
free = 55348920 (52.78484344482422MB)
99.99166553046376% used
53637 interned Strings occupying 5407736 bytes.

Old Generation 的使用率达到99.99%

1
2
3
4
5
6
7
8
9
10
-> top -p 2733
top - 23:39:24 up 219 days, 9:08, 3 users, load average: 0.16, 0.15, 0.14
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.2 us, 1.0 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.2 st
KiB Mem : 16264332 total, 1558012 free, 9641436 used, 5064884 buff/cache
KiB Swap: 2097148 total, 2076668 free, 20480 used. 6131380 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2733 appuser 20 0 4397080 593796 14968 S 0.0 3.7 24:49.67 java

随着请求的增加RES的值一直在升高,最终服务崩溃。

生成堆转储快照dump文件。

1
jmap -dump:format=b,file=heapdump.phrof pid

查看内容发现可能和数据库的连接池有关,询问项目组开发人员有谁修改过数据库相关的配置。

1
2
3
4
5
6
7
8
9
configuration:
map-underscore-to-camel-case: false
# 设置一级缓存级别为每次查询都清缓存
local-cache-scope: STATEMENT
# 关闭二级缓存
cache-enabled: false
call-setters-on-nulls: true
jdbc-type-for-null: 'null'
default-fetch-size: 1000

default-fetch-size的值被改成100000造成的;
最终改为了1000。