解决 OceanBase CE 启动失败:OBD-2002: Failed to start 0.0.0.0 observer
昨天,没错,是昨天。凌晨(12-25 00:00)要在生产预发版,根据之前交接的模块部署文档准备了一天的环境(也是第一次),结果快下班的时候,在处理一个问题时才发现 K8s 上 OceanBase CE 实例一直是挂的(服务太多了,眼睛都花了)。
偏偏是和测试环境完全相同的配置及规格,却一直启动失败。翻遍了 OceanBase 问答社区,排查了很久,尝试了很多解决方案,最终解决了,特此记录下解决方案。
报错信息
容器日志里直接观测到的错误如下:
find obd deploy information, skip configuring...
start ob cluster ...
Get local repositories ok
Load cluster param plugin ok
Open ssh connection ok
[WARN] OBD-1007: (0.0.0.0) The recommended number of stack size is unlimited (Current value: 8192)
[WARN] OBD-1017: (0.0.0.0) The value of the "vm.overcommit_memory" must be 0 (Current value: 1, Recommended value: 0)
[WARN] OBD-1012: (0.0.0.0) clog and data use the same disk (/root/ob)
cluster scenario: express_oltp
Start observer ok
observer program health check x
[WARN] OBD-2002: Failed to start 0.0.0.0 observer
See https://www.oceanbase.com/product/ob-deployer/error-codes .
Trace ID: 84aff43c-e0db-11f0-9da7-f628f70b8ed5
If you want to view detailed obd logs, please run: obd display-trace 84aff43c-e0db-11f0-9da7-f628f70b8ed5
Wed Dec 24 15:16:33 UTC 2025
boot success!根据提示,进入容器执行 obd display-trace 84aff43c-e0db-11f0-9da7-f628f70b8ed5,查看详细日志。恕我眼拙,看的眼花缭乱也没看出什么细节。
# 上方略...
[2025-12-24 15:16:30.144] [INFO] Start observer
[2025-12-24 15:16:30.144] [DEBUG] -- starting 0.0.0.0 observer
[2025-12-24 15:16:30.144] [DEBUG] -- root@0.0.0.0 export LD_LIBRARY_PATH='/root/ob/observer/lib:'
[2025-12-24 15:16:30.145] [DEBUG] -- root@0.0.0.0 execute: cd /root/ob/observer; /root/ob/observer/bin/observer -r '0.0.0.0:2882:2881' -p 2881 -P 2882 -z 'zone1' -n 'difyai' -c 1766589375 -d '/root/ob/observer/store' -l 'INFO' -I '0.0.0.0' -o __min_full_resource_pool_memory=2147483648,memory_limit='4G',system_memory='1G',datafile_size='5G',log_disk_size='5G',cpu_count=16,enable_syslog_wf=False,enable_syslog_recycle=True,max_syslog_file_count=4,enable_rich_error_msg=True,enable_record_trace_log=False
[2025-12-24 15:16:30.389] [DEBUG] -- exited code 0
[2025-12-24 15:16:30.390] [DEBUG] -- root@0.0.0.0 delete env LD_LIBRARY_PATH
[2025-12-24 15:16:30.390] [DEBUG] -- need_bootstrap: True
[2025-12-24 15:16:30.391] [DEBUG] - sub start ref count to 0
[2025-12-24 15:16:30.391] [DEBUG] - export start
[2025-12-24 15:16:30.391] [DEBUG] - plugin oceanbase-ce-py_script_start-3.1.0 result: True
[2025-12-24 15:16:30.391] [DEBUG] - Searching health_check plugin for components ...
[2025-12-24 15:16:30.391] [DEBUG] - Searching health_check plugin for oceanbase-ce-4.3.5.3-103000092025080818.el8-7ce84b4d7cc89779af7d4de3f80b72a2ac679320
[2025-12-24 15:16:30.392] [DEBUG] - Found for oceanbase-ce-py_script_health_check-3.1.0 for oceanbase-ce-4.3.5.3
[2025-12-24 15:16:30.392] [DEBUG] - Call plugin oceanbase-ce-py_script_health_check-3.1.0 for oceanbase-ce-4.3.5.3-103000092025080818.el8-7ce84b4d7cc89779af7d4de3f80b72a2ac679320
[2025-12-24 15:16:30.392] [DEBUG] - import health_check
[2025-12-24 15:16:30.393] [DEBUG] - add health_check ref count to 1
[2025-12-24 15:16:30.393] [INFO] observer program health check
[2025-12-24 15:16:33.395] [DEBUG] -- 0.0.0.0 program health check
[2025-12-24 15:16:33.395] [DEBUG] -- root@0.0.0.0 execute: cat /root/ob/observer/run/observer.pid
[2025-12-24 15:16:33.411] [DEBUG] -- exited code 0
[2025-12-24 15:16:33.412] [DEBUG] -- root@0.0.0.0 execute: ls /proc/265
[2025-12-24 15:16:33.470] [DEBUG] -- exited code 2, error output:
[2025-12-24 15:16:33.470] [DEBUG] ls: cannot access '/proc/265': No such file or directory
[2025-12-24 15:16:33.470] [DEBUG]
[2025-12-24 15:16:33.471] [WARNING] [WARN] OBD-2002: Failed to start 0.0.0.0 observer
[2025-12-24 15:16:33.471] [DEBUG] - sub health_check ref count to 0
[2025-12-24 15:16:33.471] [DEBUG] - export health_check
[2025-12-24 15:16:33.471] [DEBUG] - plugin oceanbase-ce-py_script_health_check-3.1.0 result: False
[2025-12-24 15:16:33.476] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 1
[2025-12-24 15:16:33.476] [DEBUG] - share lock /root/.obd/lock/mirror_and_repo release, count 0
[2025-12-24 15:16:33.476] [DEBUG] - unlock /root/.obd/lock/mirror_and_repo
[2025-12-24 15:16:33.476] [DEBUG] - exclusive lock /root/.obd/lock/deploy_obcluster release, count 0
[2025-12-24 15:16:33.476] [DEBUG] - unlock /root/.obd/lock/deploy_obcluster
[2025-12-24 15:16:33.476] [DEBUG] - share lock /root/.obd/lock/global release, count 0
[2025-12-24 15:16:33.476] [DEBUG] - unlock /root/.obd/lock/global
[2025-12-24 15:16:33.476] [INFO] See https://www.oceanbase.com/product/ob-deployer/error-codes .
[2025-12-24 15:16:33.476] [INFO] Trace ID: 84aff43c-e0db-11f0-9da7-f628f70b8ed5
[2025-12-24 15:16:33.477] [INFO] If you want to view detailed obd logs, please run: obd display-trace 84aff43c-e0db-11f0-9da7-f628f70b8ed5K8s StatefulSet 配置
和测试环境完全一致的配置。
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: oceanbase
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: oceanbase
template:
metadata:
labels:
app: oceanbase
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: vector-pvc
containers:
- name: oceanbase
image: 'oceanbase-ce:4.3.5-lts'
ports:
- containerPort: 2881
protocol: TCP
env:
- name: MODE
value: mini
- name: OB_MEMORY_LIMIT
value: 4G
- name: OB_CLUSTER_NAME
value: difyai
- name: OB_SERVER_IP
value: 0.0.0.0
- name: OB_SYS_PASSWORD
valueFrom:
configMapKeyRef:
name: dify-config
key: OCEANBASE_VECTOR_PASSWORD
- name: OB_TENANT_PASSWORD
valueFrom:
configMapKeyRef:
name: dify-config
key: OCEANBASE_VECTOR_PASSWORD
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
volumeMounts:
- name: data
mountPath: /root/ob
- name: data
mountPath: /root/.obd/cluster
subPath: obd-cluster
- name: data
mountPath: /root/boot/init.d
subPath: obd-cluster
imagePullPolicy: Always
restartPolicy: Always排查过程
尝试调整 memory_limit 大小
利用报错信息中的关键信息:错误码 OBD-2002,同时也翻了很多 OceanBase 问答社区帖子、GitHub OceanBase Issue,官方的回复里,大多都是提示调整 memory_limit、system_memory 或 cpu 等大小。
observer program health check x
[WARN] OBD-2002: Failed to start 0.0.0.0 observer根据官方错误码查询到如下说明。
OBD-2002:failed to start x.x.x.x observer
错误原因:出现该报错的原因有很多,常见的原因有以下两种。
memory_limit 小于 8G。
system_memory 太大或太小。通常情况下 memory_limt/3 ≤ system_memory ≤ memory_limt/2。
解决方法:
若排查后发现该报错为上述两条原因造成,根据对应原因进行调整即可;
若排查后发现不是由上述两条原因引起的报错,您可到官网 问答区 进行提问,会有专业人员为您解答。
于是,多次尝试或组合尝试如下 N 次:
- 首先调整了 StatefulSet
OB_MEMORY_LIMIT为 8G(官方说默认6G),并调整资源限制,删掉 Pod 后重新生成,仍然如此 - 增加
OB_SYSTEM_MEMORY为 4G(官方说默认1G),结果依旧 - 删除了 PV 重新安装,结果依旧
containers:
env:
- name: OB_MEMORY_LIMIT
value: 8G
- name: OB_SYSTEM_MEMORY
value: 4G
resources:
limits:
cpu: '4'
memory: 16Gi
requests:
cpu: '4'
memory: 16Gi尝试不挂载磁盘
有同事根据日志,给出提示,是不是和磁盘权限有关呢?移除掉磁盘后,结果依旧。
尝试调整 MODE 为 slim
我在开发环境用相同容器配置尝试了一下,却报了其他的错误,然后使用 slim 模式,正常启动。但是在生产环境却结果依旧。
MODE 的三种模式:mini, slim, normal
mini:表示容器将使用最少的资源
normal:表示容器将尽可能使用容器的全部资源
slim:表示容器将只启动observer并使用快速启动模式,租户名为 test,集群和租户资源相关的变量配置不生效。
containers:
env:
- name: MODE
value: slim解决:删除 OB_SERVER_IP 配置
时间一点点过去,我在躺平及埋怨 OceanBase 日志写的太烂(上述信息完全不知道具体什么原因导致的检查失败)的无意识瞎尝试中(也提了 Issue,想着等回复),从问答社区某篇帖子中官方人员给的回复有了一些灵感,
辞霜-官方
内存cpu分配的没问题。需要提供一份obd详细日志 cd ~/.obd/log下
查看了集群配置,和测试环境对比一致。
cd ~/ob
cd obd-cluster
cd obcluster
cat config.yamloceanbase-ce:
servers:
- 172.53.104.54
global:
home_path: /root/ob/observer
mysql_port: 2881
rpc_port: 2882
zone: zone1
appname: difyai
memory_limit: G
system_memory: 1G
datafile_size: 5G
log_disk_size: 5G
root_password: xxxx
scenario: express_oltp
obconfig_url:
cpu_count: 16
production_mode: false
syslog_level: INFO
enable_syslog_wf: false
enable_syslog_recycle: true
max_syslog_file_count: 4
enable_rich_error_msg: true
cluster_id: 1766644739也终于找到了详细日志。
cd ~/ob
cd observer
cd log
tail -100 observer.log# 其他部分略,也忽略其中的配置和前文不一致问题,因为当时没截取完...
[2025-12-25 06:13:37.321365] WDIAG [LIB] get_ifname_by_addr (ob_net_util.cpp:331) [265][observer][T0][Y0-0000000000000001-0-0] [lt=8][errcode=0] can not find ifname by local ip(local_ip=0.0.0.0)
[2025-12-25 06:13:37.327365] ERROR [SERVER] init_local_ip_and_devname (ob_server.cpp:2235) [265][observer][T0][Y0-0000000000000001-0-0] [lt=5975][errcode=-4393] local_ip set failed, please check your local_ip. local_ip is 0.0.0.0. [suggestion] Verify if your local IP is right.
[2025-12-25 06:13:37.333990] INFO [SHARE.CONFIG] reload_config (ob_server_config.cpp:356) [265][observer][T0][Y0-0000000000000001-0-0] [lt=6562] update observer memory config(memory_limit=10737418240, system_memory=4294967296, hidden_sys_memory=1073741824)
[2025-12-25 06:13:37.334016] INFO [SERVER] set_running_mode (ob_server.cpp:2370) [265][observer][T0][Y0-0000000000000001-0-0] [lt=23] observer start with mini_mode(memory_limit=10737418240)
[2025-12-25 06:13:37.334027] INFO set_running_mode (ob_server.cpp:2375) [265][observer][T0][Y0-0000000000000001-0-0] [lt=10] mini mode: true
[2025-12-25 06:13:37.334045] INFO [SERVER] init_self_addr (ob_server.cpp:2306) [265][observer][T0][Y0-0000000000000001-0-0] [lt=5] Build basic information for each syslog file(info="address: , observer version: OceanBase_CE 4.3.5.3, revision: 103000092025080818-e8da5f0afb288ed0add0613740c6ccf2a3c6830b, sysname: Linux, os release: 4.19.112-1.el7.x86_64, machine: x86_64, tz GMT offset: 00:00")
[2025-12-25 06:13:37.334053] INFO [SERVER] init_self_addr (ob_server.cpp:2310) [265][observer][T0][Y0-0000000000000001-0-0] [lt=7] my addr(self_addr="0.0.0.0:2882")
[2025-12-25 06:13:37.334062] EDIAG [SERVER] init_config_module (ob_server.cpp:2329) [265][observer][T0][Y0-0000000000000001-0-0] [lt=7][errcode=-4002] local address isn't valid(self_addr_="0.0.0.0:2882", ret=-4002, ret="OB_INVALID_ARGUMENT") BACKTRACE:0xa911ef8 0xa6180b5 0xa7b860f 0xa7b7fd6 0xa7b7f08 0xa7b7d26 0x14149269 0x14116c5a 0x141097ee 0xffbbc88 0xffc1172 0x273eb110 0xffbd9fd 0x7efe1ee3acf3 0xabfb79e
[2025-12-25 06:13:37.334144] EDIAG [SERVER] init_config (ob_server.cpp:2097) [265][observer][T0][Y0-0000000000000001-0-0] [lt=81][errcode=-4002] init config module failed(ret=-4002, ret="OB_INVALID_ARGUMENT") BACKTRACE:0xa911ef8 0xa6180b5 0xa75cb4f 0xa75c4c6 0xa75c400 0xa75c227 0x14147bd0 0x14116f25 0x141097ee 0xffbbc88 0xffc1172 0x273eb110 0xffbd9fd 0x7efe1ee3acf3 0xabfb79e
[2025-12-25 06:13:37.334179] EDIAG [SERVER] init (ob_server.cpp:248) [265][observer][T0][Y0-0000000000000001-0-0] [lt=34][errcode=-4002] init config failed(ret=-4002, ret="OB_INVALID_ARGUMENT") BACKTRACE:0xa911ef8 0xa6180b5 0xa75cb4f 0xa75c4c6 0xa75c400 0xa75c227 0x14118896 0x1410c16a 0xffbbc88 0xffc1172 0x273eb110 0xffbd9fd 0x7efe1ee3acf3 0xabfb79e
[2025-12-25 06:13:37.334198] WDIAG [STORAGE.TRANS] getClock (ob_clock_generator.h:70) [265][observer][T0][Y0-0000000000000001-0-0] [lt=15][errcode=-4006] clock generator not inited
[2025-12-25 06:13:37.334276] INFO [SERVER] init (ob_server.cpp:254) [265][observer][T0][Y0-0000000000000001-0-0] [lt=26] [server_start 1/18] observer init begin.
[2025-12-25 06:13:37.334285] INFO [LIB] set_param (achunk_mgr.cpp:31) [265][observer][T0][Y0-0000000000000001-0-0] [lt=7] set large page param(large_page_type_=0)
[2025-12-25 06:13:37.334296] EDIAG [SERVER] init (ob_server.cpp:562) [265][observer][T0][Y0-0000000000000001-0-0] [lt=6][errcode=-4002] [OBSERVER_NOTICE] fail to init observer(ret=-4002, ret="OB_INVALID_ARGUMENT") BACKTRACE:0xa911ef8 0xa6180b5 0xa75cb4f 0xa75c4c6 0xa75c400 0xa75c227 0x141206b6 0x1410f130 0xffbbc88 0xffc1172 0x273eb110 0xffbd9fd 0x7efe1ee3acf3 0xabfb79e
[2025-12-25 06:13:37.334323] ERROR [SERVER] init (ob_server.cpp:566) [265][observer][T0][Y0-0000000000000001-0-0] [lt=7][errcode=-4002] [server_start 2/18] observer init fail. you may find solutions in previous error logs or seek help from official technicians.尝试把 OB_SERVER_IP 配置移除,重启后终于正常启动了。
find obd deploy information, skip configuring...
start ob cluster ...
Get local repositories ok
Load cluster param plugin ok
Cluster status check ok
[WARN] OBD-1007: (172.53.104.54) The recommended number of stack size is unlimited (Current value: 8192)
[WARN] OBD-1017: (172.53.104.54) The value of the "vm.overcommit_memory" must be 0 (Current value: 1, Recommended value: 0)
cluster scenario: express_oltp
Start observer ok
observer program health check ok
Connect to observer ok
obshell start ok
obshell program health check ok
Connect to observer ok
Wait for observer init ok
+-------------------------------------------------+
| oceanbase-ce |
+---------------+---------+------+-------+--------+
| ip | version | port | zone | status |
+---------------+---------+------+-------+--------+
| 172.53.104.54 | 4.3.5.3 | 2881 | zone1 | ACTIVE |
+---------------+---------+------+-------+--------+
obclient -h172.53.104.54 -P2881 -uroot@sys -p'xxxx' -Doceanbase -A
cluster unique id: e8ea0b05-c7ad-5912-9145-f640ed897854-19b543bc03f-03050304
obcluster running
Trace ID: 875b8cec-e15c-11f0-aec4-960a1d6a947b
If you want to view detailed obd logs, please run: obd display-trace 875b8cec-e15c-11f0-aec4-960a1d6a947b
Thu Dec 25 06:40:24 UTC 2025
boot success!最终可用的 K8s StatefulSet 配置
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: oceanbase
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: oceanbase
template:
metadata:
labels:
app: oceanbase
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: vector-pvc
containers:
- name: oceanbase
image: 'oceanbase-ce:4.3.5-lts'
ports:
- containerPort: 2881
protocol: TCP
env:
- name: MODE
value: mini
- name: OB_MEMORY_LIMIT
value: 4G
- name: OB_CLUSTER_NAME
value: difyai
- name: OB_SYS_PASSWORD
valueFrom:
configMapKeyRef:
name: dify-config
key: OCEANBASE_VECTOR_PASSWORD
- name: OB_TENANT_PASSWORD
valueFrom:
configMapKeyRef:
name: dify-config
key: OCEANBASE_VECTOR_PASSWORD
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '2'
memory: 4Gi
volumeMounts:
- name: data
mountPath: /root/ob
- name: data
mountPath: /root/.obd/cluster
subPath: obd-cluster
- name: data
mountPath: /root/boot/init.d
subPath: obd-cluster
imagePullPolicy: Always
restartPolicy: Always参考资料
- OBD-2002:failed to start x.x.x.x observer:https://www.oceanbase.com/product/ob-deployer/error-codes
- 使用Docker部署OceanBase:https://github.com/oceanbase/docker-images/blob/main/oceanbase-ce/README_CN.md
- oceanbase社区版安装报错observer program health check x:https://ask.oceanbase.com/t/topic/35630314
