2013年5月5日星期日

MySQL数据库错误server_errno=2013的解决


一组MySQL复制环境中的Master意外掉电,重启后Master运行正常,但该复制环境中的其它slave端,Error Log中却抛出的如下错误信息:

110110 15:21:25 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
110110 15:21:25 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'forummysql01-bin.002937' position 243387731
110110 15:21:25 [Note] Slave: connected to master 'repl@192.168.1.31:3306',replication resumed in log 'forummysql01-bin.002937' at position 243387731
110110 15:21:25 [ERROR] Error reading packet from server:Client requested master to start replication from impossible position( server_errno=1236)
110110 15:21:25 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
110110 15:21:25 [Note] Slave I/O thread exiting, read up to log 'forummysql01-bin.002937', position 243387731
通过mysql命令行连接到slave端,执行show slave status查看复制状态:

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.1.31
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: forummysql01-bin.002937
Read_Master_Log_Pos: 243387731
Relay_Log_File: phpmysql02-relay-bin.33417576
Relay_Log_Pos: 243387875
Relay_Master_Log_File: forummysql01-bin.002937
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 243387731
Relay_Log_Space: 243387875
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Salve的io线程没有运行,看起来是接收日志出现了问题,尝试启动该线程:

mysql> start slave io_thread;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.1.31
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: forummysql01-bin.002937
Read_Master_Log_Pos: 243387731
Relay_Log_File: phpmysql02-relay-bin.33417576
Relay_Log_Pos: 243387875
Relay_Master_Log_File: forummysql01-bin.002937
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 243387731
Relay_Log_Space: 243387875
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
看起来 没有反应,其中是有反映,执行启动io线程的命令后,Error Log文件中又抛出了日志文件位置异常的信息。看来还是得到master端,查看一下报错的日志文件指定位置到底执行的什么操作,以及该位置是否存在?

通过mysqlbinlog命令可以查看二进制日志文件中的内容,在master端执行命令如下:

[root@forummysql01 data]# mysqlbinlog --start-position=243387732 forummysql01-bin.002937
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
还别说,这个位置看起来啥都没有做,稳妥起见,三思将整个forummysql01-bin.002937文件中的内容均提取出来查看一下,再次执行mysqlbinlog命令,这次不再指定position:

[root@forummysql01 data]# mysqlbinlog ./forummysql01-bin.002937 > /home/jss/bin-002937.log
我们只需要查看一下该文件最后几行的信息即可,例如:

[root@forummysql01 data]# tail -50 /home/jss/bin-002937.log
.............................
# at 243297123
#110110 15:02:19 server id 1 end_log_pos 243297459 Query thread_id=1773644066 exec_time=0 error_code=0
SET TIMESTAMP=1294642939/*!*/;
INSERT INTO cdb_sessions (sid, ip1, ip2, ip3, ip4, uid, username, groupid, styleid, invisible, action, lastactivity, lastolupdate, seccode, fid, tid)
VALUES ('HQFzjy', '202', '160', '180', '187', '0', '', '7', '1', '0', '3', '1294642939', '0', '232485', '27', '4583')
/*!*/;
................
................
................

# at 243308840
#110110 15:02:20 server id 1 end_log_pos 243315309 Query thread_id=1773638971 exec_time=0 error_code=0
SET TIMESTAMP=1294642940/*!*/;
update group_topic set TOPIC_TIT.............................
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
可以看到该bin文件中最后的位置点是243315309,与错误日志中“'forummysql01-bin.002937', position 243387731”相差较大,提示的错误点在二进制日志文件中确实不存在,我将其理解为逻辑错误,应该是由于master意外掉电,重新启动时自动flush了binlog,而slave并未获取到这个信息导致,因此解决该问题也比较简单,直接重置同步的master位置应该就可以。这里三思选择将日志文件序号递增(也可以选择将position位置号提前),执行命令如下:

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.101',
-> MASTER_PORT=3306,
-> MASTER_USER='repl',
-> MASTER_PASSWORD='******',
-> MASTER_LOG_FILE='forummysql01-bin.002938',
-> MASTER_LOG_POS=0;
Query OK, 0 rows affected (0.01 sec)
然后再重新启动slave,查看状态:

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.31
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: forummysql01-bin.002938
Read_Master_Log_Pos: 35910271
Relay_Log_File: phpmysql02-relay-bin.000003
Relay_Log_Pos: 21407790
Relay_Master_Log_File: forummysql01-bin.002938
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 21407646
Relay_Log_Space: 35910415
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 2215
1 row in set (0.00 sec)
Slave相关进程已启动,Error Log文件中也没有再抛出错误信息。等待一段时间,让slave赶上master的进度,其它slave也参照此步骤操作,整个复制环境就恢复了。

没有评论:

发表评论