Posts: 4
Threads: 1
Joined: Aug 2020
Reputation:
0
I surprised why no one asked about this before, and hope Haxus will answer on that question
Why you use gdb to freeze crashed server process, instead of gathering core dump and restarting it?
If you have binlog enabled in your sql db, then i suppose restart will not lead to any long term unsolvable problems
Posts: 131
Threads: 13
Joined: Jun 2018
Reputation:
7
08-23-2020, 06:32 PM
(This post was last modified: 08-23-2020, 06:33 PM by QuakeIV.)
There is a certain philosophy that auto-restarting a server due to a fault is a bad idea, due to probability that the fault will immediately re-arise and crash it again.
I personally think its a relatively reasonable one, many faults are repeatable so trying to reboot wont do much good, and it will do some small degree of harm to the servers to reboot them repeatedly for hours.
Posts: 4
Threads: 1
Joined: Aug 2020
Reputation:
0
almost any modern database have some kind of bin log/transaction log, what make possible to restore working state from snapshot or delayed slave replication, so this must be relevantly safe to use since he just need to rollback only some from transactions related to server where crash occurred to prevent instant re-arise
also information about crash will be fully saved since core dump contain full dump of allocated memory, process state and registers
Posts: 131
Threads: 13
Joined: Jun 2018
Reputation:
7
08-23-2020, 11:43 PM
(This post was last modified: 08-23-2020, 11:44 PM by QuakeIV.)
Its true that you could probably pretty much deal with most of the downsides. Limited rollback (maybe 10 minutes or something) and maybe a delay on restarting, and you would both reduce the odds of a repeat failure and also reduce the fatigue on the servers from repeated restarts.
Notably doing a selective rollback on just the database entries that the failed server was working on (rather than bringing everything down to do a global rollback and then coming back up) it might be challenging to implement a highly selective rollback, depending on how the backend works.
Posts: 4
Threads: 1
Joined: Aug 2020
Reputation:
0
08-24-2020, 10:36 AM
(This post was last modified: 08-24-2020, 10:38 AM by MEXAHOTABOP.)
Here question to haxus about used implementation and why he use it
There a request to do something about it in general
Posts: 4
Threads: 1
Joined: Aug 2020
Reputation:
0
08-24-2020, 11:27 AM
(This post was last modified: 08-24-2020, 11:27 AM by MEXAHOTABOP.)
as i writed before, core dump contain all allocated memory and registers
you will get exactly same process in same state only thing what will be changed between freezed process and restored one is system pid