The this page explains data lost of ROMA.
What's about data lost?
ROMA has hundreds or thousands buckets, and store a data to this. And ROMA divide a data and make redundant by a bucket.
If several server down, some buckets which stored at these server is lost from ROMA cluster. Several server down at once is very low possibility. But if you had not recover redundancy adequately, it would have had possibilities to result in data lost.
So it is essential to recover redundancy for operation.
ROMA's behavior in case that data lost
Select ROMA's behavior against data lost by configuring booting option.
Reference : Configuration - DEFAULT_LOST_ACTION
Coping process in case of data lost
Recover data which were lost
If you selected tc file as a data storage, you can recover the data often times.
When several machine down and data lost, this data is vanished from ROMA and you can't access.
But you can recover data from tc file. We explain how to data recover when data lost occured.
Confirm lost status
> stat routing.lost_vnodes
How to recover data when data lost
Note: If you set auto_assign mode and do not install "roma-ruby-client", please install "roma-ruby-client" before executing recovery command data lost.
(Client's root directory name have to set "roma-ruby-client" and put in under ROMA directory.)
Method of recovery are differ by ROMA's configuration(DEFAULT_LOST_ACTION).
- auto_assign mode(default)
Different points between [no_action], [shutdown] and [auto_assign]
The difference between no_action and auto_assign is whether accessing to buckets which data lost is available. In case of no_action, access is rejected.
On the other hand auto_assign can access to lost buckets, because blank bucket is assigned to lost bucket when data lost.
It have possibilities that lost buckets have new data which is differ from past data. As a reason of this, auto_assign's data recovery does not overwrite a existing data after data error occured.
- Collect data of several down server.
- Merge these data(create new update data).
- Upload this update data to ROMA.
- (Join down instance, If you need)
1. Collect data of several down server.
2. Merge tc file's data(create new update data).
After copy data to one machine, make update data by below command.
Argument is [digest bit count], [bucket bit count], [tc file count of each instance], [TC directory' s path 1], [TC directory' s path 2], [output path]
$ bin/mkrecent [digest bit counts] [bucket bit counts] [tc file count] [directory path which include data file(1)] [directory path which include data file(2)] [directory path which will make for new data directory]
Note: About TC directory' s path, roma direcotry(./roma/*.tc) must be included just below the directory.
3. Upload this update data to ROMA.
Upload merged data file to working ROMA server. And execute following command.
$ bin/recoverlost [host name] [port No.] [merged data path] [time which error occured]
In case of auto_assign mode, even so data lost, routiung table is updated automatically. So it seems lost buckets are not existed.
If identify time of lost occured, ROMA can search a changed history and specify lost buckets.
On the other hand auto_assign can access to lost buckets, because blank bucket is assined to lost bucket when data lost.
So this time must not set remiss time. If there is a history of data lost in past, it have possibilities that non-related buckets will be uploaded.
4. (Join down instance, If you need).
Reference : Roma reconstruct & reboot
You can investigate all argument by stat command.
> stat routing.dgst_bits # digest bit counts > stat routing.div_bits # bucket bit counts > stat storages[roma].storage.divnum # tc file count
Note: This command support only 2 directory.
If you have over 3 directory which need to merge, you need to continue this step until it becomes one.
- no_action or shutdown mode
Difference point between [no_action] and [shutdown]
- In case of no_action, ROMA keep working.
- In case of shutdown, ROMA shut down all process.
Note: It make it possible that some ROMA's data will rollback to a previous state(before error occured).
So if you don't want it, please use no_assign mode.
- Kill existing ROMA process by using "balse".(It is not needed for shutdown mode.)
- Rollback routing file.
- Reboot all instance
1. Kill surviving ROMA process by using "balse".(It is not needed for shutdown mode.)
2. Rollback routing file.
Why rollback need?
- If error information is existing on routing tables immediately after error ocurred, ROMA reboot with quotafree bucket.
- Booting with initial status need to rollback the routing file up to the point of instance down.
ROMA's routing data are stored in *.route.
If routing table is changed during ROMA is working, the difference between routings are stored in new *.routing files.
*.routing *.routing.1 # new file will be created after error occured.
Rollback of routing will be completed by deleting the file which was created after first error occured.
Note: This operation have to be executed on all instance.
3. Reboot all instance.
Reboot all instance.
If "routing.lost_vnodes" became 0, recovery is success!