Reference for ROMA
Architectures
Outline of ROMA
ROMA, which stands for Rakuten On-Memory Architecture, is one of the data storing system for distributed key-value stores, and it is implemented by Ruby. ROMA consists of multiple processes on several servers, and make one big size data store. It is possible to access a large amount of data quickly by using ROMA.
Fault Tolerance
ROMA adopts the pure P2P architecture model, so it excels at fault-tolerance.
- It is not likely that the data which stored in ROMA loses.
- ROMA consists of multiple machines and replicates data in their network to their machines automatically.
- If one machine will break down, the data which stored this broken machine exists in other machine. ROMA prevents the data loss because ROMA refers to the replicated data in other machine automatically.
Increase in the dynamic data store size.
ROMA can participate or drop out other machine automatically (without stopping ROMA).
- ROMA operator can add the new machine in the current ROMA due to increase of the data store size.
- The participation of the new machine to ROMA can do simply.
Memcached compatible protocol
ROMA operators or application developers can communicate ROMA using the memcached compatible protocol.
- ROMA operators can communicate with ROMA cluster via telnet and check the status of ROMA cluster.
- ROMA application engineers can get or put the data to ROMA via their current using memcached client.
Memcached client and ROMA client
The chart below is the communication method which users access ROMA via client.
- Memcached client
When the memcached client wants to get the value of one key,- The client can access the arbitrary process, called node, which constructed ROMA.
- The node, called coordinator, which is accessed arbitrarily inquires of the node which actually has the value.
- The node which has the values gets the data, and returns the data to the client.
- ROMA client
- ROMA client gets the node list which constructs ROMA in advance.
- ROMA client accesses the node which actually has the value.
- The node which has the values gets the data, and returns the data to the client.
- When using the memcached client
-
it will have to make one more hop, so it will take longer.
It means that the client inquires the arbitrary node in order to get the desired data. - The memcached client does not implement which is considered about the fault-tolerance. If needed, the application developer has to implement this function.
In ROMA client case, when a trouble occurred, the ROMA client can notice this trouble. So, the application developer does not need to consider about the ROMA trouble.

Plug-in Architecture
- ROMA has plug-in architecture to be extended its function.
It is possible to extend ROMA without changing its source code by preparing plug-ins. - ROMA can change the storage option.
- ROMA can change its storage system from Ruby Hash to other.
- Require performance and trustworthiness is differ by each application,
so we recommned to select and change strage system to adjust target application.
- For example, if you think a great deal of access speed and not trustworthiness, you have to select Ruby hash as a ROMA storage.
On the other hands, you think a great deal of trustworthiness and store a lot of data, you had better to set ordinally DBMS as a ROMA storage. - These storage option can setting by config.rb.
ROMA Architectures
Pure P2P
ROMA make up from ROMA processes which operate in multiple servers. Each process operate cooperatively via network using the pure P2P model. The benefit of the P2P is that the belonged processes can autonomously construct the overlay network. "Autonomously" means that the multiple processes in the network can recognize and manage the participation or dropout of the new processes. ROMA construct the hash table on this network.
Since it uses the P2P model, ROMA don't need to make a single server or a centralized index server. Therefore, if one process or machine will crash, ROMA separate off the broken server from their own network, and take over the data. And ROMA can prevent that the client access flock to one server.
Consistent Hashing and Virtual Nodes
ROMA arranges ROMA processes in multiple servers to the one-dimensional ringed hash space in theory. Therefore, ROMA can manage easily the participation or dropout of ROMA processes.
ROMA process has multiple data regions which called "Virtual Nodes". Since one ROMA process has the segmentalized data region ROMA can manage the ROMA processes. In addition, ROMA can reduce biases of these data.
Data Replication
In order to increase the fault tolerance, ROMA sets the replica of data to multiple servers. This helps to prevent the service from shutting down by the loss of data, if one server will stop.
ROMA can recognize the breakdown of one server at one timing, and cut off this server. And then, ROMA will automatically take over the client access to the data which stored this server.
Eventual Consistency (Replica Synchronization)
It is very difficult that ROMA ensure the consistency of data between data replicas. In the distributed system, it is impossible to guarantee the consistency of data in strict meaning. ROMA try to keep the consistency of data in the data replicas regularly.