`
mryufeng
  • 浏览: 968540 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

mnesia 分布协调的几个细节

阅读更多
这3点描述了大概mnesia是如何运作的,多读几遍!

6.6 Loading of Tables at Start-up

At start-up Mnesia loads tables in order to make them accessible for its applications. Sometimes Mnesia decides to load all tables that reside locally, and sometimes the tables may not be accessible until Mnesia brings a copy of the table from another node.

To understand the behavior of Mnesia at start-up it is essential to understand how Mnesia reacts when it loses contact with Mnesia on another node. At this stage, Mnesia cannot distinguish between a communication failure and a "normal" node down.
When this happens, Mnesia will assume that the other node is no longer running. Whereas, in reality, the communication between the nodes has merely failed.

To overcome this situation, simply try to restart the ongoing transactions that are accessing tables on the failing node, and write a mnesia_down entry to a log file.

At start-up, it must be noted that all tables residing on nodes without a mnesia_down entry, may have fresher replicas. Their replicas may have been updated after the termination of Mnesia on the current node. In order to catch up with the latest updates, transfer a copy of the table from one of these other "fresh" nodes. If you are unlucky, other nodes may be down and you must wait for the table to be loaded on one of these nodes before receiving a fresh copy of the table.

Before an application makes its first access to a table, mnesia:wait_for_tables(TabList, Timeout) ought to be executed to ensure that the table is accessible from the local node. If the function times out the application may choose to force a load of the local replica with mnesia:force_load_table(Tab) and deliberately lose all updates that may have been performed on the other nodes while the local node was down. If Mnesia already has loaded the table on another node or intends to do so, we will copy the table from that node in order to avoid unnecessary inconsistency.

Warning
Keep in mind that it is only one table that is loaded by mnesia:force_load_table(Tab) and since committed transactions may have caused updates in several tables, the tables may now become inconsistent due to the forced load.

The allowed AccessMode of a table may be defined to either be read_only or read_write. And it may be toggled with the function mnesia:change_table_access_mode(Tab, AccessMode) in runtime. read_only tables and local_content tables will always be loaded locally, since there are no need for copying the table from other nodes. Other tables will primary be loaded remotely from active replicas on other nodes if the table already has been loaded there, or if the running Mnesia already has decided to load the table there.

At start up, Mnesia will assume that its local replica is the most recent version and load the table from disc if either situation is detected:

mnesia_down is returned from all other nodes that holds a disc resident replica of the table; or,
if all replicas are ram_copies
This is normally a wise decision, but it may turn out to be disastrous if the nodes have been disconnected due to a communication failure, since Mnesia's normal table load mechanism does not cope with communication failures.

When Mnesia is loading many tables the default load order. However, it is possible to affect the load order by explicitly changing the load_order property for the tables, with the function mnesia:change_table_load_order(Tab, LoadOrder). The LoadOrder is by default 0 for all tables, but it can be set to any integer. The table with the highest load_order will be loaded first. Changing load order is especially useful for applications that need to ensure early availability of fundamental tables. Large peripheral tables should have a low load order value, perhaps set below 0.

6.7 Recovery from Communication Failure

There are several occasions when Mnesia may detect that the network has been partitioned due to a communication failure.

One is when Mnesia already is up and running and the Erlang nodes gain contact again. Then Mnesia will try to contact Mnesia on the other node to see if it also thinks that the network has been partitioned for a while. If Mnesia on both nodes has logged mnesia_down entries from each other, Mnesia generates a system event, called {inconsistent_database, running_partitioned_network, Node} which is sent to Mnesia's event handler and other possible subscribers. The default event handler reports an error to the error logger.

Another occasion when Mnesia may detect that the network has been partitioned due to a communication failure, is at start-up. If Mnesia detects that both the local node and another node received mnesia_down from each other it generates a {inconsistent_database, starting_partitioned_network, Node} system event and acts as described above.

If the application detects that there has been a communication failure which may have caused an inconsistent database, it may use the function mnesia:set_master_nodes(Tab, Nodes) to pinpoint from which nodes each table may be loaded.

At start-up Mnesia's normal table load algorithm will be bypassed and the table will be loaded from one of the master nodes defined for the table, regardless of potential mnesia_down entries in the log. The Nodes may only contain nodes where the table has a replica and if it is empty, the master node recovery mechanism for the particular table will be reset and the normal load mechanism will be used when next restarting.

The function mnesia:set_master_nodes(Nodes) sets master nodes for all tables. For each table it will determine its replica nodes and invoke mnesia:set_master_nodes(Tab, TabNodes) with those replica nodes that are included in the Nodes list (i.e. TabNodes is the intersection of Nodes and the replica nodes of the table). If the intersection is empty the master node recovery mechanism for the particular table will be reset and the normal load mechanism will be used at next restart.

The functions mnesia:system_info(master_node_tables) and mnesia:table_info(Tab, master_nodes) may be used to obtain information about the potential master nodes.

The function mnesia:force_load_table(Tab) may be used to force load the table regardless of which table load mechanism is activated.

6.8 Recovery of Transactions

A Mnesia table may reside on one or more nodes. When a table is updated, Mnesia will ensure that the updates will be replicated to all nodes where the table resides. If a replica happens to be inaccessible for some reason (e.g. due to a temporary node down), Mnesia will then perform the replication later.

On the node where the application is started, there will be a transaction coordinator process. If the transaction is distributed, there will also be a transaction participant process on all the other nodes where commit work needs to be performed.

Internally Mnesia uses several commit protocols. The selected protocol depends on which table that has been updated in the transaction. If all the involved tables are symmetrically replicated, (i.e. they all have the same ram_nodes, disc_nodes and disc_only_nodes currently accessible from the coordinator node), a lightweight transaction commit protocol is used.

The number of messages that the transaction coordinator and its participants needs to exchange is few, since Mnesia's table load mechanism takes care of the transaction recovery if the commit protocol gets interrupted. Since all involved tables are replicated symmetrically the transaction will automatically be recovered by loading the involved tables from the same node at start-up of a failing node. We do not really care if the transaction was aborted or committed as long as we can ensure the ACID properties. The lightweight commit protocol is non-blocking, i.e. the surviving participants and their coordinator will finish the transaction, regardless of some node crashes in the middle of the commit protocol or not.

If a node goes down in the middle of a dirty operation the table load mechanism will ensure that the update will be performed on all replicas or none. Both asynchronous dirty updates and synchronous dirty updates use the same recovery principle as lightweight transactions.

If a transaction involves updates of asymmetrically replicated tables or updates of the schema table, a heavyweight commit protocol will be used. The heavyweight commit protocol is able to finish the transaction regardless of how the tables are replicated. The typical usage of a heavyweight transaction is when we want to move a replica from one node to another. Then we must ensure that the replica either is entirely moved or left as it was. We must never end up in a situation with replicas on both nodes or no node at all. Even if a node crashes in the middle of the commit protocol, the transaction must be guaranteed to be atomic. The heavyweight commit protocol involves more messages between the transaction coordinator and its participants than a lightweight protocol and it will perform recovery work at start-up in order to finish the abort or commit work.

The heavyweight commit protocol is also non-blocking, which allows the surviving participants and their coordinator to finish the transaction regardless (even if a node crashes in the middle of the commit protocol). When a node fails at start-up, Mnesia will determine the outcome of the transaction and recover it. Lightweight protocols, heavyweight protocols and dirty updates, are dependent on other nodes to be up and running in order to make the correct heavyweight transaction recovery decision.

If Mnesia has not started on some of the nodes that are involved in the transaction AND neither the local node or any of the already running nodes know the outcome of the transaction, Mnesia will by default wait for one. In the worst case scenario all other involved nodes must start before Mnesia can make the correct decision about the transaction and finish its start-up.

This means that Mnesia (on one node)may hang if a double fault occurs, i.e. when two nodes crash simultaneously and one attempts to start when the other refuses to start e.g. due to a hardware error.

It is possible to specify the maximum time that Mnesia will wait for other nodes to respond with a transaction recovery decision. The configuration parameter max_wait_for_decision defaults to infinity (which may cause the indefinite hanging as mentioned above) but if it is set to a definite time period (eg.three minutes), Mnesia will then enforce a transaction recovery decision if needed, in order to allow Mnesia to continue with its start-up procedure.

The downside of an enforced transaction recovery decision, is that the decision may be incorrect, due to insufficient information regarding the other nodes' recovery decisions. This may result in an inconsistent database where Mnesia has committed the transaction on some nodes but aborted it on others.

In fortunate cases the inconsistency will only appear in tables belonging to a specific application, but if a schema transaction has been inconsistently recovered due to the enforced transaction recovery decision, the effects of the inconsistency can be fatal. However, if the higher priority is availability rather than consistency, then it may be worth the risk.

If Mnesia encounters a inconsistent transaction decision a {inconsistent_database, bad_decision, Node} system event will be generated in order to give the application a chance to install a fallback or other appropriate measures to resolve the inconsistency. The default behavior of the Mnesia event handler is the same as if the database became inconsistent as a result of partitioned network (see above).
分享到:
评论
2 楼 mozhenghua 2013-06-13  
http://www.erlang.org/doc/apps/mnesia/Mnesia_chap7.html#id77819
1 楼 mryufeng 2009-03-23  
mnesia的节点间是平等的 不像其他数据master/slave模式 这样注定有3个好处:

1. 数据可以从任意节点访问, 就近访问, 提高系统的反应速度,同时达到load balance。
2. 数据可以冗余,对抗节点失败。
3. 各个节点拥有的数据可以不同.

相关推荐

    Mnesia User's Guide

    session, specify a Mnesia database directory, initialize a database schema, start Mnesia, and create tables. Initial prototyping of record definitions is also discussed. • Build a Mnesia Database ...

    amnesia:失忆备忘录

    当前版本的话,每个任务可以选择当天或者之后完成,也可以为每个任务打标签 目前我已经迭代了多次,当然了因为没有用户,所以我是根据我自己的用户需求做变更的。二、为什么开源最最最最主要的原因还是因为小程序不...

    Api-Social-Amnesia.zip

    Api-Social-Amnesia.zip,忘记过去。社交健忘症确保你的社交媒体帐户只显示你最近的历史,而不是5年前“那个阶段”的帖子。,一个api可以被认为是多个软件设备之间通信的指导手册。例如,api可用于web应用程序之间的...

    Chrome Amnesia-crx插件

    语言:English (United States) 遗忘的延伸 Chrome失忆症是一个Chrome扩展程序,可让您有选择地不记得自己的任何浏览历史记录。...有关更多信息,请访问https://github.com/DanielBok/chrome-amnesia。

    Mnesia用户手册

    Mnesia用户手册Mnesia用户手册

    Amnesia-开源

    失忆症是一种提醒,允许您定义警报,贴纸(贴子)以提醒您一些重要的内容以及有关所需内容的注释。 可以将警报编程为在给定时间显示,可以在桌面上放置贴纸以随时查看。

    erlang——Mnesia用户手册.pdf

    8.附录.A:Mnesia.错误信息 8.1.Mnesia.中的错误 9.附录.B:备份回调函数接口 9.1.Mnesia.备份回调行为 10.附录.C:作业存取回调接口 10.1.Mnnesia.存取回调行为 11.附录.D:分片表哈希回调接口 11.1....

    Mnesia用户手册.zip

    Mnesia是一个分布式数据库管理系统(DBMS),适合于电信和其它需要持续运行和具备软实时 特性的Erlang应用。

    Mnesia用户手册.pdf

    Mnesia用户手册.pdf

    Amnesia

    Amnesia

    mnesia数据库文档

    erlang系统自带的数据库mnesia的官方文档。

    Mnesia用户手册(docx版)

    Mnesia用户手册(docx版) 详细讲解Mnesia数据库操作

    Mnesia用户手册(PDF版本)

    Mnesia用户手册(PDF版本) 详细讲述Mnesia数据库操作。

    Mnesia table fragmentation 过程及算法分析

    Mnesia table fragmentation 过程及算法分析。erlang就算在64位下dets的空间限制仍旧是2g,同样影响了mnesia,如果有更大需求,就必须使用Mnesia的 table fragmentation 技术

    amnesia-开源

    AMNESIA是一个Erlang库,提供了用于连接关系DBMS的抽象层。 它允许设计人员使用本机Erlang类型和语言构造将关系数据库集成到Erlang程序中。

    Erlang Mnesia

    Mnesia is a distributed Database Management System, appropriate for telecommunications applications and other Erlang applications which require continuous operation and soft real-time properties. It ...

    Mnesia 用户手册中文版 pdf

    Mnesia 用户手册中文版 pdf,把市面上的doc转成pdf,并添加重要章节的书签,细节并不完美,请见谅。

    Mnesia用户手册 4.4.10版.rar

    Mnesia是一个分布式数据库管理系统(DBMS),适合于电信和其它需要持续运行和具备软实时特性的Erlang应用。 目 录 1 、介绍 . . .. . .. . . .. . 4 1.1 关于 Mnesia . . .. . .. . . .. . 4 1.2 Mnesia ...

Global site tag (gtag.js) - Google Analytics