[DB] introduction to MYSQL Cluster
MySQL Cluster is a new technology to enable clustering of in-memory databases in a share-nothing system. The share-nothing architecture allows the system to work with very inexpensive hardware, without any specific requirement on hardware or software. It also does not have any single point of failure since each component has its own memory and disk.
MySQL Cluster is an integration of the standard MySQL server with an in-memory clustered storage engine, called NDB. In the rest of this document, the term NDB refers to the storage engine specific part of the setup, whereas MySQL Cluster refers to the combination of MySQL and the new storage engine.
This article is intended for anyone who wants to get MySQL Cluster up and running and who is already familiar with MySQL server. For this reason, the configuration and details of NDB is covered in much more detail than the MySQL server configuration.
The cluster storage engine
NDB is an in-memory storage engine with many built-in high-availability and data-persistence features. NDB can, although this requires extensive knowledge, be used as a database system in its own right, supporting the traditional relational data model, as well as full ACID transactions.
The storage engine can be configured with many fail-over and load-balancing options, but the easiest level to start at is looking at the storage engine at the cluster level. The NDB storage engine of MySQL Cluster contains a complete set of data, dependent only on other data within the cluster itself.
A MySQL Cluster may replicate clustered data to other MySQL Clusters. But this is a complex configuration, and before getting to that point, let’s look at how a single MySQL Cluster is used, and that is what this article describes — how to set up a single MySQL Cluster consisting of an NDB storage engine and some MySQL servers.
The cluster part of MySQL Cluster is currently configured independently from the MySQL servers. In an MySQL Cluster each part of the cluster is considered to be a node. (A node in many contexts is often a computer, but for MySQL Cluster it is a process. There can be any number of nodes on a single computer.) Each node has a type, and there can be multiple nodes in the MySQL Cluster of each type. In a minimal MySQL Cluster configuration, there will be at least three nodes:
The management (MGM) node. The role of this type of node is to manage the other nodes within the MySQL Cluster, such as providing configuration data, starting and stopping nodes, running backup etc. As this node type manages the configuration of the other nodes, a node of this type must always be started first, before any other node. With a running cluster, the MGM node does necessarily have to be running all the time.
The storage or database (DB) node. This is the type of node that manages and stores the database itself. There are as many DB nodes as you have replicas times the number of fragments. That is, with two fragments, each with two replicas, you need four DB nodes. Note that it is not necessary to have more than one replica, so a minimal MySQL Cluster may contain just one DB node.
The client (API) node. This is the client node that will access the cluster, and in the case of MySQL Cluster, the MySQL daemon is the client. If you have applications that use the NDB API directly, then these are considered API nodes too.
The example configuration of these nodes will follow shortly, but it is much less difficult than one might think. The reason for this is that the only node that needs extensive configuration is the MGM node or nodes, all the other nodes need only be configured to access the MGM node, all other configuration data will then be fetched from the MGM node.
The MySQL server within the MySQL Cluster
The MySQL server that is part of MySQL Cluster is different in only one main area from what we are used to, it has an additional storage engine (NDB or NDBCLUSTER), which is initially disabled.
Except for this, the MySQL server is not much different than what we are used to from previous MySQL releases, except any other new 4.1 features, of course. As default, the MySQL is configured with the NDB storage engine disabled; to enable it you need to modify my.cnf.
Also, as the MySQL daemon is an API client to the NDB storage engine, the minimal configuration data needed to access the MGM node from the MySQL server must be set. When this is done, then all MGM nodes (it is enough with one) and DB nodes must be up and running before starting the MySQL server.
This may sound complicated, but once you have set it up once, you’ll find that it is not really that complex. In fact, the most complex part of the configuration is the MGM node — the rest is very minimal.
Getting and building the software
Before continuing with configuring MySQL Cluster, let us have a quick look at how to get and build the software. If you have done this before, or have a binary distribution of MySQL Cluster, then you can skip this section. A binary distribution will be available soon, but before that is in place, one need to download the source code using BitKeeper and build it.
All examples here are from building MySQL Cluster on a PC with a virgin SuSE Linux 9.0 installation. Building MySQL Cluster on other Linux systems should be similar. If you do not already have BitKeeper installed, now is the time to get it from http://www.bitkeeper.com.
Create a directory to hold the source code and then get the source tree from BitKeeper (in BitKeeper lingo this is called to clone):
/home/karlsson> mkdir MySQL
/home/karlsson> cd MySQL
/home/karlsson/MySQL> mkdir Cluster
/home/karlsson/MySQL> cd Cluster
/home/karlsson/MySQL/Cluster> bk clone bk://mysql.bkbits.net/mysql-4.1
This will take some time, and when done, you will have a directory under the Cluster directory that you just created, called mysql-4.1. Before building the software, make sure that you have the appropriate versions of the tools and libraries used by the build. Note that some tools need different versions than are typically used when building MySQL. See the table below for details.
Table 1: Build tools and libraries used in example build Tool/library Version Comments
libncurses 5.2.2 Used by some of the command-line tools.
Make 3.79.1
Gawk 3.1.0 Some Linux distributions come with mawk instead. This is the case with Debian, for example. This will NOT work.
Autoconf 2.56 Very important to have the right version here, at least 2.5x.
Automake 1.7.6 Also very important, some buildfiles rely on recent functionality and absence of bugs. Having the wrong version will cause strange build errors, not immediately related to automake.
Libtool 1.5.6 Again, the version is important. Strange error occurs if very old versions are used.
Gcc 2.95.4
Bison 1.75 The build will fail if an old version of Bison is used, the table-size in the SQL parser will overflow.
Zlib 1.1.4 This isn’t normally required, but due to a small bug, the build will not complete if zlib is missing, and regretably at a very late stage.
Now it is time to build the software, and this not different from how this is usually done and documented, except that you must either choose a preconfigured BUILD-file that includes NDB, or you have to create one yourself. In this case, we take the easy route and use a BUILD-file that includes NDB.
With the source tree that I got, this was limited to the compile-pentium-max build-file. This build also includes OpenSSL, so you either have to modify the build or get OpenSSL, in this case I choose the latter, and realized that the ssl libraries ended up where MySQL didn’t expect them to be, but this was easily fixed.
Alternatively, you can modify a copy of the compile-pentium-max build-file and use that. Assuming this has been set up correctly (if not, the build will fail, and you can usually determine from the errors generated what went wrong), it is now time to build the software:
/home/karlsson> cd MySQL/Cluster/mysql-4.1
/home/karlsson/MySQL/Cluster/mysql-4.1> BUILD/compile-pentium-max
/home/karlsson/MySQL/Cluster/mysql-4.1> make test
/home/karlsson/MySQL/Cluster/mysql-4.1> su
Password: ******
/home/karlsson/MySQL/Cluster/mysql-4.1 # make install
/home/karlsson/MySQL/Cluster/mysql-4.1 # exit
This will build MySQL Cluster and install MySQL in the standard place (/usr/local/mysql). You can also choose to install it elsewhere, but note that make install will not install NDB. The last step before we go on to configure MySQL Cluster is that you might need to run the mysql_install_db script, but this is no different than MySQL without cluster support.
Configuring and starting a simple MySQL Cluster setup
As stated above, you need to have all the DB and MGM nodes up and running first, and this will probably be the most time-consuming part of the configuration, if for no other reason so because this document assumes that you are already familiar with MySQL to a certain extent. As for the MySQL configuration and the my.cnf file, this is very straightforward, and the following covers only the differences from configuring MySQL without clustering.
Setting up the environment
Before starting to configure anything else though, we need to have a working environment. We will show how our example environment is set up here, but again, most Linux and Unix systems are similar. The configuration for Windows is a bit different, but the following section should all the same be helpful. As can be seen from the environment setup file below, it is assumed that MySQL is installed in /usr/local/mysql.
As for NDB, it is assumed that this remains where it was originally built, which in a production environment isn’t really optimal, but is a good choice for testing the system, as it allows us to easily switch between versions of NDB and to rebuild it and fix things, without having to reinstall. The directory where MySQL was built in the example above was /home/karlsson/MySQL/Cluster/mysql-4.1 and as can be seen, NDB is built in the sub-directory ndb, which is the default location of the NDB source.
#!/bin/bash
MYSQL_HOME=/usr/local/mysql
export NDB_HOME=/home/karlsson/MySQL/Cluster/mysql-4.1/ndb
export PATH=$NDB_HOME/bin:$MYSQL_HOME/bin:$PATH
export LD_LIBRARY_PATH=$NDB_HOME/lib:$MYSQL_HOME/lib
export MANPATH=$NDBHOME/man:$MANPATH
Example file 1: Environment setup script
Before commencing, be sure to set up an environment setup file like the one above and then source it:
/home/karlsson> . ndbenv.sh
As is usual with MySQL, even though we are to use NDB as the storage engine, MySQL will store its data dictionary as MyISAM files, so we need to make space for this, unless you haven’t already done this or are upgrading from an earlier version of MySQL. In addition to this directory, we also need some space for NDB.
In this case, we will run each node in its own directory, except of course the DB node, which use the MySQL datadir for its configuration. Here, we will use a directory called data in the current user home directory as the MySQL datadir, and then we create two directories for the remaining nodes in the same location:
/home/karlsson> mkdir ndb_mgmt
/home/karlsson> mkdir ndb_data1
This completes the setup of the MySQL Cluster environment.
MySQL Cluster setup
The MySQL Cluster is configured by one common configuration file, typically called config.ini, which is managed by the management nodes. The other nodes only need to know how to access a management node to get the configuration information.
The name of this file is passed to the management node program when it is started, and this file configures all the physical computers in the cluster, all the nodes and all the communication paths between the nodes. Preset defaults are available for most parameters though, and defaults can be specified in the config.ini file itself. For example, if all DB nodes use the same size of data memory, which is not the same as the preset default size, then a DB DEFAULT entry can be created that specifies the default data memory size for all DB node entries.
The config.ini file that has a rather standard INI-file format, with sections preceded by section headings (surrounded by square brackets), followed by settings and values. One deviation from the standard format being that the setting name and value are separated by a colon (:) instead of an equal sign (=), and another is that the sections aren’t unique. Instead, unique entries (such as two nodes of the same type) are identified by a unique ID. The following example shows how this works:
[COMPUTER]
Id:1
ByteOrder: Little
HostName: localhost
[MGM]
Id:1
ExecuteOnComputer: 1
[DB]
Id: 2
NoOfReplicas: 1
ExecuteOnComputer: 1
FileSystemPath: /home/karlsson/ndb_data1
[API]
Id: 3
ExecuteOnComputer: 1
[TCP DEFAULT]
PortNumber: 10002
[TCP]
NodeId1: 1
NodeId2: 2
[TCP]
NodeId1: 2
NodeId2: 3
Example file 2: config.ini file
This config.ini file represents a very minimal, but working, MySQL Cluster configuration. All available settings are explained in detail in the MySQL Cluster administration manual — we will only cover the sections and settings used in this example.
There are two distinct types of sections, one that identifies a specific entity, such as a computer or a DB node, and one that defines defaults for a type of entry. One or two unique identifiers identify a non-default section; default sections have the section type post-fixed by the word DEFAULT, for example [DB DEFAULT]. Note that a unique identifying section must exist for all involved nodes and paths between nodes, even though these nodes and paths use only default settings:
[COMPUTER] – Identifies that the following settings are for a specific hardware computer that will host one or more of the nodes of the MySQL Cluster.
Id – This setting determines the ID of this computer, and is referenced by other entries in the configuration to determine which hardware an MySQL Cluster node resides on.
ByteOrder – Determines the byte ordering of the computer (LittleEndian or BigEndian).
HostName – Determines the hostname of this computer.
[MGM] – This section defines the settings for a management-node on the MySQL Cluster.
Id – This is the unique node identifier of this management node.
ExecuteOnComputer – References the Id of the COMPUTER that this node executes on.
[DB] – This section heading starts a section that contains settings for a MySQL Cluster database node.
Id – This is the unique node identifier of this database node.
NoOfReplicas – Defines how many replicas (copies) exist of the data in the database.
ExecuteOnComputer – References the Id of the COMPUTER that this node executes on.
FileSystemPath – Path to where this DB node stores its data on disk. Note that each DB node must have a unique file system path.
[API] – This section identifies a MySQL Cluster client. Note that this must be specified, even though you intend to use SQL through MySQL, as the MySQL server is, from the point of view of MySQL Cluster, considered a client.
Id – This is the unique node identifier of this API node, or rather, the computer where the MySQL server executes.
ExecuteOnComputer – References the Id of the COMPUTER that the MySQL server executes on.
[TCP] – Identifies a section that defines how communication between the different nodes take place. A TCP DEFAULT section can be used to set default attributes of each such link.
Node1 – References the Id of the first node in this communication path.
Node2 – References the Id of the second node in this communication path.
PortNumber – Specified the TCP/IP port used for communication between these nodes. The default value is incremented by 1 for each TCP section. So if the default is 10000, then the first TCP section gets 10001 and the second 10002 and so on (assuming of course that no Port is specified for that TCP entry).
Note that the config.ini file may have many more options; the options documented above and used in the example in Example 2 are just the very basic ones. Paths must be defined between every MGM and every DB, from every DB to every DB, and from each API to each DB node. There is no need to define communication between MGM and API nodes.
The MGM node is used to manage all the DB nodes in the cluster, whereas the clients, such as the MySQL server, access the MGM node for configuration information, but is not managed by the MGM node, i.e. the MGM node need not send data to the API node, except in the case where the API node requests it, of course.
Each executing node has to know at least how to identify itself, i.e. which Id entry in the config.ini file that this node refers to, and how to access the management node to get the relevant configuration. This information is set up in a file called Ndb.cfg or alternatively by using an environment variable. In this case we will use the Ndb.cfg file approach.
The Ndb.cfg file looks similar on all nodes in the cluster, including the MGM nodes themselves, except that the nodeid is different and that the management server connection data is not necessary in the case of the management node itself (and even this entry has a sensible default).
nodeid=1
Example file 3: Ndb.cfg file for the management node
This Ndb.cfg file is placed in the current directory where the node executes, in the case of the file in example 3 this is the MGM node (as the nodeid is 1) which is placed in /home/karlkarlsson/ndb_mgmt. In the case of the DB node, the host and port of the management server needs to be specified, but except for this, it is very similar to the one used for the management node. The default port for the management node is 2200, and that is what is being used here.
nodeid=2
host=127.0.0.1:2200
Example file 4: Ndb.cfg file for the first database node
MySQL Cluster startup
At this point, before moving on to configuring the MySQL server, I recommend that you follow this document and start the MySQL Cluster without the MySQL server to make sure that it is configured correctly. In the minimal working MySQL Cluster configuration there are three nodes, and hence three processes, but as one of those nodes is the API Client, or the MySQL server, we need only start two nodes to get the MySQL Cluster up and running. First, we need to start the management node.
Make sure that you have the Ndb.cfg and the config.ini files, as documented above, in the ndb_mgmt and ndb_data1 directories and that the environment is set up properly, i.e. the PATH and LD_LIBRARY_PATH (the latter might have a different name for some operating systems) variables referencing the appropriate directories. Then you can start the management server:
/home/karlsson> cd ndb_mgmt
/home/karlsson/ndb_mgmt> mgmtsrvr -c config.ini
NDB Cluster Management Server. Version 3.4.5 (alpha)
Command port: 2200, Statistics port: 2199
NDB>
Note that you must specify the filename of the MySQL Cluster configuration file when starting the management server, it has no default. This will start the management server. If there are errors in your configuration file, the management server will not start. On some operating systems, the management server can be started as a daemon, but we are not going to use that feature in this example, so as you can see, assuming that the management server started OK, you now get a prompt to enter commands. Type HELP to see the available commands. For example, SHOW PARAMETERS will display all the MySQL Cluster settings, including default values and values specified in the config.ini file.
The next step is then to start the MySQL Cluster DB node for this example. Open another terminal window, and make sure the environment is set up correctly by sourcing the environment setup script created previously. Also, make sure that the Ndb.cfg file is copied to the MySQL Cluster DB node directory, ndb_data1, which was created previously. Then the DB node can be started by:
/home/karlsson> cd ndb_data1
/home/karlsson/ndb_data1> ndb
— NDB Cluster — DB node 2 — Version 3.4.5 (alpha) —
2004-05-05 22:52:11 [NDB] INFO — Start initiated (version 3.4.5)
2004-05-05 22:52:14 [NDB] INFO — Started (version 3.4.5)
If you see no errors here, the storage engine of MySQL Cluster is now up and running! If you switch to the terminal window where you started the management server, you can now check that the database node is running by issuing the command STATUS preceded by the Id of the database node in question, using out example:
NDB> 2 STATUS
Node 2: Started (Version 3.4.5)
NDB>
If all is well now, we can now go on with getting the MySQL server running, which should be much easier. Note that the MySQL Cluster storage engine will continue to run while this is being done.
MySQL server daemon configuration
As you already know, the MySQL server daemon is configured using the my.cnf file. As for setting up this with MySQL Cluster, the only required option that you need to change is to enable the NDB storage engine, as it is not enabled by default.
Also, as the MySQL server daemon is a client as far as the MySQL Cluster is concerned, a Ndb.cfg file is needed. This looks just like the ones used for the management and database nodes defined above, and is placed in the MySQL datadir, as defined in my.cnf.
nodeid=3
host=127.0.0.1:2200
Example file 5: Ndb.cfg file for the MySQL server daemon storage engine
In this case, a very simple MySQL server configuration will be used, and the datadir is placed in the home directory of the current user. Initially we will start the MySQL server without having NDB configured, just to make sure that the configuration is valid.
[mysqld]
datadir=/home/karlsson/data
# Uncomment the following to enable the NDB storage engine.
#ndbcluster
Example file 6: Example my.cnf file with the NDB storage engine disabled
In this case I use a user-based my.cnf file, so it is placed in /home/karlsson and is called .my.cnf but you can use the global one in /etc/my.cnf also. Don’t forget that that the user-based configuration file will override settings defined and that any settings not set in the user-based file will use the settings in the global file.
The location of the global configuration file may vary between different operating systems, but this is no different from when using MySQL without the NDB storage engine. With the MySQL server daemon configured, let us now test that the daemon starts (and before running the following command, don’t forget to set up the environment with the environment setup script created above):
/home/karlsson> mysqld_safe &
[1] 9667
/home/karlsson:~> Starting mysqld daemon with databases from /home/karlsson/data
/home/karlsson>
With the daemon running, try to connect to the server:
/home/karlsson> mysql test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \\g.
Your MySQL connection id is 1 to server version: 4.1.2-alpha
Type ‘help;’ or ‘\\h’ for help. Type ‘\\c’ to clear the buffer.
mysql> use test
Database changed
mysql> exit
Bye
/home/karlsson>
If it works so far, it is time to restart the MySQL server daemon with the NDB storage engine enabled. Exciting! First shut down the MySQL server daemon.
/home/karlsson> mysqladmin -u root shutdown
040505 23:08:32 mysqld ended
[1]+ Done mysqld_safe
/home/karlsson>
Then open your .my.cnf file and uncomment the ndbcluster setting.
[mysqld]
datadir=/home/karlsson/data
# Uncomment the following to enable the NDB storage engine.
ndbcluster
Example file 7: Example my.cnf file with the NDB storage engine enabled
And then restart the MySQL daemon server.
/home/karlsson> mysqld_safe &
[1] 9710
/home/karlsson:~> Starting mysqld daemon with databases from /home/karlsson/data
/home/karlsson>
You should now be able to create tables using the NDB table type.
/home/karlsson> mysql
Welcome to the MySQL monitor. Commands end with ; or \\g.
Your MySQL connection id is 1 to server version: 4.1.2-alpha
Type ‘help;’ or ‘\\h’ for help. Type ‘\\c’ to clear the buffer.
mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table mytab(col1 integer not null primary key,
-> col2 char(20)) TYPE=NDB;
Query OK, 0 rows affected, 1 warning (0.61 sec)
mysql> insert into mytab values(1, ‘xxx’);
Query OK, 1 row affected (0.00 sec)
mysql> insert into mytab values(1, ‘xxx’);
ERROR 1022 (23000): Can’t write, duplicate key in table ‘mytab’
mysql> insert into mytab values(2, ‘xxx’);
Query OK, 1 row affected (0.00 sec)
mysql> select * from mytab;
+——+——+
| col1 | col2 |
+——+——+
| 2 | xxx |
| 1 | xxx |
+——+——+
2 rows in set (0.00 sec)
mysql>
So now you have a MySQL server talking to a MySQL Cluster running on a single machine, and you’re ready to dive into the MySQL Cluster Administration Guide to learn more about running in a multi-computer environment, backing up and restoring data from the cluster, and troubleshooting your cluster.
3 Responses
… [Trackback]
[…] Find More Information here to that Topic: nblog.syszone.co.kr/archives/508 […]
… [Trackback]
[…] Find More on on that Topic: nblog.syszone.co.kr/archives/508 […]
… [Trackback]
[…] Read More Information here on that Topic: nblog.syszone.co.kr/archives/508 […]