Wednesday, June 03, 2009

kill, exit(), _exit() and issues getting gcov results

Hi,

We are currently running code coverage for MySQL Cluster testing and have hit a few bumps along the road when it cam to collecting the "block" coverage for the NDBD.

I wanted to share them here for others that might run into similar frustrations when doing testing code coverage.

Gcov accumulates during each run information on which functions and lines of code are called and a total for the times called.

The problem comes from when gcov dumps this information out. The actual data is dumped on the "exit" of the program.

After much time tracking this issue down, it turns out that the NDBD code had been changed to use _exit() instead of just exit()

What is the difference?

exit()
Terminate the process after cleanup.

_exit()
Terminate process immediately.

So by calling _exit(), gcov never had a chance to dump the data that it had collected.

So the quick workaround to this is to wrap all the _exit() calls with an #ifndef

i.e.

#ifndef HAVE_gcov
_exit(0)
#else
exit(0)
#endif

Question: Why use the _exit? If you have trouble with a program hanging after the exit() has been called a solution is to use _exit().

Then we also found that some of the testing was using the "kill -9" to end the Cluster instance. This, of course, is the same as calling _exit().

So if you are using gcov, and not getting results, the above may have to do with your frustration.

Hope this saves someone some time and frustration!!!

Cheers,
/Jeb

Thursday, April 23, 2009

Scalability Issue, or just tuning required for Auto Increment

Hi,

First let me start by making sure you know Sun's MySQL Cluster 7.0 has been released as GA. The was announced at the MySQL Users Conference this week.

You can read more about the release at:

http://www.mysql.com/products/database/cluster/
http://www.mysql.com/news-and-events/generate-article.php?id=2009_06

Now, on to the topic.

Disclaimer:
All performance numbers released here are used for development purpose only and are not intended for marketing. The systems are not set for optimal throughput and the results of tests are "only" used to ensure that we have not lost performance from commit to commit or to, as in this case, track down issues.

Moreover, the TPC-"B" test is a retired test from www.tpc.org, but is still a very useful test when looking at simple transaction and scalability of simple transactions w/ a low probability of deadlocks and rollbacks.

Results for other users or customers may vary and each should try different combination of options to see which works best for them.

Article:
During testing of 7.0, our automated testing system that runs and compares benchmark tests found a scalability issue.

The issue first centered around the ndb-connection-pool option.

If the option was set to 1 (or default) we seemed to scale okay:

TPC-B results for ndb-connection-pool=1

Thread(s) Value(s)
----------------------------
4 695.2407989502
8 1133.3599700928
12 1287.9699935913
16 1299.6866531372
20 1290.1900100708
24 1294.7066650391

but when we set the connection pool to say 4, we then saw where the performance seemed to flat-line:

TPC-B results for ndb-connection-pool=4

Thread(s) Value(s)
----------------------------
4 202.85333633423
8 200.13333511353
12 265.94666481018
16 276.55333137512
20 276.59096050262
24 282.05245113373

After some pretty extensive testing, it was isolated to the insert into the history table which is an alone transaction.

Transaction Profile:

BEGIN
DECLARE ran INT;
START TRANSACTION;
INSERT into $database.history (aid,tid,bid,TransTime,scriptID,filler)
VALUES(ran,ran,ran,now(),ran,'MoreMoney');
COMMIT;
END

History Schema: (Take note of the AUTO_INCREMENT)

CREATE TABLE history (id BIGINT NOT NULL AUTO_INCREMENT,
aid INT,
tid INT,
bid INT,
TransTime DATETIME,
scriptId CHAR(10),
filler CHAR(80),
PRIMARY KEY (id))ENGINE=$engine_type;

There some known weirdness with cluster and auto inc. For performance reasons a MySQL Demon will pre-fetch a series of auto inc numbers from the cluster.

After we realized that AUTO_INCREMENT was involved, it was time to start playing with it to see what made it better (if anything) or worse. We realized that having multiple thread from the mysqld to the cluster can cause contentions if there are more clients then there are MySQLD threads into the cluster.

So that open the question, what happens if ndb-connection-pool is greater then the number of clients that you have connecting and inserting into this table?

Side Note:

---------
Make sure that you have enough API slots in the cluster to cover all the MySQLD threads that will be connecting or, you will run into bug#44330.
{http://bugs.mysql.com/bug.php?id=44330 }
---------


Some testing outside the framework using a higher ndb-connection-pool setting showed promise.

We updated the automated testing to include an ndb-connection-pool = 24 and added additional API slots to cover these threads.

TPC-B results for ndb-connection-pool=24

Thread(s) Value(s)
----------------------------
4 940.3466796875
8 1425.5633239746
12 1643.7499847412
16 1671.7252731323
20 1674.6633224487
24 1670.5512924194

The results showed a scalability improvement with an average 28% increase of throughput.

Deviation percentage(s) for run.

Threads %
------- ------
4 35.2548183460184
8 25.7820429159744
12 27.623313657942
16 28.6252551026102
20 29.7997434003385
24 29.0293266829633

Yet, there where some other unanswered questions.

We had also seen issues with auto inc when it came to Ndb_cache_check_time.

In addition, we were wondering what effect the Ndb_autoinc_fetch_sz (default 256) was having on scalability.

We ran well over 30 small case tests changing these different setting to see what had the greatest throughput.

What showed to give the most performance (in this case, maybe not yours) was to have the following setting.

NDB API Slots = ndb-connection-pool
ndb-connection-pool= or > number of clients
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

To verify this, we updated the automated test suite configurations leaving the pool = 24 and adding the other 2 new settings.

ndb-connection-pool=24
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

Thread(s) Value(s)
----------------------------
4 1216.9033508301
8 2115.7299804688
12 2684.7500152588
16 3216.4866790771
20 3341.2700042725
24 3353.1133270264

This gave us a huge boot.

Deviation percentage(s) for run.
Threads %
------- ------
4 29.4100757855078
8 48.4136091948517
12 63.3308009235662
16 92.4052193725821
20 99.5189098299998
24 100.718968776481


Summary:

If (you are using Sun's MySQL Cluster && MySQLD to insert rows into a table with an Auto Increment column && your just not getting the throughput you are wanting) then you should take a close look at the 3 options listed to see what combination works best for your setup.

ndb-connection-pool=?
Ndb_cache_check_time= (suggested 1)
Ndb_autoinc_fetch_sz= (suggested 200)

In addition, there are other tests such as TPC-C (DBT2) that showed similar gains for the connection-pool, but no additional gains for the other setting as this test does not use auto inc columns.

There is still a bug open over the original problem http://bugs.mysql.com/bug.php?id=43469 and this is to be fixed at a later date.

Side Note:

TPC-"B" utilizing Disk Data also benefited greatly from this new combination.

[Disk Data Results]
ndb-connection-pool=1

Thread(s) Value(s)
----------------------------
4 891.16665649414
8 1337.1700134277
12 1554.7266845703
16 1616.2533416748
20 1589.3699951172
24 1579.4092941284

ndb-connection-pool=24
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

Thread(s) Value(s)
----------------------------
4 1162.4266662598
8 1972.8933258057
12 2556.6375579834
16 2910.8500213623
20 3147.7466583252
24 3058.708442688

Friday, December 19, 2008

MySQL 5.1 GA is of good quality

Hi,

Been a while since I lasted posted.

It has been a busy year with testing on 5.1, 5.1 Replication, the Telco branches of cluster and doing test Automation.

I know many have read the post that one of the founders did about MySQL 5.1 GA and the known issues with it. I as QA wanted to extend my take.

I started working for MySQL back in April of 2005. Official QA had only been around since Feb of 2005 and I think I was the 3rd or 4th brought in. My position at another DB software provider had gone to India which allowed me to find this opportunity.

When I first started, there were not many things in place for QAing the product and very limited automation. Even so, finding bugs was like shooting fish in a barrel with and over sized shotgun.

In fact, I spent more time writing bug reports then I did testing. There was a lot of low hanging fruit. Some the size of cherries and some the size of watermelons.

MySQL 5.0 was already behind and had many known and unknown issues.

Before the official start of QA inside MySQL, testing was left up to developers and community to find issues. It was much easier to ship with no known issues as that is just what they were, not known. This is not to say they did not exist in those versions, just that no one had found them.

Fast forward to today, QA is in place and detailed testing is being done

We have system, stress, load, performance, regression, unit, etc... testing that happening each and every day. In addition, there is better automated testing in place. Due to this most of all, if not all, the low hanging fruit has been picked.

I now spend more time testing and developing tests/automation then I do writing bug reports.

I know that 5.1 is a much better release then 5.0 was by far. Are there still bugs? Sure, but many are known and listed for anyone to review. I know of no software company (closed or open source) that ships and guarantees no bugs.

5.1 has had thousands of bugs found and fixed before it ever received the GA title. It has many wonderful features that many have been waiting on GA so that could officially use it in production. (i.e. Row Based Replication).

So if you are holding off looking at or trying MySQL 5.1 because of a naysayer, I would encourage you to try it, see for yourself and makeup your own mind.

As always, I am open to better ways to test our product. Feel free to email me and let me know your thoughts.

/Jeb

Wednesday, February 20, 2008

MySQL Conference User and contacting me


MySQL Conference & Expo 2008



Hi,

I don't get out here too often. If you have a question, or comment, please feel free to post it here, but also send it directly to me @ jmiller@mysql.com. You will get a faster response.

Best wishes,
/Jeb

Monday, May 28, 2007

Just when you think you know something

Hi,

Been a long time since last I posted.

I just got back from the Cluster Summit in Stockholm.

During this meeting we covered topics such as Design and Internals of MySQL Cluster, Dolphin Express for MySQL Cluster, Road Map discussion, Cluster & Replication Troubleshooting, Benchmarks with new performance patches and of course QA.

Items brought up during the QA Discussion included:

* HA Testing (High Availability)
* Log Events Testing
* Large Data sets
* Backup/Restore
* Configuration Testing
* Functional testing (Regression testing of Standard Functionality)
* Profiling
* Performance

So as you can see from above, I will be busy for some time to come :-)

Many of the above are being done now. We discussed how to take them to the next level.

During this meeting I was asked to take a Cluster Certification Test. No problem I thought, this should be a breeze. Just when you think you know something you are reminded that there is always areas to learn about.

I have been working with Cluster for over 2 years now and loving the product. The Certification test showed me that there was still plenty for me to learn. I am happy to know that MySQL DBA's that take and pass this test will have to know what they are talking about. I am saying that this is not just some easy test; which would lower the value of the Certification IMHO.

Want to see me and others Cluster MySQLers sweating... I mean taking the test?
Checkout this:
http://rpbouman.blogspot.com/2007/05/ssst-quietcluster-certification-exam-in.html

I know, you are asking, well did he pass the test? The answer is yes, but I have to admit that my score was not what I would have liked for it to be. Just means that I have much more to learn.

So if you are a Cluster Shop and you are looking to hire someone that is a MySQL Certified DBA with Cluster Certification, I for one can tell you that they had to have good Cluster knowledge to get that Cert.

Don't believe me, well then, just take the test and see for yourself ;-)

As always, I love to have feedback and ideas about QAing Cluster and Replication.

Jonathan Miller
Austin, Texas USA
Senior Lead Quality Assurance Developer
MySQL AB www.mysql.com

Friday, September 29, 2006

MySQL 5.1 Cluster Testing

Hi,

It has been a while since I last posted. We have been busy testing MySQL Cluster, MySQL Replication and MySQL Cluster Replication. Good news is the serious bugs are getting harder to find. This is allowing us to spend some time automating more testing like configuration and up/down grade testing for example.

The documentation guy for cluster, Jonathan "Jon" Stephens, has been doing great work to expand the Cluster on-line manual. If you have not been out there in a while to review you should have a look.

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html

I feel very positive about this version of the cluster as it has had more hands on testing then ever before. The testing includes: Functional, Negative, Load/Stress, Performance, Recovery, Network Failure, etc...

The new Disk Data will allow customer with huge database to start taking advantage of the reliability and scalability that the cluster offers.

In addition the Cluster Replication opens a whole new use for cluster. This feature adds a few more 9's to the 99.999 up time for services and allows for many different strategies in database design to take place.

One example would be to replicate to another cluster so that data mining and report generation can be done with out putting a load on the main system.

In addition, you could replicate to another site so that if something goes wrong at the main site the secondary site could take over minimizing the companies down time.

I would love to here from the users out there in the world that are using the cluster and have ideas of test they would like to see done, and or want to share how they are using it so we can incorporate that into our testing.

I will attempt to post more often and keep you apprised of our efforts and share any tips or trick I run across that might run across.

Cheers!

/ Jeb

Jonathan Miller
Austin, Texas USA
Senior Lead Quality Assurance Developer
MySQL AB www.mysql.com

Jumpstart your cluster!
http://www.mysql.com/consulting/packaged/cluster.html

Get training on clusters
http://www.mysql.com/training/courses/mysql_cluster.html

All-in-one Enterprise-grade Database, Support and Services
http://www.mysql.com/network/

Thursday, August 03, 2006

MySQL Cluster can be too much typing

Testing the MySQL Cluster can mean a lot of typing. Not being the best typist and getting tired of typing the same thing over and over again I created a short cut to make life better.

I have a file called shortcut.sh. Inside my .bash_profile for each of my test systems, I have added this file to be run when I login.

.bash_profile example

. /home/ndb/jmiller/shortcut.sh

Then all the command and connection strings I need are just few key strokes away.

shortcut.sh

alias ndbd='/home/ndb/jmiller/builds/libexec/ndbd'

alias ndb_mgmd='/home/ndb/jmiller/builds/libexec/ndb_mgmd'

alias ndb_mgm='/home/ndb/jmiller/builds/bin/ndb_mgm'

alias mysql='/home/ndb/jmiller/builds/bin/mysql'

alias mysql_server='/etc/init.d/mysql.server'

alias ndb_restore='/home/ndb/jmiller/builds/bin/ndb_restore'

alias mysqldump='/home/ndb/jmiller/builds/bin/mysqldump'

alias installdb='/home/ndb/jmiller/builds/bin/mysql_install_db'

export NDB_CONNECTSTRING=n10:14000

alias mysqladmin='/home/ndb/jmiller/builds/bin/mysqladmin'

dbvar="/space/var"

ndbrun="/space/run"

jmiller="/home/ndb/jmiller"

Before doing this, to start a cluster I would have to do the following:

(short example)

Either cd to

$> /home/ndb/jmiller/builds/libexec/

or

type

$> /home/ndb/jmiller/builds/libexec/ndb_mgmd -f /path/to/config/file

and then

$>/home/ndb/jmiller/builds/libexec/ndbd -c n10:14000 --initial

as you can see this would get to be tiring.

By adding the shortcut, I can just do

$>ndb_mgmd -f /path/to/config/file

$> ndbd --initial

Note that since I exported my NDB_CONNECTSTRING I do not need to include the -c option when starting the ndbd. In addition, I can now use ndb_mgm (mgt client) any where on the host and I do not need to include the connect string.

example:

$>ndb_mgm -e show

$>ndb_mgm -e "all status"

$>ndb_mgm -e shutdown

In addition I created in the file above some shortcuts to directories that I often need to go to.

Example:

cd /space/var

Or

cd $db

It only saved 7 key strokes, but in a day of typing that can really add up.

In short, we are supposed to work smarter, not harder. Hope this helps for those using MySQL Cluster