Wednesday, June 03, 2009

kill, exit(), _exit() and issues getting gcov results

Hi,

We are currently running code coverage for MySQL Cluster testing and have hit a few bumps along the road when it cam to collecting the "block" coverage for the NDBD.

I wanted to share them here for others that might run into similar frustrations when doing testing code coverage.

Gcov accumulates during each run information on which functions and lines of code are called and a total for the times called.

The problem comes from when gcov dumps this information out. The actual data is dumped on the "exit" of the program.

After much time tracking this issue down, it turns out that the NDBD code had been changed to use _exit() instead of just exit()

What is the difference?

exit()
Terminate the process after cleanup.

_exit()
Terminate process immediately.

So by calling _exit(), gcov never had a chance to dump the data that it had collected.

So the quick workaround to this is to wrap all the _exit() calls with an #ifndef

i.e.

#ifndef HAVE_gcov
_exit(0)
#else
exit(0)
#endif

Question: Why use the _exit? If you have trouble with a program hanging after the exit() has been called a solution is to use _exit().

Then we also found that some of the testing was using the "kill -9" to end the Cluster instance. This, of course, is the same as calling _exit().

So if you are using gcov, and not getting results, the above may have to do with your frustration.

Hope this saves someone some time and frustration!!!

Cheers,
/Jeb

Thursday, April 23, 2009

Scalability Issue, or just tuning required for Auto Increment

Hi,

First let me start by making sure you know Sun's MySQL Cluster 7.0 has been released as GA. The was announced at the MySQL Users Conference this week.

You can read more about the release at:

http://www.mysql.com/products/database/cluster/
http://www.mysql.com/news-and-events/generate-article.php?id=2009_06

Now, on to the topic.

Disclaimer:
All performance numbers released here are used for development purpose only and are not intended for marketing. The systems are not set for optimal throughput and the results of tests are "only" used to ensure that we have not lost performance from commit to commit or to, as in this case, track down issues.

Moreover, the TPC-"B" test is a retired test from www.tpc.org, but is still a very useful test when looking at simple transaction and scalability of simple transactions w/ a low probability of deadlocks and rollbacks.

Results for other users or customers may vary and each should try different combination of options to see which works best for them.

Article:
During testing of 7.0, our automated testing system that runs and compares benchmark tests found a scalability issue.

The issue first centered around the ndb-connection-pool option.

If the option was set to 1 (or default) we seemed to scale okay:

TPC-B results for ndb-connection-pool=1

Thread(s) Value(s)
----------------------------
4 695.2407989502
8 1133.3599700928
12 1287.9699935913
16 1299.6866531372
20 1290.1900100708
24 1294.7066650391

but when we set the connection pool to say 4, we then saw where the performance seemed to flat-line:

TPC-B results for ndb-connection-pool=4

Thread(s) Value(s)
----------------------------
4 202.85333633423
8 200.13333511353
12 265.94666481018
16 276.55333137512
20 276.59096050262
24 282.05245113373

After some pretty extensive testing, it was isolated to the insert into the history table which is an alone transaction.

Transaction Profile:

BEGIN
DECLARE ran INT;
START TRANSACTION;
INSERT into $database.history (aid,tid,bid,TransTime,scriptID,filler)
VALUES(ran,ran,ran,now(),ran,'MoreMoney');
COMMIT;
END

History Schema: (Take note of the AUTO_INCREMENT)

CREATE TABLE history (id BIGINT NOT NULL AUTO_INCREMENT,
aid INT,
tid INT,
bid INT,
TransTime DATETIME,
scriptId CHAR(10),
filler CHAR(80),
PRIMARY KEY (id))ENGINE=$engine_type;

There some known weirdness with cluster and auto inc. For performance reasons a MySQL Demon will pre-fetch a series of auto inc numbers from the cluster.

After we realized that AUTO_INCREMENT was involved, it was time to start playing with it to see what made it better (if anything) or worse. We realized that having multiple thread from the mysqld to the cluster can cause contentions if there are more clients then there are MySQLD threads into the cluster.

So that open the question, what happens if ndb-connection-pool is greater then the number of clients that you have connecting and inserting into this table?

Side Note:

---------
Make sure that you have enough API slots in the cluster to cover all the MySQLD threads that will be connecting or, you will run into bug#44330.
{http://bugs.mysql.com/bug.php?id=44330 }
---------


Some testing outside the framework using a higher ndb-connection-pool setting showed promise.

We updated the automated testing to include an ndb-connection-pool = 24 and added additional API slots to cover these threads.

TPC-B results for ndb-connection-pool=24

Thread(s) Value(s)
----------------------------
4 940.3466796875
8 1425.5633239746
12 1643.7499847412
16 1671.7252731323
20 1674.6633224487
24 1670.5512924194

The results showed a scalability improvement with an average 28% increase of throughput.

Deviation percentage(s) for run.

Threads %
------- ------
4 35.2548183460184
8 25.7820429159744
12 27.623313657942
16 28.6252551026102
20 29.7997434003385
24 29.0293266829633

Yet, there where some other unanswered questions.

We had also seen issues with auto inc when it came to Ndb_cache_check_time.

In addition, we were wondering what effect the Ndb_autoinc_fetch_sz (default 256) was having on scalability.

We ran well over 30 small case tests changing these different setting to see what had the greatest throughput.

What showed to give the most performance (in this case, maybe not yours) was to have the following setting.

NDB API Slots = ndb-connection-pool
ndb-connection-pool= or > number of clients
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

To verify this, we updated the automated test suite configurations leaving the pool = 24 and adding the other 2 new settings.

ndb-connection-pool=24
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

Thread(s) Value(s)
----------------------------
4 1216.9033508301
8 2115.7299804688
12 2684.7500152588
16 3216.4866790771
20 3341.2700042725
24 3353.1133270264

This gave us a huge boot.

Deviation percentage(s) for run.
Threads %
------- ------
4 29.4100757855078
8 48.4136091948517
12 63.3308009235662
16 92.4052193725821
20 99.5189098299998
24 100.718968776481


Summary:

If (you are using Sun's MySQL Cluster && MySQLD to insert rows into a table with an Auto Increment column && your just not getting the throughput you are wanting) then you should take a close look at the 3 options listed to see what combination works best for your setup.

ndb-connection-pool=?
Ndb_cache_check_time= (suggested 1)
Ndb_autoinc_fetch_sz= (suggested 200)

In addition, there are other tests such as TPC-C (DBT2) that showed similar gains for the connection-pool, but no additional gains for the other setting as this test does not use auto inc columns.

There is still a bug open over the original problem http://bugs.mysql.com/bug.php?id=43469 and this is to be fixed at a later date.

Side Note:

TPC-"B" utilizing Disk Data also benefited greatly from this new combination.

[Disk Data Results]
ndb-connection-pool=1

Thread(s) Value(s)
----------------------------
4 891.16665649414
8 1337.1700134277
12 1554.7266845703
16 1616.2533416748
20 1589.3699951172
24 1579.4092941284

ndb-connection-pool=24
Ndb_cache_check_time=1
Ndb_autoinc_fetch_sz=200

Thread(s) Value(s)
----------------------------
4 1162.4266662598
8 1972.8933258057
12 2556.6375579834
16 2910.8500213623
20 3147.7466583252
24 3058.708442688