It’s all about performance

April 22nd, 2010

Database drivers are all about performance. They have a simple job to do and that is, to send and receive data from the application to the database server in the fastest possible way. Period. This is a fundamental requirement of database drivers, especially the ISAM to SQL kind where the product has to work with a new database without source level changes.

While the driver is the most important component in improving performance, there are some other elements to improving performance, rated (in my opinion)  in order of importance:

  1. The database driver
  2. The server hardware
  3. The database server setup
  4. The network

Database Drivers
All drivers are not created equal, (just like a Fiat and Ferrari are not equal.) A lot depends on how the product is engineered, how it handles data and how it interacts with the relational back end.

Over the years, when working on a new driver version, we have always put performace as our number one priority. Our development team’s goal is to squeeze out every last bit of performance possible. If we feel that re-engineering our code will give us that extra 10-15 percent, we will do it! For our v10 drivers for DataFlex, we pretty much re-engineered the entire source code to give us twice the performance of our v9 drivers.

We are already seeing the results of these re-engineering efforts.  A recent customer prospect doing a performance comparison between our Flex2SQL for SQL Server and the DAW CK for DataFlex reports Mertech’s driver outperforms DAW CK  by 3 to 5 times. In some cases, with larger tables, the performance difference was even higher. Clearly the driver plays a big part in performance.

Another area where the driver matters is in scalability. We recently deployed our Oracle driver at a customer site with 2500+ concurrent users! This customer had tried unsuccessfully to go live  with DB2 for over a year and finally gave up. It was not the DB2 server that couldn’t handle the load. It was the DB2 driver.

Server Hardware
When choosing a server hardware,  you need to make sure that you have adequate RAM and you have the fastest possible disk set up. The more RAM you have, the more data will be cached and served from memory. The biggest hardware bottleneck in client server computing  is disk access. Knowing what type of disk configuration will give you better result becomes very important. For example, knowing that RAID 1+0 will be 30-40%  faster than RAID 5 when writing to disk.  There is a huge repository of information on how to optimize a given database server. Include that in your planning process, especially if you are going to have  100+ users doing a lot of disk access.

Database Server Configuration
Most database servers perform quite well out of the  box with minimal tuning requirements. However, if you will have a large number of users, say 100+ users with large amount of data, you will need to make sure that the server is set up to reduce disk contention and disk writes. One way of doing this to by using a smart SAN or by putting your transaction log, index tables and data tables on separate disk spindles. Allocating a large RAM for server process is always good and if you are in Windows, make sure your server has a higher process priority.

Network
The network used to be an issue when we had 10MB networks which was fairly common just a few years ago. This is largely not an issue any longer with faster 100MB and Gigabit switches. However, when running the application over WAN, VPN and even wi-fi network can become a big factor. There you deal with some other issues like network connection glitches and data traffic over a smaller pipeline. The new intelligent data fetching mechanism we implemented in our v10 driver for DataFlex significantly reduces network traffic and also handles network connection issues by automatically retrying when connection to database fails. This means the application doesn’t crash when connection to server is lost for any reason.

Mertech Flex2SQL Vs. DAW CK: What are the differences?

March 24th, 2010

I still recall the time in 1997 when I presented the first SQL drivers; most DataFlex developers were skeptical. Their first question was “Does this even work?” Followed by “Why should we switch from Pervasive or DataFlex? It works for us and is the “right” fit for DataFlex.”

Now, 12 years later, it’s clear that we have come a long way. No one questions any connectivity from DataFlex to SQL, but there is a caveat.  The driver does a lot of work and it must not be assumed that just because Mertech’s solution works, others will work the same way.  So what are the differences?  Are all drivers the same?  While we have answered these questions many times, we have never formally published it.

So here they are:

  1. OLEDB vs ODBC: Mertech uses OLEDB while DAW CK uses CLI
    Microsoft has three different APIs to connect to SQL Server. DBLIB, CLI and OLEDB. DBLIB was a layover from the old Sybase engine that MS acquired in the early ’90s that turned into MS SQL Server. CLI was created to provide ODBC access. And then OLEDB was created to replace CLI and provide a universal API to access both SQL as well transaction data.  Our first driver, v2.x was developed using DBLIB but then in 1999, we wrote the driver to use OLEDB API.   Our reason for choosing OLEDB over CLI was 1) performance 2) better record access for ISAM style databases 3) Connection pooling and better connection handling, and 4) because MS told us 10 years ago that they were internally shifting everything to OLEDB and ADO (built on top of OLEDB) and away from CLI. The DAW CK uses CLI, the ODBC interface which provides access to SQL server through an ODBC driver.  Our testing had showed that for a mix of set and record based fetches, we achieved a 15% built-in performance advantage for just choosing the right API.
  2. Performance
    Mertech’s drivers overwhelmingly outperform DAW’s drivers in a real world environment. Our v9 drivers were two times faster than DAW’s. The new v10 drivers are 3-5 times faster depending on Indexes being used, etc. This was again highlighted yesterday by a customer in the U.S. who was looking to adopt SQL. They ran the same process on two different machines connecting to the SQL with everything exactly the same except the driver.  I asked him what performance difference he saw and he said, “When the process running with Mertech’s driver had finished, the process running DAW CK was at 14%.” This is consistent with the findings of other developers as well as with our own internal testing. The comparison was done with our latest v10 driver and the latest DAW v5 CK.  Needless to say, we have a new ISV partner! This same story was told to me by a Swedish ISV at EDUC in Seefeld. They had deployed five sites with DAW CK, after moving from Pervasive, faced serious performance problems and are now looking to deploy with Mertech.
  3. Intelligent Data Fetching
    With our v10 drivers we introduced an intelligent data fetching mechanism that determines the most optimal number of rows and columns to fetch. This highly complex process significantly boosts performance while reducing network traffic. For example, if you have a selection list based on an index with two segments in a table with 40 columns and say one million rows, we will fetch only the two columns and only the rows you see on the screen. Not only does it result in reduced network traffic but also makes it work with good speed over slow WAN connections.
  4. Recovering from Connection Problems
    Over the years, we have never received any complaints about dropped connections or drivers misbehaving due to a connection hiccup (very short interruption).  That is largely due to how OLEDB handles connections and connection pooling. It is more robust at correcting itself from connection glitches than CLI.  However, if there was a network outage (somebody pulls the network cable), DataFlex programs would just crash if the driver is unable to handle the effects of losing connection to the server. Our goal for  v10 drivers was “high availability, fastest performance.” In keeping with that goal, we added robust handling of recovering from connection issues. If a connection drops, the driver will retry and if it fails to reconnect to the server (for instance, if the server is down), it will generate an informative  message and ask the user to retry instead of blowing up. Once the connection with the server is established, all transactions and all operations will continue the way they were intended without any loss of data or application crash. The DAW CK doesn’t provide the ability to recover from connection hiccups.
  5. Support for SQL Server Failover and Mirroring
    Database Mirroring is an easy way in MS SQL Server to create a high-availability database environment where if the main server goes down, another server picks up the load without the user seeing a difference. A very good demo was given by Gandalf at DISD and then again at EDUC where in the middle of a find operation, he shuts down the server and then when he presses find again the next record is found. In today’s always-on 24X7 business operations this is a MUST HAVE feature. With version 10 drivers, this feature is now available. No such feature is offered in the DAW CK.
  6. Ability to Generate SQL Scripts to Create/Drop Tables and Indexes
    One thing we have learned is that some DBAs are pretty strict about who makes changes to their database. They want to see how tables are created and dropped, and for good reason. These operations can lead to serious performance issues.  In Flex2SQL, we have always provided a way to create empty tables and indexes (or drop them) so that users could give those scripts to the DBA to work with. This also creates a big advantage for software vendors who can use the scripts to create empty tables during the installation process. No such feature is available with DAW CK.
  7. No More Structure Caching!
    We started doing structure caching (.TD files) back in 1999 after we saw that opening the .INT file is way too slow in DataFlex. With the .TD files, we could store the table structure in a local binary file rather than read it every time from the server. While this led to faster opens, it also resulted in “stale cache” where a customer would change the table structure on the server but it didn’t delete the local .TD files, which would lead to errors when the application was run.  This was a big support headache for us and to some of our customers.  We worked on optimizing how we query tables and now have completely eliminated .TD files.  The DAW CK still uses structure caching via a .CCH file.
  8. Faster Switching Between Application Views
    While working on improving opens without caching, we also decided to keep all opened files in memory so that when you switch between programs or views and open and close multiple files, the  structures  would be available and open will be instant. The end-users can see perceptible improvement in performance while navigating between program components. This function will outperform DAW CK and even the native DataFlex database.
  9. Support for LOB Columns
    With our v10 drivers, we now provide the ability to store large amount of data in LOB columns and fetch them through new commands. Now you can store a large chunk of data and retrieve it in pieces to bypass the 16K field limit in DataFlex.
  10. Support for SQL Server Partitioning
    Flex2SQL allows you to  partition your index and data tables into separate filegroups which in turn can reside on separate physical drives. This is an important feature that speeds up SQL Server performance when there is a large number of concurrent users.


    CL2SQL: Command Line Tool for Unattended Migration

    January 20th, 2010

    Mertech provides a command line tool to allow migration of data using Flex2SQL features without the GUI interface. The tool, called cl2SQL, is located in the Mertech\bin directory and supports all the options of Flex2SQL through command line parameters.

    A good use of this tool would be to convert an existing DataFlex databases to SQL without any attendants. Cl2SQL can be used along with the CONVER_DAT (see http://blog.flex2sql.com/index.php/2009/02/write-your-own-migration-routines-using-the-convert_dat_file-macro/) command to develop unattended migration process.

    To see all the options supported by CL2SQL, click on cl2SQL to run it and an information window will be displayed with all the options. These options are listed below:

     -?,-h,-help           : This information screen.
    
     -d <drv>,             : Use driver <drv> where drv is one of our supported drivers.
     -driver <drv>           Such as ORA_DRV, MDS_DB2, SQL_DRV, MDSMYSQL and MDSPGSQL.
    
     -f <fs>,              : The fileset passed as a parameter in <fs> on which you
     -fileset <fs>           want to operate.
    
     -file <n>             : The filenumber passed as a parameter in <n> on which you
                             want to operate. You can only use this option with a fileset
                             This allows you to override the set of files on which you
                             want to operate with another file.
    
     -prefix <ABC>,        : The prefix <ABC> and postfix <ABC> options are  to be used
     -postfix <ABC>,         for the tablename prefix or postfix as an override on what is
                             defined in the fileset.
                             You can only use this option along with a fileset.
    
     -s <server>,          : Database <server> to connect to, where <server> can be either
     -server <server>        the server name or IP used.
    
     -database <db>        : If you want to use another database as is specified in the
                             configuration file, then you can override the setting with
                             the value in <db>.
    
     -u <user>,            : User id to login with.
     -user <user>            You should not set this when you want to login using
                             a trusted connection.
    
     -p <pass>,            : Password to use for logging in.
     -password <pass>        
    
     -port <n>             : Use an alternative port to connect to on the database server.
    
     -c <cfg>,-config <cfg>: Use the provided configuration in <cfg> file
    
     -l,-login             : Automatically log in using the last used/saved parameters.
                             This is the default behaviour if no login details are passed.
    
     -r <rpt>,-report <rpt>: Write a migration report to file <rpt>.
    
     -rc,-rptclear         : Erase migration report file <rpt> (from the -report option).
                             before the actual report is being written hereby opening the
                             ability to not append to an existing report and start fresh.
    

    DF_TEXT data type mapping in Flex2SQL – performance considerations

    September 18th, 2009

    DF_TEXT fields are used in DataFlex to store text longer then 255 bytes. The number of DF_TEXT fields is limited by the maximum record length. This limit is higher in MS SQL Server then in a DAT file. There are though performance implications that should be considered before using SQL TEXT datatypes (VARCHAR(MAX), NVARCHAR(MAX) or TEXT) .

    SQLFlex uses the SQL Server Native Client OLE DB (SQLNCLI-OLEDB) provider as the low-level COM API for data access, for maximum performance. SQLNCLI-OLEDB has some restrictions when using SQL TEXT columns (as well as to it’s binary cousins, VARBINARY(max) and IMAGE).

    When you do a FIND, SAVE, or DELETE or call any other DataFlex command that requires data access, SQLFlex turns that into a SQL statement. SQLFlex executes the statements through cursors that are kept available in case the same command is re-executed, for added performance. But if the table accessed has at least one TEXT column, then any existing cursors cannot be reused, due to SQLNCLI-OLEDB limitations. This causes a slight slowdown, as the driver is forced to work around this restriction, by closing and reopening cursors.

    There are no problems to have a table with many TEXT columns, but for maximum performance, the usage of TEXT columns should be avoided when possible. This can be done if your DF_TEXT field is smaller than 8,000 bytes, and then map DF_TEXT to SQL data types like CHAR, VARCHAR, or NVARCHAR instead of SQL TEXT or VARCHAR(MAX).

    The DataFlex field data type can be defined by creating a FIELD_TYPE token in the INT file or based on the default mappings, shown in the following table:


    SQL Server Data Type                             DataFlex Data Type
    CHAR, VARCHAR or NVARCHAR <= 255 bytes           DF_ASCII
    CHAR, VARCHAR or NVARCHAR > 255 bytes            DF_TEXT
    VARCHAR(MAX), NVARCHAR(MAX), TEXT                DF_TEXT
    BINARY, VARBINARY, VARBINARY(MAX) or IMAGE       DF_BINARY

    Flex2SQL has an option that you can use to specify the SQL native data type that should be used when converting a table. In the Flex2SQL menu, click on Tools -> Preferences and open the TAB Field Mappings. Assign new data types for the DF_TEXT and DF_BINARY fields and migrate you DAT file.

    Let us say that you assigned a VARCHAR and a VARBINARY SQL data type to the DF_TEXT and DF_BINARY respectively as showed by the following image:

    Default type

    Default type

    When you migrate a DAT file that has a DF_TEXT field, Flex2SQL will create it as a VARCHAR column when the field length is smaller than or equal to 8000 bytes. But if your DF_TEXT field is greater than 8000 bytes, Flex2SQL will create it as a VARCHAR(MAX) column instead.

    Any way you choose to map DF_TEXT, the driver will handle it, but if speed is a primary concern, then we suggest trying to use the regular CHAR, VARCHAR, or NVARCHAR column types, whenever possible.

    Current Driver Releases

    August 27th, 2009

    The current driver releases for all of Mertech’s Flex2SQL products is as follows:

    Flex2SQL for MySQL
    (ver 10.0.3.29 updated 3/23/10)
     
     Flex2SQL for PostgreSQL 8
    (ver 10.0.2.33 updated 3/23/10) 
     
    Flex2SQL for Oracle (ver 10.0.3.40 updated 3/23/10)

    Flex2SQL for SQL Server 2008 and 2005 (ver 10.0.4.28 updated 3/23/10)

    BTR2SQL for Oracle (ver 4.2.1.6 updated 3/12/10)
     
    BTR2SQL for MS SQL Server 2008 and 2005 (ver 4.2.1.6 updated 3/12/10) 
    Flex2SQL for DB2 (ver 8.1.0.17 updated 2/21/06)