MySQL guide

This is an article I wrote for Stack Overflow for people using MySQL, a very popular, free alternative to SQL Server.

Introduction

These are the fruits of many hours and many projects with a great free database. Among other things, I develop application data servers on Windows platforms, mostly with mainstream business SQL servers, but I have done a good deal of work with MySQL and have written several libraries against it.

When I first began using it in the early 1990s, it was not great, but today it is used by half of the Internet as part of the LINUX LAMP stack. It has earned its place.

The worst MySQL mess I had to straighten out was the ultimate MySQL legacy database nightmare. This required a series of applications to process the tables into something usefull using many of the tricks mentioned here.

In general, the SQLs I tune require using sub-queries. In the Oracle database world, things I took for granted weren’t working the same with MySQL. My reading on MySQL tuning makes me conclude that MySQL is behind Oracle in terms of optimizing queries. 

While the simple queries required for most B2C applications may work well for MySQL, most of the aggregate reporting-type of queries needed for Intelligence Reporting seem to require a fair bit of planning and re-organizing of the SQL queries to guide MySQL to execute them faster.

Administration

max_connections is the number of concurrent connections. The default value is 100 connections (151 since 5.0) -- very small.

Note: Connections consume memory, and your OS might not be able to handle a lot of connections. 

MySQL binaries for Linux/x86 allow you to have up to 4096 concurrent connections, but self-compiled binaries often have a lower limit.

Set table_cache to match the number of your open tables and concurrent connections. Watch the open_tables value, and if it is growing quickly, you need to increase its size.

Note: The two parameters discussed above may require many open files, so 20+max_connections+table_cache*2 is a good estimate of what you need. MySQL on Linux has an open_file_limit option, so set this limit.

If you have complex queries, sort_buffer_size and tmp_table_size are likely to be very important. Values will depend on the query complexity and available resources, but 4Mb and 32Mb, respectively, are recommended starting points.

Note: These are "per connection" values, among read_buffer_size, read_rnd_buffer_size, and some others, meaning that this value might be needed for each connection. So consider your load and available resource when setting these parameters. For example, sort_buffer_size is allocated only if MySQL needs to do a sort. Also note: Be careful not to run out of memory.

If you have many connects established (i.e., a web site without persistent connections), you might improve performance by setting thread_cache_size to a non-zero value, with 16 a good starting value. Increase the value until threads_created does not grow very quickly.

PRIMARY KEY

There can be only one AUTO_INCREMENT column per table, it must be indexed, and it cannot have a DEFAULT value.

KEY is normally a synonym for INDEX. The attribute PRIMARY KEY can also be specified as just KEY when given in a column definition. This was implemented for compatibility with other database systems.

A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL

If a PRIMARY KEY or UNIQUE index consists of only one column that has an integer type, you can also refer to the column as "_rowid" in SELECT statements.

In MySQL, the name of a PRIMARY KEY is PRIMARY.

Currently, only InnoDB (v5.1?) tables support foreign keys.

Usually, you create all the indexes you need when you are creating tables. Any column declared as PRIMARY KEY, KEY, UNIQUE, or INDEX will be indexed.

NULL means "not having a value". To test for NULL, you cannot use the arithmetic comparison operators such as =, <, or <>. Use the IS NULL and IS NOT NULL operators instead.

NO_AUTO_VALUE_ON_ZERO suppresses auto increment for 0 so that only NULL generates the next sequence number. This mode can be useful if 0 has been stored in a table's AUTO_INCREMENT column. (Storing 0 is not a recommended practice, by the way.)

To change the value of the AUTO_INCREMENT counter to be used for new rows, use:

ALTER TABLE mytable AUTO_INCREMENT = value; 

or: SET INSERT_ID = value.

Unless otherwise specified, the value will begin with 1000000, or specify it thus:

...) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1

TIMESTAMPs

Values for TIMESTAMP columns are converted from the current time zone to UTC for storage and from UTC to the current time zone for retrieval.

http://dev.mysql.com/doc/refman/5.1/en/timestamp.html For one TIMESTAMP column in a table, you can assign the current timestamp as the default value and the auto-update value.

One thing to watch out for when using one of these types in a WHERE clause. It is best to do WHERE datecolumn = FROM_UNIXTIME(1057941242) and not WHERE UNIX_TIMESTAMP(datecolumn) = 1057941242. Doing the latter won't take advantage of an index on that column.

http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html

 UNIX_TIMESTAMP() 
 FROM_UNIXTIME
()
 UTC_DATE
()
 UTC_TIME
()
 UTC_TIMESTAMP
()

If you convert a datetime to a Unix timestamp in MySQL--
and then add 24 hours to it--
and then convert it back to a datetime--it magically loses an hour!

Here's what's happening. When converting the Unix timestamp back to a datetime, the timezone is taken into consideration, and it just so happens that between the 28th and the 29th of October 2006, we went off daylight savings time and lost an hour. 

Beginning with MySQL 4.1.3, the CURRENT_TIMESTAMP(), CURRENT_TIME(), CURRENT_DATE(), and FROM_UNIXTIME() functions return values in the connection's current time zone, which is available as the value of the time_zone system variable. In addition, UNIX_TIMESTAMP() assumes that its argument is a datetime value in the current time zone.

The current time zone setting does not affect values displayed by functions such as UTC_TIMESTAMP() or values in DATE, TIME, or DATETIME columns.

NOTE: ON UPDATE updates the DateTime ONLY if a field is changed. If an UPDATE results in no fields being changed, then the DateTime is NOT updated!

Additionally, the first TIMESTAMP is always AUTOUPDATE by default even if not specified as such.

When working with dates, I almost always convert to Julian dates because data math is then a simple matter of adding or subtracting integers, and the same is true of seconds since midnight for the same reason. It is rare that I need time resolution of finer granularity than seconds.

Both these can be stored as a 4 byte integer and, if space is really tight, can be combined into Unix time (seconds since the epoch 1/1/1970) as an unsigned integer which will be good until about 2106 as:

' secs in 24Hrs = 86400

' Signed integer max val = 2,147,483,647 -- can hold 68 years of seconds 

' Unsigned integer max val = 4,294,967,295 -- can hold 136 years of seconds

Binary Protocol

MySQL 4.1 introduced a binary protocol that allows non-string data values to be sent and returned in native format without conversion to and from string format. (Very useful.)  

By the way, mysql_real_query() is faster than mysql_query() because it does not call strlen() to operate on the statement string. 

http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html The binary protocol supports server-side prepared statements and allows transmission of data values in native format. The binary protocol underwent quite a bit of revision during the earlier releases of MySQL 4.1.

You can use the IS_NUM() macro to test whether a field has a numeric type. Pass the type value to IS_NUM(), and it evaluates to TRUE if the field is numeric.

One thing to note is that binary data CAN be sent inside a regular query if you escape it and remember that MySQL requires that only backslash and the quote character be escaped. So that is a really easy way to INSERT shorter binary strings,  such as  encrypted/salted passwords, for example.

Master Server

http://www.experts-exchange.com/Database/MySQL/Q_22967482.html

http://www.databasejournal.com/features/mysql/article.php/10897_3355201_2

GRANT REPLICATION SLAVE ON to slave_user IDENTIFIED BY 'slave_password'

#Master Binary Logging Config  STATEMENT causes replication 
             
to be statement-based -  default

log-bin
=Mike
binlog-format
=STATEMENT
server-id
=1            
max_binlog_size
= 10M
expire_logs_days
= 120    


#Slave Config
master-host
=master-hostname
master-user
=slave-user
master-password
=slave-password
server-id
=2

Binary log file must read items:

http://dev.mysql.com/doc/refman/5.0/en/binary-log.html

http://www.mydigitallife.info/2007/10/06/how-to-read-mysql-binary-log-files-binlog-with-mysqlbinlog/

http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html

http://dev.mysql.com/doc/refman/5.0/en/binary-log.html

http://dev.mysql.com/doc/refman/5.1/en/binary-log-setting.html

You can delete all binary log files with the RESET MASTER statement or a subset of them with PURGE MASTER.

    --result-file=binlog.txt TrustedFriend-bin.000030

Normalization

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

UDF functions

http://www.koders.com/cpp/fid10666379322B54AD41AEB0E4100D87C8CDDF1D8C.aspx

http://souptonuts.sourceforge.net/readme_mysql.htm

DataTypes

http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html

http://www.informit.com/articles/article.aspx?p=1238838&seqNum=2

http://bitfilm.net/2008/03/24/saving-bytes-efficient-data-storage-mysql-part-1/

One thing to note is that on a mixed table with both CHAR and VARCHAR, mySQL will change the CHARs to VARCHARs.

RecNum integer_type UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (RecNum)

MySQL always represents dates with the year first, in accordance with the standard SQL and ISO 8601 specifications.

Miscellaneous

Turing off some MySQl functionality will result in smaller data files and faster access. For example:

    --datadir will specify the data directory and

    --skip-innodb will turn off the inno option and save 10-20M.

See more here: http://dev.mysql.com/tech-resources/articles/mysql-c-api.html.

Download Chapter 7 -- free.

InnoDB is transactional, but there performance overhead comes with it. I have found MyISAM tables to be sufficient for 90% of my projects. Non-transaction-safe tables (MyISAM) have several advantages of their own, all of which occur because 

there is no transaction overhead:

    Much faster

    Lower disk space requirements

    Less memory required to perform updates

Each MyISAM table is stored on disk in three files. The files have names that begin with the table name and have an extension to indicate the file type. An .FRM file stores the table format. The data file has a .MYD (MYData) extension. The index file has an .MYI (MYIndex) extension. 

These Files can be copied to a storage location intact without using the MySQL Administrators Backup feature which is time consuming. (So is the restore.)

The trick is make a copy of these files, then DROP the table. When you put the files back, MySQl will recognize them and update the table tracking.

If you must Backup/Restore,

Restoring a backup or importing from an existing dump file can takes a long time depending on the number of indexes and primary keys on each table. You can speed up this process dramatically if you modify your original dump file by surrounding it with the following:

SET AUTOCOMMIT = 0;
SET FOREIGN_KEY_CHECKS=0;

... your dump file ...

SET FOREIGN_KEY_CHECKS = 1;
COMMIT;
SET AUTOCOMMIT = 1;

To vastly increase the speed of the reload, add the SQL command SET AUTOCOMMIT = 0; at the beginning of the dump file, and add the COMMIT; command to the end.

By default, autocommit is on, meaning that each and every insert command in the dump file will be treated as a separate transaction and written to disk before the next one is started. If you don't add these commands, reloading a large database into InnoDB can take many hours.

The maximum size of a row in a MySQL table is 65,535 bytes.

The effective maximum length of a VARCHAR in MySQL 5.0.3 and later = maximum row size (65,535 bytes).

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

CHAR and VARCHAR values in MySQL are compared without regard to trailing spaces.

Using CHAR will speed up your access only if the whole record is fixed size. That is, if you use any variable size object, you might as well make all of them variable size. You gain no speed by using a CHAR in a table that also contains a VARCHAR.

The VARCHAR limit of 255 characters was raised to 65535 characters as of MySQL 5.0.3.

Full-text searches are supported for MyISAM tables only.

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

BLOB columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values.  

If strict SQL mode is not enabled and you assign a value to a BLOB or TEXT column that exceeds the column's maximum length, the value is truncated to fit, and a warning is generated.

Useful Commands

check strict mode:

    SELECT @@global.sql_mode;

turn off strict mode:

    SET @@global.sql_mode= '';

    SET @@global.sql_mode='MYSQL40'

or remove:

    sql-mode="STRICT_TRANS_TABLES,...

SHOW COLUMNS FROM mytable

SELECT max(namecount) AS virtualcolumn FROM mytable ORDER BY virtualcolumn

http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html

http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_last-insert-id last_insert_id()

gets you the PK of the last row inserted in the current thread; max(pkcolname) gets you last PK overall. 

Note: if the table is empty, max(pkcolname) returns 1. mysql_insert_id() converts the return type of the native MySQL C API function mysql_insert_id() to a type of long (named int in PHP). 

If your AUTO_INCREMENT column has a column type of BIGINT, the value returned by mysql_insert_id() will be incorrect. Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query.

http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_last-insert-id

Just a note that when you’re trying to insert data into a table and you get the error:

Unknown column the first bit of data what you want to put into the table in field list

when using something like:

INSERT INTO table (this, that) VALUES ($this, $that)

the error is caused by not placing apostrophes around the values you’re trying to stick into the table. So you should change your code to:

INSERT INTO table (this, that) VALUES ('$this', '$that') 

Reminder: Quotation marks (") are used to define MySQL fields, databases, or tables, not values.  ;)

Lost connection to server during query:

http://dev.mysql.com/doc/refman/5.1/en/gone-away.html

http://dev.mysql.com/doc/refman/5.1/en/packet-too-large.html

http://dev.mysql.com/doc/refman/5.0/en/server-parameters.html

http://dev.mysql.com/doc/refman/5.1/en/show-variables.html

http://dev.mysql.com/doc/refman/5.1/en/option-files.html

http://dev.mysql.com/doc/refman/5.1/en/error-log.html

Tuning Queries

http://www.artfulsoftware.com/infotree/queries.php?&bw=1313


Comments