SSIS, MySQL, and invalid Dates–Take #2

A while back I posted an article on handling dates in select statements from MySQL. The idea was that I wanted a quick way to generate select statements for MySQL that would automatically set any invalid date to NULL. (an invalid date being one such as “0000-00-00” or “1973-02-00”)  After some more trial and error, I realized that this script was incomplete. I’ve created an updated script that should better handle those edge cases.

The script is largely the same as the former, but I’ve added handling for invalid years or months or days. We don’t want to redirect the rows for further error handling. We want to treat the dates that won’t successfully import as NULL. This could be done several ways within SSIS, but handling the transform within the select statement reduces our need to write many transforms to handle each date time for each table.  (I suspect that BIML could be my friend for this task, but I’m not quite ready to tackle this with BIML.)

  WHEN ordinal_position = 1 THEN ‘SELECT ‘ 
  ELSE ‘, ‘ 
, CASE data_type WHEN ‘date’
THEN CONCAT(‘CASE WHEN ‘, column_name, ‘= ”0000-00-00” THEN NULL ‘
, ‘WHEN ‘, column_name, ‘ LIKE ”%0000%” THEN NULL ‘
, ‘WHEN ‘, column_name, ‘ LIKE ”%-00%” THEN NULL ‘
, ‘else ‘, column_name, ‘ END as ‘, column_name)
WHEN ‘timestamp’
THEN CONCAT(‘CASE WHEN ‘, column_name, ‘= ”0000-00-00” THEN NULL ‘
, ‘WHEN ‘, column_name, ‘ LIKE ”%0000%” THEN NULL ‘
, ‘WHEN ‘, column_name, ‘ LIKE ”%-00%” THEN NULL ‘
, ‘else ‘, column_name, ‘ END as ‘, column_name)
WHEN ‘time’
THEN CONCAT(‘CAST(‘, column_name, ‘ AS CHAR(8) ) as ‘, column_name ) 
ELSE column_name
, CASE WHEN ordinal_position  maxord THEN  
  ELSE CONCAT(‘     FROM MySchema.’, c.table_name, ‘;’) 
  END) AS Select_Column_Name
FROM   information_schema.columns AS c
       JOIN (SELECT table_name,
                    Max(ordinal_position) AS MaxOrd
             FROM   information_schema.columns
             WHERE  table_schema = ‘MySchema’
             GROUP  BY table_name) AS t1
         ON c.table_name = t1.table_name
WHERE  table_schema = ‘MySchema’
ORDER  BY c.table_name, ordinal_position
LIMIT  0, 50000;


Tales of a Lazy DBA–MySQL, SSIS, and “0” Dates

We’ve recently been tasked with converting a lot of MySQL Data into our system from a multi-tenant MySQL Database. We don’t have a fixed schedule to actually execute the imports because they’ll be on a “per customer” basis.  Well, that sounded like a great task for SSIS.  I set up the ODBC driver, connected, set the initial “sql_mode” options for our connections to (‘mssql, allow_invalid_dates’) and started to work.
First problem we ran into with an ADO.NET connection to MySQL and writing a SELECT * from schema.table was when we hit a MySQL “Date” column containing a value of ‘0000-00-00’.  SSIS threw an error, not sure what to do.  Thanks to some others who have solved this problem, I realized that within the MySQL Select statement, we could do something like:

CASE date_created WHEN ‘0000-00-00’ THEN NULL else date_created END as date_created

to pass those dates over as NULL. That solves the implicit conversion to datetime (SQL 2005) and avoids the invalid dates.  I ran something similar for a “Time” column to:

CAST(Time_Created as char(8) ) as Time_Created

So that solved one particular table export with about 20 or so CAST and CASE statements.  Needless to say, I wasn’t looking forward to doing this for another 500 tables with a total of almost 6000 columns.
I finally set up a really basic query to generate most of the SELECT statements we would need to pull our MySQL data across without too much pain. Admittedly, a small step and I’d still need to copy/paste when I set up each new ADO.NET source, but it worked reasonably well. I’m adding the code snippet for MySQL here in case anyone else has a similar problem.

CONCAT(CASE WHEN ordinal_position = 1 THEN 'SELECT ' ELSE ', ' END,
 CASE Data_Type WHEN 'date'
 THEN CONCAT('CASE ',column_name,' WHEN ''0000-00-00'' THEN NULL else '
    ,column_name,' END as ',column_name)
 WHEN 'time' THEN CONCAT('CAST(',column_name,' AS CHAR(8) ) as ',column_name)
 ELSE column_name END, CASE WHEN Ordinal_Position  MaxOrd THEN ''
    FROM MySchema.', c.table_name) END
    ) as Select_Column_Name
from information_schema.columns as c
JOIN (select table_name, MAX(ordinal_position) as MaxOrd
        from information_schema.columns
        WHERE Table_Schema = 'MySchema'
        GROUP BY Table_Name) as t1
    ON c.Table_Name = t1.Table_Name
where table_schema = 'MySchema'
order by c.table_name, ordinal_position LIMIT 0, 50000;

I’d love to hear other ideas if anyone has encountered this before and come up with a more elegant solution for translating “0” Date or Time data from MySQL into SQL Server.