Flat file connections in SSIS 2008 fail completely when faced with double-quote characters embedded in csv file data fields. They don't provide any way to "escape" them, and they error-out on any row containing them.
If you Google around a bit you'll see lots of references to the problem. The solutions I've seen are to pre-process the file, roll your own imput procedure, or use the Microsoft downloadable utility Undouble.
There is another option which is low-pain and will look like an old friend if you've been munging data for a while. The solution is to create a Microsoft Jet OLEDB 4.0 data source with its associated schema.ini file. This allows escaping a double-quote by doubling it (a double double quote). Conveniently if you happen to be migrating, this was the standard csv-file behavior in SQL Server 2000.
To set up data source first build your schema.ini file. This has to be in the same directory as your csv (I think). Schema files look a little primitive and they're completely non-integrated but they're dependable and effecient. Good schema.ini documentation at:
http://msdn.microsoft.com/en-us/library/ms709353%28VS.85%29.aspx
http://support.microsoft.com/kb/149090/en-us
schema.ini will have your file name in square brackets an then the file and column parameters. Here's an example:
[myfile.csv]
ColNameHeader=False
Format=CSVDelimited
MaxScanRows=25
CharacterSet=OEM
Col1=myfile_ID Integer
Col2=cust_nm Char Width 254
Col3=cust_type Char Width 4
Col4=cust_rgn Char Width 2
Col5=lang_id Integer
If you have multiple csv files in the same directory you'll be importing, you'll have one of these sections in schema.ini for each file format. Conceptually the schema file defines a database and each of the source files is a "table".
Next, in BIDS, create a new OLEDB data source. This is the part that is apparently completely undocumented. Since almost all the information is in your schema.ini, your data source definition is very simple.
- Click New OLEDB Connection
- Click the New button
- For the Provider select "Native OLE DB\Microsoft Jet 4.0 OLE DB Provider"
- Click "All" over on the left.
- For "Extended Properties" enter "Text". DO NOT put Text in quotes. Some of the documentation says to. That's the way you'd do it in a normal OLE DB connection string. But it's not how SSIS likes it. If you put the the extended properties in quotes you'll get the exquisitely unhelpful message "Could not find installable ISAM".
- Now scroll almost to the bottom and enter the name of the datafile directory under "Data Source". This directory also contains the schema.ini file.
- At this point if you press "Test Connection" it should be good.
- Press Ok and Ok, then rename your data connection appropriately. It might be a good idea to use a generic name since it will be one connection for all the files in a directory.
- Next create an OLE DB source in your Data Flow.
- For "OLE DB connection manager" select the source you just created.
- For "Data access mode" you can either pick "Table or view" or "SQL command".
- Next pick the name of the table or define a SQL statement.
- Pressing Preview will show you your columns
Tada! You've done it!
Caveats:
Long is documented as a valid type for a schema.ini definition. It doesn't work. Use Float as a substitute. I know this doesn't seem very good, but it works. You can add a Derived Column transform to cast them to DT_I8 or DT_U8. Short works fine, btw.
DateTime can be read successfully, but I frequently do a Derived Column transform if the incoming date needs formating or editing. This is for an input format that looked like: "2007-09-01-14.16.59.962000"
((LEN(TRIM(MyDate)) > 0) && ((SUBSTRING(MyDate,1,1) == "2") || (SUBSTRING(MyDate,1,2) == "19")) ? (DT_DATE)REPLACE(SUBSTRING(MyDate,1,10) + " " + SUBSTRING(MyDate,12,2) + ":" + SUBSTRING(MyDate,15,2) + ":" + SUBSTRING(MyDate,18,2) + SUBSTRING(MyDate,20,4)," ::","") : NULL(DT_DATE))