Accessing data records from Excel and CSV files in VB.NET

One of the first projects I undertook when I started at my current workplace was to produce a data migrator that ported user data records from a client system to one of our systems. This was written in VB.NET (for compatibility reasons) and the data sources were both Microsoft SQL Server 2005 systems. Recently I’ve had to make a few changes to it to allow the client data source to be either an Excel spreadsheet or a CSV text file. Having never worked with these methods before, I had to look it up (via Google of course) and thought I’d share with you the methods of doing this.

ConnectionStrings

The first component required here is the .NET ConnectionString. Thankfully file formats can use the same OLE Provider for Microsoft Jet and looks as follows:

Excel

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\UserDetails.xls;Extended Propert
ies="Excel 8.0;HDR=Yes;IMEX=1"

CSV

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;Extended Properties="text;HDR=Yes;FMT=Delimited"

Note: in each case, HDR= indicates that the source file contains a header row. If it doesn’t, it should be set to “No” and the SQL queries will need to be written differently. More on this later.

Making the connection

Opening the connection to the file couldn’t be easier, and the following code illustrates this:

Dim oleConn As OleDb.OleDbConnection = OleDb.OleDbConnection(connectionString)
oleConn.Open()

And this works for either Excel or CSV files.

Forming the queries

Naturally you can use standard SQL commands to query the contents of the file, since it is now acting as a database. The main difference lies in the table name, or what substitutes for the table name.

Excel

With an Excel file, the name of the worksheet within the Excel file needs to be targetted, contained within square brackets, with a dollar sign as the last but one character, e.g. [UserDetails$]

So a simple SQL command to extract names from a user details Excel file with a header row, with the worksheet name of UserDetails would be:

SELECT first_name, last_name FROM [UserDetails$]

where first_name and last_name are the names of the columns within the worksheet.

CSV

For a CSV file, the actual file name is used as the table name. So the same query above used against a similiar CSV file, would look like the following:

SELECT first_name, last_name FROM user_details.csv

No header data

For both Excel and CSV files, if the connection string contains the setting HDR="No", it indicates that there is no header row contained within the file. If this is the case, the columns are selected in order:

SELECT F1, F2 FROM [UserDetails$]

You can also name these fields as first_name and last_name (or anything else for that matter!) as follows:

SELECT F1 AS first_name, F2 AS last_name FROM [UserDetails$]

Performing the query

To actually perform the query itself, and run the commands shown above, you require an OleDbCommand to contain the actual command, and a OleDbDataReader to retrieve the results from the open OLE connection.

To prepare the command:

Dim command AS String = "SELECT first_name, last_name FROM [UserDetails$]"
Dim oleCmd AS OleDb.OleDbCommand = New OleDb.OleDbCommand(command, oleConn)

And then to actually run it and receive the data:

Dim oleDR AS oleDB.oleDataReader = oleComm.ExecuteReader()
While oleDR.Read() Then
' read in a string
oleDR.ReadString(0)
End While

Tidy up

Of course always remember to tidy up after yourself:

oleDR.Close()
oleConn.Close()

And that’s it really. I know it’s a very short and to the point post, but it’s so simple I didn’t think it required anything more other than a statement of the facts and some code!

Of course if you have any questions or require any further clarification on anything, please do post a comment and let me know and I will do what I can.