Importing (Multiple) CSV Files To MySQL






Methods to import CSV files to database

The most common and efficient method for importing a (single) CSV file to a database such as MySQL is using the built-in MySQL "LOAD DATA INFILE" command. There are other methods, such as using a programming language to read the CSV data into memory, and then constructing INSERT queries for each of the records, but this method is generally only useful when the objective is to only insert certain records based on certain conditions. If the objective is to import the entire CSV file into a database table, then MySQL's LOAD DATA INFILE command is simpler, faster and more efficient.

But - what if we want to insert multiple, or in some cases, hundreds or even thousands (or more) CSV files into a database table. How can we achieve that? Well first, let's look at why it's important and why it might make sense to do so.

Why should you import multiple CSV files to MySQL?

As an anecdote, in my 10+ years of experience performing enterprise data migrations for Parts Catalog software, any time the legacy system data was provided to us in CSV file format, it was easy to work with, in general. For example, converting a Bill of Material (BOM) file (which is basically a list of part numbers, their names, a quantity, and item number) from the CSV file into, in our case, an XML file, was quite straight forward. However, I quickly learned that even enterprise customer data was not always clean and consistent, and perhaps more times than not, was quite inconsistent. Using a standard Extract, Transform and Load (ETL) process to handle the data turned out to be catastrophic. Feel free to check out our Misunderstanding ETL page for more information. To summarize though, inconsistent data must first be analyzed.

If we want to analyze data from hundreds or thousands of CSV files though, it can be quite a challenge. For example, if we want to see what part numbers, across all of the CSV files, do not conform to a certain pattern, we would have to write that condition into our conversion process to look for it. And then if we want to look for some other condition, we'd have to write analysis and conditional code for that as well. Having all of the CSV data ingested into one or more MySQL database tables offers us the power of SQL to analyze our data!

Following our example from above, if we now want to look for certain conditions on the data, it is as simple as querying for it. For example, looking for all part numbers, across 10,000 CSV files, that are less than 10 characters in length, or that do not have a hyphen, would require custom code. However, having all of the parts data, from the 10,000 CSV files ingested into a MySQL database table, makes it as simple as:

							select * from [tableName] where length(partNumber) < 10;
						
							select * from [tableName] where partNumber not like '%-%';
						

Thus, if the goal is to "know" what kind of data you're dealing with, and if you want to be able to analyze your data that is spread across hundreds or thousands of CSV files, ingesting them into a database table is the way to go; and furthermore, if you can do this with an automated tool, even better!

Great! Now - How do I import all my CSV file data into MySQL?

For our solution, we approached it by writing a tool that does the following (you'll need intermediate coding skills, or you can use our free tool below as a sample, or if you still need help, feel free to contact us). Note that our tool assumes that all CSV files are similar. Columns across multiple CSV files can be in different locations, and CSV files can have more or less columns, but they will all be loaded into a single Database Table.

  1. First, create a new empty MySQL database.
  2. Then, create a new output directory folder. This is where you will be copying all of your CSV files.
  3. Next, depending on where your CSV files are located, you may or may not need to write a recursive method to aggregate them. If you have a nested folder structure with an unknown N depth of sub-directories, do the following:
    • Write a recursive method that takes an absolute path to your dataset's root directory.
    • Iterate through the folder looking for CSV files.
      • Foreach CSV file found, get the CSV file's checksum. Copy the CSV file to your output dir from the 2nd step, but copy to the output dir using the file's checkSum. For example, a file called part.csv with a checksum of af84adb3a64dc4187b4e279156f476b8 would be copied as /my/output/dir/af84adb3a64dc4187b4e279156f476b8.csv
      • For each sub-directory, recurse
  4. If you did not have nested folder, simply copying all of your CSV files into the output directory, even manually, is okay.
  5. NOTE - it's not necessary to rename to the checkSum, as is required when having to recurse. It's only required when recursing sub-directories so that we do not have a collision in fileNames when copying a CSV file. We have found that /some/dir/part.csv may be completely different than another file /another/dir/part.csv, where the byte size and checkSum of the files are different!
  6. Create a new table in your database, with a single column called lineNum, make it auto-increment and a primary key
  7. Now that we have all of our CSV file data aggregated, we can begin writing an ingestion tool.
  8. Loop through your aggregated list of CSV files
    • Foreach CSV File
    • Open the CSV file and create a multi-dimensional array or map containing each row and column.
      • If your data has a header row, meaning the field names are in the first row, that's great. Store these values into memory for each column position, and follow the steps below:
        • For each header row field
        • Write a method to check if your table already contains this field. If it does, do nothing.
        • If your table does not already have this field (which for the first CSV file being ingested, it shouldn't), use ALTER TABLE to create this column in your table. Set the max length to any arbitrary value (note that it will be updated later when we check each cell's size)
      • If your data does not have a header row, we will need to use generic field names in the database, eg Field1, Field2, Field3 etc. For the case where your CSV data does 'not' have a header row, use the steps below instead:
        • For each column in the first row of the CSV file:
        • Write a method to check if your table already contains this field. If it does, do nothing.
        • If your table does not already have this field (which for the first CSV file being ingested, it shouldn't), use ALTER TABLE to create this column in your table, and give it a name of "FIELD" + position. For example, the first column of data will be Field1, then Field2, etc. Set the max length to any arbitrary value (note that it will be updated later when we check each cell's size)
      • Loop through each subsequent row in the CSV file, and for each field:
        • Check the cell's size. If the size is greater than the database table's current column size, then use ALTER TABLE to increase it. For example, if we have a column Name VARCHAR(10), but we found that row 3 in the CSV data for the field Name is "Jonathan A Johnson", which is 18 character, we need to update our column to be a VARCHAR(18).
    • Now that we have a table, containing fields with a max size that can support all of the data from the CSV file, we can use LOAD DATA INFILE to load the data!

Free tool to batch import CSV files to MySQL

Feel free to use the following tool to batch import your CSV files into a database table and run queries against the data! You can even export the table data when it's complete! NOTE: This online and publicly available version of the tool has a 500 CSV file limit. If you need to ingest more than 500 files, send us an email and let us know.


Do you need enhanced tools or additional help?

Do you need any additional help, or would you like a customized version of this tool with more features for your business, website or server? Help us help you - contact us and tell us a little bit about your project and let us know what you need. We are confident we can help!