Online File and Folder Analysis






Types of File and Folder Analysis

When it comes to analyzing a specific set of files and folders, there are different types of analysis one might be interested in. For example:

  • The structure of folders (or directories) and its files can be analyzed
  • The types of files within folders, as well as their extensions and file sizes can be analyzed
  • The data contained within the files might need to be analyzed
  • Or some combination or all of the above might be of interest

Analyzing files and content for migration purposes

For the purposes of an enterprise or system / data migration, all of the above are likely of interest. However, it can be quite difficult to develop a generic tool to ingest data for every type of file. There are many different industries and systems when it comes to migrating data. As an example, in the legal industry, analyzing emails, word documents, pdf files and other document types might be of interest. In the manufacturing industry, analyzing engineering bills of material and/or engineering diagrams would likely be more important than trying to analyze the contents of emails. While different data and content migrations may require different types of analysis for the data contained within the files , we believe that all migrations, without exception, can benefit from an analysis that describes the structure of folders and its files, as well as the types of files within each folder, as well as their extensions, size and checksums.

Knowing the answers to these questions are pertinent before attempting or even beginning any kind of enterprise data migration. Over many years of experience, we have found that any enterprise data migration must begin with an analysis of the data source(s), and to that end, we have created a tool that helps you achieve this pre-migration analysis of a data source. Note that the standard ETL process generally does not include this analysis step - feel free to check out our article on Misunderstanding ETL to learn more.

How to analyze files and folders recursively

For our solution, we approached it by writing a tool that does the following (you'll need intermediate coding skills, or you can use our free tool below as a sample, or if you still need help, feel free to contact us).

  1. First, create a new database for where you intend to store the ingested data set for the files and folder analysis.
  2. Next, create two new tables, one called file, and one called rootpath, using the following table schemas:
    									CREATE TABLE `rootpath` (
    										`id` int(11) NOT NULL AUTO_INCREMENT, 
    										`pathToRoot` varchar(500) DEFAULT NULL, 
    									PRIMARY KEY (`id`))
    									ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
    									
    									CREATE TABLE `file` (
    									`id` int(11) NOT NULL AUTO_INCREMENT,
    									`fileExists` tinyint(1) DEFAULT NULL,
    									`rootpath_id` int(11) DEFAULT NULL,
    									`filePath` varchar(300) DEFAULT NULL,
    									`fileName` varchar(100) DEFAULT NULL,
    									`baseName` varchar(100) DEFAULT NULL,
    									`extension` varchar(10) DEFAULT NULL,
    									`fileSize` int(11) DEFAULT NULL,
    									`checkSum` varchar(32) DEFAULT NULL,
    									`processed` tinyint(1) DEFAULT NULL,
    									PRIMARY KEY (`id`),
    									KEY `FK_rp_rootpathid` (`rootpath_id`),
    									KEY `rootPathId_filePath_fileName` (`rootpath_id`,`filePath`,`fileName`),
    									KEY `fileName_checkSum` (`fileName`,`checkSum`),
    									KEY `filename` (`fileName`),
    									CONSTRAINT `FK_rp_rootpathid` FOREIGN KEY (`rootpath_id`) REFERENCES `rootpath` (`id`)
    									) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
    								
  3. Foreach file found within the input directory:

Free Online Tool To Analyze Your Files and Folders

Feel free to use the following tool to analyze your dataset. To use it, simply zip up your data and upload it. Note that you may include both files and folders/directories, as well as files with sub-directories or sub-folders (and so on) within your Zip Archive. We will recursively ingest the data and present a database table explorer to allow you to query against it. Note that there may be a limit on the upload size, so you won't be able to upload hundreds of megabytes or gigabytes of data. If you do need to have a larger dataset analyzed, contact us and let us know - we'd be willing to help!


Do you need enhanced tools or additional help?

Do you need any additional help, or would you like a customized version of this tool with more features for your business, website or server? Help us help you - contact us and tell us a little bit about your project and let us know what you need. We are confident we can help!