Database analysis tools help DBAs in identifying the discrepancies and illogical connections between data sets, so that they can be removed from the database. However, the capability of database analysis tools depend upon the platform on which database is built. For instance, small-scale relational database created on MS Access needs a whole lot of different DB analysis utility than one created using MySQL. If you’re looking for an all-in-one database analysis tool that can help you identify and clean redundant entries from the database, odds of you coming across a paid utility are high. Today, we discovered an open source, java-based utility called DataCleaner that implements a variety of database analysis and cleaning concepts including Data Quality, Master Data Management, Data Profiling, and Data Quality Monitoring, to remove all superfluous data sets off the specified data store. The application supports a wide range of database platforms and data files including CSV, MS Excel (XLS and XLSX), MS Access, SAS library, DBase database, Fixed Width File, XML, OpenOffice Database, MonoDB, MySQL, Oracle and MS SQL Server.
Additionally, it provides sample connection strings and other options to help you quickly create connections with your database. Once database is connected or data sets are retrieved from specified DB, it lets you view complete Metadata information of DB tables, enabling you to easily modify the required fields and rows. Moreover, it includes an extensive Filters feature that may help you view only the required part of database. You can apply numerous conditional filters on the database including Equals, Max rows, Null Check, Number range, Single word, String pattern match, and String value range. You have also the liberty to tweak with selected filters to meet the data mining requirements.
DataCleaner comes with an transformer feature that allows you to pre-process the data to extract, combine and generate separate values. The built-in Analyzer provides comprehensive Data Quality and Data Profiling operations to assist you in analyzing the content of selected Database. It includes a number of analysis features such as Boolean analyzer, Character set distribution, Data gap analyzer, Matching analyzer, Pattern finder, String analyzer, Weekday and Value distribution and more.
The main interface shows the supported datastore types. All you need is to pick the database type and configure the connection details. For example, if you want to clean and analyze MySQL database, enter Database name, Drive class name, connection string, and username & password. You can open Excel or Access database file by just specifying the complete path. Once connection is established, it adds the database to the list. Now, check the database you want to analyze, and click Analyze.
The left sidebar holds databases from selected datastore, allowing you to explore the datasets. You need to select the fields to perform analysis. Just double-click the fields you want to include, and it will list them in Source tab. You can check the datasets and their type, and remove any unnecessary fields.
Under Metadata tab, you will find all the meta data information, including Table, Column, Type family (based on stored data), Type, Native Type, Size, Nullable condition, Index status.
The Filters tab lets you include any aforementioned filters to extract specific information from columns. It lets you add multiple types of filters to the list, so that you can easily filter out the required data sets. Once you’ve added a filter, collapse the filter window, and then click Add filter to choose the type of filter you want to use.
Analyzer is probably the most useful feature of the application. As mentioned before, it allows you to select numerous kinds of analysis. Not only does it give you detailed analysis report, it also lets you create analysis graphs. Just select the required type of analysis, and fill in the required information. Once done, click Run analysis.
Upon click, it will start creating analysis report. The Progress information tab shows all the processes it run while generating the analysis.
You can double-click the graph button present in first field to plot the data on the graph. DataCleaner allows you to save the jobs. It creates an XML file that holds the analysis report. Analysis job files can be viewed by opening the XML file from File –> Open analysis job.
DataCleaner offers write data option during different stages of data analysis. For example. the Filters feature lets you write cleansed data to a separate file. You can export the filtered data to CSV file, create Excel spreadsheet, make staging table or insert the data directly into a table.
DataCleaner is a massive data analysis and cleaning utility that supports Windows, Linux, and Mac OS X. We have tried to cover the most salient features of the application, and might have missed out few useful tools and options. You can find detailed help manuals and usage guides from product page. It must be noted that it requires latest JRE (Java Runtime Environment) to work.