|
|
||||||
Developing better products faster
|
What's New in the Latest FormRules Release (v4) The latest release of FormRules (September 2011) adds substantial new functionality to the previous releases. The main change is the inclusion of Decision Trees, for modeling outputs which take 'class values' rather than numerical ones. Highlights of the new release include:
Decision Trees for modeling 'classified' properties If a property is classified (has text values) then the model is now developed using a modified version of the ID3 Decision Tree algorithm including many of the C4.5 extensions to generate a set of rules derived from the decision tree. The rules can be those derived directly from the decision tree, or pruned to reduce their number and complexity. ASMOD neurofuzzy models are created for properties that take numeric values (as was the case in earlier versions of FormRules. This change has resulted in changes to the Training and Consult screens, and a new 'classification plot' is provided for the decision tree models. ANOVA statistics are provided for neurofuzzy models, but these are not appropriate for decision tree models - for those cases, the number of times which the model results in mis-classification are presented so that you can judge model quality. Ability to handle more inputs and outputs, and more data records The spreadsheet displayed on the Enter/Edit screen now allows up to inputs/ouputs and 50,000 experiments. This represents a considerable advance over the previous 150 inputs/outputs, and means that it is no longer necessary to add extra rows manually to the spreadsheet to account for cases with lots of experimental records. Memory is now allocated dynamically, depending on how many data types (inputs/outputs) and experimental data records are used. Within the memory limitations of your PC, data sets of up to 500 data types and 50,000 data records can be used to train the models.
Enter/Edit data screen enhancements To make the data sheet easier to use, cells are now formatted to remove leading and trailing white spaces. The character "?" is now recognized as 'mssing data' and will be replaced by the 'missing data flag', -99999. Message referring to errors in the columns now report the alphabetic code of that column (rather than its number) making it easier to find. When you press the Next button on the Enter/Edit Data screen, if there are any text values used in the data set, the list of values in their encoded sequence is displayed. This makes it eaier to spot potential typographical errors in class values. By default, classified text values are encoded in the order in which they are encountered in the data. For orginal class values, the defult assignment may not make sense, so the new screen allows you to rearrange the order of these values into a sensible sequence. There is also an option to convert classified text data into numeric values. Class imbalance is known to cause issues with some machine-learning algorithms such as ID3/C4.5, where the data is overwhelming represented by a one particular class value. For example a data set looking at a certain biological activity may contain far fewer actives than inactives. In this case standard algorithms may be overwhelmed by the majority examples while minority examples contribute very little. One way to address this imbalance is to rebalance the data by using sampling techniques. In this case this could be achieved by using under-sampling to remove a number of the majority class examples In FormRules Data Analysis, you have far more examples of one classified property than another, you have the option to 'rebalance' the data. This can be done either by excluding data records of the more populous class, or by adding duplicate records for the less populous one.
A new New Smart Compress option allows you a much faster option to compress your data than using clustering. You specify the percentage of records to be removed. The data records are not removed entirely at random; Smart Compress ensures that at least one data record is retained for all the input maximum and minimum values. This ensures that the data ranges are not affected. For a full list of the changes, contact us.
|
|||||
|
|
|||||
|
This document maintained by
webmaster@intelligensys.co.uk.
|
||||||