We published a tutorial paper on the Tree Edit Distance on arXiv, namely:
Paaßen, B. (2018). Revisiting the tree edit distance and its backtracing: A tutorial. arXiv: 1805.06869 [cs.DS]
With that publication, we also updated the toolbox again and fixed some bugs in the trees module. Special Thanks go to Martin Donath for spotting further bugs.
- sets via the new
setsmodule. A set alignment is performed via the Hungarian algorithm and requires O(n³) operations where n is the number of elements in the larger set
- trees via the new
treesmodule. For the tree edit distance we support the Algorithm by Zhang and Shasha (1989). We also implement backtracing, both crisp and soft.
- forests via the new
treesmodule. Forests can be either ordered lists of trees, in which case we perform a standard string edit distance based on the tree edit distances between all pairwise tree assignments; or forests can be defined as unordered lists of trees, in which case we perform a set edit distance via the Hungarian algorithm.
Version 3.0.1 contains more convenience functions in the primitives module and a few bugfixes for the primitives and visualization module.
Please have a look at the 'examples' directory directory in the GIT repository to inspect the new version of the toolbox in more detail.
- Instead of the Sequence datastructure the toolbox has been revised to support arbitrary input in List form. That means: You can now input whatever input you like, as long as it implements the java.util.List interface.
- This also means that using the TCS Alignment Toolbox has become significantly easier: There is no need to construct a NodeSpecification or an AlignmentSpecification anymore. You just need your data in List form, a Comparator and an AlignmentAlgorithm.
- The Sequence datastructure is still supported (and implements the java.util.List interface now) but is not mandatory anymore and has been refactored to isolated modules
- Many implementations in the algorithms-lib module had to be removed to make maintaining the toolbox easier.
- The remaining implementations in the algorithms-lib module (global alignment and DTW) have been revised to be more efficient. Combined with the reduced overhead caused by the Sequence datastructure this leads to speedups up to factor 10.
- The gradient computation mechanism has been revised to be easier to understand. See the DerivableAlignmentDistance, the DerivableComparator and the Gradient interface for more details.
- The ReplacementComparator has been revised to be easier to understand and more efficient.
- We have added four new comparators providing a normalized Euclidean and Manhatten distance with and without relevance weighting.
- The java examples are now isolated maven projects, making them easier to use.
Please have a look at the 'examples' directory in the GIT repository to inspect the new version of the toolbox in more detail.
We fixed some bugs in three modules of the TCSAlignmentToolbox, namely:
- sequences module: The validate() method of VectorialKeywordSpecification and SymbolicKeywordSpecification accepts null values now.
- algorithms module: The Operation class validates DELETIONREPLACEMENT and INSERTIONREPLACEMENT OperationType consistency in the constructor now.
- algorithms module: The calculateComparatorDistances() method of the Operation class called the skipDelete() method before if the OperationType SKIPINSERTION was given. This is corrected now.
- adp module: The FlexibleGrammar could not handle DELETIONREPLACEMENT and INSERTIONREPLACEMENT before. This is fixed now.
- We provide additional OperationTypes now, namely DELETIONREPLACEMENT and INSERTIONREPLACEMENT to model the behaviour of Dynamic Time Warping, which reads from one of the input sequences, but does not consume the respective node.
- We support Dynamic Time Warping now as a default grammar in the adp module.
- We have added a Metric Learning Module (learning), that provides implementation distance based classifiers (k-Nearest Neighbor and Large Margin Nearest Neighbor) as well as a gradient-based optimization scheme of metric parameters with respect to the LMNN cost function. More details regarding this approach can be found in the master's thesis Adaptive Affine Sequence Alignment Using Algebraic Dynamic Programming.
- We have added a export and import module (csv) that enables you to store NodeSpecification objects as JSON data (and reimport them from JSON data) as well as store Sequence objects as CSV data (and reimport them from CSV data). The generated data is aimed to be human-readable and, thus, hopefully compatible with other applications.
At June 27th, we presented a Demonstrator for the TCS Alignment Toolbox and its applicability to Intelligent Tutoring Systems at the 8th International Conference on Educational Datamining in Madrid. This demonstrator also provides some general insight with respect to the capabilities of this toolbox, so we thought it best to provide it as free software here as well. You can find it at this link.
- The architecture of the toolbox has been revised. The toolbox is now structured into 9 different modules. Also the package names have been revised, such that backwards compatibility is not garantueed.
- All of those modules are available on maven central (For matlab users a binary distribution as a single, monolithic .jar file is still available)
- The toolbox now supports Algebraic Dynamic Programming, which enables users to design their own alignment schemes in an abstract fashion, without having to worry about the (efficient) implementation. An example of this technique can be seen in a new example.
- The Parallel processing capabilities of the toolbox have been improved, such that they now use the java standard library FixedSizeThreadPool.
- We further improved the documentation by reworking the available uml diagrams, updating the wiki page, updating all examples and extending the javadoc.
Also we will present the Toolbox at the 8th International Conference on Educational Datamining 2015 in Madrid in the Demo track.
We released a new version of the TCSAlignmentToolbox with the main changes:
- Version 1.4.0 now suports local (and affine) alignment (strict as well as soft)
- The bellmans gap sources have been improved and support parameter setting and alignment derivatives now
- The examples have been extended to incorporate new possibilities
Also check out the Wiki page https://opensource.cit-ec.de/projects/tcs/wiki where we now present runtime considerations.
- Implementations of Dynamic Time Warping and Global Sequence Alignment (Needleman-Wunsch)
- Support for multi-dimensional sequential data
- Support for non-vectorial and multi-modal sequence data, e.g. strings, object sequences, structured data
- Support for metric learning by gradient descent: For our algorithms we provide several derivative schemes
- Examples for usage in Java and Matlab
- Parallel Processing support for alignment calculation and derivative calculation
More information can be found at: https://opensource.cit-ec.de/projects/tcs/wiki
Also available in: Atom