A Java High Performance Tool For Topological Data Analysis


Technical description

We note that jHoles develops a filter that transforms the network in a clear object for our library, jPHEngine. Through jPHEngine this object will become a simplicial complex and it whose computed the persistent homology. The library returns on output a unique object, named PersistentHomologyMetaData. As a first step for jPHEngine, we want to remove all the unnecessary features with respect to jHoles. The relevant structural choices of jPHEngine are: 

  • a unique user interface to handle the computation of persistent homology;
  • a facility to store and retrieve all the relevant data;
  • a better and easier representation of data;
  • full control over the API behaviour via configuration file;
  • a control layer to properly configure the library;
  • persistent storage for partial data.

With partial data we refer to all the data that the algorithm generates during its execution, like the chain module or the reduced matrix. jHoles is designed to be easily used even by non computer scientists. Its main point of access is jHoles, a class offering all the methods to process a graph. This architectural choice was made to keep it simple to use, grouping in a single class its core functions. This interface comes with some pre-made, multi-threaded parsers for files, supporting GEXF files, "edge list" files (sometimes referred as "sparse matrix representation" i.e. in Matlab) or a plain text file representing a matrix. It offers different methods to filter the network: one, marked as deprecated, uses Holes original algorithm; the others are various implementations of the improved algorithm: the difference is mainly in the optimization, e.g. how many threads the library should use, where to use caching technologies or to thread pooling to reduce the overhead. It is currently under active development a paged data structure to store the simplicial complex, as its dimensions may grow up easily: its aim is to avoid computational limits (i.e. the computer which it is running on has not enough RAM) at the price of some speed. The result of the filtration is stored in a Hash Map provided by the Java Runtime Environment.

Pipeline of jPHEngine in jHoles