What is running on this page?
A genetic algorithm designed to find a near optimal solution to large instances of the Travelling Salesman Problem .
The Genetic Algorithm is running inside a web worker, a completely separate thread to the main page. The application is written in ClojureScript , a functional programming language that compiles to JavaScript using the Google Closure Compiler.
The aim of this experiment was to determine the feasibility of training machine learning models in the browser, utilizing the web worker spec now implemented in all major browsers.
Additionally, I wanted to assess ClojureScript alongside it's development tools together with application state and UI frameworks: Re-Frame and Reagent . I wanted to pick up experience in these technologies and test their ability to produce robust applications in the data science domain.
Why use a genetic algorithm?
Genetic algorithms implement a global search of the problem space. This is particularly important to the Travelling Salesman Problem where many good but suboptimal solutions will exist that would otherwise trap the search in local maxima.
Additionally many different genetic operators for use on permutation representations have been proposed in the literature. These range from slight mutations to destructive position and order changes. Effectively combining these will allow more dynamic searches of the problem space, converging quicker and potentially reaching more optimal solutions.
Finally, genetic algorithms are incredibly flexible and can be applied to many different problems. If problem solutions can be adequately represented in an encoded form, operators can effectively alter that representation and a fitness function can be found that provides consistent evaluation then a GA can be used to intelligently navigate the problem space.
Why train ML models in the browser?
There are benefits to this that are potentially unrealized commercially. If you are able to (re)architect a product from the ground up; pushing complex and long-running processes to the clientside saves money in cloud instance hours.
Successfully doing this unlocks opportunities for scaling products and services driven by machine learning models that are customised to individual users. This kind of scale just isn't feasible on the backend for startups without established prior offerings to help fund it.
Clearly many ML training tasks are not suitable for the browser. These tasks require massive parallelism, are typically trained on GPUs and still take days and weeks to converge. However there are many smaller, much less complex problems that can be effectively solved using machine learning methods that are suitable for the browser.
Additionally even large and complex models can be fine-tuned, evaluated and even individualized on the browser. These can be augmented with smaller, clientside-trained models which may feed individualized input into the larger models.
As privacy and the ethics of data collection become a big talking point amongst the internet community, this is one way to build strong trust with users over the use of their data to provide innovative services.
If ML models can be trained in the browser, user data does not have to leave the browser. It doesn't have to be transmitted over the internet, it doesn't have to be stored by companies. Once trained, models can encode an individual's preferences without the ability to extract their information from it.