Tag Archives: Big data

idea product recommendation

Product Recommendation with machine learning using ElasticSearch

Elasticsearch provides us with powerful machine learning based tool for product recommendation for e-commerce.

What do we know about our customers?

It is relatively easy to know which products our customers have either bought, or clicked upon in the past.   Using elastic search we are able to leverage this data to recommend to these users other products which have been of interest to different users who have showed an interest in the same products.

For example if our user has clicked upon a book about medieval French history,  then it would seem obvious that we can show the user the most popular books in the category of medieval French history.   However this approach of simply repeating products in the same category may become tedious, and we may miss many interesting possibilities to offer users products across categories.  For example if  the user buys a camera we would possibly want to offer the user a book on photography.

Machine Learning Product Recommendation

Elasticsearch provides us the possibility to recommend products to users based on what other users who bought the same products as them have purchased.

Term aggregation

For example we have an index which contains all of our users, with all of the products they have purchased as shown in the mapping below.

If a user has bought a polaroid camera, then we can look up the most popular products purchased by other users who brought the same polaroid camera.  This list is likely to include products which are directly related to a polaroid camera (such as polaroid film), and indirectly related (books on photography).

Significant Term Aggregation

A term aggregation may indicate products which are popular with polaroid users, but some of these may be completely unrelated to polaroid cameras, (eg. Mobile telephones) simply because they are popular with everyone (including people who buy polaroid cameras).  If we want to avoid this, then we can use the significant terms aggregation, which will return products which are significantly more popular with polaroid camara buyers compared with our customer set as a whole.

The example below shows a significant term aggregation



This approach is interesting because as our volume of data increases, the quality of our recommendations improves:  as time goes by, we will learn more about our specific users and  at the same time grow our database of user preferences.






Make your own search engine with elasticsearch

In this article you can see how to use elasticsearch to create a fast search engine capable of deep text search, working with terrabytes of data.

We are going to build a search engine based on the living people category of wikipedia, store the data in elasticsearch, test the speed and relevance of our queries and also create an autocomplete suggestion query.


You already have elasticsearch and kibana installed.

Install Pywikibot

Pywikibot enables you to easily download the contents of wikipedia articles.  If you have access to a different source of data, then you can use that instead.

Instructions to install pywikibot are here https://www.mediawiki.org/wiki/Manual:Pywikibot/Installation

Configure pywikibot to use wikipedia.

This is done by running the setup script

python pwb.py generate_user_files

The script is interactive and enables you to define the type of wiki you want to access.  In our case, choose wikipedia.

Install Python Libraries
pip install elasticsearch


Create a Mapping in Elasticsearch

The mapping tells elasticsearch what sort of data is being stored in each field, and how it should be indexed.

The following command can be pasted directly into  the Kibana Dev Tools console

This creates a mapping for document type “wiki_page” in the index “wikipeople” with four text categories (full url, title, categories,text) and one special field called suggest which will be used for autocomplete function (more on that later).  Note also that we have specified that the text field uses an english language analyser.   (as opposed to French,Spanish or any other language).


Create Pywikibot script

In the directory where you installed Pywikibot, you will find a subdirectory “/core/scripts”

In the scripts directory create a new script called wikipeopleloader.py

You can then get pywikibot to run your script using the following command (from ../pywikibot/core directory)

python pwb.py wikipeopleloader.py

The output from the screen will reveal any errors, if all is going well, you should see how the script downloads pages from wikipedia and loads them into elasticsearch.   The speed of download will depend on your machine, in my case one or two pages per second.  For testing you can abort the script (ctrl Z) after a minute or so.

Elasticsearch Search engine Query

Below is an example elasticsearch query and the beginning of the response.

The “source” part of the command specifies that we exclude the text of the page to keep the size of the response down.

The query searches for the terms american football and bearcats in the title, category and body of the text.  However it gives greater weight to the score if these terms are found in the category and title (as determined by the values “boost” in the search query).

The highlight part of the command also returns the detail of the where the search term has been found.   This can be seen in the part of the response labeled “highlight”.  This makes it very easy to display the context of the search term to the user to enable them to see whether they are interested in the results.


Autocomplete suggestions Using ElasticSearch and Jquery

In our mapping we created a special field called “suggest” based on the page title.  This enables us to display an “autocomplete” suggester as the user types into the search box.  Autocomplete queries are optimized to provide very quick responses.   A sample query and response would be as follows:

The query returns suggestions where the title starts with the letters we have introduced in our query.    This would enable us to create autocomplete funcionality with jquery or similar.







Introducing the Open Source IoT stack for Industrie 4.0

The open source IoT stack is a set of open source software which can be used to develop and scale IoT in a business environment.  It is particularly focused towards manufacturing organizations.

Why Open Source?

Continue reading “Introducing the Open Source IoT stack for Industrie 4.0” »


Storing IoT data using open source. MQTT and ElasticSearch – Tutorial

Why ElasticSearch?

  • Its open source
  • Its hugely scaleable
  • Ideal for time series data

It is part of the elasticsearch stack which can provide functionality for the following:

  • Graphs (Kibana)
  • Analytics (Kibana)
  • Alarms

What is Covered in This article

We are going to set up a single elasticsearch node  on a Linux Ubuntu 16.04 server and use it to collect data published on a Mosquitto MQTT server.  (It assumes you already have your MQTT server up and running.)

For full information and documentation, the IoT open source stack project is now called Zibawa and has a project page of its own -where you will find source code, documentation and case studies.

Installing ElasticSearch

Create a new directory myElasticSearch

mkdir myElasticSearch
cd myElasticSearch

Download the Elasticsearch tar :

curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.1/elasticsearch-2.4.1.tar.gz

Then extract it as follows :

tar -xvf elasticsearch-2.4.1.tar.gz

It will then create a bunch of files and folders in your current directory. We then go into the bin directory as follows:

cd elasticsearch-2.4.1/bin

And now we are ready to start our node and single cluster:


To store data we can use the command

curl -XPOST 'localhost:9200/customer/external?pretty' -d '
"name": "Jane Doe"

To read the same data we can use

curl -XGET 'localhost:9200/customer/external/1?pretty'

If you can see the data you created, then elasticSearch is up and running!

Install the Python Client for elasticsearch

pip install elasticsearch

Install the PAHO mqtt client on the server

pip install paho-mqtt

Create a Python MQTT client script to store the MQTT data in elastic search

Use the script mqttToElasticSearch.py which uses both the MQTT Paho and ElasticSearch python libraries.  You will need to modify the lines at the top depending upon the port and IP address of your MQTT installation.

You can download the file from


Or if you have GIT installed use:

git clone https://github.com/mattfield11/mqtt-elasticSearch.git

The script should be installed into a directory on the same server as you have ElasticSearch running.

Run the Python MQTT client we just downloaded

python mqttToElasticSearch.py

To view the data we just created on elasticsearch

curl 'localhost:9200/my-index/_search?q=*&pretty'

We are now storing our MQTT data in elasticsearch!
In the next few days I will publish how to view MQTT data in Kibana where we will make graphs, and analyse the MQTT data.

Further Information


Zibawa – Open source from device to Dashboard.  Project, applications, documentation and source code.




Running as a service on Linux I didnt use this, but probably should have!



ElasticSearch Python Client


Business Intelligence without changing your ERP

Business Intelligence (or BI) is evolving at great speed.  With the media hype around big data companies are growing increasingly interested in this area.   Tools which are particularly interesting are those that are capable of absorbing data from a variety of sources, and allowing users with little or no IT knowledge to analyse and draw conclusions from that data for business decisions.    A certain degree of technical knowledge may be necessary to set these tools up, and key is to understand how your data is (or is not) structured.  Once you get the feeling for how they work, and the type of analysis you can perform, then these tools are inexpensive and invaluable.