Category Archives: Software

autocomplete

Make your own search engine with elasticsearch

In this article you can see how to use elasticsearch to create a fast search engine capable of deep text search, working with terrabytes of data.

We are going to build a search engine based on the living people category of wikipedia, store the data in elasticsearch, test the speed and relevance of our queries and also create an autocomplete suggestion query.

Pre-requisites

You already have elasticsearch and kibana installed.

Install Pywikibot

Pywikibot enables you to easily download the contents of wikipedia articles.  If you have access to a different source of data, then you can use that instead.

Instructions to install pywikibot are here https://www.mediawiki.org/wiki/Manual:Pywikibot/Installation

Configure pywikibot to use wikipedia.

This is done by running the setup script

python pwb.py generate_user_files

The script is interactive and enables you to define the type of wiki you want to access.  In our case, choose wikipedia.

Install Python Libraries
pip install elasticsearch

 

Create a Mapping in Elasticsearch

The mapping tells elasticsearch what sort of data is being stored in each field, and how it should be indexed.

The following command can be pasted directly into  the Kibana Dev Tools console

This creates a mapping for document type “wiki_page” in the index “wikipeople” with four text categories (full url, title, categories,text) and one special field called suggest which will be used for autocomplete function (more on that later).  Note also that we have specified that the text field uses an english language analyser.   (as opposed to French,Spanish or any other language).

 

Create Pywikibot script

In the directory where you installed Pywikibot, you will find a subdirectory “/core/scripts”

In the scripts directory create a new script called wikipeopleloader.py

You can then get pywikibot to run your script using the following command (from ../pywikibot/core directory)

python pwb.py wikipeopleloader.py

The output from the screen will reveal any errors, if all is going well, you should see how the script downloads pages from wikipedia and loads them into elasticsearch.   The speed of download will depend on your machine, in my case one or two pages per second.  For testing you can abort the script (ctrl Z) after a minute or so.

Elasticsearch Search engine Query

Below is an example elasticsearch query and the beginning of the response.

The “source” part of the command specifies that we exclude the text of the page to keep the size of the response down.

The query searches for the terms american football and bearcats in the title, category and body of the text.  However it gives greater weight to the score if these terms are found in the category and title (as determined by the values “boost” in the search query).

The highlight part of the command also returns the detail of the where the search term has been found.   This can be seen in the part of the response labeled “highlight”.  This makes it very easy to display the context of the search term to the user to enable them to see whether they are interested in the results.

 

Autocomplete suggestions Using ElasticSearch and Jquery

In our mapping we created a special field called “suggest” based on the page title.  This enables us to display an “autocomplete” suggester as the user types into the search box.  Autocomplete queries are optimized to provide very quick responses.   A sample query and response would be as follows:

The query returns suggestions where the title starts with the letters we have introduced in our query.    This would enable us to create autocomplete funcionality with jquery or similar.

 

 

 

 

 

opensource

Introducing the Open Source IoT stack for Industrie 4.0

The open source IoT stack is a set of open source software which can be used to develop and scale IoT in a business environment.  It is particularly focused towards manufacturing organizations.

Why Open Source?

Continue reading “Introducing the Open Source IoT stack for Industrie 4.0” »

bigdata

Storing IoT data using open source. MQTT and ElasticSearch – Tutorial

Why ElasticSearch?

  • Its open source
  • Its hugely scaleable
  • Ideal for time series data

It is part of the elasticsearch stack which can provide functionality for the following:

  • Graphs (Kibana)
  • Analytics (Kibana)
  • Alarms

What is Covered in This article

We are going to set up a single elasticsearch node  on a Linux Ubuntu 16.04 server and use it to collect data published on a Mosquitto MQTT server.  (It assumes you already have your MQTT server up and running.)

For full information and documentation, the IoT open source stack project is now called Zibawa and has a project page of its own -where you will find source code, documentation and case studies.

Installing ElasticSearch

Create a new directory myElasticSearch

mkdir myElasticSearch
cd myElasticSearch

Download the Elasticsearch tar :

curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.1/elasticsearch-2.4.1.tar.gz

Then extract it as follows :

tar -xvf elasticsearch-2.4.1.tar.gz

It will then create a bunch of files and folders in your current directory. We then go into the bin directory as follows:

cd elasticsearch-2.4.1/bin

And now we are ready to start our node and single cluster:

./elasticsearch

To store data we can use the command

curl -XPOST 'localhost:9200/customer/external?pretty' -d '
{
"name": "Jane Doe"
}'

To read the same data we can use

curl -XGET 'localhost:9200/customer/external/1?pretty'

If you can see the data you created, then elasticSearch is up and running!

Install the Python Client for elasticsearch

pip install elasticsearch

Install the PAHO mqtt client on the server

pip install paho-mqtt

Create a Python MQTT client script to store the MQTT data in elastic search

Use the script mqttToElasticSearch.py which uses both the MQTT Paho and ElasticSearch python libraries.  You will need to modify the lines at the top depending upon the port and IP address of your MQTT installation.

You can download the file from

https://github.com/mattfield11/mqtt-elasticSearch

Or if you have GIT installed use:

git clone https://github.com/mattfield11/mqtt-elasticSearch.git

The script should be installed into a directory on the same server as you have ElasticSearch running.

Run the Python MQTT client we just downloaded

python mqttToElasticSearch.py

To view the data we just created on elasticsearch

curl 'localhost:9200/my-index/_search?q=*&pretty'

We are now storing our MQTT data in elasticsearch!
In the next few days I will publish how to view MQTT data in Kibana where we will make graphs, and analyse the MQTT data.

Further Information

 

Zibawa – Open source from device to Dashboard.  Project, applications, documentation and source code.

https://zibawa.com

ElasticSearch

https://www.elastic.co/

Running as a service on Linux I didnt use this, but probably should have!

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-service.html#using-systemd

 

ElasticSearch Python Client

https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/index.html

Agile_Project_Management

Agile Digital Transformation – Low risks high ROI

An inspirational video from CEO Peter Schroer of ARAS , a manufacturer of Product Lifecycle Management Software (PLM). Schroer advocates the use of an “agile” approach to factory digital transformation, the same as that used in software development. Strategically it is a mistake to draw up a large complex project, since during the execution of the project many things will change , invalidating many of our original plans:

  • Business objectives change
  • Changes in the business environment (economic down/up turn),
  • Changes in company ownership(mergers / aquisitions)
  • Changes in customer requirements
  • Changes in legal requirements to be met
  • New technology appears

If we limit ourselves to the implementation of simple, achieveable tasks which have a direct and immediate impact on the business, which are achievable over a number of “sprints” (where a sprint is managed in weeks not months) then we have far greater probability of success.

That doesn’t mean that we have no longer term vision or road map of where we are going in the long term, rather it means that we do not devote a large amount of resource to detailed planning or execution of the long term, or paralysis by analyisis.

This approach has important consequences:

We must accept that a single software system or platform is not going to be able to continually adapt to our changing business requirements. This means that communication between systems and the use of open source software is an essential part of the strategy to avoid lock in to a single system.

Large upfront license payments are rarely compatible with this philosophy for the same reason. Software as a service which enables us to pay as a funcion of number of users or volume of transactions is much more compatible with this approach.

We must be prepared to test, put a foot in the water, test, then build on what works, and abandon what fails.  As Schroer says, “Do something we know, make it work, then go onto the next thing”.

You will find the video on agile plm implementation here.

Cibersecurity – 10 steps to reduce risk

Over recent years there have been two key trends in cibersecurity.  The first is a growing tendency for attacks to be made by professional organized groups looking for profit, rather than amateur hackers wanting to prove their ability.  The second is the growing recognition that good defence is more about process and people than the technology used itself.  Continue reading “Cibersecurity – 10 steps to reduce risk” »