I have been researching postgresql search capabilities for quite a while. Besides the very powerful Full Text Search capabilities the engine offers today, there are a few less known index types that support string searches in the form : where foo like ‘%bar%’ For these cases, a regular b-tree index on ‘foo’ is guaranteed to never be used. The following example illustrates a case in which a relatively simple query is transformed to.. Read More

## Note on Kafka consumer group offsets

I recently was testing Kafka 0.10 with the excellent Confluent Kafka Python client API and in the process came up with an initially confusing situation where even after dropping a topic and re-creating it I still was seeing an old offset for a consumer group and didn’t understand how to clear it. I had been testing with both the old and the new Consumer API, and using different ways to obtain consumer group.. Read More

## Maths and Machine Learning – Linear Algebra

I recently read the post by Wale Akinfaderin on the Mathematics of Machine Learning and was compelled to write on the subject as a way to keep track of some useful resources and pointers, having gone – even though having a decent mathematical background – through the process of refreshing up certain mathematical concepts in order to be able to understand machine learning algorithms but also making sense of the strengths and weaknesses.. Read More

## What are the differences between parametric models and non-parametric models ?

In the context of statistical modeling, a model is a set of distributions. It is said to be parametric when it is completely determined by a finite set of parameters. For example in the case of a linear model In this case the regression line that fits the data is completely determined by its parameters and a noise term. We say that the vector space of the model parameters is finite-dimensional. On the.. Read More

## What is the difference between correlation and regression ?

We are assuming here that the “regression” in the question is of a linear form. Correlation is the measure of how strong a linear relationship between two variables is. There are several correlation coefficient standards (Pearson, Spearman, etc). The correlation coefficient ranges between -1.0 and 1.0. Zero correlation means there is no linear relationship between the variables. The greater the correlation coefficient, the stronger the relationship, meaning that when one variable goes up,.. Read More

## What are the different stages of building a model ?

Some weeks months ago I was ingesting my usual load of news related to data science when I saw one of those “frequently asked questions” that proliferate nowadays with the boom of interest in analytics topics. While I could honestly answer most of the questions, I found that some topics required a memory refresh, and in doing so, I thought it would help others refresh theirs, while leaving the door open to visitors’ contributions, in a.. Read More