K-means Clustering and Use cases In Security Domain
Clustering (Unsupervised Learning)
What is clustering ?
Clustering means the collection of datapoints which are aggregated together based on similar qualities.
For suppose, in a dataset, if we plot a graph of it we may have concentration of datapoints more on at some range and similar to that we may have concentration in some other range in same dataset.
All the datapoints can be aggregated based on similarities which makes it to fall the points in similar range if we plot a graph. There aggregated points becomes clusters depends on set of data points.

From the above fig we have to more concentration of datapoints on 3 ranges in the graph. We have 3 clusters. So if any point which comes with simliar range of values, then that point can be easily predicted that it belongs to certain cluster. This is what exactly K-means algorithm helps to find out. This raises to the Unsupervised learning.
K-means Clustering Algorithm :
K-means Clustering is a popular unsupervised learning algorithm which can group the unlabelled data into fixed number of clusters which is k. For suuppose if we define k=3 , it groups the datapoints into 3 clusters.

It is actually centroid based algorithm, which creates K number of centroids and every cluster is associated with the centroid. The main goal is to minimize the sum of distances between datapoint and their corressponding clusters. It performs the iterative operations in calculating the centroids for stable clusters.
First, we have to define number of clusters. Then select random centroids and after that each data point nearest to centroid will assign to the centroid point and creates Initial cluster the after that it calculates varience (Variance is a measure of dispersion of data points from the mean. Low variance indicates that data points are generally similar and do not vary widely from the mean). This is what ‘means’ in k-means mean. Then this iterations goes on until it finds stable clusters.

Use Cases of K-means Clustering in Security Domain
Here, we get to know that K-means helps us to make the clusters by grouping the datapoints as the datapoints the cluster have the similar behaviour and qualities. So in future, if we get any datapoint which is similar to the datapoints of any cluster, then that datapoint can be said that it belongs to that cluster as it has similar behaviour. So if we label the clusters, based on behaviour of the security concerns like most of the data points of this particular cluster leads to make security issues. If anyone comes with this similar kind of data which make it falls under the security concern cluster can be easily predicted that the person or anyone with that data would raise security concerns and would breach the security or violate the security.
Conclusion :
So, things related to security breach can be predicted based on behaviour of data. We can also predict based of logs of web server, that which user can be more dangerous to the server. So K-means clustering algorithm helps in security related issues.
Thankyou…
Keep learning…
Keep sharing…