본문 바로가기

Enginius/Machine Learning

What is the future of Big data

 Lazy engineer and  Necessity is the mother of invention. 


 In this era of 'Big data', reachability and accessibility to immense amount of data make conventional statistics and machine learning approaches, such as an support vector machine (SVM) or artificial neural network (ANN) , infeasible and sometimes useless. 


 There are three main shortcomings of these methods. 

 1. Computational overhead. Most of the existing 

 2. Too many parameters to tune. <= Hyper parameter

 3. Hard to get the labeled data 

 

 Many of the existing machine learning algorithms assumes "i.i.d." condition.  Below is the insightful posting by the Subutai Ahmad who's chief engineering in Grok (past Numenta).

 A question we get all the time from machine learning fans is: "How does Numenta's Cortical Learning Algorithm (the CLA) compare to traditional machine learning algorithms?" There are many ways to answer this question. In this blog entry, I will focus on one specific difference, perhaps the most fundamental one.


 First a bit of background: There is a well known truism in machine learning, the "No Free Lunch Theorem," which states that no algorithm is inherently better than any other algorithm. What distinguishes one algorithm from the next are the inherent assumptions and how well those assumptions fit the problem domain. For example, if you are predicting data that lies on a straight line, nothing is going to beat linear fitting. If the data lies on a circle, it's hard to imagine a worse technique.


 By far the most common assumption made in machine learning is the "i.i.d" assumption. In statistics, i.i.d. stands for independently and identically distributed, which states that every input record comes from the same probability distribution and is statistically independent of previous and future records. This is a very useful assumption - it makes the math easier, leads to the Central Limit Theorem, allows you to derive accuracy bounds, etc. Just about every popular technique, such as regression, support vector machines, neural networks, Bayesian networks, random forests, and decision trees rely on this assumption.


 Unfortunately, when you think about the real world of streaming data, this assumption is just plain wrong. Your weekly revenue numbers are not i.i.d. Last week's numbers are a better predictor than the numbers from 13 weeks ago. Yesterday's weather is correlated with today's. The web log for a customer navigating an e-commerce website is likely to follow specific sequences. Your GPS coordinates from 5 minutes ago are an excellent predictor of your current location. The list is endless. Streaming temporal patterns are the very antithesis of i.i.d.


The CLA is an inherently temporal learning algorithm, and doesn't care about i.i.d. It greedily constructs sequences and does not assume independence. If you saw a particular revenue pattern the last two weeks, it assumes you are more likely to see it this week. If you haven't seen a pattern for several years, it will likely forget it. Also, CLA assumes that the underlying distribution can change. This is what makes it online or adaptive. If your revenue jumps because you just added an important customer, it will adapt. It inherently assumes your data stream contains sequences and is constantly changing. We didn't invent this - the core ideas are inherent in the neocortex of the brain and lend themselves well to streaming data analytics.


CLA is not the only technique to break the i.i.d. assumption. Other algorithms, such as Hidden Markov Models and many time series algorithms, also relax that assumption. So, how is CLA different from HMMs? Good question. I guess we'll just have to tackle that in another blog entry...stay tuned! 


 In the below screencast, Jeff Hawkins narrates the presentation he gave at a workshop called "From Data to Knowledge: Machine-Learning with Real-time and Streaming Applications." The workshop was held May 7-11, 2012 at the University of California, Berkeley.

 


Hierarchical Temporal Memory (HTM) 

1. Sparse Distributed Representation

  

2. Sequence memory

3. Online Learning