Likelihood Log

Econometrics of scale

Linear Regression – How To Do It Properly Pt.2 – The Model

model specification 2

Model Specification and Evaluation

In the last post we talked about the maths behind linear regression.  We looked at how the model is fitted, how individual coefficient estimates are computed and what their individual properties such as mean and variance are.   We have also gone over some important conditions that must be satisfied in order for linear regression to really be an effective and powerful tool for data analysis and we have made a point that unless all of these conditions are met, the OLS linear regression model loses most of its authority and other models often become better alternatives.

In today’s post I would like to continue looking at mathematical due diligence that an analyst needs to do in order to make proper use of linear regression.  Specifically I would like to talk about specifying the model – selecting the explanatory variables that should be in the model, omitting the explanatory variables that should be left out and avoiding confusion between causality and correlation.  I will also look at ways of evaluating and comparing linear regression models with each other and with other kinds of models.

(more…)

Continue Reading

Linear Regression – How To Do It Properly Pt.1 – The Maths

linear regression 1Today I would like to talk about the mathematical concepts behind ordinary least squares (OLS) linear regression. We will look at the linear algebra used for fitting linear regression models and for estimating regression coefficients. We will also talk about theorems that make linear regression so powerful and we will investigate how, depending on which preconditions for which theorems are met, regression models can be meaningful or completely meaningless or anything in between.

(more…)

Continue Reading

Linear Regression – How To Do It Properly – Pt 0

chainsawLinear regression is dangerous. Very dangerous. Here is why…

Most of introductory statistics courses that are taken by social science specialists, after covering some descriptive basics like skewness, kurtosis and Student’s t-distribution finish off with one piece of “sort of advanced” statistical material – linear regression. Econometrics courses for economists focus almost exclusively on linear regression with only a chapter or two dedicated to things like logistic regression or trees. More recently, the numerous data science and machine learning courses for technologists again treat linear regression as the most important citizen of data science. All of this leads many in social sciences, economics, finance and technology to believe that data analysis is pretty much linear regression.

(more…)

Continue Reading

Data Science Tools: Python


python2
What is Python

Python is a programming language that is often used in data science applications.  It is not just a data science specific tool, but is in fact a versatile all-purpose programming language that is widely used for building all sorts of applications – games, interactive websites, enterprise software, etc.  However, because Python is relatively easy to learn, is open source/free and has amassed an impressive range of libraries geared towards number crunching, scientific computing and machine learning, it has become the programming language of choice in the data science community.

(more…)

Continue Reading

Data Visualization

datavoz4Most of us are used to consuming information in the form of tables, graphs and charts.  In recent years, there has been growing demand from customers, stakeholders and management for information in visual form.  Being “data visualization literate” is fast becoming a skill as indispensable and basic as being able to use a word processor or Excel.

So what are some of the rules of thumb for creating a good dashboard or a set of charts?

Before going into that let’s cover a few key ideas.

(more…)

Continue Reading

NoSQL vs. SQL, Who Is Who?

nosql2NoSQL is a very fashionable buzzword lately, everyone has heard about it, everyone knows it to be the new big thing and yet very few know what it really is.  To most, NoSQL is a magical new technology that is just like SQL except friendly to parallel processing and therefore scalable to very large datasets, something to do with the cloud, Hadoop and MapReduce.  That is partially true, but perhaps a clearer description is called for.

In fact, NoSQL is not just one specific technology, or paradigm. Instead, it is a loosely grouped collection of data storage and retrieval technologies that all extend or altogether replace the traditional relational database paradigm in one form or another, but always in some way that makes it horizontally scalable, versatile and suitable for large and fast data flows.  Sometimes the paradigm is relational sometimes not, sometimes there is SQL sometimes there is no room for it – No SQL is a heterogeneous set of technologies and the right interpretation for the acronym NoSQL is ”Not Only SQL”.

(more…)

Continue Reading

A Less Obvious Way in Which Technology Is Changing Economics

Lately there has been a lp2p 4ot of talk in the economics and tech communities about the technological revolution that is currently happening.  The “new machine age”, the “industrial revolution 2.0”, automation and artificial intelligence are replacing traditional jobs like taxi drivers, travel agents and insurance brokers.  To some, the recent protests by the taxi drivers against Uber in London, looked just like the smashing of textile machines by the Luddites in Victorian England.

(more…)

Continue Reading

Top 5 Web Application Vulnerabilities

hackedA security researcher from Israel has discovered a very basic, almost “school boy” level bug in Gmail that could have potentially compromised millions of email addresses. He notified Google, who have rectified the problem and have rewarded the honest fellow with the whooping $500. Here is the news article that details these events:

http://rt.com/news/165552-gmail-bug-users-address/

For those technically minded, watch the embedded Youtube video that details how Oren Hafif did it.

(more…)

Continue Reading

Big Day Today – A Computer Has Passed The Turing Test (For The First Time)

blade runnerBig Day Today!

For the first time a computer has passed the Turing Test.

Here is a detailed news article with names and dates on Gizmodo.

This is big news indeed. Those who studied theoretical Computer Science would have heard of the Turing Test, a procedure designed to distinguish whether an entity is a human being or a mere machine with formidable artificial intelligence.  The concept was first proposed by Alan Turing, one of the founders of computer science. (more…)

Continue Reading

KPCB Internet Trend Report 2014 Is Out Now

internet

Kleiner Perkins (KPCB) is a venture capital firm that has, since its establishment in 1972, successfully invested in incubation of AOL, Amazon.com, Citrix, Compaq, Electronic Arts, Google, Intuit, Juniper Networks, Netscape, Sun Microsystems and Symantec among others – they are considered one of Silicon Valley’s top venture capital providers. Their long awaited annual Internet Trends report has just been released and it makes a fascinating read. The full report can be found here.

It’s a 164 slide powerpoint presentation, but well worth a read. For those who are too busy, here is a summary distilled to a few key points:

(more…)

Continue Reading