Likelihood Log

Econometrics of scale

Category Archive: Web Applications

Sourcing Data 101: Last Resort – Web Scraping

web scraping 3What is web scraping? 

Continuing our discussion about sourcing data sets, today we will talk about web scraping.

Web scraping is the process of automated extraction of data from a web page by exploiting the structure of the HTML code underlying the page. Other definitions of web scraping can be found on Wikipedia (but of course!) and Webopedia.

In terms of getting a data set for our data analysis, web scraping is usually the “last resort” option to fall back on if all else has failed. Concretely, if you have had no luck finding a nice well-organized data set in a conventional format like CSV or JSON and if you have had no luck plugging into your desired data stream via some sort of a controlled API, you would then try web scraping.

(more…)

Continue Reading

Sourcing Data 101: API-s and Programmable Web.

api5We Need API-s

Today we will continue our discussion of sourcing datasets for research and analysis.

Last time we talked about datasets that are available to the general public and can be freely downloaded and used for research and we have discussed various portals, repositories and tools that can be used in searching for these datasets. Today we will talk about extracting data from the web programmatically. So basically instead of manually downloading a data set from some site A onto the hard drive and then using some program B to analyze that dataset, we would like to make program B connect to site A and read the data automatically and analyze it on the go.

(more…)

Continue Reading

Sourcing Data 101: Publicly Available Datasets

data-new-oil2First step in getting started with any data analysis is to actually get hold of some meaningful data.

Fortunately, we live in the age of Big Data where information is abundant in all its volume, velocity, volatility and voracity.

Last 50 years have seen unprecedented rise of video cameras, sound recorders, counters, logs, sensors, trackers, all accompanied by ever decreasing cost of storage to keep all of this data. Suddenly both capturing and storing data is easy and the challenge has shifted to the actual making sense of the data.

Thus, as long as you know how to make sense of data and to turn it into meaningful information, there is no shortage of actual data repositories out there that you can apply your skill to. A large number of these repositories are free and open to public and this should be your first point of call before investing into customized sampling and research or purchasing premium data sets from data service providers.

Today I would like to talk about large data sets that are available on the web, usually for free, for public use.

(more…)

Continue Reading

A Less Obvious Way in Which Technology Is Changing Economics

Lately there has been a lp2p 4ot of talk in the economics and tech communities about the technological revolution that is currently happening.  The “new machine age”, the “industrial revolution 2.0”, automation and artificial intelligence are replacing traditional jobs like taxi drivers, travel agents and insurance brokers.  To some, the recent protests by the taxi drivers against Uber in London, looked just like the smashing of textile machines by the Luddites in Victorian England.

(more…)

Continue Reading

Top 5 Web Application Vulnerabilities

hackedA security researcher from Israel has discovered a very basic, almost “school boy” level bug in Gmail that could have potentially compromised millions of email addresses. He notified Google, who have rectified the problem and have rewarded the honest fellow with the whooping $500. Here is the news article that details these events:

http://rt.com/news/165552-gmail-bug-users-address/

For those technically minded, watch the embedded Youtube video that details how Oren Hafif did it.

(more…)

Continue Reading

KPCB Internet Trend Report 2014 Is Out Now

internet

Kleiner Perkins (KPCB) is a venture capital firm that has, since its establishment in 1972, successfully invested in incubation of AOL, Amazon.com, Citrix, Compaq, Electronic Arts, Google, Intuit, Juniper Networks, Netscape, Sun Microsystems and Symantec among others – they are considered one of Silicon Valley’s top venture capital providers. Their long awaited annual Internet Trends report has just been released and it makes a fascinating read. The full report can be found here.

It’s a 164 slide powerpoint presentation, but well worth a read. For those who are too busy, here is a summary distilled to a few key points:

(more…)

Continue Reading

JavaScript Drives Robots

This is great!  You can now use simple JavaScript to program robots.  And I mean actual robots that blink eyes, move around the room, pick up Coke cans and do other robot stuff.  Here is a podcast that explains how this happens:

http://hanselminutes.com/391/controlling-robots-with-nodejs-and-johnny-five-with-raquel-vlez

All you need is a bit of web programming know-how (and I’m talking rather basic stuff), the Johnny Five library that runs on node.js and a simple Adruino open source micro controller!

Why is this big news?  Because, all of these technology components are simple, easy to get hold of and easy to learn.  And this allows almost anyone to get into robotics, play around and contribute.

(more…)

Continue Reading

WebRTC – A Lot More Than Just Another Skype

webRTCI have recently come across WebRTC (RTC stands for Real Time Communication) and found it to be a very neat piece of technology.

WebRTC is a suite of protocols, standards and APIs that allow real time browser-to-browser communication on a peer-to-peer basis.  Well, not quite exactly that if there are firewalls involved, but you get the point.

This doesn’t just mean instant chat, video messaging, file exchange, i.e. things that the likes of Skype are already do well.  This means a lot of other things, and it is this extension on the usual Skype-like functionality that is the really exciting part.  Basically we now have the ability to bring to life any kind of instant interaction between two web browsing experiences across the world – what I do in my browser while I surf the net determines what you see in your browser!

(more…)

Continue Reading