rvest is new package that makes it easy to scrape (or harvest) data from html web pages, by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with:
rvest in action
To see rvest in action, imagine we’d like to scrape some information about The Lego Movie from IMDB. We start by downloading and parsing the file with
library(rvest) lego_movie <- html("http://www.imdb.com/title/tt1490017/")
To extract the rating, we start with selectorgadget to figure out which css selector matches the data we want:
strong span. (If you haven’t heard of selectorgadget, make sure to read
vignette("selectorgadget") – it’s the easiest way to determine which selector extracts the data that you’re interested in.) We use
html_node() to find the first node that matches that selector, extract its…
View original post 301 more words
For ecommerce companies last year’s holiday shopping season was a collision of “expectation and reality”: Consumers’ expectation that they could safely order things two days before Christmas, and the reality that shipping companies simply couldn’t handle the massive volume. a16z General Partner Jeff Jordan (who formerly oversaw ebay.com; is currently on the board of Pinterest and also oversees a16z investments in Instacart, Julep, Walker & Company, and Zulily) offers his thoughts on the changes being wrought by and around ecommerce.
How can ecommerce companies deal with the conflicting expectations and realities of an on-demand economy? How can they compete with ecommerce giant Amazon, which just gets bigger every year? And given all this, how is physical retail faring?
Dating site [company]eHarmony[/company] continues to grow, processing more matches more quickly for more users, and now the company’s technological foundation is finally growing along with it. OpenStack, Hadoop, Spark, Docker — eHarmony CTO Thod Nguyen says the company is looking at all of them as it tries to evolve into a company that’s able to innovate on the IT front as well as the dimensions-of-compatibility front.
The overhaul began in 2013 and should be complete by the end of 2015, Nguyen told me in a recent interview. A big part of it is turning eHarmony’s existing virtualization-centric data center into a private cloud environment, mostly likely running the open source [company]OpenStack[/company] cloud software. That will give the company more flexibility in terms of scaling and provisioning the infrastructure, including virtual servers and storage, that power its website and mobile app.
Installed on top of [company]Cisco[/company] UCS blade servers (servers have quietly…
View original post 641 more words
In the last blog post, I had listed down the tools and practices introduced by Netflix in the presentation at AWS Re-invent 2013. In this second part of the blog series, I will attempt to uncover the real learning pointers that can be derived from such techniques and its effectiveness to any Cloud application developer.
1. Using a Cloud Provider is not same as using a hosting provider. It requires delicate planning, process and engineering efforts over a period of time. Without all this, an organisation cannot leverage all the benefits of a Cloud service.
2. Having an Agile infrastructure alone cannot solve problems if your developers have to perform too many rudimentary operations to use it. That also leads to another problem – Giving direct access to your developers and not be able to manage it effectively. AWS Admin Console is good from an Operations point of view…
View original post 1,291 more words
SpringXD is a new project which simplifies the development of Big Data Applications. SpringXD’s ability to stream data between two different modules. In this example we will use an HTTP module as a data source and a GemFire module as a data sink.
One Component of Big Data Applications is Pivotal’s GemFire. SpringXD has some built in connections to GemFire which allow you to get up and running quickly.
Before reading this guide I would recommend that you have a working SpringXD install. The best place to get going with SpringXD is the getting started guide over at spring.io
The data we will be posting is a Customer. The JSON that represents this customer looks like this.
Start the Gemfire Server
SpringXD comes with a scaled down GemFire server, if you are on a Mac and installed SpringXD via BREW the command to start gemfire server will…
View original post 237 more words
It seems as though everyone in tech today is infatuated with the full-stack developer. Full stack may have been possible in the Web 2.0 era, but a new generation of startups is emerging, pushing the limits of virtually all areas of software. From machine intelligence to predictive push computing to data analytics to mobile/wearable and more, it’s becoming virtually impossible for a single developer to program across the modern full stack.
When I first started programming computers as a kid in the pre-mobile, pre-web late 1970s/early 1980s, a single person typically wrote a complete software program from start to finish, and there weren’t many other layers of software between the programmer and the hardware. Using assembly language was the norm for programmers trying to squeeze more performance and space out…
View original post 649 more words