Explorating Uber rides data

A simple exploration of Uber pickups data in New York from April 2014 to September 2014.

Read On →

Comparing Online Portfolio Selection Algorithms

A comparison of online portfolio selection algorithms on growth stocks and a simple uniform weigh buy and hold strategy.

Read On →

Predicting US Equities Trends Using Random Forests

A semi-replication of Manojlović and Štajduhar paper 'Predicting Stock Market Trends Using Random Forests: A Sample of the Zagreb Stock Exchange' using U.S. equities.

Read On →

Hunting Down Growth Stocks

Each year, a bunch of news sites and organizations publish lists of the most innovative companies. Can we use those lists to hunt down growth companies? To answer the question, I aggregated those lists from the sites and compared the holding period return for the listed companies from 2012 to 2015.

Read On →

Kaggle Animal Shelter Competition

Right from the competition description: 'Every year, approximately 7.6 million companion animals end up in US shelters. Many animals are given up as unwanted by their owners, while others are picked up after getting lost or taken out of cruelty situations. Many of these animals find forever families to take them home, but just as many are not so lucky. 2.7 million dogs and cats are euthanized in the US every year. Using a dataset of intake information including breed, color, sex, and age from the Austin Animal Center, we're asking Kagglers to predict the outcome for each animal.'

Read On →

Classifying Student’s Success Rate Based on Three Machine Learning Models

In this post we will analyze the dataset on students' performance and develop classification models that will predict the likelihood that a given student will pass and choose the the most effective model that uses the least amount of computation costs. The three models that explored here are logistic regression, naive bayes and random forests.

Read On →

Efficient Frontier with Python

n a previous post, we naively selected growth companies and constructed a uniform-weigh portfolio out of them. In this post, we are going to use the same list of companies to construct a minimum-variance portfolio based on Harry Markowitz portfolio theory.

Read On →

Model Evaluation and Validation Using Boston Housing prices

Here, we are leveraging a few basic machine learning concepts to predict you the best selling price for their home using the Boston Housing dataset from scikit-learn learn python library. The dataset contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. The goal is to build an optimal model based on a statistical analysis with the tools available. This model will then used to estimate the best selling price for your client's home.

Read On →

Kaggle Digits Recognition

On my test post, we'll solve Kaggle's Digit Recognizer competition using python's machine learning library `sklearn`. It a really simple problem and used as a starting point (along with the Titanic one).

Read On →