Author Archives: faye1010

How to Ace a Data Science Interview

supervised learning Source: How to Ace a Data Science Interview

Posted in Uncategorized | Leave a comment

Learning curves (example)

 

Posted in Python for data analysis | Leave a comment

One Hot Encoding using sklearn

The dataset is the famous Titanic dataset. “onehotlabels” is a <891×1726 sparse matrix of type ‘<type ‘numpy.float64′>’ with 4455 stored elements in Compressed Sparse Row format>. Part of it: (0, 1725) 1.0 (0, 1574) 1.0 (0, 1416) 1.0 (0, 892) … Continue reading

Posted in Python for data analysis | Leave a comment

Exploratory Analysis of Movielen Dataset using Python

The MovieLens 20M dataset: GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time, depending on the size of the set.  20 million ratings … Continue reading

Posted in Uncategorized | Leave a comment

SQL commends cheat sheet 1 (W3school)

1. SELECT: Format: select column1, column 2 from dataset select * from dataset select distinct column from dataset select column1, column2 from table_name where (conditions)  Conditions: AND, OR, LIKE, BETWEEN, IN For example,            WHERE city … Continue reading

Posted in Uncategorized | Leave a comment

Questions on K means method

What is K means algorithm? What is the optimization function? The partial derivative of Distortion J  with respect to each center location must be zero. Will the algorithm stop? When J is minimized, (1) each is encoded by its nearest center … Continue reading

Posted in Machine Learning | Leave a comment

SVM notes

Support vector machine can be used for classification and regression. It has successful applications in many fields, like bioinformatics, text, image recognition, etc. Its main ideas are large margin and kernel trick. Margins: 1.Intuition To make things easier, we suppose … Continue reading

Posted in Machine Learning | Leave a comment

A brief introduction to Neural Networks with an example in R

Motivation: Suppose there are two predictors and the decision boundary is non-linear, then one can introduce quadratic terms in the logistic regression, i.e. $ latex  h_{\theta}(x)=g(\theta_0+\theta_1 x_1+\theta_3 x_2+\theta_4 x_1^2+\theta_5 x_1x_2+\theta_6 x_2^2),$ where is the logistic function. However, most questions have more … Continue reading

Posted in Machine Learning, R | Leave a comment

Spectral methods in dynamics

In the dynamics seminar here at Houston, we’re beginning a series of expository talks on statistical properties of dynamical systems. This week’s talk was given by Andrew Törö… Source: Spectral methods in dynamics

Posted in dynamical system and probability | Leave a comment

Reviews from Amazon’s customers

The dataset is from http://snap.stanford.edu/data/amazon/productGraph/. I analyzed the data from two department: beauty and Toys and Games. The features includes asin – ID of the product reviewerID – ID of the reviewer reviewTime – time of the review overall – rating of … Continue reading

Posted in Uncategorized | Leave a comment