-
Recent Posts
Recent Comments
Archives
Categories
Meta
Author Archives: faye1010
How to Ace a Data Science Interview
supervised learning Source: How to Ace a Data Science Interview
Posted in Uncategorized
Leave a comment
One Hot Encoding using sklearn
The dataset is the famous Titanic dataset. “onehotlabels” is a <891×1726 sparse matrix of type ‘<type ‘numpy.float64′>’ with 4455 stored elements in Compressed Sparse Row format>. Part of it: (0, 1725) 1.0 (0, 1574) 1.0 (0, 1416) 1.0 (0, 892) … Continue reading
Posted in Python for data analysis
Leave a comment
Exploratory Analysis of Movielen Dataset using Python
The MovieLens 20M dataset: GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time, depending on the size of the set. 20 million ratings … Continue reading
Posted in Uncategorized
Leave a comment
SQL commends cheat sheet 1 (W3school)
1. SELECT: Format: select column1, column 2 from dataset select * from dataset select distinct column from dataset select column1, column2 from table_name where (conditions) Conditions: AND, OR, LIKE, BETWEEN, IN For example, WHERE city … Continue reading
Posted in Uncategorized
Leave a comment
Questions on K means method
What is K means algorithm? What is the optimization function? The partial derivative of Distortion J with respect to each center location must be zero. Will the algorithm stop? When J is minimized, (1) each is encoded by its nearest center … Continue reading
Posted in Machine Learning
Leave a comment
SVM notes
Support vector machine can be used for classification and regression. It has successful applications in many fields, like bioinformatics, text, image recognition, etc. Its main ideas are large margin and kernel trick. Margins: 1.Intuition To make things easier, we suppose … Continue reading
Posted in Machine Learning
Leave a comment
A brief introduction to Neural Networks with an example in R
Motivation: Suppose there are two predictors and the decision boundary is non-linear, then one can introduce quadratic terms in the logistic regression, i.e. $ latex h_{\theta}(x)=g(\theta_0+\theta_1 x_1+\theta_3 x_2+\theta_4 x_1^2+\theta_5 x_1x_2+\theta_6 x_2^2),$ where is the logistic function. However, most questions have more … Continue reading
Posted in Machine Learning, R
Leave a comment
Spectral methods in dynamics
In the dynamics seminar here at Houston, we’re beginning a series of expository talks on statistical properties of dynamical systems. This week’s talk was given by Andrew Törö… Source: Spectral methods in dynamics
Reviews from Amazon’s customers
The dataset is from http://snap.stanford.edu/data/amazon/productGraph/. I analyzed the data from two department: beauty and Toys and Games. The features includes asin – ID of the product reviewerID – ID of the reviewer reviewTime – time of the review overall – rating of … Continue reading
Posted in Uncategorized
Leave a comment