Steel-trap

Tuesday, November 20, 2012

Pythonize your emacs

Awesomely simple step by step guides for setting up a Python environment on emacs.

Basic Environment:
http://www.yilmazhuseyin.com/blog/dev/basic-emacs-setup/

Further Setup:
http://www.yilmazhuseyin.com/blog/dev/emacs-setup-python-development/

The posts were written by Huseyin Yilmaz

Monday, August 6, 2012

Return to things that are meaningful

This inspires me everytime and is worth remembering..

Return to Awe - Jason Silva & the Melodysheep

Monday, July 9, 2012

Cold-start- and Warm-start-approach

In the field of machine learning, you may have come across the terms cold-start- and warm-start-approach. Roughly, here's what they are:

Cold-start-approach: Each problem is solved independently from the next, for example, a bunch of logistic regression problems are computed in order to find the best regularization parameter λ. This can be efficient when multiple processors are used as the problems can be solved simultaneously.

Warm-start-approach: Each problem is solved using previously computed data as the starting point for the next computation. It can be done efficiently with a single processor as the problems are solved more sequentially.

Monday, June 4, 2012

Using git's bisect to find out where it all broke

When working on a project with with several contributors, it can be quite a pain if the lastest commit breaks. Using blame is one way to go, but if this doesn't yield the solution or the
place where shit went wrong, git bisect can be used to systematically track the point or commit
where things broke, which is a nice debugging tool.

Here's a nice tutorial on the matter:

Monday, May 7, 2012

DeprecationWarnings in Python 2.7

If you're using Python versino 2.7, you may notice that the DeprecationWarning no longer displays. This has been de-activated by default, however, it can be switched back on by, for example, doing the following in iPython

 
>>> import warnings

>>> warnings.simplefilter("always")

If you now call a deprecated class on purpose, iPython will display the full DeprecationWarning

Tuesday, April 24, 2012

Hosting your html on Github

If you want to host a webpage on Github, here's how I do it.

For example, the documentation to some project you've done in, say, Python, has been generated using something like Sphinx, or you've just got some html files.. Nonetheless, you want to host an online version of this documentation as a pre-built website on github.

This is especially useful if you've made some changes to the documentation/website of a Github repository, and you want to show it easily to your peers for review, without having them first fork it and build it in order to see what you've done.
Note that these instructions are from how I do it on ubuntu 11.10.

Firstly, on your Github account (Your profile page where your forks and repositories are shown, create a new repository.
Give it a name, like for example, project_documentation_online_build.
A set of instructions should appear, the first them being Global setup, which I'm assuming you've hopefully done already. Follow the Next steps instructions in your terminal to create your directory with the same name (in my case, project_documentation_online_build). They should go a little something like this:

mkdir project_documentation_online_build cd project_documentation_online_build git init touch README git add README git commit -m'first commit' git remote add origin git@github.com:YourUserName/project_documentation_online_build.git git push -u origin master
You can then create a new branch for this repository called gh-pages like so:
git symbolic-ref HEAD refs/heads/gh-pages
Switch to this branch:
git checkout gh-pages
Now you can place all your html files here, like for example your html-build from your Sphinx generated documentation. Once you've placed it all here, you can:

git add . git commit -a -m "First pages commit" git push origin gh-pages

Once you've done this, you should receive a message from Github to inform you that your pages has been succesfully built (usually after a couple minutes).
A wee bit later you should be able to check it at http://YourUserName.github.com/project_documentation_online_build/

Note1: If you're page built but it's missing it's theme and pretty colours, add a blank file called .no_jekyll to the main directory in your gh-pages branch and try pushing it again.

Note2: This is just what works for me. If you have any problems, I recommend you check out the GitHub Pages documentation. These instructions, together with those that are displayed upon creating a new repository, are pretty much what I'm summarising here.

Friday, April 20, 2012

Informative features

Let's say you want to generate a synthetic data-set to play around with for classification,
and you set

n_samples = 100

n_features = 1000

and you generate the following data


import numpy as np

import matplotlib.pyplot as plt


X1 = np.asarray(np.randn(n_samples/2, n_features))

X2 = np.asarray(np.randn(n_samples/2, n_features)) + 5

X = np.append(X1, X2, axis=0)

rnd.shuffle(X)



plt.scatter(X[:,0], X[:,1])

plt.show()

For a binary classification, the function which determines our labels is \[y = sign(X \bullet \omega)\]
Where \(\omega\) is our coefficients.
For now, let's set our coefficients equal to a bunch of zeros:
coef = (np.zeros(n_features))
If we wish to make it so that we have, say, 10 informative features, we can for example set 10 of our coefficients equal to a non-zero value. Now when we dot it with our data, X, we will basically
tell it that the 10 non-zero coefficients are our informative features, while the rest that will be
multiplied by zeros are not informative.

So,

coef[:10] = 1

y = np.sign(np.dot(X,coef))

will give us our corresponding labels such that we have 10 informative features.
A way to visualise this, is to use the Scikit-Learn package's f_classif function.
If you have the Scikit-learn package installed, do the following:

from sklearn.feature_selection import f_classif

p,v = f_classif(X,y)

plt.plot(p)

plt.show()

Here you can see that the first 10 features are rated as the most informative.