Getting started with Python 3 & virtual environments on OS X Yosemite

Notes on installing Python 3, virtualenv and virtualenvwrapper on OS X Yosemite. (also shared on QUTPy GitHub repository)

Start by installing XCode from the Apple App store. The latest version (6.4) seems to include command line tools (these used to have to be installed separately).

Create a ~/.bash_profile file to set the architecture type:

# Set architecture flags
export ARCHFLAGS="-arch x86_64"
test -f ~/.bashrc && source ~/.bashrc

On OSX one way to install python3 is to install Homebrew and use

brew install python3

If you don’t want to use Homebrew, you can go to https://www.python.org/downloads/ and download the latest python3 from there.

I use virtual environments with Python to manage dependencies, so the next step is to install that (using pip3 makes sure you are using python3):

pip3 install virtualenv

Virtualenvwrapper makes using virtualenv much easier:

pip3 install virtualenvwrapper

Make directories for your projects and virtualenvs if they don’t already exist (by default virtualenvwrapper will use ~/.virtualenvs)

mkdir -p ~/Projects ~/Virtualenvs

Use which python3  to find out where your python3 is so you can set it in .bashrc.

Create a ~/.bashrc file to set parameters for virtualenvwrapper:

# pip should only run if there is a virtualenv currently activated
export PIP_REQUIRE_VIRTUALENV=true
# set paths to python & directories
export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3
export WORKON_HOME=$HOME/Virtualenvs
export PROJECT_HOME=$HOME/Projects
source /usr/local/bin/virtualenvwrapper.sh

By setting PIP_REQUIRE_VIRTUALENV=true it prevents pip from running outside of a virtualenv – this might not suit you if you already have work using the systemwide python environment, but is a way to ‘force’ using virtualenv.

Make your first virtualenv:

mkproject test1

This will create the ~/Projects/test1 directory and setup the python libraries for that virtualenv in ~/Virtualenvs/test1. It uses python3 because we set the VIRTUALENVWRAPPER_PYTHON to python3. You can use a command line option to change the python version.

Virtualenvwrapper also activates that virtual environment when you create it. To leave the virtualenv use:

deactivate

To get back into the virtualenv use:

workon test1

workon allows tab completion to see the available environments.

When the virtualenv is active, anything you pip install will be installed just for that virtualenv, not system wide. Once in the virtualenv, the environment is set to the python version you are using, so you can use either pip3 or pip.

See the virtualenvwrapper documentation for other commands to manage your virtual environments.

Try out Jupyter and Pandas

If you want to get started with Jupyter and Pandas you can just install them in your new virtualenv project (this will also install quite a few extra modules that these require):

pip install jupyter
pip install pandas

Start jupyter using:

jupyter notebook

installing latest gensim in Anaconda

I’m using Continuum Analytics Anaconda python because it was the easiest way to get BLAS working for gensim on OSX Mavericks.

I wanted to install the latest gensim (gensim 0.10.2) into Anaconda from the Python Package index (pypi). I followed this tutorial: Tutorial: Basic tutorial for building a Conda package and this Python Packages and Environments with conda for creating a new conda environment for the new gensim.

Create the package

conda install conda-build
conda update conda
cd PyDev/anaconda-builds/
conda skeleton pypi gensim
conda build gensim

Upload it to BinStar

As recommended in the tutorial, I created an account on BinStar and uploaded the package I’d made to that so I can install it like any other Anaconda package. It also means that other people can use the package from my channel instead of having to build it.

conda install binstar
binstar login
binstar upload /Users/brenda/anaconda/conda-bld/osx-64/gensim-0.10.2-py27_0.tar.bz2
conda config --add channels brenda
conda create -n gensim0.10.2 anaconda gensim=0.10.2

Use the new environment

To activate this environment, use:

source activate gensim0.10.2

To deactivate this environment, use:

source deactivate

Pandas timeseries plot – setting x-axis major and minor ticks and labels

I’ve asked this question on StackOverflow (http://stackoverflow.com/questions/12945971/pandas-timeseries-plot-setting-x-axis-major-and-minor-ticks-and-labels), but couldn’t include images because I haven’t posted on stackOverflow before. So here it is, with the images.

I want to be able to set the major and minor xticks and their labels for a time series graph plotted from a Pandas time series object.

The Pandas 0.9 “what’s new” page says: “you can either use to_pydatetime or register a converter for the Timestamp type” but I can’t work out how to do that so that I can use the matplotlib ax.xaxis.set_major_locator ax.xaxis.set_major_formatter (and minor) commands.

If I use them without converting the pandas times, the x-axis ticks and labels end up wrong.

By using the ‘xticks’ parameter I can pass the major ticks to pandas.plot, and then set the major tick labels. I can’t work out how to do the minor ticks using this approach. (I can set the labels on the default minor ticks set by pandas.plot)

Here is my test code:

import pandas
print 'pandas.__version__ is ', pandas.__version__
print 'matplotlib.__version__ is ', matplotlib.__version__

dStart = datetime.datetime(2011,5,1) # 1 May
dEnd = datetime.datetime(2011,7,1) # 1 July

dateIndex = pandas.date_range(start=dStart, end=dEnd, freq='D')
print "1 May to 1 July 2011", dateIndex  

testSeries = pandas.Series(data=np.random.randn(len(dateIndex)), index=dateIndex)

ax = plt.figure(figsize=(7,4), dpi=300).add_subplot(111)
testSeries.plot(ax=ax, style='v-', label='first line')

# using MatPlotLib date time locators and formatters doesn't work with new pandas datetime index
ax.xaxis.set_minor_locator(matplotlib.dates.WeekdayLocator(byweekday=(1),interval=1))
ax.xaxis.set_minor_formatter(matplotlib.dates.DateFormatter('%d\n%a'))
ax.xaxis.grid(True, which="minor")
ax.xaxis.grid(False, which="major")
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('\n\n\n%b%Y'))
plt.show()

# set the major xticks and labels through pandas
ax2 = plt.figure(figsize=(7,4), dpi=300).add_subplot(111)
xticks = pandas.date_range(start=dStart, end=dEnd, freq='W-Tue')
print "xticks: ", xticks
testSeries.plot(ax=ax2, style='-v', label='second line', xticks=xticks.to_pydatetime())
ax2.set_xticklabels([x.strftime('%a\n%d\n%h\n%Y') for x in xticks]);
# set the text of the first few minor ticks created by pandas.plot
#    ax2.set_xticklabels(['a','b','c','d','e'], minor=True)
# remove the minor xtick labels set by pandas.plot 
ax2.set_xticklabels([], minor=True)
# turn the minor ticks created by pandas.plot off 
# plt.minorticks_off()
plt.show()
print testSeries['6/1/2011':'6/7/2011']

and it’s output:

pandas.__version__ is  0.9.1.dev-3de54ae
matplotlib.__version__ is  1.1.1
1 May to 1 July 2011 <class 'pandas.tseries.index.DatetimeIndex'>
[2011-05-01 00:00:00, ..., 2011-07-01 00:00:00]
Length: 62, Freq: D, Timezone: None

xticks: <class 'pandas.tseries.index.DatetimeIndex'>
[2011-05-03 00:00:00, ..., 2011-06-28 00:00:00]
Length: 9, Freq: W-TUE, Timezone: None

2011-06-04   -0.199393
2011-06-05   -0.043118
2011-06-06    0.477771
2011-06-07   -0.033207
Freq: D