I have been planning to start learning analytics for sometime now. Every time I think I’ll start, something more urgent will pop up. But today I got a bit of time and so I built a fresh environment for learning purposes. Here’s what I did.

Created a new virtual machine using VirtualBox(opensource virtualization platform from Oracle). Having a virtual machine helps in multiple ways. My work related stuff is not affected in any way by new applications that I install for learning purposes. If something crashes, I can discard the VM and create a new one. I can also backup the entire VM and move it to another computer.

The new virtual machine runs Ubuntu 17.04. It comes with Python 3.6.3 pre-installed. Compared to python 2.7, there are some changes in 3.0 and above, which are not compatible backwards.

I need to run python3 to start the python interpreter.

[email protected]:~$ python3
Python 3.6.3 (default, Oct 3 2017, 21:45:48) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Now that I have python already available, I must get some of the essential packages installed. But to install packages, I need to install pip first. pip is a package management system used to install software packages developed in python. To install pip, I ran the following command  from a terminal window:

sudo apt install python-pip

Once pip is installed, things are easy. Now I can get the most needed packages.

(1) NumPy – stands for Numerical Python – forms the base package for scientific computations in python. It offers a variety of operations on arrays, matrix computations etc. Here’s the screen shot of my NumPy installation:

[email protected]:~$ pip install NumPy
Collecting NumPy
 Downloading numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB)
 100% |████████████████████████████████| 16.7MB 39kB/s 
Installing collected packages: NumPy
Successfully installed NumPy-1.13.3

(2) SciPy – Offers engineering and science library. It has modules for linear algebra, optimization, integration and statistics. The library uses NumPy package extensively. Screen shot for SciPy installation follows:

[email protected]:~$ pip install SciPy
Collecting SciPy
 Downloading scipy-1.0.0-cp27-cp27mu-manylinux1_x86_64.whl (46.7MB)
 100% |████████████████████████████████| 46.7MB 23kB/s 
Collecting numpy>=1.8.2 (from SciPy)
 Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Installing collected packages: numpy, SciPy
Successfully installed SciPy-1.0.0 numpy-1.13.3

(3) Pandas – a python package intended for what they call “data wrangling”. It simplifies the way we hold data for manipulations. Data frames provided by Pandas are good for data manipulation,aggregation, and visualization. Here’s how I installed Pandas library:

[email protected]:~$ pip install Pandas
Collecting Pandas
 Downloading pandas-0.21.0-cp27-cp27mu-manylinux1_x86_64.whl (24.3MB)
 100% |████████████████████████████████| 24.3MB 38kB/s 
Collecting pytz>=2011k (from Pandas)
 Downloading pytz-2017.3-py2.py3-none-any.whl (511kB)
 100% |████████████████████████████████| 512kB 267kB/s 
Collecting python-dateutil (from Pandas)
 Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
 100% |████████████████████████████████| 194kB 201kB/s 
Collecting numpy>=1.9.0 (from Pandas)
 Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Collecting six>=1.5 (from python-dateutil->Pandas)
 Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: pytz, six, python-dateutil, numpy, Pandas
Successfully installed Pandas-0.21.0 numpy-1.13.3 python-dateutil-2.6.1 pytz-2017.3 six-1.11.0

(4) Matplotlib – A visualization library for python. It requires a bit of effort to draw charts and graphs with Matplotlib, compared to other software tools. It can plot almost all kinds of charts. I got it installed as shown below:

[email protected]:~$ pip install Matplotlib
Collecting Matplotlib
 Downloading matplotlib-2.1.0-cp27-cp27mu-manylinux1_x86_64.whl (14.9MB)
 100% |████████████████████████████████| 14.9MB 76kB/s 
Collecting cycler>=0.10 (from Matplotlib)
 Downloading cycler-0.10.0-py2.py3-none-any.whl
Collecting numpy>=1.7.1 (from Matplotlib)
 Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl
Collecting python-dateutil>=2.0 (from Matplotlib)
 Using cached python_dateutil-2.6.1-py2.py3-none-any.whl
Collecting backports.functools-lru-cache (from Matplotlib)
 Downloading backports.functools_lru_cache-1.4-py2.py3-none-any.whl
Collecting subprocess32 (from Matplotlib)
 Downloading subprocess32-3.2.7.tar.gz (54kB)
 100% |████████████████████████████████| 61kB 1.8MB/s 
Collecting pytz (from Matplotlib)
 Using cached pytz-2017.3-py2.py3-none-any.whl
Collecting six>=1.10 (from Matplotlib)
 Using cached six-1.11.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from Matplotlib)
 Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB)
 100% |████████████████████████████████| 61kB 1.6MB/s 
Building wheels for collected packages: subprocess32
 Running setup.py bdist_wheel for subprocess32 ... done
 Stored in directory: /home/justin/.cache/pip/wheels/7d/4c/a4/ce9ceb463dae01f4b95e670abd9afc8d65a45f38012f8030cc
Successfully built subprocess32
Installing collected packages: six, cycler, numpy, python-dateutil, backports.functools-lru-cache, subprocess32, pytz, pyparsing, Matplotlib
Successfully installed Matplotlib-2.1.0 backports.functools-lru-cache-1.4 cycler-0.10.0 numpy-1.13.3 pyparsing-2.2.0 python-dateutil-2.6.1 pytz-2017.3 six-1.11.0 subprocess32-3.2.7

(5) Seaborn – Visualization library built on top of Matplotlib. It makes visualization a bit easier compared to using plain Matplotlib. I got it installed as shown below:

[email protected]:~$ pip install Seaborn
Collecting Seaborn
 Downloading seaborn-0.8.1.tar.gz (178kB)
 100% |████████████████████████████████| 184kB 304kB/s 
Building wheels for collected packages: Seaborn
 Running setup.py bdist_wheel for Seaborn ... done
 Stored in directory: /home/justin/.cache/pip/wheels/29/af/4b/ac6b04ec3e2da1a450e74c6a0e86ade83807b4aaf40466ecda
Successfully built Seaborn
Installing collected packages: Seaborn
Successfully installed Seaborn-0.8.1

So this concludes installation of some of the essential packages that I needed to start off. More may be required as I go along, but for now, I shall start with these.

Next, I will try out various container data structures available with plain vanilla python – like lists, dictionaries etc.

 

Python Lists - Easy data structure to create and manipulate

Written by Justin Jose

He is a post graduate in computers and has more than 14 years of industry experience with some of the leading Information Technology companies in India. Data-centric computing, ranging from data architecture to analytics is his area of interest.