I have been planning to start learning analytics for sometime now. Every time I think I’ll start, something more urgent will pop up. But today I got a bit of time and so I built a fresh environment for learning purposes. Here’s what I did.
Created a new virtual machine using VirtualBox(opensource virtualization platform from Oracle). Having a virtual machine helps in multiple ways. My work related stuff is not affected in any way by new applications that I install for learning purposes. If something crashes, I can discard the VM and create a new one. I can also backup the entire VM and move it to another computer.
The new virtual machine runs Ubuntu 17.04. It comes with Python 3.6.3 pre-installed. Compared to python 2.7, there are some changes in 3.0 and above, which are not compatible backwards.
I need to run python3 to start the python interpreter.
[email protected]:~$ python3 Python 3.6.3 (default, Oct 3 2017, 21:45:48) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
Now that I have python already available, I must get some of the essential packages installed. But to install packages, I need to install pip first. pip is a package management system used to install software packages developed in python. To install pip, I ran the following command from a terminal window:
sudo apt install python-pip
Once pip is installed, things are easy. Now I can get the most needed packages.
(1) NumPy – stands for Numerical Python – forms the base package for scientific computations in python. It offers a variety of operations on arrays, matrix computations etc. Here’s the screen shot of my NumPy installation:
[email protected]:~$ pip install NumPy Collecting NumPy Downloading numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl (16.6MB) 100% |████████████████████████████████| 16.7MB 39kB/s Installing collected packages: NumPy Successfully installed NumPy-1.13.3
(2) SciPy – Offers engineering and science library. It has modules for linear algebra, optimization, integration and statistics. The library uses NumPy package extensively. Screen shot for SciPy installation follows:
[email protected]:~$ pip install SciPy Collecting SciPy Downloading scipy-1.0.0-cp27-cp27mu-manylinux1_x86_64.whl (46.7MB) 100% |████████████████████████████████| 46.7MB 23kB/s Collecting numpy>=1.8.2 (from SciPy) Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl Installing collected packages: numpy, SciPy Successfully installed SciPy-1.0.0 numpy-1.13.3
(3) Pandas – a python package intended for what they call “data wrangling”. It simplifies the way we hold data for manipulations. Data frames provided by Pandas are good for data manipulation,aggregation, and visualization. Here’s how I installed Pandas library:
[email protected]:~$ pip install Pandas Collecting Pandas Downloading pandas-0.21.0-cp27-cp27mu-manylinux1_x86_64.whl (24.3MB) 100% |████████████████████████████████| 24.3MB 38kB/s Collecting pytz>=2011k (from Pandas) Downloading pytz-2017.3-py2.py3-none-any.whl (511kB) 100% |████████████████████████████████| 512kB 267kB/s Collecting python-dateutil (from Pandas) Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB) 100% |████████████████████████████████| 194kB 201kB/s Collecting numpy>=1.9.0 (from Pandas) Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl Collecting six>=1.5 (from python-dateutil->Pandas) Downloading six-1.11.0-py2.py3-none-any.whl Installing collected packages: pytz, six, python-dateutil, numpy, Pandas Successfully installed Pandas-0.21.0 numpy-1.13.3 python-dateutil-2.6.1 pytz-2017.3 six-1.11.0
(4) Matplotlib – A visualization library for python. It requires a bit of effort to draw charts and graphs with Matplotlib, compared to other software tools. It can plot almost all kinds of charts. I got it installed as shown below:
[email protected]:~$ pip install Matplotlib Collecting Matplotlib Downloading matplotlib-2.1.0-cp27-cp27mu-manylinux1_x86_64.whl (14.9MB) 100% |████████████████████████████████| 14.9MB 76kB/s Collecting cycler>=0.10 (from Matplotlib) Downloading cycler-0.10.0-py2.py3-none-any.whl Collecting numpy>=1.7.1 (from Matplotlib) Using cached numpy-1.13.3-cp27-cp27mu-manylinux1_x86_64.whl Collecting python-dateutil>=2.0 (from Matplotlib) Using cached python_dateutil-2.6.1-py2.py3-none-any.whl Collecting backports.functools-lru-cache (from Matplotlib) Downloading backports.functools_lru_cache-1.4-py2.py3-none-any.whl Collecting subprocess32 (from Matplotlib) Downloading subprocess32-3.2.7.tar.gz (54kB) 100% |████████████████████████████████| 61kB 1.8MB/s Collecting pytz (from Matplotlib) Using cached pytz-2017.3-py2.py3-none-any.whl Collecting six>=1.10 (from Matplotlib) Using cached six-1.11.0-py2.py3-none-any.whl Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from Matplotlib) Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB) 100% |████████████████████████████████| 61kB 1.6MB/s Building wheels for collected packages: subprocess32 Running setup.py bdist_wheel for subprocess32 ... done Stored in directory: /home/justin/.cache/pip/wheels/7d/4c/a4/ce9ceb463dae01f4b95e670abd9afc8d65a45f38012f8030cc Successfully built subprocess32 Installing collected packages: six, cycler, numpy, python-dateutil, backports.functools-lru-cache, subprocess32, pytz, pyparsing, Matplotlib Successfully installed Matplotlib-2.1.0 backports.functools-lru-cache-1.4 cycler-0.10.0 numpy-1.13.3 pyparsing-2.2.0 python-dateutil-2.6.1 pytz-2017.3 six-1.11.0 subprocess32-3.2.7
(5) Seaborn – Visualization library built on top of Matplotlib. It makes visualization a bit easier compared to using plain Matplotlib. I got it installed as shown below:
[email protected]:~$ pip install Seaborn Collecting Seaborn Downloading seaborn-0.8.1.tar.gz (178kB) 100% |████████████████████████████████| 184kB 304kB/s Building wheels for collected packages: Seaborn Running setup.py bdist_wheel for Seaborn ... done Stored in directory: /home/justin/.cache/pip/wheels/29/af/4b/ac6b04ec3e2da1a450e74c6a0e86ade83807b4aaf40466ecda Successfully built Seaborn Installing collected packages: Seaborn Successfully installed Seaborn-0.8.1
So this concludes installation of some of the essential packages that I needed to start off. More may be required as I go along, but for now, I shall start with these.
Next, I will try out various container data structures available with plain vanilla python – like lists, dictionaries etc.