Python is becoming one of the most powerful and most widely used computer languages in recent times. The main advantages of python are its wide range of libraries, simple and dynamic language, free and open-source, very clear syntax, easy to use, and capability of interacting with almost all third party languages and platforms. Python is used in many domains like web development, data analysis, Game development, Machine Learning & Artificial intelligence, Web Scrapping, and so on. The great data handling capacity in python makes it the most preferred language for machine learning.
A Python library is a reusable block of code that you may want to include in your programs or projects. Let us look into some of the most used python libraries for Machine Learning.
Pandas is a python library used for data analysis and data manipulation. It allows merging and filtering of data, as well as gathering it from other external sources like Excel. It provides fast, expressive, and flexible data structures to easily work with structured and time-series data. Pandas provide three data structures. They are Series (one dimension), DataFrame (two-dimension), and Panel (three-dimension). Its key data structure is DataFrame. Dataframe data are aligned in rows and columns.
To import pandas library: import pandas
To create data structures in pandas:
- pandas.Series(data,index, dtype,copy)
- pandas.DataFrame( data, index, columns, dtype, copy)
- pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)data – It can be narray, list, dict, etc.,
- index – The length of the index should be as same as data length. The index value should be unique
- copy(optional) – default is false.
Creation of DataFrame:
import pandas as pd
0 apple 24
1 orange 34
2 mango 28
NumPy means Numerical python. It is a python library used for working with huge multidimensional matrices and arrays. NumPy is faster than python lists. NumPy’s array class is called ndarray (n dimension arrays). Some of the features of NumPy are Mathematical and logical operations on arrays, Fourier transforms, and routines for shape manipulation,.Operations related to linear algebra. in-built functions for linear algebra and random number generation.
To import NumPy library: import numpy
NumPy array creation: numpy.array(data,ndmin)
ndmin=number of dimension
import numpy as np
arr = np.array([1,2,3,4])
[1 2 3 4 5]
SciPy is an Open Source Python-based library, which is used in mathematics, scientific computing, and technical computing. SciPy contains varieties of sub-packages that help to solve the most common issue related to Scientific Computation.
To import the Scipy library: import Scipy
Some of the sub-packages of Scipy:
- File input/output – scipy.io
- Special Function – scipy.special
- Statistics and random numbers – scipy.stats
- Optimization and fit – scipy.optimize
- Image manipulation – scipy.ndimage
Scikit-learn also defined as sklearn is a python library with a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction. It features various algorithms like support vector machines, random forests, and k-neighbors.
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
Fitting a model in stats model typically involves 3 easy steps:
- Use the model class to describe the model
- Fit the model using a class method
- Inspect the results using a summary method.
Matplotlib is a plotting library for the Python programming language. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib comes with a wide variety of plots. Plots help to understand trends, patterns, and to make correlations. Maplotlib along with packages like SciPy (Scientific Python) and NumPy is widely used as a replacement for MatLab, a popular platform for technical computing.
import matplotlib.pyplot as plt
Seaborn is a dataset-oriented API for examining relationships between multiple variables. It provides specialized support for using categorical variables to show observations or aggregate statistics. It provides a high-level interface for drawing attractive and informative statistical graphics. Distplot stands for distribution plot, it takes as input an array and plots a curve corresponding to the distribution of points in the array.
import seaborn as sns
import matplotlib.pyplot as plt
sns.distplot([0, 1, 2, 3, 4, 5])
Stay updated with Emerging Technologies and Science.