import pandas as pd
= {
data 'apples': pd.Series([3, 2, 0, 1]),
'oranges': pd.Series([0, 3, 7, 2])
}
= pd.DataFrame(data)
df
df
Unable to display output for mime type(s): application/vnd.dataresource+json
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is generally the most commonly used pandas object. You can think of it like a spreadsheet or SQL table, or a dictionary of Series objects. It is generally the most commonly used pandas object.
DataFrames can be created in various ways, but for this example, we’ll create a DataFrame from a dictionary of pandas Series.
import pandas as pd
data = {
'apples': pd.Series([3, 2, 0, 1]),
'oranges': pd.Series([0, 3, 7, 2])
}
df = pd.DataFrame(data)
df
Unable to display output for mime type(s): application/vnd.dataresource+json
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexes can be used to select specific rows and columns that you want to manipulate. They can also be used to modify the structure of the DataFrame itself, for example, by adding rows or columns.
Let’s explore some examples of how to work with DataFrame indexes.
Unable to display output for mime type(s): application/vnd.dataresource+json
Pandas DataFrames can be created in various ways. Some of the most common methods are: from a list, from a dictionary, from a list of dictionaries, and from a NumPy array. Let’s explore examples of each.
# Creating a DataFrame from a list
list_data = [['Alex',10],['Bob',12],['Clarke',13]]
df_list = pd.DataFrame(list_data, columns=['Name','Age'])
df_list
Unable to display output for mime type(s): application/vnd.dataresource+json
# Creating a DataFrame from a dictionary
dict_data = {'Name':['Tom', 'Nick', 'John'], 'Age':[20, 21, 19]}
df_dict = pd.DataFrame(dict_data)
df_dict
Unable to display output for mime type(s): application/vnd.dataresource+json
Pandas DataFrames offer a wide range of operations and methods that can be used to manipulate and analyze data. In this section, we’ll explore how to create new columns, how to create columns from other columns through operations, and how to combine DataFrames using the concatenate method.
Unable to display output for mime type(s): application/vnd.dataresource+json
# Creating a column from other columns
df['total_fruits'] = df['apples'] + df['oranges'] + df['bananas']
df
Unable to display output for mime type(s): application/vnd.dataresource+json
# Creating another DataFrame to concatenate
df2 = pd.DataFrame({'apples': [5, 3], 'oranges': [2, 4], 'bananas': [7, 6]}, index=[4, 5])
# Concatenating DataFrames
df_concat = pd.concat([df, df2])
df_concat
Unable to display output for mime type(s): application/vnd.dataresource+json
ignore_index
ParameterWhen concatenating DataFrames, pandas provides an ignore_index
parameter. If ignore_index
is set to True, the resulting DataFrame will have a new integer index, ignoring the original indices of the concatenated DataFrames. Let’s see an example.
Pandas provides several methods that are useful for quickly summarizing and gaining insights from your data. In this section, we’ll explore the value_counts
, unique
, nunique
, and describe
methods. Let’s first create a new DataFrame for these examples.
# Creating a new DataFrame
data = {
'A': np.random.randint(1, 10, 20),
'B': np.random.choice(['red', 'green', 'blue'], 20),
'C': np.random.normal(0, 1, 20),
'D': np.random.choice(['cat', 'dog', 'rabbit'], 20),
'E': np.random.randint(1, 100, 20)
}
df_explore = pd.DataFrame(data)
df_explore
Unable to display output for mime type(s): application/vnd.dataresource+json
Unable to display output for mime type(s): application/vnd.dataresource+json
Pandas provides several methods for sorting and ranking data in a DataFrame. In this section, we’ll explore the sort_values
, iloc
, and loc
methods. The sort_values
method sorts a DataFrame by one or more columns, while iloc
and loc
are used for indexing and selecting data.
Unable to display output for mime type(s): application/vnd.dataresource+json
Unable to display output for mime type(s): application/vnd.dataresource+json
Boolean indexing is a powerful tool that allows you to select data that meets certain conditions. This can be done using comparison operators (>
, <
, ==
) and logical operators (&
for ‘and’, |
for ‘or’). Let’s see some examples.
Unable to display output for mime type(s): application/vnd.dataresource+json
Unable to display output for mime type(s): application/vnd.dataresource+json
Unable to display output for mime type(s): application/vnd.dataresource+json
Unable to display output for mime type(s): application/vnd.dataresource+json
Pandas provides several methods to export a DataFrame to different file formats. This can be very useful when you want to save your data for later use or to share it with others. In this section, we’ll explore how to export a DataFrame to CSV, Excel, and JSON formats.
Now that we have learned about pandas DataFrames, let’s put our knowledge into practice with some exercises. These exercises cover all the topics we have discussed in this notebook and vary in difficulty from easy to hard.
Create a DataFrame from a dictionary with keys ‘Name’, ‘Age’, and ‘City’. The ‘Name’ column should contain five different names, the ‘Age’ column should contain ages between 20 and 40, and the ‘City’ column should contain the names of five different cities.
Given the DataFrame created in Exercise 1, perform the following operations:
Create a DataFrame with 10 rows and 3 columns named ‘A’, ‘B’, and ‘C’. The ‘A’ column should contain random integers between 1 and 10, the ‘B’ column should contain random floats between 0 and 1, and the ‘C’ column should contain the string ‘random’ for all rows. Then, export this DataFrame to a CSV file named ‘random.csv’.
Given the DataFrame created in Exercise 3, perform the following operations:
Given a DataFrame with 100 rows and 5 columns named ‘A’, ‘B’, ‘C’, ‘D’, and ‘E’. All columns should contain random integers between 1 and 100. Perform the following operations: