How to Count Distinct Values of a Pandas Dataframe Column?

We should perceive How to Count Distinct Values of a Pandas Dataframe Column? Consider a plain construction as given beneath which must be made as Dataframe. The sections are stature, weight and age. The records of 8 understudies structure the lines. Consideration nerd! Reinforce your establishments with the Python Programming Foundation Course and become familiar with the essentials. In any case, your meeting arrangements Enhance your Data Structures ideas with the Python DS Course. Also in any case your Machine Learning Journey, join the Machine Learning – Basic Level Course.

By and large, the information in every section addresses an alternate element of the dataframe. It very well might be ceaseless, unmitigated, or something entirely unexpected like particular texts. Assuming that you don’t know about the idea of the qualities you’re managing, it very well may be a decent exploratory advance to be aware of the count of unmistakable qualities. In this instructional exercise, we’ll see how to get the include of one of a kind qualities in every section of a pandas dataframe.

How to Count Distinct Values of a Pandas

Utilizing the pandas dataframe nunique() work with default boundaries gives a count of the relative multitude of unmistakable qualities in every section. ,Pandas – Count of Unique Values in Each Column,In the above model, the nunique() work returns a pandas Series with includes of unmistakable qualities in every section. Note that, for section D we just have two unmistakable qualities as the nunique() work, as a matter of course, overlooks all NaN values.,To count the one of a kind upsides of every segment of a dataframe, you can utilize the pandas dataframe nunique() work. Coming up next is the grammar:

To count the extraordinary upsides of every section of a dataframe, you can utilize the pandas dataframe nunique() work. Coming up next is the language structure:

counts = df.nunique()

height	weight	age
Steve	165	63.5	20
Ria	165	64	22
Nivi	164	63.5	22
Jane	158	54	21
Kate	167	63.5	23
Lucy	160	62	22
Ram	158	64	20
Niki	165	64	21

The first step is to create the Dataframe for the above tabulation. Look at the code snippet below.

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
# show the Dataframe
df

Output:

Pandas Dataframe Column

Strategy 1: Using for circle.

The Dataframe has been made and one can hard coded utilizing for circle and count the quantity of interesting qualities in a particular segment. For instance In the above table, on the off chance that one wishes to count the quantity of extraordinary qualities in the section tallness. The thought is to utilize a variable cnt for putting away the count and a rundown visited that has the recently visited values. Then, at that point, for circle that emphasizes through the ‘tallness’ section and for each worth, it checks whether a similar worth has effectively been visited in the visited list. In case the worth was not visited beforehand, then, at that point, the count is increased by 1.

Below is the implementation:

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
# variable to hold the count
cnt = 0
 
# list to hold visited values
visited = []
 
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
   
    if df['height'][i] not in visited: 
       
        visited.append(df['height'][i])
         
        cnt += 1
 
print("No.of.unique values :",
      cnt)
 
print("unique values :",
      visited)

Output :

No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]

But this method is not so efficient when the Dataframe grows in size and contains thousands of rows and columns. To give an efficiency there are three methods available which are listed below:

pandas.unique()
Dataframe.nunique()
Series.value_counts()

Method 2: Using unique().

The unique method takes a 1-D array or Series as an input and returns a list of unique items in it. The return value is a NumPy array and the contents in it are based on the input passed. If indices are supplied as input, then the return value will also be the indices of the unique value.

Syntax: pandas.unique(Series)

Example:

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
# counting unique values
n = len(pd.unique(df['height']))
 
print("No.of.unique values :", 
      n)

Output:

No.of.unique values : 5

Method 3: Using Dataframe.nunique().

This method returns the count of unique values in the specified axis. The syntax is :

Syntax: Dataframe.nunique (axis=0/1, dropna=True/False)

Example:

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
# check the values of 
# each row for each column
n = df.nunique(axis=0)
 
print("No.of.unique values in each column :\n",
      n)

Output:

No.of.unique values in each column :
height    5
weight    4
age       4
dtype: int64

To get the number of unique values in a specified column:

Syntax: Dataframe.col_name.nunique()

Example:

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
# count no. of unique 
# values in height column
n = df.height.nunique()
 
print("No.of.unique values in height column :",
      n)

Output:

No.of.unique values in height column : 5

Method 3: Using Series.value_counts().

This method returns the count of all unique values in the specified column.

Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Example:

Python3

# import library
import pandas as pd
 
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164, 
              158, 167, 160,
              158, 165],
   
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
   
  'age' : [20, 22, 22, 
           21, 23, 22,
           20, 21]},
   
   index = ['Steve', 'Ria', 'Nivi', 
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
 
 
# getting the list of unique values
li = list(df.height.value_counts())
 
# print the unique value counts
print("No.of.unique values :",
      len(li))

Output:

No.of.unique values : 5

Also Read: How to split a string in C/C++, Python and Java?

How to Count Distinct Values of a Pandas

Strategy 1: Using for circle.

Leave a Reply Cancel reply