16 Underrated Pandas Series Methods And When To Use Them

Elena Kosourova
Aug 31, 2021


In this article, we’re going to explore some lesser-known but very useful pandas methods for manipulating Series objects. Some of these methods are related only to Series, the others — both to Series and DataFrames, having, however, specific features when used with both structure types.

1. is_unique


As its name sugests, this method checks if all the values of a Series are unique:

import pandas as pd
print(pd.Series([1, 2, 3, 4]).is_unique)
print(pd.Series([1, 2, 3, 1]).is_unique)


Output:
True
False

2 & 3. is_monotonic and is_monotonic_decreasing


With these 2 methods, we can check if the values of a Series are in ascending/descending order:

print(pd.Series([1, 2, 3, 8]).is_monotonic)
print(pd.Series([1, 2, 3, 1]).is_monotonic)
print(pd.Series([9, 8, 4, 0]).is_monotonic_decreasing)


Output:
True
False
True

Both methods work also for a Series with string values. In this case, Python uses a lexicographical ordering under the hood, comparing two subsequent strings character by character. It’s not the same as just an alphabetical ordering, and actually, the example with the numeric data above is a particular case of such an ordering. As the Python documentation says,

Lexicographical ordering for strings uses the Unicode code point number to order individual characters.

In practice, it mainly means that the letter case and special symbols are also taken into account:

print(pd.Series(['fox', 'koala', 'panda']).is_monotonic)
print(pd.Series(['FOX', 'Fox', 'fox']).is_monotonic)
print(pd.Series(['*', '&', '_']).is_monotonic)


Output:
True
True
False

A curious exception happens when all the values of a Series are the same. In this case, both methods return True:

print(pd.Series([1, 1, 1, 1, 1]).is_monotonic)
print(pd.Series(['fish', 'fish']).is_monotonic_decreasing)


Output:
True
True

4. hasnans


This method checks if a Series contains NaN values:

import numpy as np
print(pd.Series([1, 2, 3, np.nan]).hasnans)
print(pd.Series([1, 2, 3, 10, 20]).hasnans)


Output:
True
False

5. empty


Sometimes, we might want to know if a Series is completely empty, not containing even NaN values:

print(pd.Series().empty)
print(pd.Series(np.nan).empty)


Output:
True
False

A Series can become empty after some manipulations with it, for example, filtering:

s = pd.Series([1, 2, 3])
s[s > 3].empty


Output:
True

6 & 7. first_valid_index() and last_valid_index()


These 2 methods return index for first/last non-NaN value and are particularly useful for Series objects with many NaNs:

print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).first_valid_index())
print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).last_valid_index())


Output:
2
4

If all the values of a Series are NaN, both methods return None:

print(pd.Series([np.nan, np.nan, np.nan]).first_valid_index())
print(pd.Series([np.nan, np.nan, np.nan]).last_valid_index())


Output:
None
None

8. truncate()


This method allows truncating a Series before and after some index value. Let’s truncate the Series from the previous section leaving only non-NaN values:

s = pd.Series([np.nan, np.nan, 1, 2, 3, np.nan])
s.truncate(before=2, after=4)


Output:
2    1.0
3    2.0
4    3.0
dtype: float64

The original index of the Series was preserved. We may want to reset it and also to assign the truncated Series to a variable:

s_truncated = s.truncate(before=2, after=4).reset_index(drop=True)
print(s_truncated)


Output:
0    1.0
1    2.0
2    3.0
dtype: float64

9. convert_dtypes()


As the pandas documentation says, this method is used to

Convert columns to best possible dtypes using dtypes supporting pd.NA.

If to consider only Series objects and not DataFrames, the only application of this method is to convert all nullable integers (i.e. float numbers with a decimal part equal to 0, such as 1.0, 2.0, etc.) back to “normal” integers. Such float numbers appear when the original Series contains both integers and NaN values. Since NaN is a float in numpy and pandas, it leads to the whole Series with any missing values to become of float type as well.

Let’s take a look at the example from the previous section to see how it works:

print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]))
print('\n')
print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).convert_dtypes())


Output:
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    NaN
dtype: float64
0    <NA>
1    <NA>
2       1
3       2
4       3
5    <NA>
dtype: Int64

10. clip()


We can clip all the values of a Series at input thresholds (lower and upper parameters):

s = pd.Series(range(1, 11))
print(s)
s_clipped = s.clip(lower=2, upper=7)
print(s_clipped)


Output:
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64
0    2
1    2
2    3
3    4
4    5
5    6
6    7
7    7
8    7
9    7
dtype: int64

11. rename_axis()


In the case of a Series object, this method sets the name of the index:

s = pd.Series({'flour': '300 g', 'butter': '150 g', 'sugar': '100 g'})
print(s)
s=s.rename_axis('ingredients')
print(s)


Output:
flour     300 g
butter    150 g
sugar     100 g
dtype: object
ingredients
flour     300 g
butter    150 g
sugar     100 g
dtype: object

12 & 13. nsmallest() and nlargest()


These 2 methods return the smallest/largest elements of a Series. By default, they return 5 values, in ascending order for nsmallest() and in descending - for nlargest().

s = pd.Series([3, 2, 1, 100, 200, 300, 4, 5, 6])
s.nsmallest()


Output:
2    1
1    2
0    3
6    4
7    5
dtype: int64

It’s possible to specify another number of the smallest/largest values to be returned. Also, we may want to reset the index and assign the result to a variable:

largest_3 = s.nlargest(3).reset_index(drop=True)
print(largest_3)


Output:
0    300
1    200
2    100
dtype: int64

14. pct_change()


For a Series object, we can calculate percentage change (or, more precisely, fraction change) between the current and a prior element. This approach can be helpful, for example, when working with time series, or for creating a waterfall chart in % or fractions.

s = pd.Series([20, 33, 14, 97, 19])
s.pct_change()


Output:
0         NaN
1    0.650000
2   -0.575758
3    5.928571
4   -0.804124
dtype: float64

To make the resulting Series more readable, let’s round it:

s.pct_change().round(2)


Output:
0     NaN
1    0.65
2   -0.58
3    5.93
4   -0.80
dtype: float64

15. explode()


This method transforms each list-like element of a Series (lists, tuples, sets, Series, ndarrays) to a row. Empty list-likes will be transformed in a row with NaN. To avoid repeated indices in the resulting Series, it’s better to reset index:

s = pd.Series([[np.nan], {1, 2}, 3, (4, 5)])
print(s)
s_exploded = s.explode().reset_index(drop=True)
print(s_exploded)


Output:
0     [nan]
1    {1, 2}
2         3
3    (4, 5)
dtype: object
0    NaN
1      1
2      2
3      3
4      4
5      5
dtype: object

16. repeat()


This method is used for consecutive repeating each element of a Series a defined number of times. Also in this case, it makes sense to reset index:

s = pd.Series([1, 2, 3])
print(s)
s_repeated = s.repeat(2).reset_index(drop=True)
print(s_repeated)


Output:
0    1
1    2
2    3
dtype: int64
0    1
1    1
2    2
3    2
4    3
5    3
dtype: int64

If the number of repetitions is assigned to 0, an empty Series will be returned:

s.repeat(0)


Output:
Series([], dtype: int64)

Conclusion


To sum up, we investigated 16 rarely used pandas methods for working with Series and some of their application cases. If you know some other interesting ways to manipulate pandas Series, you’re very welcome to share them in the comments.

“16 Underrated Pandas Series Methods And When To Use Them”
– Elena Kosourova twitter social icon Tweet


Share this article:

0 Comments

Post a comment
Log In to Comment

Related Stories

Sep 25, 2021

10 Highly Probable Data Scientist Interview Questions

The popularity of data science attracts a lot of people from a wide range of professions to make a career change with the goal of becoming a data s...

Soner Yıldırım
By Soner Yıldırım
Sep 17, 2021

5 Google Chrome Extensions Every Data Scientist Should Know About

In this new post we will talk about the best Google Chrome extensions that as data scientists make certain tasks easier for us. You should at least...

Daniel Morales
By Daniel Morales
Sep 10, 2021

Data Scientists are Really Just Product Managers. Here’s Why.

Unpopular opinion?Table of ContentsIntroductionBusiness and Product UnderstandingStakeholder CollaborationSummaryReferencesIntroductionAs mentioned...

Matt Przybyla
By Matt Przybyla
Icon

Join our private community in Slack

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!

 
We'll send you an invitational link to your email immediatly.
arrow-up icon