In this article, we’re going to explore some lesser-known but very useful pandas methods for manipulating Series objects. Some of these methods are related only to Series, the others — both to Series and DataFrames, having, however, specific features when used with both structure types.

# 1. is_unique

As its name sugests, this method checks if all the values of a Series are unique:

import pandas as pd print(pd.Series([1, 2, 3, 4]).is_unique) print(pd.Series([1, 2, 3, 1]).is_unique)Output:True False

# 2 & 3. is_monotonic and is_monotonic_decreasing

With these 2 methods, we can check if the values of a Series are in ascending/descending order:

print(pd.Series([1, 2, 3, 8]).is_monotonic) print(pd.Series([1, 2, 3, 1]).is_monotonic) print(pd.Series([9, 8, 4, 0]).is_monotonic_decreasing)Output:True False True

Both methods work also for a Series with string values. In this case, Python uses a lexicographical ordering under the hood, comparing two subsequent strings character by character. It’s not the same as just an alphabetical ordering, and actually, the example with the numeric data above is a particular case of such an ordering. As the Python documentation says,

Lexicographical ordering for strings uses the Unicode code point number to order individual characters.

In practice, it mainly means that the letter case and special symbols are also taken into account:

print(pd.Series(['fox', 'koala', 'panda']).is_monotonic) print(pd.Series(['FOX', 'Fox', 'fox']).is_monotonic) print(pd.Series(['*', '&', '_']).is_monotonic)Output:True True False

A curious exception happens when all the values of a Series are the same. In this case, both methods return True:

print(pd.Series([1, 1, 1, 1, 1]).is_monotonic) print(pd.Series(['fish', 'fish']).is_monotonic_decreasing)Output:True True

# 4. hasnans

This method checks if a Series contains NaN values:

import numpy as np print(pd.Series([1, 2, 3, np.nan]).hasnans) print(pd.Series([1, 2, 3, 10, 20]).hasnans)Output:True False

# 5. empty

Sometimes, we might want to know if a Series is completely empty, not containing even NaN values:

print(pd.Series().empty) print(pd.Series(np.nan).empty)Output:True False

A Series can become empty after some manipulations with it, for example, filtering:

s = pd.Series([1, 2, 3]) s[s > 3].emptyOutput:True

# 6 & 7. first_valid_index() and last_valid_index()

These 2 methods return index for first/last non-NaN value and are particularly useful for Series objects with many NaNs:

print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).last_valid_index())Output:2 4

If all the values of a Series are NaN, both methods return None:

print(pd.Series([np.nan, np.nan, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, np.nan]).last_valid_index())Output:None None

# 8. truncate()

This method allows truncating a Series before and after some index value. Let’s truncate the Series from the previous section leaving only non-NaN values:

s = pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]) s.truncate(before=2, after=4)Output:2 1.0 3 2.0 4 3.0 dtype: float64

The original index of the Series was preserved. We may want to reset it and also to assign the truncated Series to a variable:

s_truncated = s.truncate(before=2, after=4).reset_index(drop=True) print(s_truncated)Output:0 1.0 1 2.0 2 3.0 dtype: float64

# 9. convert_dtypes()

As the pandas documentation says, this method is used to

Convert columns to best possible dtypes using dtypes supporting pd.NA.

If to consider only Series objects and not DataFrames, the only application of this method is to convert all nullable integers (i.e. float numbers with a decimal part equal to 0, such as 1.0, 2.0, etc.) back to “normal” integers. Such float numbers appear when the original Series contains both integers and NaN values. Since NaN is a float in numpy and pandas, it leads to the whole Series with any missing values to become of float type as well.

Let’s take a look at the example from the previous section to see how it works:

print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan])) print('\n') print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).convert_dtypes())Output:0 NaN 1 NaN 2 1.0 3 2.0 4 3.0 5 NaN dtype: float64 0 <NA> 1 <NA> 2 1 3 2 4 3 5 <NA> dtype: Int64

# 10. clip()

We can clip all the values of a Series at input thresholds (lower and upper parameters):

s = pd.Series(range(1, 11)) print(s) s_clipped = s.clip(lower=2, upper=7) print(s_clipped)Output:0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 dtype: int64 0 2 1 2 2 3 3 4 4 5 5 6 6 7 7 7 8 7 9 7 dtype: int64

# 11. rename_axis()

In the case of a Series object, this method sets the name of the index:

s = pd.Series({'flour': '300 g', 'butter': '150 g', 'sugar': '100 g'}) print(s) s=s.rename_axis('ingredients') print(s)Output:flour 300 g butter 150 g sugar 100 g dtype: object ingredients flour 300 g butter 150 g sugar 100 g dtype: object

# 12 & 13. nsmallest() and nlargest()

These 2 methods return the smallest/largest elements of a Series. By default, they return 5 values, in ascending order for nsmallest() and in descending - for nlargest().

s = pd.Series([3, 2, 1, 100, 200, 300, 4, 5, 6]) s.nsmallest()Output:2 1 1 2 0 3 6 4 7 5 dtype: int64

It’s possible to specify another number of the smallest/largest values to be returned. Also, we may want to reset the index and assign the result to a variable:

largest_3 = s.nlargest(3).reset_index(drop=True) print(largest_3)Output:0 300 1 200 2 100 dtype: int64

# 14. pct_change()

For a Series object, we can calculate percentage change (or, more precisely, fraction change) between the current and a prior element. This approach can be helpful, for example, when working with time series, or for creating a waterfall chart in % or fractions.

s = pd.Series([20, 33, 14, 97, 19]) s.pct_change()Output:0 NaN 1 0.650000 2 -0.575758 3 5.928571 4 -0.804124 dtype: float64

To make the resulting Series more readable, let’s round it:

s.pct_change().round(2)Output:0 NaN 1 0.65 2 -0.58 3 5.93 4 -0.80 dtype: float64

# 15. explode()

This method transforms each list-like element of a Series (lists, tuples, sets, Series, ndarrays) to a row. Empty list-likes will be transformed in a row with NaN. To avoid repeated indices in the resulting Series, it’s better to reset index:

s = pd.Series([[np.nan], {1, 2}, 3, (4, 5)]) print(s) s_exploded = s.explode().reset_index(drop=True) print(s_exploded)Output:0 [nan] 1 {1, 2} 2 3 3 (4, 5) dtype: object 0 NaN 1 1 2 2 3 3 4 4 5 5 dtype: object

# 16. repeat()

This method is used for consecutive repeating each element of a Series a defined number of times. Also in this case, it makes sense to reset index:

s = pd.Series([1, 2, 3]) print(s) s_repeated = s.repeat(2).reset_index(drop=True) print(s_repeated)Output:0 1 1 2 2 3 dtype: int64 0 1 1 1 2 2 3 2 4 3 5 3 dtype: int64

If the number of repetitions is assigned to 0, an empty Series will be returned:

s.repeat(0)Output:Series([], dtype: int64)

# Conclusion

To sum up, we investigated 16 rarely used pandas methods for working with Series and some of their application cases. If you know some other interesting ways to manipulate pandas Series, you’re very welcome to share them in the comments.

“16 Underrated Pandas Series Methods And When To Use Them”– Elena Kosourova Tweet