Pandas provides us with the support and tools to manage data in a general way. In this occasion we will talk about how to manage time data with pandas. To do this, we will be guided by the items that follow.
- The basic concepts that Pandas makes reference about time. These can be listed in 4 general concepts such as Date times,Time deltas,Time spans and Date offset.
- We will apply basic examples of the 4 concepts mentioned above.
- We will apply these concepts in Series and DataFrames of Pandas.
It is very important to define or have an idea of what each concept is about. For this, we can summarize each one of them in the following way:
Date times: This is composed by date and time and is of class Timestamp when it is a single value. On the other hand, if it is an array the Datetimes this would be of class DatetimeIndex.
Time deltas: This stores a time range, which can be used to do some operations with the Datetimes. In the same way, if it is a single value it would be of the Timedelta class or if it is an array it would be of the TimedeltaIndex class.
Time spans: These are associated to the periods and frequencies that we can define in Pandas. Likewise, the class would be Period if it is only one value. Otherwise, it would be PeriodIndex for an array of values.
Date offset: This could be similar to the Time deltas, where it stores a portion of time that could be used for arithmetic operations according to the calendar.
As we can see, many of these data handle the Index class when they are grouped in an array. This will be very useful at the time of indexing a Series or DataFrame of pandas.
The concepts previously mentioned can be summarized in the following table.
As a first step we'll create variables with the to_datetime and date_range methods. Then, we'll look at what class they are and how we can apply them to the Series and DataFrames.
Let's import Pandas and Numpy:
We create our date variable with our to_datetime method. Also, it's important to know that this method takes a date or an array of dates and/or other types of arguments.
As we can see the date is Timestamp type. However, we can also see that the date was not shown in the order we passed it. Since the month 07 , is shown as day 07. To give an order to our arguments, we can use format.
Also, we can use arguments like today.
Also, Datetimes array as arguments.
In the later output, we can see that the class is now Datetime Index because we pass the array as an argument. This same class can also be generated when we use date_range as a method.
This method allows us to originate an array of Datetimes. Also, we can see that it takes as an argument the period and by default takes the frequency per day. This can be changed by intervals of months, days, hours, seconds and microseconds etc.
Additionally, we can change the timezone.
Now let's place this array in a series of pandas.
If we want to change the frequency and add the average result at the same time, we can do it as follows.
On the other hand, we can apply this to a DataFrame as follows below.
Suppose we want to add a certain time value to the Timestamp of date.
For this we will use the Timedelta as follows.
We can use days and even minutes and seconds as an argument.
Another way to add some time value is with Dateoffset as mentioned above. This time we will create a series of Dateoffsets. Then we'll add a Datetime array and see the result of this operation.
The first Dateoffset object has an increment of 1 and the second of 2 respectively. We will see that by default this value is represented only by the offset of the day.
As a result we get the sum of one day for the first Datetime array and the sum of 2 days for the second Datetime array. Similarly, we can do it for months, days, hours, minutes and seconds.
In this case we'll talk about the Period_range method which can be used to create a Periodindex array by passing in the period and frequency as an argument.
Also, we can create Series from this Periodindex.
On the other hand, we can create our own PeriodIndex.
Finally, if we only want a value represented in Period, it can also be obtained in the following way.
Finally, we can conclude that there are several classes to manage time such as Datetime, Timedeltas, Timespans and Dateoffsets. We can use each of them and their combinations to operate with Series and DataFrames.
As reference we can see the Pandas documentation for Time series / date functionality in this link.
“How To Manage Time With Pandas”– Danilo Galindo Tweet