The Python collections module has different specialized data types that function as containers and can be used to replace the general purpose Python containers (`dict`, `tuple`, `list` and `set`). We will study the following parts of this module:

- `ChainMap`
- `defaultdict`
- `deque`

There is a submodule of collections called abc or Abstract Base Classes. These will not be covered in this post, let's start with the ChainMap container!

ChainMap

A ChainMap is a class that provides the ability to link multiple mappings together so that they end up as a single unit. If you look at the documentation, you will notice that it accepts `**maps*`, which means that a ChainMap will accept any number of mappings or dictionaries and convert them into a single view that you can update. Let's see an example so you can see how it works:

from collections import ChainMap
car_parts = {'hood': 500, 'engine': 5000, 'front_door': 750}
car_options = {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300}
car_accessories = {'cover': 100, 'hood_ornament': 150, 'seat_cover': 99}
car_pricing = ChainMap(car_accessories, car_options, car_parts)

car_pricing
>> ChainMap({'cover': 100, 'hood_ornament': 150, 'seat_cover': 99}, {'A/C': 1000, 'Turbo': 2500, 'rollbar': 300}, {'hood': 500, 'engine': 5000, 'front_door': 750})
car_pricing['hood']
>> 500

Here we import ChainMap from our collections module. Next we create three dictionaries. Then we create an instance of our ChainMap by passing it the three dictionaries we just created.

Finally, we try to access one of the keys of our ChainMap. When we do this, the ChainMap will go through each map to see if that key exists and has a value. If it does, then the ChainMap will return the first value it finds that matches that key.

This is especially useful if you want to set default values. Suppose we want to create an application that has some default values. The application will also know the operating system environment variables. If there is an environment variable that matches one of the keys we have as a default in our application, the environment will override our default value. In addition, let's assume that we can pass arguments to our application.

These arguments take precedence over the environment and the defaults. This is one place where a ChainMap can really come in handy. Let's look at a simple example that is based on one from the Python documentation:

Note: do not run this code from Jupyter Notebook, but from your favorite IDE and calling it from a terminal. this command `python chain_map.py -u daniel`

import argparse
import os

from collections import ChainMap

def main():
    app_defaults = {'username':'admin', 'password':'admin'}

    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--username')
    parser.add_argument('-p', '--password')
    args = parser.parse_args()

    command_line_arguments = {key:value for key, value in vars(args).items() if value}

    chain = ChainMap(command_line_arguments, os.environ, app_defaults)
    print(chain['username'])

if __name__ == '__main__':
    main()
    os.environ['username'] = 'test'
    main()

➜  python python3 post.py -u daniel       
daniel
daniel

Let's break this down a bit. Here we import the Python `argparse` module along with the `os` module. We also import `ChainMap`.

Next we have a simple function that has some defaults. I have seen these defaults used for some popular routers. We then set up our argument parser and tell it how to handle certain command line options. You'll notice that argparse doesn't provide a way to get a dictionary object from its arguments, so we use a dict comprehension to extract what we need.

The other interesting piece here is the use of Python's built-in vars. If you called it without a vars argument it would behave like Python's built-in locales. But if you pass it an object, then vars is the equivalent of the `__dict__` property of object. In other words, vars(args) is equal to `args.__dict__`.

Finally we create our ChainMap by passing our command line arguments (if any), then the environment variables and finally the default values.

At the end of the code, we try calling our function, then setting an environment variable and calling it again. Try it and you will see that it prints admin and then tests as expected. Now let's try calling the script with a command line argument:

python chain_map.py -u daniel

When I run this on my machine, it returns daniel twice. This is because our command line argument overrides everything else. It doesn't matter what we set the environment to because our ChainMap will look at the command line arguments first before anything else. If you try it without the `-u daniel` it will run the actual arguments, in my case `"admin" "test"`.

Now that you know how to use ChainMaps, we can move on to Counter!

Counter

The collections module also provides us with a small tool that allows us to perform convenient and fast counting. This tool is called `Counter`. You can run it with most iterables. Let's try it with a string

from collections import Counter

Counter('superfluous')
>> Counter({'s': 2, 'u': 3, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})counter = Counter('superfluous')
counter['u']
>> 3

In this example, we import `Counter` from `collections` and pass it a string. This returns a Counter object which is a subclass of the Python dictionary. We then run the same command but assign it to the counter variable so that we can access the dictionary more easily. In this case, we have seen that the letter `"u"` appears three times in the example string.

The counter provides some methods that you may be interested in. For example, you can call elements which will get an iterator over the elements that are in the dictionary, but in an arbitrary order. This function can be considered as an "encoder", since the output in this case is an encoded version of the string.

list(counter.elements())
>> ['s', 's', 'u', 'u', 'u', 'p', 'e', 'r', 'f', 'l', 'o']

Another useful method is most_common. You can ask the Counter what are the most common elements by passing a number that represents what are the most recurring `"n"` elements:

counter.most_common(2)
[('u', 3), ('s', 2)]

Here we just asked our Counter which were the two most recurring items. As you can see, it produced a list of tuples that tells us that `"u"` occurred 3 times and `"s"` occurred twice.

The other method I want to cover is the subtract method. The `subtract` method accepts an iterable or a mapping and uses that argument to subtract. It's a little easier to explain if you see some code:

counter_one = Counter('superfluous')
counter_one
>> Counter({'s': 2, 'u': 3, 'p': 1, 'e': 1, 'r': 1, 'f': 1, 'l': 1, 'o': 1})

counter_two = Counter('super')
counter_one.subtract(counter_two)
counter_one
>> Counter({'s': 1, 'u': 2, 'p': 0, 'e': 0, 'r': 0, 'f': 1, 'l': 1, 'o': 1})

So here we recreate our first counter and print it out so we know what's in it. Thus we create our second Counter object. Finally we subtract the second counter from the first. If you look closely at the output at the end, you will notice that the number of letters in five of the elements has been decreased by one.

As I mentioned at the beginning of this section, you can use Counter against any iterable or mapping, so you don't have to use only strings. You can also pass tuples, dictionaries and lists to it.

Try it on your own to see how it works with those other data types. Now we are ready to move on to `defaultdict`!

`defaultdict`

The collections module has a handy tool called `defaultdict`. The `defaultdict` is a subclass of the Python dict that accepts a `default_factory` as its main argument. The `default_factory` is usually a Python data type, such as int or a list, but you can also use a function or a lambda. Let's start by creating a regular Python dictionary that counts the number of times each word is used in a sentence:

sentence = "The red for jumped over the fence and ran to the zoo for food"
words = sentence.split(' ')
words
>> ['The',
 'red',
 'for',
 'jumped',
 'over',
 'the',
 'fence',
 'and',
 'ran',
 'to',
 'the',
 'zoo',
 'for',
 'food']

reg_dict = {}
for word in words:
    if word in reg_dict:
        reg_dict[word] += 1
    else:
        reg_dict[word] = 1

print(reg_dict)
>> {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1}

Now let's try to do the same with defaultdict!

from collections import defaultdict
sentence = "The red for jumped over the fence and ran to the zoo for food"
words = sentence.split(' ')

d = defaultdict(int)
for word in words:
    d[word] += 1

print(d)
>> defaultdict(<class 'int'>, {'The': 1, 'red': 1, 'for': 2, 'jumped': 1, 'over': 1, 'the': 2, 'fence': 1, 'and': 1, 'ran': 1, 'to': 1, 'zoo': 1, 'food': 1})

You will notice right away that the code is much simpler. The defaultdict will automatically assign zero as a value to any key it doesn't already have in it. We add one to make it make more sense and it will also increment if the word appears multiple times in the sentence.

Now let's try using a Python list type as our `default_factory`. We will start with a regular dictionary, as before.

my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00), (345, 222.66), (678, 300.25), (1234, 35.67)]

reg_dict = {}
for acct_num, value in my_list:
    if acct_num in reg_dict:
        reg_dict[acct_num].append(value)
    else:
        reg_dict[acct_num] = [value]

If you run this code, you should get output similar to the following:

print(reg_dict)
>> {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]}

Now let's reimplement this code using defaultdict:

from collections import defaultdict

my_list = [(1234, 100.23), (345, 10.45), (1234, 75.00), (345, 222.66), (678, 300.25), (1234, 35.67)]

d = defaultdict(list)
for acct_num, value in my_list:
    d[acct_num].append(value)

Again, this eliminates the if/else conditional logic and makes the code easier to follow. Here is the output of the above code:

print(d)
>> defaultdict(<class 'list'>, {1234: [100.23, 75.0, 35.67], 345: [10.45, 222.66], 678: [300.25]})

This is a very good thing! Let's try using a `lambda` also as our `default_factory`!

from collections import defaultdict
animal = defaultdict(lambda: "Monkey")
animal
>> defaultdict(<function __main__.<lambda>()>, {})
animal['Sam'] = 'Tiger'
print (animal['Nick'])
>> Monkey
animal
>> defaultdict(<function __main__.<lambda>()>, {'Sam': 'Tiger', 'Nick': 'Monkey'})

Here we create a `defaultdict` that will assign `Monkey` as default value to any key. The first key is set to `Tiger`, and the next key is not set. If you print the second key, you will see that it has 'Monkey' assigned to it.

In case you haven't noticed yet, it is basically impossible to cause a KeyError as long as you set the `default_factory` to something that makes sense. The documentation mentions that if you set the `default_factory` to `None`, then you will receive a KeyError.

Let's see how that works:

from collections import defaultdict
x = defaultdict(None)
x['Mike']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-30-d21c3702d01d> in <module>
      1 from collections import defaultdict
      2 x = defaultdict(None)
----> 3 x['Mike']

KeyError: 'Mike'

In this case, we just created a `defaultdict` with an error. It can no longer assign a default value to our key, so it throws a `KeyError` instead. Of course, since it is a subclass of `dict`, we can simply set the key to some value and it will work. But that defeats the purpose of `defaultdict`.

`deque`

According to the Python documentation, deques "is a generalization of stacks and queues (stacks and queues)". It is pronounced `deck`, which is short for `double-ended queue`. They are a replacement container for the Python list. Deques are thread-safe and allow you to efficiently add and remove data from memory from either side of the deque.

A list is optimized for fast fixed-length operations. Full details can be found in the Python documentation. A deque accepts a `maxlen` argument that sets the limits for the deque. Otherwise, the deque will grow to an arbitrary size. When a bounded deque is full, any new elements added will cause the same number of elements to come out the other end.

As a general rule, if you need to add or remove elements quickly, use a deque. If you need quick random access use a list. Let's take a moment to see how a deque can be created and used.

from collections import deque
import string

d = deque(string.ascii_lowercase)

for letter in d:
    print(letter)

>> a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

Here we import the deque from our collections module and we also import the strings module. To create an instance of a deque, we need to pass it an iterable. In this case, we pass `string.ascii_lowerercase`, which returns a list of all the lowercase letters of the alphabet. Finally, we loop over our deque and print each element. Now let's look at some of the methods that deque has.

d.append('bye')
d
>> deque(['a',
       'b',
       'c',
       'd',
       'e',
       'f',
       'g',
       'h',
       'i',
       'j',
       'k',
       'l',
       'm',
       'n',
       'o',
       'p',
       'q',
       'r',
       's',
       't',
       'u',
       'v',
       'w',
       'x',
       'y',
       'z',
       'bye'])

d.appendleft('hello')
d
>> deque(['hello',
       'a',
       'b',
       'c',
       'd',
       'e',
       'f',
       'g',
       'h',
       'i',
       'j',
       'k',
       'l',
       'm',
       'n',
       'o',
       'p',
       'q',
       'r',
       's',
       't',
       'u',
       'v',
       'w',
       'x',
       'y',
       'z',
       'bye'])

d.rotate(1)
d
>> deque(['bye',
       'hello',
       'a',
       'b',
       'c',
       'd',
       'e',
       'f',
       'g',
       'h',
       'i',
       'j',
       'k',
       'l',
       'm',
       'n',
       'o',
       'p',
       'q',
       'r',
       's',
       't',
       'u',
       'v',
       'w',
       'x',
       'y',
       'z'])

Let's break this down a bit. First we add a chain to the right end of the deque. Then we add another string to the left side of the deque. Finally, we call `rotate` on our deque and pass it a one, which causes it to rotate once to the right.

In other words, it makes an element rotate from the far right and in front. You can pass it a negative number to make the deque rotate to the left instead.

Let's end this section with an example based on some of the Python documentation

from collections import deque
def get_last(filename, n=5):
    """
    Returns the last n lines from the file
    """
    try:
        with open(filename) as f:
            return deque(f, n)
    except OSError:
        print("Error opening file: {}".format(filename))
        raise

This code works much like the Linux tail program does. Here we pass a `filename` to our script along with the `n` number of lines we want it to return.

The deque is limited to whatever number we pass as `n`. This means that once the deque is full, when new lines are read and added to the deque, the oldest lines are pulled from the other end and discarded.

I have also wrapped the file opening with a simple exception handler because it is very easy to pass a malformed path. This will catch files that do not exist.

Conclusion

We've covered a lot of ground in this post. You learned how to use a defaultdict and a Count. We also learned about a subclass of the Python list, the deque. Finally, we saw how to use them to perform various activities. I hope you found each of these collections interesting. They may be of great use to you in your day-to-day work.

Most Related Articles

Programming

6 Advanced Statistical Concepts in Data Science

The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.In my previous articles Beginners Guide to Statistics in Data Science and The Inferential Statistics Data Scientists Should Know we have talked about almost all the basics(Descriptive and Inferential) of statistics which are commonly used in understanding and working with any data science case study. In this article, lets go a little beyond and talk about some advance concepts which are not part of the buzz.Concept #1 - Q-Q(quantile-quantile) PlotsBefore understanding QQ plots first understand what is a Quantile?A quantile defines a particular part of a data set, i.e. a quantile determines how many values in a distribution are above or below a certain limit. Special quantiles are the quartile (quarter), the quintile (fifth), and percentiles (hundredth).An example:If we divide a distribution into four equal portions, we will speak of four quartiles. The first quartile includes all values that are smaller than a quarter of all values. In a graphical representation, it corresponds to 25% of the total area of distribution. The two lower quartiles comprise 50% of all distribution values. The interquartile range between the first and third quartile equals the range in which 50% of all values lie that are distributed around the mean. In Statistics, A Q-Q(quantile-quantile) plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight(y=x).Q-Q plotFor example, the median is a quantile where 50% of the data fall below that point and 50% lie above it. The purpose of Q Q plots is to find out if two sets of data come from the same distribution. A 45-degree angle is plotted on the Q Q plot; if the two data sets come from a common distribution, the points will fall on that reference line.It’s very important for you to know whether the distribution is normal or not so as to apply various statistical measures on the data and interpret it in much more human-understandable visualization and their Q-Q plot comes into the picture. The most fundamental question answered by the Q-Q plot is if the curve is Normally Distributed or not.Normally distributed, but why?The Q-Q plots are used to find the type of distribution for a random variable whether it is a Gaussian Distribution, Uniform Distribution, Exponential Distribution, or even Pareto Distribution, etc. You can tell the type of distribution using the power of the Q-Q plot just by looking at the plot. In general, we are talking about Normal distributions only because we have a very beautiful concept of the 68–95–99.7 rule which perfectly fits into the normal distribution So we know how much of the data lies in the range of the first standard deviation, second standard deviation and third standard deviation from the mean. So knowing if a distribution is Normal opens up new doors for us to experiment with Types of Q-Q plots. Source Skewed Q-Q plotsQ-Q plots can find skewness(measure of asymmetry) of the distribution. If the bottom end of the Q-Q plot deviates from the straight line but the upper end is not, then the distribution is Left skewed(Negatively skewed).Now if upper end of the Q-Q plot deviates from the staright line and the lower is not, then the distribution is Right skewed(Positively skewed).Tailed Q-Q plotsQ-Q plots can find Kurtosis(measure of tailedness) of the distribution.The distribution with the fat tail will have both the ends of the Q-Q plot to deviate from the straight line and its centre follows the line, where as a thin tailed distribution will term Q-Q plot with very less or negligible deviation at the ends thus making it a perfect fit for normal distribution.Q-Q plots in Python(Source)Suppose we have the following dataset of 100 values:import numpy as np #create dataset with 100 values that follow a normal distribution np.random.seed(0) data = np.random.normal(0,1, 1000) #view first 10 values data[:10] array([ 1.76405235, 0.40015721, 0.97873798, 2.2408932 , 1.86755799, -0.97727788, 0.95008842, -0.15135721, -0.10321885, 0.4105985 ])To create a Q-Q plot for this dataset, we can use the qqplot() function from the statsmodels library:import statsmodels.api as sm import matplotlib.pyplot as plt #create Q-Q plot with 45-degree line added to plot fig = sm.qqplot(data, line='45') plt.show()In a Q-Q plot, the x-axis displays the theoretical quantiles. This means it doesn’t show your actual data, but instead, it represents where your data would be if it were normally distributed.The y-axis displays your actual data. This means that if the data values fall along a roughly straight line at a 45-degree angle, then the data is normally distributed.We can see in our Q-Q plot above that the data values tend to closely follow the 45-degree, which means the data is likely normally distributed. This shouldn’t be surprising since we generated the 100 data values by using the numpy.random.normal() function.Consider instead if we generated a dataset of 100 uniformly distributed values and created a Q-Q plot for that dataset:#create dataset of 100 uniformally distributed values data = np.random.uniform(0,1, 1000) #generate Q-Q plot for the dataset fig = sm.qqplot(data, line='45') plt.show()The data values clearly do not follow the red 45-degree line, which is an indication that they do not follow a normal distribution.Concept #2- Chebyshev's InequalityIn probability, Chebyshev’s Inequality, also known as “Bienayme-Chebyshev” Inequality guarantees that, for a wide class of probability distributions, only a definite fraction of values will be found within a specific distance from the mean of a distribution.Source: https://www.thoughtco.com/chebyshevs-inequality-3126547 Chebyshev’s inequality is similar to The Empirical rule(68-95-99.7); however, the latter rule only applies to normal distributions. Chebyshev’s inequality is broader; it can be applied to any distribution so long as the distribution includes a defined variance and mean.So Chebyshev’s inequality says that at least (1-1/k^2) of data from a sample must fall within K standard deviations from the mean (or equivalently, no more than 1/k^2 of the distribution’s values can be more than k standard deviations away from the mean).Where K --> Positive real numberIf the data is not normally distributed then different amounts of data could be in one standard deviation. Chebyshev’s inequality provides a way to know what fraction of data falls within K standard deviations from the mean for any data distribution.Also read: 22 Statistics Questions to Prepare for Data Science InterviewsCredits: https://calcworkshop.com/joint-probability-distribution/chebyshev-inequality/ Chebyshev’s inequality is of great value because it can be applied to any probability distribution in which the mean and variance are provided.Let us consider an example, Assume 1,000 contestants show up for a job interview, but there are only 70 positions available. In order to select the finest 70 contestants amongst the total contestants, the proprietor gives tests to judge their potential. The mean score on the test is 60, with a standard deviation of 6. If an applicant scores an 84, can they presume that they are getting the job?The results show that about 63 people scored above a 60, so with 70 positions available, a contestant who scores an 84 can be assured they got the job.Chebyshev's Inequality in Python(Source) Create a population of 1,000,000 values, I use a gamma distribution(also works with other distributions) with shape = 2 and scale = 2.import numpy as np import random import matplotlib.pyplot as plt #create a population with a gamma distribution shape, scale = 2., 2. #mean=4, std=2*sqrt(2) mu = shape*scale #mean and standard deviation sigma = scale*np.sqrt(shape) s = np.random.gamma(shape, scale, 1000000)Now sample 10,000 values from the population.#sample 10000 values rs = random.choices(s, k=10000)Count the sample that has a distance from the expected value larger than k standard deviation and use the count to calculate the probabilities. I want to depict a trend of probabilities when k is increasing, so I use a range of k from 0.1 to 3.#set k ks = [0.1,0.5,1.0,1.5,2.0,2.5,3.0] #probability list probs = [] #for each k for k in ks: #start count c = 0 for i in rs: # count if far from mean in k standard deviation if abs(i - mu) > k * sigma : c += 1 probs.append(c/10000)Plot the results:plot = plt.figure(figsize=(20,10)) #plot each probability plt.xlabel('K') plt.ylabel('probability') plt.plot(ks,probs, marker='o') plot.show() #print each probability print("Probability of a sample far from mean more than k standard deviation:") for i, prob in enumerate(probs): print("k:" + str(ks[i]) + ", probability: " \ + str(prob)[0:5] + \ " | in theory, probability should less than: " \ + str(1/ks[i]**2)[0:5])From the above plot and result, we can see that as the k increases, the probability is decreasing, and the probability of each k follows the inequality. Moreover, only the case that k is larger than 1 is useful. If k is less than 1, the right side of the inequality is larger than 1 which is not useful because the probability cannot be larger than 1.Concept #3- Log-Normal DistributionIn probability theory, a Log-normal distribution also known as Galton's distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y i.e, X = exp(Y), has a log-normal distribution. Skewed distributions with low mean and high variance and all positive values fit under this type of distribution. A random variable that is log-normally distributed takes only positive real values. The general formula for the probability density function of the lognormal distribution is:The location and scale parameters are equivalent to the mean and standard deviation of the logarithm of the random variable.The shape of Lognormal distribution is defined by 3 parameters:σis the shape parameter, (and is the standard deviation of the log of the distribution)θ or μ is the location parameter (and is the mean of the distribution)m is the scale parameter (and is also the median of the distribution)The location and scale parameters are equivalent to the mean and standard deviation of the logarithm of the random variable as explained above.If x = θ, then f(x) = 0. The case where θ = 0 and m = 1 is called the standard lognormal distribution. The case where θ equals zero is called the 2-parameter lognormal distribution.The following graph illustrates the effect of the location(μ) and scale(σ) parameter on the probability density function of the lognormal distribution: Source: https://www.sciencedirect.com/topics/mathematics/lognormal-distribution Log-Normal Distribution in Python(Source)Let us consider an example to generate random numbers from a log-normal distribution with μ=1 and σ=0.5 using scipy.stats.lognorm function.import numpy as np import matplotlib.pyplot as plt from scipy.stats import lognorm np.random.seed(42) data = lognorm.rvs(s=0.5, loc=1, scale=1000, size=1000) plt.figure(figsize=(10,6)) ax = plt.subplot(111) plt.title('Generate wrandom numbers from a Log-normal distribution') ax.hist(data, bins=np.logspace(0,5,200), density=True) ax.set_xscale("log") shape,loc,scale = lognorm.fit(data) x = np.logspace(0, 5, 200) pdf = lognorm.pdf(x, shape, loc, scale) ax.plot(x, pdf, 'y') plt.show()Concept #4- Power Law distributionIn statistics, a Power Law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another. For instance, considering the area of a square in terms of the length of its side, if the length is doubled, the area is multiplied by a factor of four.A power law distribution has the form Y = k Xα, where:X and Y are variables of interest,α is the law’s exponent,k is a constant.Source: https://en.wikipedia.org/wiki/Power_law Power-law distribution is just one of many probability distributions, but it is considered a valuable tool to assess uncertainty issues that normal distribution cannot handle when they occur at a certain probability.Many processes have been found to follow power laws over substantial ranges of values. From the distribution in incomes, size of meteoroids, earthquake magnitudes, the spectral density of weight matrices in deep neural networks, word usage, number of neighbors in various networks, etc. (Note: The power law here is a continuous distribution. The last two examples are discrete, but on a large scale can be modeled as if continuous).Also read: Statistical Measures of Central TendencyPower-law distribution in Python(Source) Let us plot the Pareto distribution which is one form of a power-law probability distribution. Pareto distribution is sometimes known as the Pareto Principle or ‘80–20’ rule, as the rule states that 80% of society’s wealth is held by 20% of its population. Pareto distribution is not a law of nature, but an observation. It is useful in many real-world problems. It is a skewed heavily tailed distribution.import numpy as np import matplotlib.pyplot as plt from scipy.stats import pareto x_m = 1 #scale alpha = [1, 2, 3] #list of values of shape parameters plt.figure(figsize=(10,6)) samples = np.linspace(start=0, stop=5, num=1000) for a in alpha: output = np.array([pareto.pdf(x=samples, b=a, loc=0, scale=x_m)]) plt.plot(samples, output.T, label='alpha {0}' .format(a)) plt.xlabel('samples', fontsize=15) plt.ylabel('PDF', fontsize=15) plt.title('Probability Density function', fontsize=15) plt.legend(loc='best') plt.show()Concept #5- Box cox transformationThe Box-Cox transformation transforms our data so that it closely resembles a normal distribution.The one-parameter Box-Cox transformations are defined as In many statistical techniques, we assume that the errors are normally distributed. This assumption allows us to construct confidence intervals and conduct hypothesis tests. By transforming your target variable, we can (hopefully) normalize our errors (if they are not already normal).Additionally, transforming our variables can improve the predictive power of our models because transformations can cut away white noise.Original distribution(Left) and near-normal distribution after applying Box cox transformation. Source At the core of the Box-Cox transformation is an exponent, lambda (λ), which varies from -5 to 5. All values of λ are considered and the optimal value for your data is selected; The “optimal value” is the one that results in the best approximation of a normal distribution curve. The one-parameter Box-Cox transformations are defined as:and the two-parameter Box-Cox transformations as:Moreover, the one-parameter Box-Cox transformation holds for y > 0, i.e. only for positive values and two-parameter Box-Cox transformation for y > -λ, i.e. negative values. The parameter λ is estimated using the profile likelihood function and using goodness-of-fit tests.If we talk about some drawbacks of Box-cox transformation, then if interpretation is what you want to do, then Box-cox is not recommended. Because if λ is some non-zero number, then the transformed target variable may be more difficult to interpret than if we simply applied a log transform.A second stumbling block is that the Box-Cox transformation usually gives the median of the forecast distribution when we revert the transformed data to its original scale. Occasionally, we want the mean and not the median.Box-Cox transformation in Python(Source)SciPy’s stats package provides a function called boxcox for performing box-cox power transformation that takes in original non-normal data as input and returns fitted data along with the lambda value that was used to fit the non-normal distribution to normal distribution.#load necessary packages import numpy as np from scipy.stats import boxcox import seaborn as sns #make this example reproducible np.random.seed(0) #generate dataset data = np.random.exponential(size=1000) fig, ax = plt.subplots(1, 2) #plot the distribution of data values sns.distplot(data, hist=False, kde=True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Non-Normal", color ="red", ax = ax[0]) #perform Box-Cox transformation on original data transformed_data, best_lambda = boxcox(data) sns.distplot(transformed_data, hist = False, kde = True, kde_kws = {'shade': True, 'linewidth': 2}, label = "Normal", color ="red", ax = ax[1]) #adding legends to the subplots plt.legend(loc = "upper right") #rescaling the subplots fig.set_figheight(5) fig.set_figwidth(10) #display optimal lambda value print(f"Lambda value used for Transformation: {best_lambda}") Concept #6- Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.In very simple terms, A Poisson distribution can be used to estimate how likely it is that something will happen "X" number of times. Some examples of Poisson processes are customers calling a help center, radioactive decay in atoms, visitors to a website, photons arriving at a space telescope, and movements in a stock price. Poisson processes are usually associated with time, but they do not have to be. The Formula for the Poisson Distribution Is:Where:e is Euler's number (e = 2.71828...)k is the number of occurrencesk! is the factorial of kλ is equal to the expected value of kwhen that is also equal to its varianceLambda(λ) can be thought of as the expected number of events in the interval. As we change the rate parameter, λ, we change the probability of seeing different numbers of events in one interval. The below graph is the probability mass function of the Poisson distribution showing the probability of a number of events occurring in an interval with different rate parameters. Probability Mass function for Poisson Distribution with varying rate parameters.Source The Poisson distribution is also commonly used to model financial count data where the tally is small and is often zero. For one example, in finance, it can be used to model the number of trades that a typical investor will make in a given day, which can be 0 (often), or 1, or 2, etc.As another example, this model can be used to predict the number of "shocks" to the market that will occur in a given time period, say over a decade.Poisson distribution in Pythonfrom numpy import random import matplotlib.pyplot as plt import seaborn as sns lam_list = [1, 4, 9] #list of Lambda values plt.figure(figsize=(10,6)) samples = np.linspace(start=0, stop=5, num=1000) for lam in lam_list: sns.distplot(random.poisson(lam=lam, size=10), hist=False, label='lambda {0}'.format(lam)) plt.xlabel('Poisson Distribution', fontsize=15) plt.ylabel('Frequency', fontsize=15) plt.legend(loc='best') plt.show()As λ becomes bigger, the graph looks more like a normal distribution.I hope you have enjoyed reading this article, If you have any questions or suggestions, please leave a comment. Also read: False Positives vs. False NegativesFeel free to connect me on LinkedIn for any query.Thanks for reading!!!Referenceshttps://calcworkshop.com/joint-probability-distribution/chebyshev-inequality/ https://corporatefinanceinstitute.com/resources/knowledge/data-analysis/chebyshevs-inequality/ https://www.itl.nist.gov/div898/handbook/eda/section3/eda3669.htm https://www.statology.org/q-q-plot-python/ https://gist.github.com/chaipi-chaya/9eb72978dbbfd7fa4057b493cf6a32e7 https://stackoverflow.com/a/41968334/7175247

Daniel Morales

Feb 16, 2021

Libraries

Pandas

4 Must-Know Python Pandas Functions for Time Series Analysis

Time series data consists of data points attached to sequential time stamps. Daily sales, hourly temperature values, and second-level measurements in a chemical process are some examples of time series data.Time series data has different characteristics than ordinary tabular data. Thus, time series analysis has its own dynamics and can be considered as a separate field. There are books over 500 pages to cover time series analysis concepts and techniques in depth.Pandas was created by Wes Mckinney to provide an efficient and flexible tool to work with financial data which is kind of a time series. In this article, we will go over 4 Pandas functions that can be used for time series analysis.We need data for the examples. Let’s start with creating our own time series data.import numpy as np import pandas as pd df = pd.DataFrame({ "date": pd.date_range(start="2020-05-01", periods=100, freq="D"), "temperature": np.random.randint(18, 30, size=100) + np.random.random(100).round(1) }) df.head()(image by author)We have created a data frame that contains temperature measurements during a period of 100 days. The date_range function of Pandas can be used for generating a date range with customized frequency. The temperature values are generated randomly using Numpy functions.We can now start on the functions.1. ShiftIt is a common operation to shift time series data. We may need to make a comparison between lagged or lead features. In our data frame, we can create a new feature that contains the temperature of the previous day.df["temperature_lag_1"] = df["temperature"].shift(1) df.head()(image by author)The scalar value passed to the shift function indicates the number of periods to shift. The first row of the new column is filled with NaN because there is no previous value for the first row.The fill_value parameter can be used for filling the missing values with a scalar. Let’s replace the NaN with the average value of the temperature column.df["temperature_lag_1"] = df["temperature"]\ .shift(1, fill_value = df.temperature.mean()) df.head()(image by author)If you are interested in the future values, you can shift backwards by passing negative values to the shift function. For instance, “-1” brings the temperature in the next day.2. ResampleAnother common operation performed on time series data is resampling. It involves in changing the frequency of the periods. For instance, we may be interested in the weekly temperature data rather than daily measurements.The resample function creates groups (or bins) of a specified internal. Then, we can apply aggregation functions to the groups to calculate the value based on resampled frequency.Let’s calculate the average weekly temperatures. The first step is to resample the data to week level. Then, we will apply the mean function to calculate the average.df_weekly = df.resample("W", on="date").mean() df_weekly.head()(image by author)The first parameter specifies the frequency for resampling. “W” stands for week, surprisingly. If the data frame does not have a datetime index, the column that contains the date or time related information needs to be passed to the on parameter.3. AsfreqThe asfreq function provides a different technique for resampling. It returns the value at the end of the specified interval. For instance, asfreq(“W”)returns the value on the last day of each week.In order to use the asfreq function, we should set the date column as the index of the data frame.df.set_index("date").asfreq("W").head()(image by author)Since we are getting a value at a specific day, it is not necessary to apply an aggregation function.4. RollingThe rolling function can be used for calculating moving average which is a highly common operation for time series data. It creates a window of a particular size. Then, we can use this window to make calculations as it rolls through the data points.The figure below explains the concept of rolling.(image by author)Let’s create a rolling window of 3 and use it to calculate the moving average.df.set_index("date").rolling(3).mean().head()(image by author)For any day, the values show the average of the day and the previous 2 days. The values of the first 3 days are 18.9, 23.8, and 19.9. Thus, the moving average on the third day is the average of these values which is 20.7.The first 2 values are NaN because they do not have previous 2 values. We can also use this rolling window to cover the previous and next day for any given day. It can be done by setting the center parameter as true.df.set_index("date").rolling(3, center=True).mean().head()(image by author)The values of the first 3 days are 18.9, 23.8, and 19.9. Thus, the moving average in the second day is the average of these 3 values. In this setting, only the first value is NaN because we only need 1 previous value.ConclusionWe have covered 4 Pandas functions that are commonly used in time series analysis. Predictive analytics is an essential part of data science. Time series analysis is at the core of many problems that predictive analytics aims to solve. Hence, if you plan to work on predictive analytics, you should definitely learn how to handle time series data.Thank you for reading. Please let me know if you have any feedback.Soner Yıldırım

Daniel Morales

Feb 16, 2021

Libraries

Pandas

16 Underrated Pandas Series Methods And When To Use Them

In this article, we’re going to explore some lesser-known but very useful pandas methods for manipulating Series objects. Some of these methods are related only to Series, the others — both to Series and DataFrames, having, however, specific features when used with both structure types.1. is_uniqueAs its name sugests, this method checks if all the values of a Series are unique:import pandas as pd print(pd.Series([1, 2, 3, 4]).is_unique) print(pd.Series([1, 2, 3, 1]).is_unique) Output: True False 2 & 3. is_monotonic and is_monotonic_decreasingWith these 2 methods, we can check if the values of a Series are in ascending/descending order:print(pd.Series([1, 2, 3, 8]).is_monotonic) print(pd.Series([1, 2, 3, 1]).is_monotonic) print(pd.Series([9, 8, 4, 0]).is_monotonic_decreasing) Output: True False TrueBoth methods work also for a Series with string values. In this case, Python uses a lexicographical ordering under the hood, comparing two subsequent strings character by character. It’s not the same as just an alphabetical ordering, and actually, the example with the numeric data above is a particular case of such an ordering. As the Python documentation says,Lexicographical ordering for strings uses the Unicode code point number to order individual characters.In practice, it mainly means that the letter case and special symbols are also taken into account:print(pd.Series(['fox', 'koala', 'panda']).is_monotonic) print(pd.Series(['FOX', 'Fox', 'fox']).is_monotonic) print(pd.Series(['*', '&', '_']).is_monotonic) Output: True True FalseA curious exception happens when all the values of a Series are the same. In this case, both methods return True:print(pd.Series([1, 1, 1, 1, 1]).is_monotonic) print(pd.Series(['fish', 'fish']).is_monotonic_decreasing) Output: True TrueAlso Read: 4 Must-Know Python Pandas Functions for Time Series Analysis4. hasnansThis method checks if a Series contains NaN values:import numpy as np print(pd.Series([1, 2, 3, np.nan]).hasnans) print(pd.Series([1, 2, 3, 10, 20]).hasnans) Output: True False5. emptySometimes, we might want to know if a Series is completely empty, not containing even NaN values:print(pd.Series().empty) print(pd.Series(np.nan).empty) Output: True FalseA Series can become empty after some manipulations with it, for example, filtering:s = pd.Series([1, 2, 3]) s[s > 3].empty Output: True 6 & 7. first_valid_index() and last_valid_index()These 2 methods return index for first/last non-NaN value and are particularly useful for Series objects with many NaNs:print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).last_valid_index()) Output: 2 4If all the values of a Series are NaN, both methods return None:print(pd.Series([np.nan, np.nan, np.nan]).first_valid_index()) print(pd.Series([np.nan, np.nan, np.nan]).last_valid_index()) Output: None None 8. truncate()This method allows truncating a Series before and after some index value. Let’s truncate the Series from the previous section leaving only non-NaN values:s = pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]) s.truncate(before=2, after=4) Output: 2 1.0 3 2.0 4 3.0 dtype: float64The original index of the Series was preserved. We may want to reset it and also to assign the truncated Series to a variable:s_truncated = s.truncate(before=2, after=4).reset_index(drop=True) print(s_truncated) Output: 0 1.0 1 2.0 2 3.0 dtype: float64Also Read: Pandas vs SQL. When Data Scientists Should Use One Over the Other9. convert_dtypes()As the pandas documentation says, this method is used toConvert columns to best possible dtypes using dtypes supporting pd.NA.If to consider only Series objects and not DataFrames, the only application of this method is to convert all nullable integers (i.e. float numbers with a decimal part equal to 0, such as 1.0, 2.0, etc.) back to “normal” integers. Such float numbers appear when the original Series contains both integers and NaN values. Since NaN is a float in numpy and pandas, it leads to the whole Series with any missing values to become of float type as well.Let’s take a look at the example from the previous section to see how it works:print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan])) print('\n') print(pd.Series([np.nan, np.nan, 1, 2, 3, np.nan]).convert_dtypes()) Output: 0 NaN 1 NaN 2 1.0 3 2.0 4 3.0 5 NaN dtype: float64 0 <NA> 1 <NA> 2 1 3 2 4 3 5 <NA> dtype: Int64 10. clip()We can clip all the values of a Series at input thresholds (lower and upper parameters):s = pd.Series(range(1, 11)) print(s) s_clipped = s.clip(lower=2, upper=7) print(s_clipped) Output: 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 dtype: int64 0 2 1 2 2 3 3 4 4 5 5 6 6 7 7 7 8 7 9 7 dtype: int64 11. rename_axis()In the case of a Series object, this method sets the name of the index:s = pd.Series({'flour': '300 g', 'butter': '150 g', 'sugar': '100 g'}) print(s) s=s.rename_axis('ingredients') print(s) Output: flour 300 g butter 150 g sugar 100 g dtype: object ingredients flour 300 g butter 150 g sugar 100 g dtype: object 12 & 13. nsmallest() and nlargest()These 2 methods return the smallest/largest elements of a Series. By default, they return 5 values, in ascending order for nsmallest() and in descending - for nlargest().s = pd.Series([3, 2, 1, 100, 200, 300, 4, 5, 6]) s.nsmallest() Output: 2 1 1 2 0 3 6 4 7 5 dtype: int64It’s possible to specify another number of the smallest/largest values to be returned. Also, we may want to reset the index and assign the result to a variable:largest_3 = s.nlargest(3).reset_index(drop=True) print(largest_3) Output: 0 300 1 200 2 100 dtype: int64Also Read: Pandas vs SQL. When Data Scientists Should Use One Over the Other14. pct_change()For a Series object, we can calculate percentage change (or, more precisely, fraction change) between the current and a prior element. This approach can be helpful, for example, when working with time series, or for creating a waterfall chart in % or fractions.s = pd.Series([20, 33, 14, 97, 19]) s.pct_change() Output: 0 NaN 1 0.650000 2 -0.575758 3 5.928571 4 -0.804124 dtype: float64To make the resulting Series more readable, let’s round it:s.pct_change().round(2) Output: 0 NaN 1 0.65 2 -0.58 3 5.93 4 -0.80 dtype: float64 15. explode()This method transforms each list-like element of a Series (lists, tuples, sets, Series, ndarrays) to a row. Empty list-likes will be transformed in a row with NaN. To avoid repeated indices in the resulting Series, it’s better to reset index:s = pd.Series([[np.nan], {1, 2}, 3, (4, 5)]) print(s) s_exploded = s.explode().reset_index(drop=True) print(s_exploded) Output: 0 [nan] 1 {1, 2} 2 3 3 (4, 5) dtype: object 0 NaN 1 1 2 2 3 3 4 4 5 5 dtype: object 16. repeat()This method is used for consecutive repeating each element of a Series a defined number of times. Also in this case, it makes sense to reset index:s = pd.Series([1, 2, 3]) print(s) s_repeated = s.repeat(2).reset_index(drop=True) print(s_repeated) Output: 0 1 1 2 2 3 dtype: int64 0 1 1 1 2 2 3 2 4 3 5 3 dtype: int64If the number of repetitions is assigned to 0, an empty Series will be returned:s.repeat(0) Output: Series([], dtype: int64) ConclusionTo sum up, we investigated 16 rarely used pandas methods for working with Series and some of their application cases. If you know some other interesting ways to manipulate pandas Series, you’re very welcome to share them in the comments.Thanks for reading!Also read: Using Python And Pandas Datareader to Analyze Financial Data

Daniel Morales

Feb 16, 2021

Python

Top 10 Python Extensions for Visual Studio Code

In this new post we want to talk about the most useful Python extensions for Visual Studio Code. Visual Studio Code is an integrated development environment created by Microsoft for Windows, Linux and macOS. Among its features are debugging, syntax highlighting, smart code completion, snippets, code refactoring and integrated Git. Users can change the theme, keyboard shortcuts, preferences and install extensions that add additional functionality.Precisely we are going to talk about the extensions you can install for VS. Here is a list of our favorites1- PythonLink: https://github.com/Microsoft/vscode-pythonPython extension for Visual Studio CodeA Visual Studio Code extension with rich support for the Python language (for all actively supported versions of the language: >=3.6), including features such as IntelliSense (Pylance), linting, debugging, code navigation, code formatting, refactoring, variable explorer, test explorer, and more!NOTE: Web support -- e.g., github.dev -- is limited.Installed extensionsThe Python extension will automatically install the Pylance and Jupyter extensions to give you the best experience when working with Python files and Jupyter notebooks. However, Pylance is an optional dependency, which means that the Python extension will remain fully functional if it is not installed. You can also uninstall it at the expense of some features if you are using a different language server.2- Python IndentLink: https://github.com/kbrose/vsc-python-indentIt is used to correct Python indentation in Visual Studio Code. How it worksEvery time you press the Enter key in a Python context, this extension will parse your Python file down to the location of your cursor, and determine exactly how much to indent the next line (or two in the case of hanging indents) and how much to indent nearby lines.There are three main cases when determining the correct indent. Review the documentation here: https://github.com/kbrose/vsc-python-indent3- Python Doctring GeneratorLink: https://github.com/NilsJPWerner/autoDocstringVisual Studio Code extension to quickly generate docstrings for python functions.FeaturesQuickly generate a docstring fragment that can be tabbed.Choose from several types of docstring formats.Infer parameter types via pep484 type hints, default values and var names.Support for args, kwargs, decorators, errors and parameter types.Docstring FormatsGoogle (default)docBlockrNumpySphinxPEP0257 (coming soon)UsageThe cursor must be on the line directly below the definition to generate a complete auto-populated docstring.Press enter after opening the docstring with triple quotes ("""" or ''')Keyboard shortcut: ctrl+shift+2 or cmd+shift+2 for macCan be changed in Preferences -> Keyboard shortcuts -> extension.generateDocstringCommand: Generate DocstringRight-click menu: Generate DocstringAlso read: 4 Must-Know Python Pandas Functions for Time Series Analysis4- Python ExtendedLink: https://github.com/tushortz/vscode-Python-ExtendedPython Extended is a vscode snippet that makes it easy to write Python code by providing completion options along with all arguments.UsageRun vscode and in a python file, type the name of the method to complete and press tab or enter on selection.How to installOpen vscode. Press F1, search for "ext install" followed by the extension name, in this case "ext install Python Extended" without the ">". Or if you prefer ">ext install", press enter, search for "Python Extended".5- Python PreviewLink: https://github.com/dongli0x00/python-previewA Visual Studio Code extension with debug preview support for the Python language.RequirementsInstall a version of Python 3.6 or Python 2.7. Make sure that the location of your Python interpreter is included in your PATH environment variable.It is best to install the Python extension for Python Intellisense.6- AREPL for PythonLink: https://github.com/almenon/arepl-vscodeAREPL automatically evaluates Python code in real time as you type.UsageFirst, make sure you have python 3.7 or higher installed.Open a python file and click on the cat in the top right bar to open AREPL. You can click the cat again to close it.Or run AREPL via the search command: control-shift-por use the shortcuts: control-shift-a (current document) / control-shift-q (new document)FeaturesReal-time evaluation: no need to run - AREPL evaluates your code automatically. You can control this (or even disable it) in the settings.Variable display: The final state of your local variables is displayed in a collapsible JSON format.Error display: The moment you make a mistake an error is displayed with the stack trace.Settings: AREPL offers many settings to suit your user experience. Customize the look and feel, bounce time, python options and much more.Aldo read: 3 Python Tricks That Will Improve Your Code7- Python PathLink: https://github.com/mgesbert/vscode-python-pathThis extension adds a set of tools to help generate internal import statements in a Python project.Features"Copy Python Path" is accessible from:Command lineExplorer context menuEditor context menuEditor title context menu8- Python Test ExplorerLink: https://github.com/kondratyev-nv/vscode-python-test-adapterThis extension allows you to run your Python Unittest, Pytest or Testplan tests with the Test Explorer user interface.How to get startedInstall the extensionConfigure Visual Studio Code to discover your tests (see the Configuration section and the documentation for the test framework of your choice:Unittest documentationPytest documentationTestplan documentationOpen the sidebar of the test viewExecute your tests via the Run icon in the Test ExplorerFeaturesDisplays a Test Explorer in the test view in the VS Code sidebar with all detected tests and suites and their statusConvenient error reporting during test detectionUnittest, Pytest and Testplan debuggingDisplays the log of a failed test when the test is selected in the explorerTest rerun when saving testsSupports multi-root workspacesSupports Unittest, Pytest and Testplan test frameworks and their plugins9- Python SnippetsLink: https://github.com/ylcnfrht/vscode-python-snippet-packA snippet package to make working with Python more productive This snippet package contains all of the following Python methodsall built-in python snippets and contains at least one example for each methodall python string snippets contain at least one example for each methodall python list snippets contain at least one example for each methodall Python set snippets contain at least one example for each methodall Python tuple snippets contain at least one example for each methodall python dictionary snippets contain at least one example for each methodAnd it contains many other code snippets (such as if/else, for, while, while/else, try/catch, file process, andclass snippets and class examples for oop (polymorphism, encapsulation, inheritance, etc.).If you don't use a method don't worry this extension contains a lot of code examples for each python method.This extension is not just a code snippet, it will also be useful for learning the python programming language.You will learn all python methods with a lot of code examples.For example, if you want to use the string replacement method, you just need to use .replace.But if you don't know how to use the replace method then use string.replace =>10- JupyterLink: https://github.com/Microsoft/vscode-jupyterA Visual Studio Code extension that provides basic notebook support for language kernels that are compatible with Jupyter Notebooks today. Many language kernels will work without any modifications. To enable advanced features, modifications to the VS Code language extensions may be necessary.Notebook supportThe Jupyter Extension uses VS code's built-in notebook support. This interface offers a number of advantages to notebook users:Out-of-the-box support for VS Code's wide range of basic code editing functions, such as hot output, search and replace, and code folding.Editor extensions such as VIM, bracket coloring, linters and many more are available while editing a cell.Deep integration with the general workbench and file-based features of VS Code, such as outline view (table of contents), breadcrumbs, and other operations.Fast load times for Jupyter notebook (.ipynb) files. Any notebook file is loaded and rendered as quickly as possible, while execution-related operations are initialized behind the scenes.Includes a notebook diff tool, which makes it easy to compare and visualize differences between code cells, results and metadata.Extensibility beyond what the Jupyter extension provides. Extensions can now add their own specific language or runtime to notebooks, such as the .NET and Gather interactive notebooks.Although the Jupyter extension comes with a comprehensive set of the most commonly used renderers for output, the marketplace supports installable custom renderers to make working with your notebooks even more productive. To get started writing your own, check out the VS Code renderer api documentation.You can also read data science posts in Spanish here.ConclusionThere are many extensions that you can use with your Visual Studio Code, and deciding which one to use will involve testing, reviewing utilities, use cases and so on in order to make your work easier while coding!Also read: Why Decorators In Python Are Pure Genius?

Daniel Morales

Feb 16, 2021

Understanding Python's Collections Module

Contents Outline

Daniel Morales

Understanding Python's Collections Module

ChainMap

Counter

`defaultdict`

`deque`

Conclusion

Related Posts

Categories

Join Competition

Daniel Morales

Daniel Morales

Daniel Morales

Daniel Morales

Understanding Python's Collections Module

Contents Outline

Social Sharing

Daniel Morales

ChainMap

Counter

`defaultdict`

`deque`

Conclusion

Related Posts

Categories

Join Competition

Most Related Articles

Daniel Morales

Daniel Morales

Daniel Morales

Daniel Morales