Study Plan for Learning Data Science Over the Next 12 Months

Daniel Morales
Jan 12, 2021

Contents Outline

Study Plan for Learning Data Science Over the Next 12 Months

Jan 12, 2021 17 minutes read

Here you will find a study plan divided by semesters so that you can start on January 1st

We are ending 2020 and it is time to make plans for next year, and one of the most important plans and questions we must ask is what do we want to study?, what do we want to enhance?, what changes do we want to make?, and what is the direction we are going to take (or continue) in our professional careers?.

Many of you will be starting on the road to becoming a data scientist, in fact you may be evaluating it, since you have heard a lot about it, but you have some doubts, for example about the amount of job offers that may exist in this area, doubts about the technology itself, and about the path you should follow, considering the wide range of options to learn.

I’m a believer that we should learn from various sources, from various mentors, and from various formats. By sources I mean the various virtual platforms and face-to-face options that exist to study. By mentors I mean that it is always a good idea to learn from different points of view and learning from different teachers/mentors, and by formats I mean the choices between books, videos, classes, and other formats where the information is contained.

When we extract information from all these sources we reinforce the knowledge learned, but we always need a guide, and this post aims to give you some practical insights and strategies in this regard.

To decide on sources, mentors and formats it is up to you to choose. It depends on your preferences and ease of learning: for example, some people are better at learning from books, while others prefer to learn from videos. Some prefer to study on platforms that are practical (following online code), and others prefer traditional platforms: like those at universities (Master’s Degree, PHDs or MOOCs). Others prefer to pay for quality content, while others prefer to look only for free material. That’s why I won’t give a specific recommendation in this post, but I’ll give you the whole picture: a study plan.

To start you should consider the time you’ll spend studying and the depth of learning you want to achieve, because if you find yourself without a job you could be available full time to study, which is a huge advantage. On the other hand, if you are working, you’ll have less time and you’ll have to discipline yourself to be able to have the time available in the evenings, mornings or weekends. Ultimately, the important thing is to meet the goal of learning and perhaps dedicating your career to this exciting area!

We will divide the year into quarters as follows

  • First Quarter: Learning the Basics
  • Second Quarter: Upgrading the Level: Intermediate Knowledge
  • Third Quarter: A Real World Project — A Full-stack Project
  • Fourth Quarter: Seeking Opportunities While Maintaining Practice
First Quarter: Learning the Basics

Unsplash: @tateisimikito

If you want to be more rigorous you can have start and end dates for this period of study of the bases. It could be something like: From January 1 to March 30, 2021 as deadline. During this period you will study the following:

A programming language that you can apply to data science: Python or R.

We recommend Python due to the simple fact that approximately 80% of data science job offers ask for knowledge in Python. That same percentage is maintained with respect to the real projects you will find implemented in production. And we add the fact that Python is multipurpose, so you won’t “waste” your time if at some point you decide to focus on web development, for example, or desktop development. This would be the first topic to study in the first months of the year.

Familiarize yourself with statistics and mathematics.

There is a big debate in the data science community about whether we need this foundation or not. I will write a post later on about this, but the reality is that you DO need it, but ONLY the basics (at least in the beginning). And I want to clarify this point before continuing.

We could say that data science is divided in two big fields: Research on one side and putting Machine Learning algorithms into production on the other side. If you later decide to focus on Research then you are going to need mathematics and statistics in depth (very in depth). If you are going to go for the practical part, the libraries will help you deal with most of it, under the hood. It should be noted that most job offers are in the practical part.

For both cases, and in this first stage you will only need the basics of:

  • Statistics (with Python and NumPy)
  1. Descriptive statistics
  2. Inferential Statistics
  3. Hypothesis testing
  4. Probability
  • Mathematics (with Python and NumPy)
  1. Linear Algebra
  2. Multivariate Calculation
Note: We recommend that you study Python first before seeing statistics and mathematics, because the challenge is to implement these statistical and mathematical bases with Python. Don’t look for theoretical tutorials that show only slides or statistical and/or mathematical examples in Excel/Matlab/Octave/SAS and other different to Python or R, it gets very boring and impractical! You should choose a course, program or book that teaches these concepts in a practical way and using Python. Remember that Python is what we finally use, so you need to choose well. This advice is key so you don’t give up on this part, as it will be the most dense and difficult.

If you have these basics in the first three months, you will be ready to make a leap in your learning for the next three months.

Second Quarter: Upgrading the Level: Intermediate Knowledge

If you want to be more rigorous you can have start and end dates for this period of study at the intermediate level. It could be something like: From April 1 to June 30, 2021 as deadline.

Now that you have a good foundation in programming, statistics and mathematics, it is time to move forward and learn about the great advantages that Python has for applying data analysis. For this stage you will be focused on:

Data science Python stack

Python has the following libraries that you should study, know and practice at this stage

  • Pandas: for working with tabular data and make in-depth analysis
  • Matplotlib and Seaborn: for data visualization
Pandas is the in-facto library for data analysis, it is one of the most important (if not the most important) and powerful tools you should know and master during your career as a data scientist. Pandas will make it much easier for you to manipulate, cleanse and organize your data.

Feature Engineering

Many times people don’t go deep into Feature Engineering, but if you want to have Machine Learning models that make good predictions and improve your scores, spending some time on this subject is invaluable!

Feature engineering is the process of using domain knowledge to extract features from raw data using data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself. To achieve the goal of good feature engineering you must know the different techniques that exist, so it is a good idea to at least study the main ones.

Basic Models of Machine Learning

At the end of this stage you will start with the study of Machine Learning. This is perhaps the most awaited moment! This is where you start to learn about the different algorithms you can use, which particular problems you can solve and how you can apply them in real life.

The Python library we recommend you to start experimenting with ML is: scikit-learn. However it is a good idea that you can find tutorials where they explain the implementation of the algorithms (at least the simplest ones) from scratch with Python, since the library could be a “Black Box” and you might not understand what is happening under the hood. If you learn how to implement them with Python, you can have a more solid foundation.

If you implement the algorithms with Python (without a library), you will put into practice everything seen in the statistics, mathematics and Pandas part.

These are some recommendations of the algorithms that you should at least know in this initial stage

  • Supervised learning
    Simple Linear Regression
    Multiple Linear Regression
    K-nearest neighbors (KNN)
    Logistic Regression
    Decision Trees
    Random Forest
  • Unsupervised Learning
    K-Means
Bonus: if you have the time and you are within the time ranges, you can study these others

  • Gradient Boosting Algorithms
    GBM
    XGBoost
    LightGBM
    CatBoost
Note: do not spend more than the 3 months stipulated for this stage. Because you will be falling behind and not complying with the study plan. We all have shortcomings at this stage, it is normal, go ahead and then you can resume some concepts that did not understand in detail. The important thing is to have the basic knowledge and move forward!

If at least you succeed to study the mentioned algorithms of supervised and unsupervised learning, you will have a very clear idea of what you will be able to do in the future. So don’t worry about covering everything, remember that it is a process, and ideally you should have some clearly established times so that you don’t get frustrated and feel you are advancing.

So far, here comes your “theoretical” study of the basics of data science. Now we’ll continue with the practical part!

Third Quarter: A Real World Project — A Full-stack Project

If you want to be more rigorous you can have start and end dates for this period of study at the intermediate level. It could be something like: From July 1 to September 30, 2021 as deadline.

Now that you have a good foundation in programming, statistics, mathematics, data analysis and machine learning algorithms, it is time to move forward and put into practice all this knowledge.

Many of these suggestions may sound out of the box, but believe me they will make a big difference in your career as a data scientist.

The first thing is to create your web presence:

  • Create a Github (or GitLab) account, and learn Git. Being able to manage different versions of your code is important, you should have version control over them, not to mention that having an active Github account is very valuable in demonstrating your true skills. On Github, you can also set up your Jupyter Notebooks and make them public, so you can show off your skills as well. This is mine for example: https://github.com/danielmoralesp
  • Learn the basics of web programming. The advantage is that you already have Python as a skill, so you can learn Flask to create a simple web page. Or you can use a template engine like Github Pages, Ghost or Wordpress itself and create your online portfolio.
  • Buy a domain with your name. Something like myname.com, myname.co, myname.dev, etc. This is invaluable so you can have your CV online and update it with your projects. There you can make a big difference, showing your projects, your Jupyter Notebooks and showing that you have the practical skills to execute projects in this area. There are many front-end templates for you to purchase for free or for payment, and give it a more personalized and pleasant look. Don’t use free sub-domains of Wordpress, Github or Wix, it looks very unprofessional, make your own. Here is mine for example: https://www.danielmorales.co/
  • Add all the exercises and projects you have done so far, in the previous 6 months, to your online portfolio. You already have material to make yourself known, no matter how professional your Jupyter Notebooks look. My Jupyter Notebooks for now I’m uploading them here: https://www.narrativetext.co/
Choose a project you are passionate about and create a Machine Learning model around it.

The final goal of this third quarter is to create ONE project, that you are passionate about, and that is UNIQUE among others. It turns out that there are many typical projects in the community, such as predicting the Titanic Survivors, or predicting the price of Houses in Boston. Those kinds of projects are good for learning, but not for showing off as your UNIQUE projects.

If you are passionate about sports, try predicting the soccer results of your local league. If you are passionate about finance, try predicting your country’s stock market prices. If you are passionate about marketing, try to find someone who has an e-commerce and implement a product recommendation algorithm and upload it to production. If you are passionate about business: make a predictor of the best business ideas for 2021 :)

As you can see, you are limited by your passions and your imagination. In fact, those are the two keys for you to do this project: Passion and Imagination.

However don’t expect to make money from it, you are in a learning stage, you need that algorithm to be deployed in production, make an API in Flask with it, and explain in your website how you did it and how people can access it. This is the moment to shine, and at the same time it’s the moment of the greatest learning.

You will most likely face obstacles, if your algorithm gives 60% of Accuracy after a huge optimization effort, it doesn’t matter, finish the whole process, deploy it to production, try to get a friend or family member to use it, and that will be the goal achieved for this stage: Make a Full-stack Machine Learning project.

By full-stack I mean that you did all the following steps:

  • You got the data from somewhere (scrapping, open data or API)
  • You did a data analysis
  • You cleaned and transformed the data
  • You created Machine Learning Models
  • You deployed the best model to production for other people to use.
This does not mean that this whole process is what you will always do in your daily job, but it does mean that you will know every part of the pipeline that is needed for a data science project for a company. You will have a unique perspective!

Fourth Quarter: Seeking Opportunities While Maintaining Practice

If you want to be more rigorous you can have start and end dates for this period of study at the final level. It could be something like: From October 1 to December 31, 2021 as deadline.

Now you have theoretical and practical knowledge. You have implemented a model in production. The next step depends on you and your personality. Let’s say you are an entrepreneur, and you have the vision to create something new from something you discovered or saw an opportunity to do business with this discipline, so it’s time to start planning how to do it. If that’s the case, obviously this post won’t cover that process, but you should know what the steps might be (or start figuring them out).

But if you are one of those who want to get a job as a data scientist, here is my advice.

Getting a job as a data scientist

“You’re not going to get a job as fast as you think, if you keep thinking the same way”.
Author
It turns out that all people who start out as data scientists imagine themselves working for the big companies in their country or region. Or even remote. It turns out that if you aspire to work for a large company like data scientist you will be frustrated by the years of experience they ask for (3 or more years) and the skills they request.

Large companies don’t hire Juniors (or very few do), precisely because they are already large companies. They have the financial muscle to demand experience and skills and can pay a commensurate salary (although this is not always the case). The point is that if you focus there you’re going to get frustrated!

Here we must return to the following advise: “You need creativity to get a job in data science”.

Like everything else in life we have to start at different steps, in this case, from the beginning. Here are the scenarios

  • If you are working in a company and in a non-engineering role you must demonstrate your new skills to the company you are working for. If you are working in the customer service area, you should apply it to your work, and do for example, detailed analysis of your calls, conversion rates, store data and make predictions about it! If you can have data from your colleagues, you could try to predict their sales! This may sound funny, but it’s about how creatively you can apply data science to your current work and how to show your bosses how valuable it is and EVANGELIZE them about the benefits of implementation. You’ll be noticed and they could certainly create a new data related department or job. And you already have the knowledge and experience. The key word here is Evangelize. Many companies and entrepreneurs are just beginning to see the power of this discipline, and it is your task to nurture that reality.
  • If you are working in an area related to engineering, but that is not data science. Here the same applies as the previous example, but you have some advantages, and that is that you could access the company’s data, and you could use it for the benefit of the company, making analyses and/or predictions about it, and again EVANGELIZING your bosses your new skills and the benefits of data science.
  • If you are unemployed (or do not want, or do not feel comfortable following the two examples above), you can start looking outside, and what I recommend is that you look for technology companies and / or startups where they are just forming the first teams and are paying some salary, or even have options shares of the company. Obviously here the salaries will not be exorbitant, and the working hours could be longer, but remember that you are in the learning and practice stage (just in the first step), so you can not demand too much, you must land your expectations and fit that reality, and stop pretending to be paid $ 10,000 a month at this stage. But, depending of your country $1.000 USD could be something very interesting to start this new career. Remember, you are a Junior at this stage.
The conclusion is: don’t waste your time looking at and/or applying to offers from big companies, because you will get frustrated. Be creative, and look for opportunities in smaller or newly created companies.

Learning never stops

While you are in that process of looking for a job or an opportunity, which could take half of your time (50% looking for opportunities, 50% staying in practice), you have to keep learning, you should advance to concepts such as Deep Learning, Data Engineer or other topics that you feel were left loose from the past stages or focus on the topics that you are passionate about within this group of disciplines in data science.

At the same time you can choose a second project, and spend some time running it from end-to-end, and thus increase your portfolio and your experience. If this is the case, try to find a completely different project: if the first one was done with Machine Learning, let this second one be done with Deep learning. If the first one was deployed to a web page, that this second one is deployed to a mobile platform. Remember, creativity is the key!

Conclusion

We are at an ideal time to plan for 2021, and if this is the path you want to take, start looking for the platforms and media you want to study on. Get to work and don’t miss this opportunity to become a data scientist in 2021!

I hope you enjoyed this reading! you can follow me on twitter or linkedin

Thank you for reading!
Join our private community in Discord

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!