How to build your Ultimate Data Science Portfolios | A Great Data Scientist Builds Products that Matter

Vincent Tatan
Apr 30, 2020



Build your portfolio (Unsplash)

“I am going to build a fitness tracker to analyse my fitness/diet metrics”
“I have these Tableau dashboards that I worked on. No plan, just for fun”

A few days ago, I had a talk with a Data Scientist/ ML Engineer from a reputable startup. As a professional who just finished his studies, he is ambitious and enthusiastic. He asked for a meeting with me at Google and wrote down everything we talked about. He was really excited to share his Data Science projects and willing to pore through any interesting technical books to get him ahead in his side hustles and Kaggle competitions.

If he reminds you of yourself or your aspiring data scientist friends, you are not alone. In fact, I found many fellow juniors and professionals in data and tech industry who consistently use their free time/weekend to build their portfolio. Data Science is a fast moving industry with evolving trends in every single quarter. Your choice is to build or lose to your peers in terms of portfolio.

But have you asked yourself: “Is this the right approach?”

Don’t get me wrong, I admire talking to him with his boundless enthusiasm to build his side projects. He shared his interesting learning experiences and I had no doubts he will learn a lot more. But somehow the way he promoted his complex technical skills for his side projects does not make sense to me.

They lack impacts.

Build things that matter in your limited time


If we have the time to work on the best portfolios. Why do we spend the most time on showcasing our technical skills rather than the impacts that we made? Why do we prioritize working on Kaggle Competitions rather than solving the problems our friends have that data analytics can solve? Wouldn’t it be better to say that you help your peers earn money with your stocks picking model rather than you rank 30th in Kaggle Competitions?

This is exactly what, in my opinion, separates the good data scientists from the great data scientists.

Good vs Great Data Scientists: Build your Products with Impacts

A good data scientist has a large learning repository. He knows how to make beautiful dashboards. He builds better NN models to classify MNIST Dataset. He runs highly complex trading algorithms which would take a person years to learn.

This is good, but it is not enough to build impacts.

What you need to become a great data scientist is a great product and impact. Products indicate values that your users benefit from. It is an indicator that your skills brought impacts to society. Ultimately, when you go to a data science interview, you will need to prove that you can solve problems and bring values into the table.

Therefore, a great data scientist uses his dashboards to build predictions formula to stop CoronaVirus spreads to millions of people. A great data scientist uses his NN model to classify phishing attacks to protect millions users from hijacking. Inherently, a great data scientist has audiences, products, and impacts in his portfolio.

Your Products will be your ultimate Portfolios

When you communicate your products as a portfolio, you will become a Subject Matter Expert (SME) regardless of your educational background. Consequently, Analytics HR recruiters will headhunt you rather than you look for them. When you go for interviews, you will have exciting stories to tell rather a boring list of technical skills/certificates you have.

Build your ultimate portfolio by building your products, know your audiences and deliver impacts.

Three Keys to build your ultimate portfolios


1. Aim for simple solutions with impacts for your audiences


Who’s your audience. That is always the first question to ask (Unsplash)

The danger comes about when you are more interested in building your skills than building your audiences. Let me grab the guy that made our headline story. He planned to build a fitness tracker to analyse his own fitness/diet metrics. Great projects, but no impacts to anyone but himself.

Similarly, many aspiring data scientists/fellow juniors I know only focus on building complex models but do not bring values to their audiences. In the Netflix competition, the winning team does not consider the engineering efforts needed to implement their recommendation model. Although they won with stunning accuracy, their solution is too complex. As a result, Netflix wasted $1 million to award a machine learning model that they could not adopt.

Similarly, you should build your products with your audience’s needs in mind. This would force you to build a more realistic data analysis pipeline. You analyze real business problems, extract dirty data, clean up data, feature engineer, design, deploy, and maintain the model.

In university and other MOOC certifications, usually, the problem formulation and clean dataset were already handled for us. Therefore, building solutions for real audiences would be more challenging. But, in the long term, you will receive satisfaction building products which matter. During any data interviews, you amassed stories to promote your real project rather than boast on your technical skills and Kaggle Competitions.

2. Build solutions from your domain knowledge


Source (Unsplash)

I have talked to many juniors and professionals who are moving from different industries to data/tech industry. I had talked to a lady who enrolled in the National University of Singapore (NUS) PhD in Chemistry and came to me for advice to work as a Data Scientist in a tech industry. She has never worked in the tech industry but she wanted to learn python and develop her coding skills from scratch.

While it is admirable for people to bravely jump on different expertise, I would strongly advise her against this without a solid plan. After all, what kind of advantage she would have compared to thousands of Computer Science or Business Analytics Graduates that have undergone 4 years of their studies? How are you going to catch up with the skills and compete when there are already so many hypes in data analytics?

This is a huge leap of faith.
Therefore, I suggest her to stick with her studies in chemistry. I suggested her to be the best chemist first and ventured into analytics later. Why? It is because she does not need to start from scratch. She already has a large asset which is her domain knowledge in chemistry and she could just learn the necessary analytics skills needed to build chemical products and demonstrate her capabilities. She should not throw away her domain knowledge. She should use it as a platform to find her place in analytics.

Similarly, if you are a finance student, build stocks research tool. If you are an operation management/industrial engineer student, build six sigma optimization tools. That allows you to leverage your existing knowledge to create meaningful data analytics work than if you start learning analytics from scratch from day one.

Ideally, you should achieve the Pareto ratio, Most of the data analytics work is solvable using simple models such as linear regression and decision tree. If you use your existing domain knowledge right, 20% of your effort should already create 80% impact.

3. Deploy and Communicate your Solutions


My presentation about CNN models for Data Science Singapore at Google

If you want to become a great data scientist, you will need to publish your work. Open up your solutions to let other people use it. Let people contribute to your Github. Write and speak about it. The more people find values in your work, the more likely they spread your products. In the future, you would dive into building your portfolio and personal brands while educating others apart from your professional work.

In my case, I normally used Github for people to access my codes and online learning, Youtube/Medium to communicate my thoughts in writing and videos, and finally Heroku to launch my applications in Python. I received income and traffic from my solutions which reinforce the value of my knowledge to benefit others. Subsequently, building products allow me to take ownership in my work while building my personal branding and stories.

Conclusion


If you play it right, you would have lots of benefits. First, you would build your portfolio and have fun to see the impacts you make. Second, you promote self branding and online visibility for job and conference opportunities. Finally, you trained yourself to write and tell stories to inspire actions for your audiences. All of these benefits will open up opportunities for you to become a great data scientist.

In summary, to achieve these results, you will need to:
  1. Aim for simple solutions with impacts: Focus on building your audience rather than purely building your skills.
  2. Build solutions from your domain knowledge: Don’t start from scratch to pursue data science. Leverage your existing talents and domain knowledge to build data science products for your peers.
  3. Deploy and Communicate your solutions: Deploy and market your products. Have it used by many people and keep track of the impacts you made. It gives you further inspirations and stories to augment your learning as a great data scientist.

Soli Deo Gloria

Finally…


I really hope this has been a great read and a source of inspiration for you to develop and innovate.

Please Comment out below for suggestions and feedback. Just like you, I am still learning how to become a better Data Scientist and Engineer. Please help me improve so that I could help you better in my subsequent article releases.

Thank you and Happy coding :)

About the Author

Vincent Tatan is a Data and Technology enthusiast with relevant working experiences from Google LLC, Visa Inc. and Lazada to implement microservice architectures, business intelligence, and analytics pipeline projects.

Vincent is a native Indonesian with a record of accomplishments in problem-solving with strengths in Full Stack Development, Data Analytics, and Strategic Planning.

He has been actively consulting SMU BI & Analytics Club, guiding aspiring data scientists and engineers from various backgrounds, and opening up his expertise for businesses to develop their products .

Lastly, please reach out to Vincent viaLinkedIn, Medium or Youtube Channel

“How to build your Ultimate Data Science Portfolios | A Great Data Scientist Builds Products that Matter”
– Vincent Tatan twitter social icon Tweet


Share this article:

3 Comments
  1. Víctor Manuel
    Víctor Manuel
    about 1 year ago
    Correcto, lo mismo me sucede a mi, me falta poner en practica y con proyectos reales el conocimiento.
  2. team-latinmail-com
    team-latinmail-com
    about 1 year ago
    Hola Etienne, gracias por tu comentario! Es verdad, muchas veces nos centramos en mejorar tecnicamente, pero no en hacer proyectos de la vida real. El tema DevOps es bien interesante, y muchas cosas están pasando por ahi en la ciencia de datos. Saludos!
  3. Etienne Mercier
    Etienne Mercier
    about 1 year ago
    Muy interesante su post. Tengo dos diplomas universitarios en este tema y soy bastante junior en esto, aunque he hablado con otros científicos de datos y tienen opiniones similares: No hay gente resolviendo problemas de negocio. Me surge la duda con las DevOps, creo que ahí me falta tiempo también para hacer cosas interesantes. Tengo tengo cero experiencia en eso. Muchas Gracias por su publicación

Post a comment
Log In to Comment

Related Stories

Jul 23, 2021

Pandas vs SQL. When Data Scientists Should Use One Over the Other

A deep dive into the benefits of each toolTable of ContentsIntroductionPandasSQLSummaryReferencesIntroductionBoth of these tools are important to n...

Matt Przybyla
By Matt Przybyla
Jul 14, 2021

How To Write The Perfect Data Science CV

These tips are also applicable to Software Engineers. Make a few changes in your CV and land that job!Writing a good CV can be one of the toughest ...

Roman Orac
By Roman Orac
Jul 09, 2021

Separating Hype From Value In Artificial Intelligence

You've probably heard a lot about data science, artificial intelligence and big data. Frankly, there has been a lot of hype around these areas. Wha...

Daniel Morales
By Daniel Morales

Win USD $2,000 in cash prizes with our data science competition!

🎉 Model submissions for the "Keyword Recency Prediction" competition will close in

arrow-up icon