What's new in DataSource.ai?

Daniel Morales
Apr 16, 2021

What's new in DataSource.ai?

Apr 16, 2021 6 minutes read

Hello everyone, I'm Daniel Morales co-founder of DataSource.ai and today I want to share with you very good news and the new advances we are making in order to have an excellent data scientist community globally. 

New CEO


This is perhaps the most important news of all, as we have a new CEO who has been working on DataSource.ai for the last 3 months, understanding the internal processes, the competitions, the community and the technology we work with. This is a huge achievement since he is a person with extensive experience in tech companies and startups, with more than 20 years working for 500 fortune companies like IBM, Cisco and AT&T and is based in San Francisco, CA. heart of Silicon Valley. His name is Dimitry Kushelevsky, you can contact him on Linkedin, or email dimitry@datasource.ai

Our main goal with Dimitry is to fulfill our mission of democratizing Artificial Intelligence to small and medium businesses, as well as creating a great culture, having a non-technical leader to grow the team and having sponsors for data science competitions, bringing value to these companies with the results obtained from Machine Learning models sent by our community, and as a result having prize money, constantly and consistently for the whole community. Welcome Dimitry!


Competitions

So far we have held 6 competitions, and we are in the middle of the seventh competition. In a journey of more than a year we have learned tons about competitions, how they work in detail, how to host them, how to evaluate them, how to automate tasks, and much more. At the same time we have learned from you, from those who have won competitions, and have filled out our feedback forms. We would like to take this opportunity to thank you for this! 

Based on this knowledge we have made a number of changes that are worth sharing with you. 


Discussions within competitions




This is a feature that we were asked many times in the surveys, so it is now enabled for all competitions. If you have questions or comments you can post them here. 


Maximum of 50 submissions per day



When a competitor makes more than 50 submissions per day, the button will be disabled, as it has reached its maximum. 

If a competitor is sending this amount of models per day, he is probably doing it automatically, trying to overfit the Test.csv, which is not good for the competitor, nor for the other competitors. 

Remember to always choose your best models to send, so you don't have to wait until the next day. As an additional tip, we recommend that you make different splits of the data in the Train.csv, which in turn serve as Hold-out test sets, so that you can simulate new unseen data scenarios, and run the same competition evaluation metrics on them. This way you will be more confident about the possible results when you send the csv file to our platform.


Competition completion process


This is perhaps the most important change we have made within the platform, so pay close attention.

The normal process of participation within the competition is as follows:
  1. You download the Train.csv dataset.
  2. You make EDAs and build a base model
  3. Make a .predict on the dataset Test.csv
  4. Create a csv following the guidelines of the SampleSumbission.csv file.
  5. Upload the csv to our platform to obtain the score.
  6. You appear in the public leaderboard
  7. You continue to work the model with advanced techniques and test different models
  8. You repeat the submission process
  9. You get different scores (and you improve them).

This is the normal process, but it has the problem of overfitting. The model that has the best score, we can say that it has been overfitted to the data given in Test.csv. That is why we have decided to introduce a new dataset which will be released at the end of the competition, which will act as a "real life" dataset, on which the model has not been "overfitted". We will call this dataset FinalTest.csv. And the process to send the model is as follows:




  1. Once the date is reached (see competition timeline) the dataset called FinalTest.csv is enabled.
  2. You download them to your environment
  3. You choose your best model (the one that has given you the best score so far with Test.csv)
  4. And you do a .predict on FinalTest.csv
  5. Be careful because you will only have ONE chance to send this last model, so choose well.
  6. Create the csv following the guidelines of the SampleSumbission.csv file. In the final form (from the Submit Final Model button) you must include 
    1. The csv to obtain the score
    2. The .ipynb (Notebook)
    3. You will no longer need to send the Notebook to our email address.
  7. You will see your final score on the screen, but it will not be immediately reflected in the private leaderboard.
  8. You will have a period of one week to make this submission.
  9. At the end of this week, all scores will be revealed and the private leaderboard will be revealed.
  10. The private leaderboard is the one we will use for points, gifts and/or cash prizes.


Timeline of completion


Following the previous example, as you can see in this image the competition (a fictitious competition) started on March 22, ended on April 14, and will be completed on April 21. That means:

  1. Until April 14 you can use the Test.csv dataset to make your predictions and be on the public leaderboard.
  2. From the following day, and until one week later, is the final submission window. It closes definitively on April 21. In this period of time you must send your predictions on the FinalTest.csv dataset.

Late submissions




Some competitions will allow late participation for academic and learning purposes. If the competition is already over, it means that the respective prizes (points, prizes in kind or money) have already been assigned, so it is not possible to participate late to win them. 

But you can download the datasets, play with them, have fun, learn, send your results, get the scores and finally appear in the public table. This is a good way to keep practicing and demonstrating your data science skills!


Certificate of participation in the competitions




Within your Dashboard > My Profile > Certificates you can find the certificates of your participations. You will be awarded a certificate as long as you are in the first 10 places of the private leaderboard, and once the final date is reached, they are automatically assigned. These certificates can also be shared on your Linkedin as "certificates" automatically as well:



Haz click en “Add to profile” y aparecerá lo siguiente


And then you can share your achievements with recruiters!

Here is an example of the certificate in PDF



Your public profile


We have also changed a bit the public profiles, to see the participation in competitions, so you can show it to recruiters and the community.





We are currently working on other great opportunities within the competitions, so expect news soon!
Join our private community in Discord

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!