Blog Post Visitors Prediction

Total pool prize

Funding Starts
Regular Season
Last Submission
Time to finish Quarterfinals

This is the fallback for when the countdown is elapsed

* GMT-5 Time


Tournament brackets


How do the quarterfinals work?

This is where the excitement begins! The system will randomly assign pairs of competitors, in 4 groups of 2 people. At this point you start competing directly with your opponent, not with everyone else. This allows you to increase your adrenaline every time you submit a new solution, as you want to prove that you are the best in that bracket! A new dataset called QuarterfinalsTest.csv will be immediately enabled, with the samples you must now predict. This file will contain the original data plus the true labels from the regular season.

This is where we "reset" the competition, as you must re-train your model with more data, and make predictions on new data, all under the same data science problem. Our system will color yellow the best score between the two. Here you must take into account the timing, as the quarter finals will only last 1 week! At this point no new participants or new amounts of money will be admitted, here the final pool prize to be distributed among the winners will be defined.

Machine Learning Problem

Let's say you work as senior Data Scientist in a marketing agency that offers Search Engine Optimization SEO services for corporate clients. Within the SEO services the agency recommends to all its clients to publish a greater number of articles in their blog posts, and as a result have a bigger impact, because Google rewards the creation of unique and frequent content. 

So far, the agency has historical data on its clients' posts with a column that has recorded the number of unique and total visitors for each blog post. The agency believes that with this information it can predict whether a client's new post will be successful, as measured by the number of visits. 

This will allow the client to focus on creating relevant posts that are expected to have a high number of visitors (and impact), and avoid creating posts that don't have that same impact. This prediction will save valuable resources (like time and money) for the client and the agency.


The evaluation of the model will be done using the RMSLE (Root Mean Squared Logarithmic Error). What we do is calculate the Square Root to the MLSE metric that implements Scikit-learn. If you want to know more about the MLSE metric that Scikit Learn calculates, you can find it here: 


N = Number of rows in your submission file

= true values

= predicted values


  • The code should not be shared privately. Any code that is shared, must be available to all participants of the competition through the platform
  • The solution should use only publicly available open source libraries
  • If two solutions get identical scores in the ranking table, the tie-breaker will be the date and time of the submission (the first solution submitted will win).
  • We reserve the right to request any user's code at any time during a tournament. You will have 48 hours to submit your code following the code review rules.
  • We reserve the right to update these rules at any time.
  • Your solution must not infringe the rights of any third party and you must be legally authorized to assign ownership of all copyrights in and to the winning solution code to
  • Competitors may register and submit solutions as individuals (not as teams, at least for now).
  • Apart from the rules in the Terms of Use, no other particular rules apply.
  • Maximum 50 solutions submitted per day.

If you reach the Final Stage, at the end of this stage you must submit the complete model in .ipynb (Jupyter Notebook) format through our form (as an attachment) that we’ll display for you inside the platform- no other file formats or submission channels will be accepted. Normally, you'll have 3 days (Final Shot Stage) to send it through our "Submit Modal" button - This final machine learning model will help us to get the final evaluations, so the winners will be determined on the basis of the final score and this Notebook. 

Within our tournaments everybody wins!

Our tournaments are community-funded, this means that the money that will be distributed will be the total amount collected by the community within the established deadlines. In order to participate in the tournament, each competitor must contribute a sum ranging from $USD 10  to $USD 300.

What do you receive with your contribution?
  1. The Machine Learning models of the winners in a Jupyter Notebook format. 
    1. At the end of the tournament we will share with you the winning Machine Learning models in a Notebook format, for educational purposes, as you will be able to study them and learn from the best!
    2. This provides the transparency of the competition, as well as the proof-of-work of the winners!
  2. Learn competitive and applied Machine Learning in a real-world environment. You will learn about the process of participating in a tournament, do your best to advance to the different stages, and for sure you will get an adrenaline rush once you are competing in the playoffs!
  3. Show off your skills to recruiters: we will award certificates of achievement to those who reach the quarterfinals or higher. You can also share your public profile with recruiters, where they will see your achievements in the tournament.
  4. Measure your level of learning, and your skills mastery, through the scores you achieve
  5. And of course, the chance to win money!

The total amount raised will be distributed as follows
  • First Place: 50% of the total amount, and 20.000 points
  • Second Place: 30% of the total amount, and 15.000 points
  • Third Place: 20% of the total amount, and 10.000 points

For more info, please go to our tournament FAQs:

Total Pool Prize: $USD 105.0

The dataset contains historical information about the number of visitors to different blog posts, which are hosted on different websites. Each blog post has certain features, which will be used to make the predictions. 

The dataset is public, but due to transparency issues in the tournament, and in order to avoid possible cheating, we will not provide the original columns and further information about the data. The columns have been anonymized as C2, C3... C60, and the only column we keep original is the "target" column.

PS: remember to check the tournament rules, because if there is suspicion of cheating, you could be disqualified from the tournament and the platform.

Submission File

For each "id" in the test set, you must predict a number for the "target" variable. The file should contain a header and have the following format:


The total number of predicted rows you need to send in your submission file on this stage are: 3.985 rows

Datasets: one of the peculiarities of our tournaments is that each stage has new datasets (new observations). This means that the data science problem keeps the same, but we make a release of new observations, so the competitors must re-train the models based on them. This faithfully simulates the input of new real data, and the improvement of the model based on them. It also keeps the excitement high, as no one has their position completely won!

Release datasets: just as we will have a TrainQuarterfinals.csv or a TrainSemifinal.csv depending on the stage of the tournament, we will also make a TestRelease.csv with the true labels of the immediately previous stage. This allows us to keep the transparency of each stage, since each competitor can test his model with the true labels of that file

Quarterfinals datasets: if you advance to the quarterfinals, at this stage you will notice (when you're logged in) a dataset named RegularSeasonRelease.csv (blue button). This file contains the true values from the previous stage (Regular season). This is important because you can check that the scores you got in the platform are correct (following the evaluation metrics). And the second dataset to take into account is the new TrainQuarterfinals.csv. This file contains the Train.csv of the regular season added to the observations with the true labels of the Test.csv of the Regular Season. This means that now you have more data to re-train the model and fit it even more. And the other dataset is called TestQuarterfinals.csv which will now serve as a Test dataset. This file will give you the number of observations to send in your Submission file. You will also see the dataset called SampleSubmission.csv, which is the sample for sending your results to the platform.


Do you have any comments or questions about the competition?
Log In to Comment


Join our private community in Slack

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!

We'll send you an invitational link to your email immediatly.
arrow-up icon