Keyword Recency Prediction
Share:
USD $2,000

Keyword Recency Prediction

Image
Description

Welcome to our next exciting competition!For this challenge, we have teamed up with Battelle Memorial Institute - one of the most respected names i...

Prizes
There are TWO winners for this competition. Awarded on the basis of private leaderboard rank. 1st place: USD $1.5002nd place: USD $500For this comp...
Competitors
  • Choonghwan Lee
  • MatiSepul-en
  • EugeniaP-en
  • Pranjali
  • StalinJavier04-en
  • BrandonV-en
116 Competitors Published at: 07/01/2021
Total Prize
$2,000
graphical divider

Public Leaderboard


Ranking
Data Scientist
Country
# Submissions
Last submission
Best Score



Timeline

Begin
2021/07/03
Finish
2021/09/03
Complete
2021/09/10

Competition start: 2021/07/03 00:01:00
Competition closes on: 2021/09/03 23:59:00
Final Submission Limit: 2021/09/10 23:59:00

This competition has a total duration of 2 months, within which you will be able to make your submissions and obtain results automatically. Once the first part of the competition is over, you will have one week to choose your best model and submit it to be scored and considered for the cash prize. 


Description

Welcome to our next exciting competition!

For this challenge, we have teamed up with Battelle Memorial Institute - one of the most respected names in the global scientific & research community - to launch a Data Science competition that can help to dramatically accelerate the pace of global innovation. The goal of this project is to break down several barriers that currently stand in the way of advanced research publications getting noticed, and receiving prompt recognition from the world's brightest minds. This competition will also offer cash prizes to the authors of the top two ML models, as determined by our platform's evaluation algorithm. Please read on for more details, and good luck!

About Battelle (battelle.org)


Battelle is solving the world’s most pressing challenges. We deliver when others can’t. We conduct research and development, manage laboratories, design and manufacture products, and deliver critical services for our clients – whether you are a multi-national corporation, a small start-up organization or a government agency. We are valued for our independence and ability to innovate.

We are part of a community working to encourage the discovery of new and interesting research in Artificial Intelligence and Machine Learning, especially in languages other than English. Much of the research being done in these fields is easily available on the web through sites like Arxiv.org, but many interesting discoveries are happening every day in different corners of the internet that may take time to identify and bring to the attention of the rest of the community. 

This is especially true of research that is in a language other than English, which may easily missed by much of the community. We are passionate about finding the best current research, and identifying trends so that the cutting edge can continue to be pushed. In order to push in that direction, we devised a problem that attempts to measure when new ideas are being discussed, in any language. Based on a metric for recency of key words, how can we identify when a research paper is bringing forth new ideas so that we can better isolate them?
 
The Problem
The data is a collection of 42.912 abstracts from recent publications, along with the language and year of publication. The abstracts have author given keywords associated with them, and they have been given scores based on the average number of years that those keywords show up in our database. The goal of this competition is to build a model that is able to take in the abstract, the language, and the publication year, and predict the recency score. These models will be scored based on the accuracy of their predictions.


Evaluation

The evaluation of the model will be done using the RMSLE (Root Mean Squared Logarithmic Error). What we do is to calculate the Square Root to the MLSE metric that implements Scikit-learn.

If you want to know more about the MLSE metric that Scikit Learn calculates, you can find it here: https://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-log-error

Where:

RMSLE

N = Number of rows of the dataset Test.csv

 = true value

 = predicted value


Rules

Competition Rules

  1. The code should not be shared privately. Any code that is shared, must be available to all participants of the competition through the platform
  2. The solution should use only publicly available open source libraries
  3. If two solutions get identical scores in the ranking table, the tie-breaker will be the date and time of the submission (the first solution submitted will win).
  4. We reserve the right to request any user's code at any time during a challenge. You will have 72 hours to submit your code following the code review rules.
  5. We reserve the right to update these rules at any time.
  6. Your solution must not infringe the rights of any third party and you must be legally authorized to assign ownership of all copyrights in and to the winning solution code to the competition host/sponsor.
  7. Competitors may register and submit solutions as individuals (not as teams, at least for now).
  8. Apart from the rules in the DataSource.ai Terms of Use, general competition rules, and code requirement rules apply.
  9. Maximum 50 solutions submitted per day.
  10. The intent for winner's models and write-ups will be to publish it on the Neuralberry.org website

At the end of the competition you must submit the complete model in .ipynb (Jupyter Notebook) format - no other formats will be accepted. Normally, you'll have 1 week after the end of the competition to send it through our "Submit Final Model" button - This model will help us to get the real final evaluations, so the Private Leaderboard could change when the final private evaluation is shown.

Additionally, if you are on top 5 finalists, you have to meet
this code requirements to be elegible to win the cash prize.


There are TWO winners for this competition. Awarded on the basis of private leaderboard rank. 

  • 1st place: USD $1.500
  • 2nd place: USD $500

For this competition we want to give another a very special gift for the 3th and 4th places!

We will ship this prize to any country or city in the world! (made by https://www.devwear.co/)


*The hoddie is for men or women (Unisex)


Total Score Scale

These will be the awards in platform points once the competition is over:

  • 1st Place: 30.000 pts + USD $1.500
  • 2nd Place: 29.000 pts + USD $500
  • 3rd Place: 28.000 pts + Python Hoodie (Delivery to any city around the world)
  • 4th Place: 27.000 pts + Python Hoodie (Delivery to any city around the world)
  • 5th Place: 26.000 pts 
  • 6th Place: 25.000 pts 
  • 7th Place: 24.000 pts 
  • 8th Place: 23.000 pts
  • 9th Place: 22.000 pts 
  • 10th Place: 21.000 pts

Total Prize: $2,000


The data is a collection of 32.184 abstracts from recent publications, along with the language and year of publication. The abstracts have author given keywords associated with them, and they have been given scores based on the average number of years that those keywords show up in our database. The goal of this competition is to build a model that is able to take in the abstract, the language, and the publication year, and predict the recency score.

Data fields
  • Language: language in which the papers are written
  • Year: year of paper publication
  • Abstract: paper abstract
  • Title: paper title
Target var
  • total_rel_score: metric calculating recency

The total_rel_score was calculated using the year of publication of the paper and the year in which the paper's keyword first appeared in another document. Essentially a value close to 1 means that it is a recent paper (given its keywords), and a value close to 0 means that it is an older paper. The task is to predict this value for a given set of features (Language, Year, Abstract and Title).

Submission file
For each "id" in the test set, you must predict a label for the "total_rel_score" variable. The file should contain a header and have the following format:

id,total_rel_score
1,0.545714
2,0.635714
3,0.532713
4,0.335710
5,0.135714
6,0.535710
....
10725,0.187
10726,0.525
10727,0.014
10728,0.690

For this competition stage, you need to send your submission file with this details:

# of columns: 2
Column names: id,total_rel_score
# of rows: 10729


4 Comments
  1. ottob
    ottob
    25 days ago
    Personally, I'm trying to use Keras for Bag-of-words. I'll see if I can use other NLP advanced methods like BERT
  2. 5hr3ya5h
    5hr3ya5h
    25 days ago
    Which algos are you guys using?
  3. Daniel Morales
    Daniel Morales
    29 days ago
    Hola Santiago. Gracias por avisarnos. El problema ya fue solucionado. Deberia validar valores negativos y valores nulos automaticamente. El archivo que usted habia enviado, tenia un solo valor negativo, lo cambiamos a positivo y corrimos manualmente la metrica, dando como resultado: 0.13573531711075593 para dicho archivo. Sigue adelante, esperamos verte en los primeros lugares al finalizar la competición! 
  4. Santiago Serna
    Santiago Serna
    29 days ago
    Hola, hay un problema con la evaluación de la métrica, si hay algún valor negativo da como resultado 0.

Do you have any comments or questions about the competition?
Log In to Comment


Share this competition:

Other Competitions

Ready to start?

It's free! Just enter your name and email to join our global data science community, enter competitions, learn, have fun, and win cash prizes

You will be notified shortly about your successful registration.
deco-ring-1 decoration
deco-dots-3 decoration

Win USD $2,000 in cash prizes with our data science competition!

🎉 Model submissions for the "Keyword Recency Prediction" competition will close in

arrow-up icon