Αρχική Εγγραφή Συμμετεχόντων Εγγραφή Μεντόρων
Συνδεθείτε εδώ στις 28 Ιουνίου στις 15.30 για να παρακολουθήσετε την έναρξη του datathon και να θέσετε τις ερωτήσεις σας.
Την Τετάρτη 30 Ιουνίου στις 13.30 παρακολουθήστε εδώ τις παρουσιάσεις των διαγωνιζόμενων και τη βράβευση των νικητών.
Βράβευση Datathon
Προκλήσεις |
|
Challenge #1 - Fake news diffusion in Twitter
|
|
Category
|
|
Fake news, Social networks
|
|
Short Description
|
|
The target of this challenge is to show how fake news propagates through social networks. Fake news data from the GreekHoaxes website will be matched against Twitter data to extract tweet propagation graphs that demonstrate the fake news diffusion.
|
|
(Sub-)tasks
|
|
An important challenge in current research is the identification of fake news, as it appears in digital media. Several methods have been proposed that are based on machine learning techniques and use various characteristics of the text to identify fake content. Such fake content usually propagates in social media like Twitter, as users retweet the content of the original tweet to their followers. Some of the proposed methods use the Twitter propagation graphs of fake news to extract patterns that are subsequently used to identify fake tweets that follow similar diffusion patterns.
In this challenge the objective is to produce a Twitter diffusion graph for a fake news story. The data that will be provided is:
The desired outcome is a graph whose nodes correspond to Twitter users, and edges indicate retweets of tweets relevant to the given fake news story.
Participants are expected to provide a detailed step-by-step roadmap that will describe how to produce the propagation graph from the given input. This roadmap may include a mixture of pseudocode, code, selection of tools and instructions on how to use them, diagrams, and whatever else is needed to specify the solution completely and accurately. Using this roadmap, a coder should be in position to implement the solution without having to make any technical design decision.
Note. The roadmap should address the general case, where the fake news story is not known in advance. In this general case, fake news stories are scraped from a website such as GreekHoaxes and for each story a diffusion graph is produced.
|
|
Potential Considerations
|
|
[1] Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding high-quality content in social media. WSDM.
[2] Abel, F., Hauff, C., Houben, G., Stronkman, R., & Tao, K. (2012). Semantics + filtering + search = twitcident. exploring information in social web streams. HT.
[3] MacEachren, A.M., Jaiswal, A.R., Robinson, A.C., Pezanowski, S., Savelyev, A., Mitra, P., Zhang, X., & Blanford, J.I. (2011). SensePlace2: GeoTwitter analytics support for situational awareness. 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), 181-190.
|
|
Recommended technologies and tools
|
|
The participants could consider the following tools:
|
|
This challenge is supported the projects "Moving from Big Data Management to Data Science" (MIS 5002437/3) and by VisualFacts (#1614), funded by the Hellenic Foundation for Research and Innovation - 1st Call of Research Projects for the support of post-doctoral researchers. |
|
Challenge #2 - Scientific Impact Measures | |
Category | |
Citation Networks, Scientometrics | |
Short Description | |
The target of this challenge is to produce datasets that contain scientific impact data for scientific venues or papers , focusing on specific challenges. Publication data from Crossref will be combined with BiP! DB data, to calculate the proposed scientific impact measures. | |
(Sub-)tasks | |
Due to the constant increase of published scientific papers, an important challenge in current research is to identify those with the highest impact among them. Several methods to rank papers, as well as authors, and journals, based on scientific impact have been proposed in the literature, most of them focusing on the use of citation counts. However, relying on citation counts has inherent drawbacks: for example, citation patterns differ between various scientific fields, rendering the comparison of papers based on them difficult. To this aim, a number of field-normalized indicators of impact have been proposed, such as the Field Weighted Citation Index (FWCI). However, further drawbacks remain, e.g., recent publications with no, or few citations cannot reliably be assessed based on their impact. Additionally, a number of indicators of impact based on the venue of publication, such as the Impact Factor, or the Eigenfactor have been proposed. These indicators, have in turn been criticized due to being heavily influenced by few highly cited papers in each venue.
Distinct aims of this challenge that may be tackled are the following:
The data that will be provided is:
The desired outcome is an open dataset of either paper or venue impact measures, along with the software to produce it. Alternatively, participants may provide a detailed step-by-step roadmap that will describe how to produce the data from the given input. This roadmap may include a mixture of pseudocode, code, selection of tools and instructions on how to use them, diagrams, and whatever else is needed to specify the solution completely and accurately. Using this roadmap, a coder should be in position to implement the solution without having to make any technical design decision.
|
|
Potential Considerations | |
[1] Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Impact-based ranking of scientific publications: a survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering. 2019 Sep 13.
[2] Vergoulis T, Kanellos I, Atzori C, Mannocci A, Chatzopoulos S, Bruzzo SL, Manola N, Manghi P. Bip! db: A dataset of impact measures for scientific publications. InCompanion Proceedings of the Web Conference 2021 2021 Apr 19 (pp. 456-460).
[3] Lariviere V, Sugimoto CR. The journal impact factor: A brief history, critique, and discussion of adverse effects. InSpringer handbook of science and technology indicators 2019 (pp. 3-24). Springer, Cham.
[4] https://www.snowballmetrics.com/wp-content/uploads/snowball-metrics-recipe-book-upd.pdf
|
|
Recommended technologies and tools | |
The participants could consider the following tools:
|
|
This challenge is supported the projects "Moving from Big Data Management to Data Science" (MIS 5002437/3) and by VisualFacts (#1614), funded by the Hellenic Foundation for Research and Innovation - 1st Call of Research Projects for the support of post-doctoral researchers.
|
|
Challenge #3 - Visual Analytics for Mobility Data | |
Category | |
Data visualization, Visual Analytics | |
Short Description | |
The target of this challenge is to analyze the NYC Yellow Taxi Trip Dataset (Taxi) [1] and provide novel interactive visualization methods and visual analytics for the following analytical tasks. | |
(Sub-)tasks | |
Participants are supposed to address the following tasks, however other processing and pipelines can be considered as well.
Task 1. Visual Analytics for Traffic Prediction
Note. The data prediction and visualization are considered to be used in interactive applications, so real-time response is required. The focus of this task is on novel visualization methods that will allow users to use\run well-known prediction algorithms (e.g., regression) and interact with the predicted results.
Task 2. Visual Analytics for Dirty Data
Taxi and Other Mobility companies aggregate data from various sources, ending in having dirty\duplicate entries related to trips. A common task in such cases involves an analyst which wishes to analyze the dataset w.r.t. data quality. The use of visual techniques may reveal information (e.g., correlations, patterns) which are not easily captured by traditional (non-visual) methods. In our case, visually analyze "information" related to duplicate entries, will assist the analyst to recognize data patterns, values, or specific attributes, where duplicates records appear. Beyond the insights related to data quality, using these insights will enable the analyst to improve the effectiveness of their duplication techniques.
Note. The duplicates records will be given to the participants beforehand.
Note. The data prediction and visualization are considered to be used in interactive applications, so real-time response is required. |
|
Potential Considerations | |
[1] NYC Yellow Taxi Trip dataset (https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page), are CSV files, containing information regarding yellow taxi rides in NYC. Each object refers to a specific taxi ride described by several attributes, such as pick-up location, trip distance, payment type, passenger count, tip amount, etc.
An example of Taxi data visualization on a map can be found in RawVis system (http://rawviz.imis.athena-innovation.gr), which has been implemented to visualized raw data, e.g., csv. [2] https://eng.uber.com/h3/
[3] https://eng.uber.com/visualizing-city-cores-with-h3/
[4] Taxi dataset has been generated by aggregating data from several data sources. As a result, there are a large number of records (rides) that are (almost the) same. For example, two same rides may have different level of precision at latitude/longitude values.
|
|
Recommended technologies and tools | |
The participants are encouraged to use either
|
|
This challenge is supported the projects "Moving from Big Data Management to Data Science" (MIS 5002437/3) and by VisualFacts (#1614), funded by the Hellenic Foundation for Research and Innovation - 1st Call of Research Projects for the support of post-doctoral researchers.
|
Επιτροπές |
|
ΕΠΙΣΤΗΜΟΝΙΚΗ ΕΠΙΤΡΟΠΗ Datathon Ημέρες Δράσης | |
Αρμοδιότητες: Καθορισμός Προκλήσεων - Προβλημάτων προς Επίλυση - Μότο - Προφίλ συμμετεχόντων |
|
ΕΠΙΤΡΟΠΗ ΕΠΙΛΟΓΗΣ Datathon Ημέρες Δράσης (TBC) | |
|
Βραβεία |
|
1ο Βραβείο: Τριήμερο ταξίδι στη Θράκη για όλη την ομάδα των νικητών! Ξενάγηση στις εγκαταστάσεις του ΕΚ “Αθηνά” στην Ξάνθη και την Αθήνα, Mentoring για έναν (1) μήνα |
|
2ο Βραβείο: 1 Laptop Ξενάγηση στις εγκαταστάσεις του ΕΚ “Αθηνά” στην Αθήνα, Mentoring για έναν (1) μήνα |
|
3ο Βραβείο: 1 tablet Ξενάγηση στις εγκαταστάσεις του ΕΚ “Αθηνά” στην Αθήνα, Mentoring για έναν (1) μήνα |