Datasets and Benchmarks Track: Call for Papers

Background

As the premier international forum for data mining researchers and practitioners from academia, industry, and government, the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-2025) is excited to announce the launch of the Datasets and Benchmarks Track. This new track aims to serve as a premier venue for the presentation of high-quality datasets, benchmarks, and tools that are essential for advancing research and applications in data science, data mining, and data-centric machine learning. It also provides a forum for discussing best practices and standards for dataset creation and benchmark development, ensuring ethical and responsible use.

Important Dates

Abstract Deadline: Feb 17, 2025
Paper Deadline: Feb 24, 2025
Notification: May 16, 2025
Camera-ready: TBD

All deadlines are end-of-day in the Anywhere on Earth (AoE) time zone.

Submission Site

We will use OpenReview to manage the submissions and reviewing. Submissions will not be made public on OpenReview during the reviewing period.

All listed authors must have an up-to-date OpenReview profile. Here is information on how to create an OpenReview profile. Note OpenReview’s moderation policy for newly created profiles:

New profiles created without an institutional email will go through a moderation process that can take up to two weeks.
New profiles created with an institutional email will be activated automatically.

The OpenReview profile will be used to handle conflict of interest and paper matching. Incomplete OpenReview profile is sufficient ground for desk rejection.

To be considered complete, each author profile must be properly attributed with the following mandatory fields: current and past institutional affiliation (going back at least 5 years), homepage, DBLP (if there is prior publication), ORCID, Advisors and Recent Publications (if any). In addition, other fields such as Google Scholar, LinkedIn, Semantic Scholar, Advisees and Other Relations should be entered wherever applicable. Abstracts and papers can be submitted through OpenReview.

Objective

The Datasets and Benchmarks Track is dedicated to fostering the development, sharing, and evaluation of datasets and benchmarks that are valuable for the KDD community. We seek contributions that introduce novel datasets, propose new benchmarks, or offer tools and methodologies for dataset creation, curation, and evaluation. The track supports open science by encouraging the submission of open-source libraries and tools that accelerate research in data science and machine learning.

Evaluation Criteria

Submissions will be reviewed with the same rigor as the main KDD conference but tailored to the specific needs and challenges of datasets and benchmarks. The key evaluation criteria include:

Accessibility: Datasets should be easily accessible to the research community without requiring personal requests. Any associated code or tools should be open source and well-documented.
Quality and Documentation: Clear, detailed descriptions of how data was collected, curated, and organized are required. Documentation should include metadata, data collection methods, and any preprocessing steps.
Impact: The datasets and benchmarks should demonstrate potential to advance research by addressing gaps, enabling new studies, or enhancing reproducibility and generalizability. Submissions should clearly outline how they can influence future work and contribute significantly to the relevant field.
Ethics and Fairness: Submissions must address ethical considerations, including data privacy, consent, bias, and potential misuse.

Relationship to KDD

Submissions to the track will be part of the main KDD conference, presented alongside the main conference papers. Accepted papers will be officially published in the KDD proceedings.

Scope of Submissions

We welcome submissions in the following categories:

New Datasets: Original datasets or thoughtfully designed (collections of) datasets based on previously available data, to fill critical gaps, address real-world challenges, or offer unique characteristics that push the boundaries of data science and machine learning research.
Benchmarks and Benchmarking Tools: New benchmarks, including evaluation methodologies or frameworks, provide standardized ways to assess model performance on various tasks. Tools for benchmarking model performance across different datasets or domains.
Data Generators and Environments: Tools, libraries, or platforms that facilitate the creation of synthetic data or offer new environments for training machine learning models.
Data-Centric AI Methods and Tools: Innovations in methods or tools that enhance data quality, utility, or management, and provide new insights into data-centric AI.
Advanced Data Collection and Curation Practices: Techniques or methodologies that enhance data collection, organization, and curation, even if the dataset itself cannot be shared.
Responsible Dataset Development: Frameworks or methodologies for auditing datasets, identifying significant biases, and developing datasets responsibly.
Competitions and Challenges: In-depth analyses of challenges or competitions (e.g., KDD Cup) that provide valuable insights into dataset use and model evaluation.

Submission Guidelines

Authorship: The ACM has an authorship policy stating who can be considered an author in a submission as well as the use of generative AI tools. Please note the disclosure requirements as they will be strictly enforced. Every person named as the author of a paper must have contributed substantially to the work described in the paper and/or to the writing of the paper and must take responsibility for the entire content of a paper. Persons who do not meet these requirements may be acknowledged, but should not be listed as authors. Any use of generative AI tools must be disclosed and elaborated in the submission form.
- Maximum authorship. In the Datasets & Benchmark track, the number of submissions allowed per author is limited to a maximum of two. If more than two papers are submitted with the same person listed as an author, the additional papers submitted after the first two by submission id, will be desk-rejected.
- Authorship changes. The full list of author names, including the ordering, must be finalized by submission deadline. There cannot be any addition, removal, or reordering of authors after the submission deadline. The only changes allowed are the correction of spelling mistakes or new affiliation.
Anonymity: Submissions must follow the KDD 2025 format and will be reviewed under a single-blind process (Authors should include their names and affiliations in the submitted manuscript).
Formatting Requirements: Submissions must be in English, in double-column format, and must adhere to the ACM template and format (also available in Overleaf); Word users may use the Word Interim Template. The recommended setting for LaTeX is:

\documentclass[sigconf,review]{acmart}

Submissions must be a single PDF file: 8 (eight) content pages as main paper, followed by references and an optional Appendix that has no page limits. The Appendix can contain details on reproducibility, proofs, pseudo-code, etc. The first 8 pages should be self-contained, since reviewers are not required to read past that. Note that different limits will apply to camera-ready (see below).
Supplementary Materials: Additional supplementary materials can be provided, including detailed documentation on data collection and curation processes, ethical use guidelines, and information on data accessibility and maintenance.
Serving as Reviewers: To ensure that all papers receive a sufficient number of high quality reviewers, there is a requirement for authors to contribute to reviewing: Every submission must nominate at least one author who is a qualified reviewer (i.e., authors with at least three papers in KDD or other related conferences). Only if no qualified reviewer exists in the author list, nominate the best-qualified author for consideration by the PC chairs.
Submissions violating these formatting requirements will be rejected without review.

Ethical Standards and Reproducibility

KDD 2025 emphasizes ethical research practices and reproducibility. Submissions must adhere to ethical guidelines, including considerations around data privacy, consent, and bias mitigation. The reviews will not be publicly visible during the review phase but will be published post-decision. Accepted datasets should be accessible to reviewers and can be released publicly at a later date.

Program Committee Co-Chairs

Email: KDD25-benchmark-chairs@acm.org

Ambuj Singh (UCSB)

Haixun Wang (EvenUp)