Conference paper (in proceedings)

PoolinGH : fast, efficient, and robust GitHub repository mining

  • 2026
Published in:
  • ACM International Conference on Mining Software Repositories (MSR 2026). - 2026, p. in press
English Researchers in Mining (open-source) Software Repositories (MSR) often create datasets that should survive the single paper and support long-term investigation of specific phenomena. Although popular, these studies recurrently deal with similar technical limitations. For instance, public collaborative development platforms, such as GitHub, impose hourly rate limits on their API requests. Furthermore, depending on network and API conditions, queries can fail and disrupt the process. These unexpected events can slow down or even invalidate the mining. Nevertheless, there are ways to minimize the undesirable effects in a reusable way while still complying with such limitations. However, best practices are often (re-)implemented on an {\em ad hoc} basis. Whatever works.
We propose PoolinGH, a lightweight, open-source, easy-to-use library, aimed at supporting researchers. It is designed to accelerate and ensure efficient and robust mining on the GitHub REST API while taking full advantage of its capabilities. PoolinGH enables automatic pooling of multiple access tokens and parallelizes queries. It optimizes queues and regulates network and API usage for respecting GitHub's limits and best practices. Error management and recovery or pruning in case of deadlocks are ensured. Search coverage maximization and progress monitoring are among the most useful features to avoid reinventing the wheel. We also provide solution templates that meet common needs for specific extensions of PoolinGH. A preliminary evaluation of these examples, involving tens of thousands of requests, demonstrates tangible gains.
Collections
Language
  • English
Classification
Computer science and technology
Notes
  • MSR 2026
  • Rio de Janeiro, Brazil
  • 13-14 Apr 2026
License
CC BY
Open access status
gold
Identifiers
Persistent URL
https://n2t.net/ark:/12658/srd1334960
Statistics

Document views: 0 File downloads:
  • Raglianti_Lanza_2026_ACM_MSR_PoolinGH: 0