In the information filtering (or publish/ subscribe) paradigm, clients subscribe to a server with
continuous queries that express their information needs while information sources publish
documents to servers. Whenever a document is published, the continuous queries satisfying this
document are found and notifications are sent to appropriate subscribed clients. Although
information filtering has been in the research agenda for about half a century, there is a huge
paradox when it comes to benchmarking the performance of such systems. There is a striking
lack of a benchmarking mechanism (in the form of a large-scale standarised test collection of
continuous queries and the relevant document publications) specifically created for evaluating
filtering tasks. This work aims at filling this gap by proposing a methodology for automatically
creating massive continuous query datasets from available document collections. We intend to
publicly release all related material (including the software accompanying the proposed
methodology) to the research community after publication.