Author of public dataset requesting co-authorship: usual?

2021-07-26

An answer to this question on the Academia Stack Exchange.

Question

I have come across a published dataset which comes with a note saying that: (1) the authors of the dataset would like to be informed of any papers that use it (2) they might request co-authorship depending on how much the paper depends on the data.

I am feeling ambivalent about using this dataset. On the one hand, I appreciate that the authors chose to make it public, despite the many man-hours of work that went into creating it. On the other hand, it seems unusual that the creator of an already published dataset would request co-authorship on a paper they did not otherwise contribute to.

I am interested in using such a dataset for an idea, but I am reluctant to put significant work into a project when there is a risk (even if a small risk) that the dataset's creator may interfere with publication. Accepting co-authorship essentially means agreeing that the co-author may delay publication or may attempt to shape the paper, possibly in ways I am unhappy with.

Are such requests common? Are they reasonable? Is it reasonable to ignore such a request, given that the dataset is public?

I understand that this question may read like a nitpick. Realistically, I don't expect trouble. Yet it bothers me that starting to work with this dataset appears to essentially require agreeing that my work may be interfered with. It seems like a rather unreasonable "have your cake and eat it" mentality on part of the dataset author when releasing the data.

Answer

All of the thoughts you raise about co-authorship are negative!

The data folks might delay publication
The data folks might shape the paper

As others have said, this is in the public domain, so you are not required to make the data folks co-authors. But I think you should consider the many benefits:

You build new collaborations. Later, this leads to Letters of Rec, conference invites, early notification about new datasets, someone on a faculty selection committee who knows your name.
The data folks understand their data. You may think that you do, but data can have subtle issues. A co-authorship incentivizes the data folks to help you interpret their data as accurately as possible.
The data folks may have more, unpublished data. Bringing them aboard as co-authors could give you the opportunity to explore the topic in a broader or more nuanced way if it provides the data folks an avenue for getting more data out there.
The data folks turn out to be good writers and your final paper is better for bringing them aboard. Personally, I know I often grow tired of a paper in the final stages of working with it. Having coauthors continue to raise nit-picks is annoying, but when we submit I'm much more confident that the final product is of high quality.
You learn more about collaboration/coordination/leadership. Increasingly, science is a team sport, so playing it that works in your favour.

When I go to seminars, I'm always awed by the final slides of the presentation where the speaker shows the veritable army of collaborators they've led in producing their Science Thing. Correlation isn't causation, but if you want to go around and give seminars, collaborating widely seems like it helps get you there.

Build a big tent :-)