Status

PROPOSED 

Outcome
Minutes Link

What is the policy on accepting DCOs from contributors operating under pseudonyms?

Some possibilities:

  • Always.
  • Never, DCOs fields must have "Real Names".
  • If the real identity is known to the TSC, it is acceptable.
  • If the real identity is known to a maintainer of the project, it is acceptable.
  • If the real identity is known to the maintainer merging the commit, it is acceptable.
  • If the commit contains a signed-off-by for someone using their real identity and who knows who the person represented by the pseudonym is, it is acceptable.
  • If the commit contains at least one signed-off-by by someone using their real identity, it is acceptable.

Background

Blockchains, cryptocurrencies, and cryptography in general have had a long history of people contributing and operating under pseudonyms: from Captain Crunch to Satoshi Nakamoto.  Some contributors prefer to operate under a pseudonym in order to avoid online harassment because they have been or have reason to believe they would be subject to harassment. 

Regardless of the reasons for contributors "masking" themselves we would benefit from clarity regarding how to handle contributions from anonymous and pseudonymous contributors.  Are they required to "unmask" and if so to what degree and to whom (completely or to trusted parties).

To complicate things there are also individuals who operate under pseudonyms that appear and act like real identities.  If the policy is anything but "always" what duties do maintainers have when they discover a contributor with a pseudonym that passes as a real identity is contributing or has contributed?

A related issue would be whether maintainers would be able to operate under a pseudonym and what degree of unmasking would be required if allowed.

Reviewed By

24 Comments

  1. I think that the whole purpose of the DCO is meant to be able to trace back a commit to its author who is asserting that they have the right to make the contribution etc etc. IMO the signed-off-by needs to be a real email address that we can use too contact the submitter should we need to.

  2. You can operate under a pseudonym AND have a real email address. Case in point is Satoshi herself who was reachable using an email address. A person can assert that they have the right to contribute the code, but falsely. There is not much recourse to this state of affairs.

  3. I followed up with another email addressing some of this.  Is there any way we could get someone from legal to address DCO stuff?

    1. Yes, I brought it up and Brian's going to ask the LF lawyers for their take on this.


      1. Thanks a lot!  Do we have a timeline on when we might hear back?

        1. No but let's see what Brian reports back.

  4. LF legal looked at the item and were wondering what underlying need was motivating the ask.  "In the Linux kernel for example the maintainers are expected to know the identity of anyone whose patches they're contributing. The real issue is if there was ever a legal matter, would the person be identifiable and available because we have their identity."  I was going to bring that question back to here but fell behind. 

    The risk of taking a DCO from someone that can't be identified and reached is that a challenge to the provenance of that code can't be answered - basically anyone could claim "that was mine, you accepted stolen property" and there'd be no one to refute that or take the blame for it.  In which case there'd be a very difficult decision - fight in court without any testimony that the code wasn't stolen, or purge the code and require a clean-room rewrite.  Those seem like awful paths to have to take, for the price of more vigilance up front.

    Given this is a matter of legal liability, it's not a decision the TSC can make; at best it could recommend a change to the Governing Board and LF, but it's the GB and LF that need to weigh that risk as they're the ones who would bear the costs of any legal action.

    I wasn't on Hyperledger on day zero, but one thing I recall hearing is that one reason it was formed was to provide a space safe from anonymous contributors who may come along later seeking rent.  I remember specifically hearing that if it turned out Craig Wright was Satoshi, then the Australian patents he (much later) filed on Bitcoin architecture could be leveraged against anyone in the Bitcoin community, in part because the license on the code was MIT and thus came with no patent grants.  I think we want to avoid that risk. 

    However I know the term "real identity" is highly problematic.  We aren't storing Social Security numbers or DNA or anything like that.  The DCO is attached to the commit or PR, from which we can get the Github account name, but that doesn't necessarily come with a real name or even a contactable email address, which is also a problem when we pull together the voter lists for the TSC election.  Are each of you sure you'd be able to get in contact with all submitters of PRs you've accepted?  Even good, real people have their email addresses go bad or name changes and then can't be reached.  So this isn't about providing a hermetic seal around the problem, more showing good faith and intent in ensuring we don't receive stolen or patent-covered code.


    I'll try and get more clarity.  Til then, please document any instances where people refuse to offer PRs because they don't want to be contactable after the fact.

  5. If LF/Hyperledger has no interest in patenting any of its code, then I suggest a simple way to preemptively disclose pertinent information is require the committer to briefly describe what the code is supposed to do.  Such disclosure makes the code prior art and negates any attempt to claim novelty as part of the patenting application.  A corollary is when an author discloses his/her invention in an academic publication prior to any patent submission, and automatically becomes "state of the art" information that anyone can use.

    Kent Lau.

    LLB,LLM (Intellectual Property).

    Hong Kong.

    1. This wouldn't address the case where a patent was filed before the code was contributed though, right?

      1. Making it easy for the Patent Examiner to find and understand the breadth of hyperledger code in the public domain is a useful strategy to make it harder for the patentee to claim novelty.  A search by the Patent Examiner will reveal areas of contention and the onus is on the patentee to satisfactorily explain to the Patent Examiner exactly how his/her claims are novel with respect to Prior Art.  Conversely, there is no legal requirement for LF to sustain or explain any commit message or other disclosure.  My suggestion is to maximise the barrier for any malicious actor to be granted a relevant patent.


        Regards


        Kent


        Hong Kong

    2. IANAL, but AIUI, the US recently moved from first-to-invent to first-to-file, which means even an openly disclosed invention developed by person A can be patented by person B if person A doesn't do it first (with the inevitable caveats).  Furthermore, how does a maintainer accepting the contribution know that all possible patentable claims have been declared in the commit message of "what the code is supposed to do"?  The threat here is from malicious contributors who intentionally don't disclose patents or make themselves uncontactable, so adding a new reporting requirement on contributors doesn't seem effective.

      1. There are 3 features of patent law to bear in mind:


        1)  Grace Period is a set time limit within which you can still file a patent after public disclosure of your invention.  There is no Grace Period in a first-to-file system.  Therefore, you must submit your patent application BEFORE public disclosure.


        2)  Novelty is the concept that you cannot patent anything already in the public domain (also known as Prior Art).  Public disclosure destroys novelty in a first-to-file system.


        3)  Scope is the breadth of claims for which you are applying for patent protection.  It is impossible to know all patentable claims, although patentees do try to maximise the scope.


        In reponse to your particular points:


        i)  Openly disclosed invention destroys novelty, so person B cannot get a patent for Prior Art by person A, unless person B makes a new and different claim or an improvement on the Prior Art.


        ii)  Maintainer does not need to know all patentable claims because we are creating a "spoiler system" and the Law of Obviousness increases the zone around the public disclosure.  The maintainer need only speculate (rightly or wrongly) about the function of the code, since LF is not applying for a patent, and no actual evidence is needed.  


        iii)  I agree that it is difficult to preempt a malicious actor submitting a patent before committing that code.  However, a "spoiler system" does not need the malicious contributor to be contactable, or even identifiable.


        I submit the above points for your consideration and commend a "spoiler system" of public disclosure as an effective "shield" with minimum effort and overhead.


        Regards,


        Kent.


        Hong Kong.

  6. In short how can you make the DCO stick?

    1. DCO signer identity is unknown
    2. DCO signer signs without owning rights to the code (this can happen when a mass of code with multiple authors are signed off by one person after a squash)- happens especially when a new project with existing code joins the system.

    For 1. Suggestions: Hart/Chris- make it correspond to LFID which have a lot of hidden and verified attributes which are not public, but known only to LF (see below for SSI)

    For 2. There is no known antidote(yet). Going after the DCO signer is possible with 1, but then what is the recourse?

    For 1. and 2. Kent G Lau suggests creating prior art (Brian Behlendorf already has the first-to-file objections around this- but prior art defense does trump this as noted by Kent G Lau ), IF the implementation is not patented yet. Pieces of code cannot be patented only copy-righted, nor ideas- only implementations or inventions.

    Finally MOSS solutions can look for similarity in software (usually used for anti-plagiarism in universities)- don't know how this would work or how accurate it will be.

    In the end this remains a conundrum. Basically the aim is to start off with clean unencumbered code (Apache 2) and accept only unencumbered code signed off properly, with properly publicly disclosed details.

    On another note: I wonder if these verifiable claims can somehow be issued by the signer and held by LF (Hyperledger) for later verification using Self Sovereign Identity and DIDs- a question for Nathan George . Let us look at this as a use case for SSI- I doubt whether we will solve the DCO issue in the short term with this solution.

  7. IMNSHO, If there is no governance process for the code (which in my mind means you need to know who contributed it), then there is no point in actually contributing code to Hyperledger in the first place.

    Why are we even concerned about people who are not willing to provide their real identities?  That fundamentally goes against the whole point of publishing governed code.

    1. Anonymity doesn't have to be all or nothing here.  We can have a system where the LF knows who you are, but the public does not.  This is what I've been suggesting with tying github accounts to an LFID.  The github accounts can be anonymous and public-facing, while the LFID can have all of the real identity and legal information on it.

      I agree with you that, fundamentally, the LF is going to need to know who contributors are, and people that want to remain totally anonymous don't really have a place in the system.  However, there are a number of reasons why you might want to be publicly anonymous.  Maybe you don't want to get bombarded with emails from random people in public who can correlate your contributions with your real-world persona.  Maybe you don't want your employer to see that you're tinkering with open source code in your free time.  I think there are a number of good reasons for this.

      Thoughts?

    2. I think Hart has a point. Another reason I know of for people not to want to reveal their identity is gender discrimination. I read that, sadly, PRs from women more often get rejected than those from men. This blows my mind but it is what it is. For that reason some women understandably chose to hide their real identity.

      I agree that a system such as the one outlined by Hart would seem to fit the bill. This is definitely something we should get investigated.


    3. There is a spectrum of anonymity, and I think the best course is not an all or nothing approach, we won't be served well by total anonymity nor will we be well served by a "real names" policy.  The question in my mind is if the real identity needs to be known to the maintainer pressing "merge" or if it needs to be to HLP and the LF as a whole.  I prefer that only the maintainer needs to know.

      1. To clarify, I'm not suggesting that the whole HLP or LF has to know a person's real name.  Just that there is a database (that shouldn't be accessible except under special circumstances, and even then only be certain people) with people's names and DCO statements (and whatever else legal requires) tied to associated github accounts.

        In an ideal world, yes, it would be better if the real identity needed to only be known by the maintainer pressing merge.  However, such a system would seem to be difficult to implement practically:  rather than have a database, we would have to have some kind of DAG that kept track of who was responsible for what and who knew what identity.  This would be further complicated if people left or became inactive (and didn't respond to attempts for contact).  So I'm not sure how we could have an easily manageable identity system (that allows us to trace contributions back to people) that gives this level of anonymity.  However, I'm happy to be proven wrong on this one:  if you think there is an easy way to do this that would pass legal muster, I'd be happy to listen. 

        Additionally, in some cases, it's probably better to let the LF be the identity host rather than rely on maintainers.  Arnaud points out the fact that PRs from women are more likely to get rejected than those from men (which provides an incentive for women to contribute pseudononymously).  A system where the maintainers need to know the real ID of the people writing the code before they press merge isn't going to help with this issue.

      2. I don't think merely relying on the maintainer to know is quite enough because it doesn't resist the test of time. A few years later, the maintainer might be gone and it may become difficult if not impossible to contact the contributor.

        I think this means we should have the actual contact info in some LF or HLP DB somewhere that we can go back to if needed down the line. And rather than relying on a DCO made of some email address that often is bogus we should rely on an LF ID that can be used to retrieve the contact info.

        1. Iroha had exactly this issue with their older code and had to do a squash... which we really don't like doing.

  8. Hi. This topic has persisted for lack of a concrete framing of the problem, which has hindered a concrete proposal for a solution.

    Right now, DCOs are stronger than some would like (who would prefer to contribute pseudonymously or anonymously, without any legal agreement) and weaker than some would like (the association is entirely with a Github ID or email address, the latter of which may be entirely bogus).

    There seems to be a desire for and interest in stronger chaining of Github IDs and PR processes to LFID identities, which would allow the LF at least to have a firmer connection from the author of a PR or push to an email address, and perhaps then to the legal identity of the contributor. This is a long process however, since among other challenges it requires Github adopting new technologies like DIDs and VCs for authn/authz and changing PR processes to more firmly track original authors rather than using free text in the commit message. We hope a solution emerges - there is growing concern over attacks on the open source software supply chain, and some efforts starting up to harden that. But right now there seems no practical approach that does not place a substantial degree of the process on human diligence, and no reasonable alternative to using Github for SCM at the moment, after there was tremendous pressure to move to Github.

    The LF (with informal advice from counsel) does not recommend changing the legal documents or processes at this time around this issue. They have proven strong enough to defend the Linux kernel against legal attacks to date, and creating new barriers to participation is a real concern even while very hard to measure. I recommend this issue be closed, and we keep an eye on approaches the LF and/or Github take towards better hardening.

    Brian



    1. Brian, for the Linux kernel, the email cannot be fake, because patches to the kernel are submitted via (wait for it.... drum roll...) email (ta-dah!). So, the address cannot be fake, it is real and verifiable. With a simple GitHub pull request, this is not the case. http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/

    2. I think it's true that we don't really have a problem (yet) to solve in terms of dealing with bad players and from that point of view we may not need to do anything but I'm surprised by the claim that the problem lacks framing.

      I think Danno 's question was pretty clearly stated and maybe we tried to make too much of it but I think we've got to answer his question. Not answering it means we are leaving maintainers without any guidance as to what policy to use, which is bound to lead to inconsistencies across Hyperledger projects.