- Created by Si Chen, last modified by Andrea Frosinini on Apr 23, 2023
This is the data architecture for the data channels such as the Utility Emissions Channel of the Operating System for Climate Action.
The Idea
Writing software requires getting data from external sources, which usually means accessing an API by sending a GET or POST request to a URL with authentication keys and a request message and parsing the XML or JSON that comes back. While this approach is fully automated, it comes with a lot of potential problems:
- You have to build a different API for every source.
- The API is periodically unavailable, so you're down as well.
- The API could change and break your app.
- The API could simply be taken away from you, to "upgrade to a better API", to "improve security", etc. that leaves you scrambling for an alternative.
- You have to store a local copy of the data retrieved from the API, in case it goes away.
- You had to work out a procedure for audit and verification of the data obtained from the API.
- You worry about security because now you have a big app with lots of data and authentication keys.
As a result, these API integrations are expensive and time consuming to build and maintain, so they don't scale well in scope. When we're talking about building a Hyperledger-based blockchain app for Carbon Accounting and Certification WG, which could require integrating data from a large number of sources, we came upon a better way to do it with the blockchain itself.
Instead of each source providing us with an API, they could put their data on the blockchain. This could be a Hyperledger permissioned channel that is only open to their customers and other trusted parties. The data could be the carbon emissions of a product or a particular invoice, for example.
Instead of a big app consuming data from all these sources, an app could simply traverse through multiple channels. Instead of having a single access key from each data source, it could request that the user of the app pass it their authentication key or token. Then it would pass that token through to the data source's channel and obtain the data it needs. It would then perform the calculation with this data. Finally, it could open a channel of its own and put the results on this channel. This channel could then be open to its customers, so they could in turn use the data for their own carbon emissions calculations.
An additional advantage is that the sources contributing the data simply have to store the data in the channel. They do not have to be responsible for its long-term keeping and availability. Thus, this allows specialty companies, such as CO2 emissions auditing companies, to audit the data from a larger entity, such as a utility, shipping company, airline, or manufacturer, and then store it in the channel for everyone else to use. This is similar to the offline scenario where a building engineer files a report for a building developer with the city's building department: Once the report has been filed, the building engineer needs no longer to be responsible for making it available. The city planning department is responsible for making it available to the building developer and his sub-contractors and customers, as well as other government agencies.
There could be many channels of data from electric utilities, shipping companies, airlines, and suppliers of industrial commodities such as plastics or steel, each placed on the channel by a trusted CO2 emissions auditing company. Our app is a carbon calculator. The user of our app would pass in his keys to access the channels, for example from his utility emissions channel, based on his status as their customer. We would pass those keys to the channel and get the emissions from the utility, which we could then use for our calculations. The entire process is run as a serverless micro service.
The advantages of this approach are:
- The auditing company doesn't have to worry about keeping an API up and running. Once it has done its analysis, it's written to the blockchain channel and no longer requires a server to provide it, just like web content deployed on a CDN.
- The format is standardized by agreement between the provider and users of the data.
- The content of the data cannot be altered once it is written to the blockchain, removing concerns about audit.
- Even if the auditing company changes the format or stops providing data in the future, data from the past will always be available.
- The app consuming the data can be much smaller. For example, instead of storing an authentication key for many sources and data from many users, it can use keys provided by a customer's wallet. Once it's finished with the data obtained from those keys, it can put the results on its channel and then delete the data obtained from the blockchain.
- Both the data source and data consumer could then be run as serverless micro services.
The channels could be maintained by a third party entity, which charges a small fee for each access through a token. The data source provider could give out these tokens when it allows a data consumer permission to access its channel. In return, the third party entity serves as a neutral party tasked with keeping up the operations of the data channel.
This general architecture could be used for a variety of business data integrations, specifically the ones from Operating System for Climate Action.
A Physical Analogy
A physical world analogy is engineering reports filed with city building departments. There are many engineering companies who would draw up building plans according to building codes. These are reviewed, approved, and filed by the city building department, and then they are available to owners of the building as well as government agencies. Here we are proposing to replace the equivalent of the centralized city building department with a blockchain hosted and run by the engineering companies themselves.
Get Involved
This is an open source project and anyone is welcome to get involved and we will be happy to see you contribute.
1) Start by subscribing to the Climate SIG mailing list for updates and meeting notifications.
2) Join our bi-monthly Peer Programming Zoom call for developers on Mondays at 9 AM US Pacific time (UTC-07:00 America/Los Angeles.) Please check the calendar for the next call.
3) Check out the good first issues from our blockchain-carbon-accounting in Hyperledger-labs and feel free to contribute a fix for one that looks interesting to you.
4) See our How to Contribute page for other ways how you could get involved.
- No labels
25 Comments
Jim Mason
Have you evaluated using Kafka directly? It supports many different architectures for data sharing including the pattern here. It's fully open-source. It would not have some of the performance challenges blockchain has potentially.
Robin Klemens
If there will be some performance issues - probably not in the PoC - we should spend a look at Hyperledger Avalon.
"Hyperledger Avalon is a ledger independent implementation of the Trusted Compute Specifications published by the Enterprise Ethereum Alliance. It aims to enable the secure movement of blockchain processing off the main chain to dedicated computing resources. Avalon is designed to help developers gain the benefits of computational trust and mitigate its drawbacks."
David Boswell Maybe we could invite someone of the avalon community to one of our meetings if our need for outsourcing processing increases.
Si Chen
Hi Robin Klemens I agree with you. Actually vaneetalso had a question about scalability. Has anybody had problems scaling Hyperledger Fabric? Is there something in particular we should be aware of?
I would like to learn more about Avalon and, also Aries for security keys storage as Martin Wainstein suggested during our call. David BoswellDo you think we could get somebody from those projects to talk to our SIG, as well as somebody from Fabric to talk about scalability issues/questions?
David Boswell
Let me look into Aries, Avalon and Fabric speakers and will get back to you, Martin and Tom about scheduling for those.
Jim Mason
Fabric has very good scalability in standard configurations ( eg the ones used for benchmarks ) when compared to any other major blockchain platform. That said, the Fabric team has some additional improvements that have not yet been fully implemented. Separately, when I look at specific use cases, there is a great opportunity in some of those to scale Fabric well beyond the standard configuration benchmarks.
Si Chen
That makes sense, and I've heard the same elsewhere as well. You're right in that Kafka on its own is going to be a lot faster, but I think that Fabric will be more than adequate for our needs of storing what are essentially mirrors of financial accounting transactions.
David Boswell
Jim Mason – Do you have recommendations for who from the Fabric project would be best to ask to speak to the CA2 SIG group about Fabric scalability? Thanks for any suggestions.
Si Chen
Thanks for your feedback. Yes, Kafka is similar, but the difference is using a blockchain gives us an immutable ledger that preserves the data, which is of a financial and audit nature. In contrast, Kafka is designed more for data that's not meant for long-term. If you look at their use cases, for example, there's a lot of logging. Maybe you could build Kafka to do this (distributed, encrypted, secure, immutable, verifiable, etc.), but I suspect you'd end up with a homegrown blockchain by then.
vaneet
Si Chen,
Thank you for giving me the opportunity for open discussion. Before i can suggest anything related to architecture, I need to understand different actors of the system and their boundaries.
My first thoughts on reading this page are below
Above could also be done with server based system, creating in house ESB ( Enterprise Service Bus ) using Pipes and Filters pattern for instance.
consuming data from external api's → inbound gateway takes in request and transforms payload , does the business logic aka calling core services or calculations and send response to outbound gateway . This can done using channels and can be asynchronous. Middle layer can also use grpc with protocol buffers for data exchange and inbound gateway and outbound gateways can use http. Here, caching of data set can be done using Redis or etc, in case data is not available, it can be fetch by the cache.
Thereby app will access data from outbound gateway → using simple call. All processing is done by middle ware and results are aggregated and response is set to client over http from outbound gateway
I think that putting data on blockchain node could lead to bottlenecks in transaction management, performance and scalability.
Also data access layer, data authentication layer and business service layer should be designed in way so that they do not cross too many boundaries, otherwise any new changes in business entities/ layers can be seen in data access and data authentication and it would be hard to maintain and scale the ever changing platform. In fact, i would design these layers so that they only have single responsibility principle.
I would use Hybrid architecture aka → using scalable backed architecture having simple micro services using channels to get data, aggregate and send to the client and around that core, using blockchain aka hyperledger for what it serves the best.
As we have increasing number of nodes/networks, adding data / accessing via blockchain can have other problems with data. Data consistency, transactions and etc.
As i said, I am very new to understanding the requirements of the system, but little bit with what i understood, I gave my point of view
Si Chen
Thanks vaneet Yes, it could certainly be done that way. So what do you see Hyperledger or blockchain being used for?
vaneet
Si Chen
Well, Blockchain can be used for part of the system where it fits the best and do not compromise on non functional requirements of the solution.
I ask myself the following questions
1) Do we need to store the state ?
2) Do we have multiple writers ?
3) can we use an online third party who is trusted ?
4) Are all writers known ? if yes, are they trusted ?
With above exercise, we can distinguish how we can create a better platform and using blockchain technologies to solve the problem without impacting the design constraints of the solution.
so i would use blockchain for audit and verification and let Api's handle complex processing and some part of data required for write / verification should be sufficient enough.
What do you think ?
Thank you
Si Chen
vaneet How about let's think about a use case: Figuring out the CO2 emissions from a company's electricity usage.
Let's assume that the company buys all its electricity from a utility. So the approach I had in mind is:
Part 1
The utility (or a service provider for the utility) takes the company's utility bill, which has the kWH of electricity during a billing period, and compares it to the Emissions & Generation Resource Integrated Database (eGRID) to calculate the CO2 emissions of the electricity used. It then writes this CO2 emissions used as on a permissioned ledger.
Part 2
The company goes to the permissioned ledger and presents an authorization key issued by the utility. It gets its CO2 emissions data from the ledger.
So would these be the right answers to your questions?
vaneet
Si Chen Sorry i was busy with family. Holiday times here.
use case is good.
1) Writing C02 commissions on permissioned ledger. It could be multiple utility providers providing the service to the company also. for e.g solar installations and traditional grid.
How often writes will happen ?
Can we have a second use case where we can have ledger working with x number of providers and x times y number of clients/ companies and what other pieces of data required to be stored ?
what you think of this ?
1) utility companies can write the data to the traditional database and some important attributes which is required for verification and authentication should be put on ledger
another service writes data to the ledger, here we can use any protocol instead of depending up the client specific ( aka utility suppliers ) and we can put some filters and etc for data validation before its written to ledger. Writing to a ledger is costly affair.
2) Company can read from the ledger or the database. Why we need ledger ?
please don't get me wrong. I am just trying to understand system in more detail so that we can analyze parts of the system for better definition.
To ease the decision making process, we provide a flow
chart below. We consider one or multiple parties that
write the system state, i.e. a writer corresponds to an entity
with write access in a typical database system or to consensus
participant in a blockchain system.
If no data needs to be stored, no database is required at all,
i.e. a blockchain, as a form of database, is of no use. Similarly,
if only one writer exists, a blockchain does not provide
additional guarantees and a regular database is better suited,
because it provides better performance in terms of throughput
and latency. If a trusted third party (TTP) is available, there are
two options. First, if the TTP is always online, write operations
can be delegated to it and it can function as verifier for state
transitions. Second, if the TTP is usually offline, it can function
as a certificate authority in the setting of a permissioned
blockchain, i.e. where all writers of the system are known.
If the writers all mutually trust each other, i.e. they assume
that no participant is malicious, a database with shared write
access is likely the best solution. If they do not trust each other,
using a permissioned blockchain makes sense. Depending on
whether public verifiability is required, anyone can be allowed
to read the state (public permissioned blockchain) or the
set of readers may also be restricted (private permissioned
blockchain). If the set of writers is not fixed and known to the
participants, as is the case for many cryptocurrencies such as
Bitcoin, a permissionless blockchain is a suitable solution.
I like the below diagram that helps in defining blockchain solution
Si Chen
Thanks so much for doing this!
In answer to your questions:
The frequency would coincide will billing cycles. In reality it would not be too often–once a month or so.
Not sure I understand what you mean by the "a second use case where we can have ledger working with x number of providers and x times y number of clients/ companies and what other pieces of data required to be stored"?
I've also had some more time and discussions about this, and I think I was missing an important element: A neutral third party auditor. But when you think about it, of course the CO2 emissions should be from a neutral third party, not the company that produced it?
So, the point of the channel: It should actually be a place to store audited CO2 emissions published by a neutral third party based on utility bills to the customer. So it is not a database. It should really be an immutable audit trail of objective records.
The channel can be accessed by the utility's customer, as well as other trusted parties (government agencies, NGO's, investors).
The flows I see are:
1. Customer authorizes the third party auditor to access his data from the utility (I've implemented this already using UtilityAPI and GreenButton XML.)
2. Auditor gets the utility bill and applies government carbon emissions data to calculate emissions and puts them on the channel.
3. Auditor grants access to the channel, initially to the customer and the utility, but also to other trusted parties.
4. Customer gets data from the channel and uses it for his own total CO2 emissions calculations.
Eventually there could be many channels, created by possibly different auditors that work with data in each area -- shipping, travel, etc.
So coming back to your criteria:
The offline analog, by the way, is a set of engineering reports for a building. It's drawn up by a professional engineer (not the builder himself), filed with the municipal building department, which approves and certifies it. It's available to the builder, who can also make it available to tenants, other contractors. It's also available to regulatory agencies and tax authorities.
A permissioned ledger just provides a digital alternative to this set up. In this case, for CO2 emissions.
vaneet
Si Chen
Thank you for giving me the opportunity to share
Yes, A neutral third party auditor. That was what i was looking for, to find which actor/service can be a good fit for blockchain
- that is where blockchain part fit and rest of the actors could use traditional approach to have best of both worlds to maintain latency and keep scalability.
A neutral third party auditor can be a blockchain compliant framework instead of a managed my humans, companies or clients. of course for that, we assume that utility companies will provide us reliable data at source.
So, what you are saying is : C02 emissions data is updated or created monthly on ledger per customer.
So auditors write data to the channels. aka multiple auditors storing/writing to same utility channel. what do you think of throughput ?
I meant lets take another complex use case where more actors are involved , simulating a real time scenario for creating a value from point A to point B . "a second use case where we can have ledger working with X number of auditors and X times Y number of clients/ companies and what other type of data need to be stored"? Objective is to : verify that at each part we need blockchain and eliminate the parts where its not required.
Best Regards
Si Chen
vaneet Appreciate the questions – they're very helpful so keep them coming!
Throughput: Not the highest out there in terms of frequency, since bills are issued monthly. This is not like IOT data or tracking shipments in a supply chain. The only issue is there are many utility customers out there. They solve this problem by staggering the billing cycles, though, so not verybody gets billed at the same time. I think we can try to solve this when the solution reaches that level of adoption, though.
Robin Klemens
Hi Si Chen and vaneet , thanks for this great discussion and I'd like to hop in.
The questions vaneet raised are great to reflect critically why we should use blockchain technology in the first place and second which kind of blockchain (private, public, permissioned, permissionless) suits best. The problem decides the technology and it is not the technology looking for an application.
I want to extend the answers given by Si Chen and wider our view. Concerning the problem of Carbon Accounting and Certification, blockchain can do a lot more than being an "immutable audit trail of objective records".
Do we have multiple writers?
- Yes, we do. There has to be a set of writers, e.g. auditors, trusted institutional organizations, governments, independent NGOs, etc. The most important part is that there is a critical mass of parties authorized to write data to the ledger and thus reach consensus by the majority.
Are all the writers trusted?
- Is there is a thing like trust in a system that is ruled by monetary driven factors? This is where blockchain brings one of is superpowers into play (as vaneet has mentioned already). Blockchain has the ability to "minimize the amount of trust required from any single actor in the system" by distributing trust among different (independent) actors. Hence, a blockchain architecture should always try to avoid non-redundancy aka a single point of failure/trust or just a bottleneck. (I'll try to reflect this in the architecture proposed for the Utility Emission Channel). By the use of redundantly executed smart contracts provided with the validation logic an auditor would do, blockchain technology can minimize the probability of fraud, save a lot of money due to automation, speed up processes and make all decisions transparent as well as immutable stored to the ledger. Further, as long as there will be humans involved in the process of auditing or accounting, there gonna be random mistakes made by humans (errare humanum est).
If I go along the diagram proposed by vaneet, I come to the point where I have to decide if public verifiability is required. I'd love to see all the data transparently stored to a public readable ledger as the emission of greenhouse gases concerns the same planet we all live at. However, as long as this will not be enforced by law or there will arise a "forcing" demand in society, most companies would prefer the permissioned way.
I'd love to get feedback and I recommend continuing the use of blockchain technology with a look at the Utility Emission Channel example as this is more feasible than talking about blockchain and climate accounting in general.
/Robin
Yilin Sai
Hi Si Chen Robin Klemens vaneet ,
Thanks for the great discussion! Here are my thoughts, I'm very interested to know whether you guys agree with them:
What do you guys think?
Si Chen
Hello Yilin Sai vaneet Robin Klemens and anybody else reading
You ask some good questions! Here are some attempts at answers:
Yilin Sai
Hi Si Chen,
Thank you very much for the reply! Your comments inspires me to think a lot deeper. I want to add some comments:
For point 1, thank you for pointing out the need for high availability. With that in mind, I came up with the following data flow:
third party auditors/cloud computers do the emission calculation and sign the result → the result is stored in a cluster of nodes with redundancy → customers/government agencies/etc connect to one of the node, pull the result of interest, and verify the signature to ensure the source is correct and the result hasn't been tampered with.
Some notes about the system above:
a. the cluster provides high availability. In other words, they use a crash fault tolerance consensus algorithm.
c. third party auditors/could computers are the source of trust
b. the cluster don't need to verify the signature, because the cluster is not the source of trust
d. the serverless micro service run by customers/government agencies does the signature verification
Now let's compare the system above with the following one:
third party auditors/cloud computers do the emission calculation and sign the result → the result is stored in a cluster of nodes which verifies the signatures and reach a consensus on the validity of the result → customers/government agencies/etc connect to one of the node, pull the result of interest, and assume the result is from the correct source and the result hasn't been tampered with.
Compared with the first system, in the second system:
a. the cluster provides high availability and trust. In other words, they use a Byzantine fault tolerance consensus algorithm.
b. the third party auditors/cloud computers plus the cluster are the source of trust
c. the cluster needs to verify the signatures, because the cluster is part of the source of trust
d. the serverless micro service run by customers/government agencies does not verify signatures, because they trust the cluster has done the job.
I think we can call both systems blockchain-based, because they both involve a distributed ledger of signed record. However, the two system has different assumptions/requirements and operates differently. I think what you have in mind is more inline with the second system?
In my opinion, the first system loosens the requirements of the cluster so that we don't need to worry about distributing trust among the cluster and BFT consensus, which means it's easier to implement. Also it's closer to the real world analogy you mentioned, so the industry might be more likely to adopt.
Finally the privacy of the customer is better preserved, because the customer does not need to share its usage data with as many auditors/cloud computers as system 2. What do you think?--------------------------------------------
For point 2, a fee for use of data makes sense, but I think there is something missing if we want to truly distribute the trust. Please allow me to compare with bitcoin again: in bitcoin, a person submitting the transaction doesn't get to choose which node to process the transaction. The fee will be given to a random node (in POW network, it's the node who solved the hashcash problem). In our case, if customer is the one who gets to choose, it could choose a node (or organisation) that is in favour of it. We don't want a third party to choose either. So I think in order to truly distribute the trust, we need a mechanism to both
a. determine which node(s) is going to calculate the CO2 emission of the current auditing request
b. distribute the incentive to the node(s) who does the job
Edit: just want to make this clearer. Without this mechanism, the endorser nodes who do the emission calculation has an incentive to do something in favour of the request submitter, because the submitter hold the money and he/she can decide who to give it to!
I guess this could be out of scope for the MVP, but I think some work is needed later.
Sorry for the long message, I'm really interested in this topic because I believe it will inform the design of many solutions. Looking forward to your and everyone else's thoughts!
Si Chen
Thank you Yilin Saifor your deep comments. They're also inspiring me to think a lot deeper.
In the first part, I prefer Option 2 (third party auditors/cloud computers plus the cluster are the source of trust):
In the second part, we'd have to assume that the customer would choose which auditor to work with. In a real market with different engineers/architects/consultants/service providers, that's the normal business practice. They offer different services, specialization within a category, personal relationships, different pricing, etc. As long as they're all "accredited" and thus trusted, we usually trust they will do the right thing.
The only interesting exception is mortage financing: After the mortgage market problems of 2008, it was determined that appraisers must not be hired by the mortgage brokers. So a system was put in to route appraisals through a pool. This, however, created its own problems: Randomly chosen appraiser from the the pool of "qualified" appraisers may not be equally familiar with the specific needs of the property. For example, a pool of all appraisers in a big city may not be equally familiar with each neighborhood and its quirks.
Yilin Sai
Thanks Si Chen for addressing my concerns! Really happy to see how the scenario's demands naturally lead to the choice of a blockchain-based solution!
Sylvester Akinbohun
hello there, I was looking for help in hyper ledger so I storm on this forum but I don't really know maybe is for learner like me. so, I have to be sure before I start asking questions please. Thanks all
David Boswell
Sylvester Akinbohun – I'm glad to hear that you're interested in learning more about Hyperledger. A good place to start is to go through the free online training courses. You can find links to those at:
https://www.hyperledger.org/learn/training
If you have questions after checking those out, feel free to let us know.
Martin Wainstein
Hi Jim Mason, I see you commented here and that you are intimate with TOIP layer. We advanced on this idea on this meeting, and now have an active prompt for the upcoming Open Climate Collabathon, where your skills and those from TOIP community would be very catalytic. Please review this HL WG page and prompt, and if you know of folks that would be interested in participating maybe flag the opportunity to them as well. Thanks!