Date


Attendees

Main Event

Software Testing in DLT: Challenges and Use Cases in Financial Services
Elena Treshcheva, Business Development Manager and Researcher at Exactpro.

In this talk Elena will share her thoughts around ensuring resilience of DLT-based platforms in the financial services industry. She will give a couple of examples of the testing tasks Exactpro faced recently and speak about the common challenges of DLT networks in this knowledge domain. She will also mention the types of testing which proved to be the most fit to the task.

Discussion items

Time

Item

Who

Notes

5 minAnti-Trust & code of conductVB
 30 min  Presentation plus Q&AET
5 minNext SIG reportVB
10 minDiscuss XCSI lab proposalAll

Recording


Audio

Minutes

Slides (you can also see it in the comments below)

Thanks to Elena for sharing the slides.

Exactpro Hyperledger CMSIG.pdf

Functional and non-functional testing of DLT networks


Jim asked:

q> did you use jmeter for loads? No
q> do you test cicd and change management? No
q> how do you define bad data test cases? 
q> do you pre and post assertions in your test framework? No
q> do you use openid connect authentication interfaces in testing? No


We talked about the next SIG report, a draft version will be circulated before the 10th of July, We need to submit the final report by July 15.

A new lab is proposed for xcsi (cross chain settlement instructions), the write-up was shared, all are welcome to join the lab.




  • No labels

3 Comments

  1. Hi all, thanks so much for having me yesterday and thanks for all of your questions.

    Attached is the deck I was going through. If any questions, I would love to discuss them either here or via email / LinkedIn.

     

    1. great presentation Elena ! thanks again

  2. Hi again,

    As promised at the last meeting, I am sharing here some insights received in discussions with the Exactpro team based on the questions following my presentation.

    Regarding the usage of JMeter for non-functional testing

    Rather than using this tool, in the absolute majority of our projects we use in-house tools and plugins. For testing clearing and ccps, we use our tool ClearTH (open-sourced) enhanced with the Woodpecker plugin (Java and Kotlin connector), and for non-functional testing of exchanges we use LoadInjector, another Exactpro tool. The reason behind this is extremely high requirements for exchanges, for which LoadInjector turns out to be a better fit: it is an open-cycle tool which can produce lots of messages and continue sending them without waiting for a response from the system under test, while JMeter is a closed-cycle generator (the thread being executed waits for a response from the system before it starts sending the next portion of requests).

    On CI/CD and change management:

    As a software testing specialist firm, we are needed only where there's a change involved, stemming either from regulatory changes, requirements updates, or new technology being introduced into the platform - thus, generally speaking, all our work is around change management, with the size of projects depending on the scale of the change. To support the integration with the client’s change management process, our tools are equipped with command line and Rest API and can be integrated into any CI/CD framework. One of the examples is Jenkins (the most commonly used): the integrations with testing tools allow to deploy 16-24 test envs and make changes to them (including emulation of time travels to cover multi-day lifecycle processes). In the most recent DLT-related project we used another framework, TeamCity by JetBrains. 

    An addition on the question about bad data test cases:

    Negative testing is a major part of our scope in any project, though not always the clients are happy with performing checks against negative scenarios. But our experience shows that seemingly unrealistic scenarios often emerge in production. One of the examples is around testing of SWIFT tags absent from a specification: though additional tags were called unrealistic and highly improbable, one day a new tag was sent from a central depository into the system under test, which proves that negative tests are extremely important for resilience testing. When testing for resiliency, we work with several different levels of protection, so we need to diversify the test cases, as some types of negative tests are good at the perimeter and some of them are good to get inside the system and test further. For example, malformed messages sent randomly in batches will test the system's resilience in terms of incoming message validation as well as the gateways’ throughput and ability to deal with the message queues. And we can also test the system on messages formed correctly, but containing several non-existing tags, or yet further checks can involve correct messages with correct, existing tags, but in incorrect combinations, and at last, we can perform checks with legitimate messages, but experiment with system settings. Thus we test not just an external interface, but every level of the system which deals with the received data. Tool-wise, such an approach is implemented through codecs connecting testing tools to the system under test: while ‘normal mode’ codecs are aimed at the checks of consumed data inside the system, including data reconciliation, the ones in a ‘dirty mode’ are targeted at the external checks.

    When testing, we work with real as well as synthetic data, and it is very important to use both. Real data is good from the point of view of the real picture in production, but the constraints include data protection issues and the limitations which are inevitable with the production data replay, such as limited coverage and absence of edge scenarios. Synthetic data significantly helps with coverage, but there’s a risk to miss a critical real-life case.

    A small addition to my comments on pre- and post assertions in our test framework:

    As this is more common for unit testing (which is not our focus), we have a quite specific understanding of pre- and post-assertions in our work. When working with multicomponent systems, it's important to launch and tune the system as a whole, so the pre-assertion in our case would be to reset the system prior to test runs, infusing it with reference data, both valid and invalid, according to the testing purposes. Post-assertions in our world are about post-transactional verification and data reconciliation testing. Here we use real-time monitoring and passive testing approaches which, again, are complementary to each other: while real-time monitoring is beneficial from the point of view of alerting and faster action on a problem, passive testing allows for post-transactional processing of a lot more data, working with bigger batches.

    Regarding openid authentication interfaces in testing, the answer is no - that stems from the implementations we’ve encountered so far.

    Elaborating more on Mani’s question about testing of CDM, it turns out we did look at the model itself during the DerivHack 2019 participation that I mentioned. Our experience with CDM back then in October last year showed that the model is quite good as a framework for requirement definition, but the Rosetta portal had room for improvement (as of then). I am attaching a slide deck presented on the final day of the hackathon.

    And one more additional comment on Junji’s question about most common mistakes / issues we encounter in testing DLT.

    As systems depend a lot on reference data and overall configuration, incorrect settings are a common cause of major failures. Yet another source of errors originate from specifications: if something is not explicitly described in specs that doesn't mean it won't happen. This is very common in software testing, though the phenomenon is much broader - as per Taleb's book on the concept of randomness, people tend to underestimate the importance of rare events.

    The latter reference leads me to yet another comment to Vipin’s question about black swans and chaos engineering - first, I forgot to refer to our latest publication on this; and second, adding to everything I told about the preparation to the disaster, though, we do not do chaos testing in production, for obvious reasons. But we push systems to their limits in pre-production environments, killing the components under load to make sure monitoring is in place and there's no single point of failure. Though people tend to think it is unnecessary, the approach proved itself in discovering critical resilience issues.

    Many thanks to Jim Masonmani pillaiJunji Katto, and Vipin Bharathan for interesting questions (it was a pleasure to go through them once again with the team) and thanks again to the group for having me.