Managing and sharing data is increasingly important to advancing research, but with that comes challenges.
The volume of data generated by research has grown rapidly in recent years. This raised questions about the most efficient ways to manage and share data. In response, researchers and funders have developed initiatives to ultimately facilitate the exploitation of results. One approach promotes open access to data, which guarantees immediate online availability of research results without access barriers. Another initiative encourages researchers to make data findable, accessible, interoperable and reusable – or FAIR1.
While these initiatives promote effective data management and improve the reuse of research results, data often remain scattered across a large number of publications or repositories in a wide variety of formats and with varying levels of metadata. Additionally, the type of datasets relevant to each field varies across disciplines. These characteristics place certain limits on the extent to which the data can be accessed and reused, even when it largely conforms to open access or FAIR principles.
Data management and sharing challenges are seen as particularly limiting in growing research areas, such as perovskite photovoltaics. This field offers an interesting case study for the different approaches that the research community has promoted over the years in terms of measurement, reporting and data accessibility, often drawing inspiration from other fields. For example, the community encouraged the creation of a checklist for reporting technical and procedural information related to solar cell characterization.2 and, building on established protocols for organic photovoltaics, device stability assessment and reporting standards3.
The growing amount of data being generated for perovskite solar cells has also prompted researchers to call for the development of a database that collects research results.3.4. Such a database should not only be accessible to people in the field, but also easily readable by machines. The need for a unified database will likely become even more pressing with the advent of high-throughput automated approaches and machine learning methods capable of developing and characterizing materials and devices at a faster rate.5.
Some attempts have been made over the years to collect data from peer-reviewed articles and make them available to others.6. In an even more ambitious endeavor, researchers working on emerging solar cell technologies, including perovskite devices, have launched the Emerging Photovoltaics Reports initiative and set up a database collecting key performance data (https: //emerging-pv.org/). Today, Jesper Jacobsson and Eva Unger have brought together a large number of researchers in the field of photovoltaic perovskite to create a dedicated and even more comprehensive platform known as the Perovskite Database project (https://www.perovskitedatabase.com /).
The researchers manually collected perovskite solar cell data from more than 15,000 peer-reviewed publications — nearly all of the research data on perovskite photovoltaics published to date. The data was systematically formatted and collated into a publicly accessible, web-hosted database. The researchers also developed graphical tools to analyze, filter and visualize the data. The project, including examples of uses, is featured in a resource article in this issue.
The database is intended to be an evolving project with researchers in the field invited to help expand the dataset and upload future data. To maintain format consistency, Jacobsson et al. established a protocol for reporting new data. This should overcome the challenges of disseminating data using different repositories.
The Perovskite Database project certainly has the potential to become a key resource in the field of photovoltaic perovskite. It provides a comprehensive overview of the state of the field, helping researchers identify knowledge gaps, perform meta-analyses, design new experiments, and more. Other areas of energy research could benefit from similar initiatives. To facilitate this, Jacobsson and his colleagues have made the code behind the data analysis tool open-source.
The researchers developed the database with a view to implementing automated machine learning tools. There are, however, a few aspects that need to be considered in data sharing when it comes to this type of application.
For example, as mentioned in previous articles3.4 and by Marina Leite in her News & Views, access to the results of failed experiments is as important as access to successful experiments for training machine learning algorithms. However, these data are generally not available.
The data must also be machine-accessible so that computers can use it autonomously. At the time of publication, the Perovskite database and interactive tools are hosted on Materials Zone – a web-based platform for materials science data management – which provides access to the resources on demand. This could partially limit future interoperability of the database, as users must request access before it is machine-readable.
This feature of the project sparked discussion within the community. As a result, Jacobsson and his colleagues are partnering with other researchers to work on expanding access to the database and its tools.
Making data accessible to machines is a steep learning curve. It is very encouraging to see researchers collaborating to overcome such obstacles with efforts like the Perovskite Database. We are sure that a similar constructive approach will help identify and resolve other limitations in the future. The information gained from the process will provide others with useful guidelines for setting up their own database projects.
Wilkinson, M. et al. Science. Data 3160018 (2016).
Nat. Mater. 141073 (2015).
Khenkin, MV et al. Nat. Energy 535–49 (2020).
Howard, JM, Tennyson, EM, Neves, BRA, and Leite, MS Joule 3325–337 (2018).
Chen, S. et al. Adv. Energy material. 81701543 (2018).
Odabaşi, Ç. & Yildirim, R. Nano energy 56770–791 (2019).
About this article
Quote this article
national energy seven, 1 (2022). https://doi.org/10.1038/s41560-022-00980-4