The Earlham Institute is developing new ways to analyse and manage large biological datasets. By making multiple measurements of crops through imagery, biochemical data and genetic data it is able to predict which genes need to be selected to increase yields or protect against crop disease.

Professor Neil Hall, Director of the Earlham Institute explains that the overarching aim is to take complex data and deliver something that is directly useful to breeders.
“A good example in our analysis that shows how local conditions impact the regulation of specific genes, which is important to understand how different varieties can be modified to improve their resilience to disease or adverse environmental conditions.”
Reducing disease risk
Researchers at the lab are also looking at how crop diseases (pathogens) interact with those found on wild species.
“Our theory, which is supported by preliminary data, is that wild sea beet provides a natural reservoir for pathogens of sugar beet. As wild plants have more genetic diversity than crop species they can support a more diverse population of pathogens. Therefore, diversification of pathogens in the wild can generate new strains that can invade crop species.
“Having an awareness of this will help improve crop protection. For example, the wild species may have genetic traits that make them more resilient to these new disease risks.”
“So, understanding population genetics of pathogens is very important as it helps us to predict outbreaks and also to monitor the spread and emergence of important traits such as drug resistance, pesticide resistance and virulence.”
“A great example of this outside agriculture is coronavirus, understanding parameters such as population size, mutation rate and reproduction rates are vital in predicting the emergence of new variants and we have seen how these variants have influenced the dynamics of the pandemic and the concerns around immune evasion. Genomics of coronavirus has played a vital role in understanding how the pathogen is adapting.
“Likewise, in agriculture, understanding population sizes, ecological niches and host adaption will be important to understand how pathogens may adapt in the future.”
New tools for disease management
Professor Hall says that the knowledge emerging from the Earlham Institute will give us tools to improve disease management.
“If we were able to sequence all of the fungal effector genes in a population and see how they interact with resistance genes, in the future we could perhaps make predictions about what strains would dominate in future seasons. This would provide indications of how they are distributed and therefore where to focus crop protection interventions and what crops to breed for future seasons based on the dynamics of pathogens in the wild.”
Making AI possible
This type of big population work depends on the sharing of data and this is based on the principle of FAIR.
“Fair means Findable, Accessible, Interoperable and Reusable. It’s more specific than saying ‘open data’ which simply means is the data freely available. FAIR is about good data management, to make it findable it has to have the correct metadata attached to it. To make it accessible requires authentication. For it to be reusable, it must be in a format that allows common tools to read it and the metadata should be in a form that enable the reuse of the data in a meaningful way.
“FAIR data is what will make the use of AI possible and it enables computational tools to understand data, how it was generated and what it represents.
This is now an established principle in science now certainly among data specialists but we are trying to socialise the idea so it becomes the norm. Without FAIR data – data is rendered worthless.”
Latest advances in data management
Professor Neil Hall will join Antony Yousefian, Agri-Tech Director at Bardsley England, Matthew Guinness, Head of Sustainability at Hummingbird Technologies, Jon Kemp, CEO of Livetrace, and Derek Thompson, CEO of Consus Fresh, at the Agri-TechE event: ‘Data Management – More Than A Numbers Game’ on Tuesday 13th April at 2:00 pm – 5:00 pm
Notes
The Earlham Institute hosts the UK node of a European infrastructure call ELIXIR www.elixir.org.
Recent breakthroughs
https://www.earlham.ac.uk/newsroom/epic-genetic-hidden-story-wheat