Genomic time machine: From sponge microbiome, insights into evolutionary past

Hexbyte Glen Cove

The red barrel sponge, Xestospongia muta — pictured here during collection trip in belize — harbor a dense and diverse microbial community that evolved repeatedly during sponge evolution and is linked to increased predation defense. Credit: Sabrina Pankey.

Sponges in coral reefs, less flashy than their coral neighbors but important to the overall health of reefs, are among the earliest animals on the planet. New research from UNH peers into coral reef ecosystems with a novel approach to understanding the complex evolution of sponges and the microbes that live in symbiosis with them. With this “genomic time machine,” researchers can predict aspects of reef and ocean ecosystems through hundreds of millions of years of dramatic evolutionary change.

“This study shows how microbiomes have evolved in a group of organisms over 700 million years old,” says Sabrina Pankey, a postdoctoral researcher at UNH and lead author of the study, published recently in the journal Nature Ecology & Evolution. “Sponges are increasing in abundance on reefs in response to and they play an enormous role in and nutrient fixation.”

The significance of the work transcends , though, providing a new approach to understanding the past based on genomics. “If we can reconstruct the evolutionary history of complex microbial communities like this we can say a lot about the Earth’s past,” says study co-author David Plachetzki, associate professor of molecular, cellular and biomedical sciences at UNH. “Research like this could reveal aspects of the chemical composition of the Earth’s oceans going back to before modern even existed, or it could provide insights on the tumult that experienced in the aftermath of the greatest extinction in history that took place about 252 million years ago.”

The researchers characterized almost 100 sponge species from across the Caribbean using a machine-learning method to model the identity and abundance of every member of the sponges’ unique microbiomes, the community of microbes and bacteria that live within them in symbiosis. They found two distinct microbiome compositions that led to different strategies sponges used for feeding (sponges capture nutrients by pumping water through their bodies) and protecting themselves against predators—even among species that grew side by side on a .

“The types of symbiotic communities we describe in this paper are very complex, yet we can show they evolved independently multiple times,” says Plachetzki.

And, adds Pankey, “there’s something very specific about what these microbial communities are doing … sponges dozens of times have decided that this diverse arrangement of microbes works for them.”

Leveraging this new genomic approach, the researchers found that the origin of one of these distinct microbiomes, which had a high microbial abundance (HMA) of more than a billion microbes per gram of tissue, occurred at a time when the Earth’s oceans underwent a significant change in biogeochemistry coincident with the origins of modern coral reefs.

While machine learning and genomic sequencing generated the findings Plachetzki calls “a tour de force of microbial barcode sequencing,” this research began far from the lab, in the warm waters of the Caribbean.

“We dove for all 1,400 of these samples,” says Pankey, who went on five expeditions in 2017 and 2018 to collect sponges. “It was a monstrous collection,” she adds, acknowledging that SCUBA diving in the Caribbean has its rewards. The duo credits co-author Michael Lesser, UNH research professor emeritus, for establishing field work techniques, and their co-authors from the University of Mississippi and the Universidad Nacional del Comahue in Argentina for assisting with sponge collection and molecular identification. Former graduate student Keir Macartney also contributed to the study.



More information:
M. Sabrina Pankey et al, Cophylogeny and convergence shape holobiont evolution in sponge–microbe symbioses, Nature Ecology & Evolution (2022). DOI: 10.1038/s41559-022-01712-3

Citation:
Genomic time machine: From sponge microbiome, insights into evolutionary past (2022, April 14)
retrieved 14 April 2022
from https://phys.org/news/2022-04-genomic-machine-sponge-microbiome-insights.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or

% %item_read_more_button%% Hexbyte Glen Cove Educational Blog Repost With Backlinks — #metaverse #vr #ar #wordpress

Hexbyte Glen Cove Scientists use machine learning to predict smells based on brain activity in worms

Hexbyte Glen Cove

The neurons of the worm function differently when tasting salt. Each circle represents a neuron, and the connections between the circles are synapses. The scientists used graph theory to group some neurons into modules, which are identified by their colors. The number of modules was reduced to 5 (from 7) when the salt stimulus was presented to the worm. This signifies that these neurons are particularly important when the animal tastes salt. Credit: Salk Institute

It sounds like a party trick: scientists can now look at the brain activity of a tiny worm and tell you which chemical the animal smelled a few seconds before. But the findings of a new study, led by Salk Associate Professor Sreekanth Chalasani, are more than just a novelty; they help the scientists better understand how the brain functions and integrates information.

“We found some unexpected things when we started looking at the effect of these sensory stimuli on and connections within the worms’ brains,” says Chalasani, member of the Molecular Neurobiology Laboratory and senior author of the new work, published in the journal PLOS Computational Biology on November 9, 2021.

Chalasani is interested in how, at a , the processes information from the outside world. Researchers can’t simultaneously track the activity of each of the 86 billion in a living human—but they can do this in the microscopic worm Caenorhabditis elegans, which has only 302 . Chalasani explains that in a simple animal like C. elegans, researchers can monitor individual neurons as the animal is carrying out actions. That level of resolution is not currently possible in humans or even mice.

Chalasani’s team set out to study how C. elegans neurons react to smelling each of five different chemicals: benzaldehyde, diacetyl, isoamyl alcohol, 2-nonanone, and sodium chloride. Previous studies have shown that C. elegans can differentiate these chemicals, which, to humans, smell roughly like almond, buttered popcorn, banana, cheese, and salt. And while researchers know the identities of the small handful of sensory neurons that directly sense these stimuli, Chalasani’s group was more interested in how the rest of the brain reacts.

The researchers engineered C. elegans so that each of their 302 neurons contained a fluorescent sensor that would light up when the neuron was active. Then, they watched under a microscope as they exposed 48 different worms to repeated bursts of the five chemicals. On average, 50 or 60 neurons activated in response to each .

By looking at basic properties of the datasets—such as how many cells were active at each time point—Chalasani and his colleagues couldn’t immediately differentiate between the different chemicals. So, they turned to a mathematical approach called , which analyzes the collective interactions between pairs of cells: When one cell is activated, how does the activity of other cells change in response?

This approach revealed that whenever C. elegans was exposed to sodium chloride (salt), there was first a burst of activity in one set of neurons—likely the sensory neurons—but then about 30 second later, triplets of other neurons began to strongly coordinate their activities. These same distinct triplets weren’t seen after the other stimuli, letting the researchers accurately identify—based only on the brain patterns—when a worm had been exposed to salt.

C. elegans seems to have attached a high value to sensing salt, using a completely different circuit configuration in the brain to respond,” says Chalasani. “This might be because salt often represents bacteria, which is food for the worm.”

The researchers next used a machine-learning algorithm to pinpoint other, more subtle, differences in how the brain responded to each of the five chemicals. The algorithm was able to learn to differentiate the neural response to salt and benzaldehyde but often confused the other three chemicals.

“Whatever analysis we’ve done, it’s a start but we’re still only getting a partial answer as to how the brain discriminates these things,” says Chalasani.

Still, he points out that the way the team approached the study—looking at the brain’s network-wide response to a stimulus, and applying graph theory, rather than just focusing on a small set of sensory neurons and whether they’re activated—paves the way toward more complex and holistic studies of how brains react to stimuli.

The researchers’ ultimate goal, of course, isn’t to read the minds of microscopic worms, but to gain a deeper understanding of how humans encode information in the brain and what happens when this goes awry in sensory processing disorders and related conditions like anxiety, attention deficit hyperactivity disorders (ADHD), autism spectrum disorders and others.

The other authors of the new study were Saket Navlakha of Cold Spring Harbor Laboratory and Javier How of UC San Diego.



More information:
Javier J. How et al, Neural network features distinguish chemosensory stimuli in Caenorhabditis elegans, PLOS Computational Biology (2021). DOI: 10.1371/journal.pcbi.1009591

Citation:
Scientists use machine learning to predict smells based on brain activity in worms (2021, November 19)
retrieved 21 November 2021
from https://phys.org/news/2021-11-scientists-machine-based-brain-worms.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the w

Read More Hexbyte Glen Cove Educational Blog Repost With Backlinks —

Hexbyte Glen Cove Machine learning IDs mammal species with the potential to spread SARS-CoV-2

Hexbyte Glen Cove

Credit: Pixabay/CC0 Public Domain

Back and forth transmission of SARS-CoV-2 between people and other mammals increases the risk of new variants and threatens efforts to control COVID-19. A new study, published today in Proceedings of the Royal Society B, used a novel modelling approach to predict the zoonotic capacity of 5,400 mammal species, extending predictive capacity by an order of magnitude. Of the high risk species flagged, many live near people and in COVID-19 hotspots.

A major bottleneck to predicting high-risk is limited data on ACE2, the that SARS-CoV-2 binds to in animals. ACE2 allows SARS-CoV-2 to enter host cells, and is found in all major vertebrate groups. It is likely that all vertebrates have ACE2 receptors, but sequences were only available for 326 .

To overcome this obstacle, the team developed a that combined data on the biological traits of 5,400 mammal species with available data on ACE2. The goal: to identify mammal species with high ‘zoonotic capacity’ – the ability to become infected with SARS-CoV-2 and transmit it to other animals and people. The method they developed could help extend predictive capacity for disease systems beyond COVID-19.

Co-lead author Ilya Fischhoff, a postdoctoral associate at Cary Institute of Ecosystem Studies, comments, “SARS-CoV-2, the virus that causes COVID-19, originated in an animal before making the jump to people. Now, people have caused spillback infections in a variety of mammals, including those kept in farms, zoos, and even our homes. Knowing which mammals are capable of re-infecting us is vital to preventing spillback infections and dangerous new variants.”

When a virus passes from people to animals and back to people it is called secondary spillover. This phenomenon can accelerate new variants establishing in humans that are more virulent and less responsive to vaccines. Secondary spillover of SARS-CoV-2 has already been reported among farmed mink in Denmark and the Netherlands, where it has led to at least one new SARS-CoV-2 variant.

Senior author and Cary Institute disease ecologist, Barbara Han, says, “Secondary spillover allows SARS-CoV-2 established in new hosts to transmit potentially more infectious strains to people. Identifying mammal species that are efficient at transmitting SARS-CoV-2 is an important step in guiding surveillance and preventing the virus from continually circulating between people and other animals, making disease control even more costly and difficult.”

Binding to ACE2 receptors is not always enough to facilitate SARS-CoV-2 viral replication, shedding, and onward transmission. The team trained their models on a conservative binding strength threshold informed by published ACE2 amino acid sequences of vertebrates, analyzed using a software tool called HADDOCK (High Ambiguity Driven protein-protein DOCKing). This software scored each species on predicted binding strength; stronger binding likely promotes successful infection and viral shedding.

Co-lead author and Cary Institute postdoctoral analyst, Adrian Castellanos, says, “The ACE2 receptor performs important functions and is common among vertebrates. It’s likely that it evolved in animals alongside other ecological and life history traits. By comparing biological traits of species known to have the ACE2 receptor with traits of other mammal species, we can make predictions about their capacity to transmit SARS-CoV-2.”

This combined modeling approach predicted zoonotic capacity of mammal species known to transmit with 72% accuracy and identified numerous additional mammal species with the potential to transmit SARS-CoV-2. Predictions matched observed results for white-tailed deer, mink, raccoon dogs, snow leopard, and others. The model found that the riskiest mammal species were often those that live in disturbed landscapes and in close proximity to people—including domestic animals, livestock, and animals that are traded and hunted.

The top 10% of high-risk species spanned 13 orders. Primates were predicted to have the highest zoonotic capacity and strongest viral binding among mammal groups. Water buffalo, bred for dairy and farming, had the highest risk among livestock. The model also predicted high zoonotic potential among live-traded mammals, including macaques, Asiatic black bears, jaguars, and pangolins—highlighting the risks posed by live markets and wildlife trade.

SARS-CoV-2 also presents challenges for wildlife conservation. Infection has already been confirmed in Western lowland gorillas. For high-risk charismatic species like mountain gorillas, spillback infection could occur through ecotourism. Grizzly bears, polar bears, and wolves, all in the 90th percentile for predicted zoonotic capacity, are frequently handled by biologists for research and management.

Han explains, “Our model is the only one that has been able to make risk predictions across nearly all mammal species. Every time we hear about a new species being found SARS-CoV-2 positive, we revisit our list and find they are ranked high. Snow leopards had a risk score around the 80th percentile. We now know they are one of the wildlife species that could die from COVID-19.”

People working in close proximity with high-risk mammals should take extra precautions to prevent SARS-CoV-2 spread. This includes prioritizing vaccinations among veterinarians, zookeepers, livestock handlers, and other people in regular contact with . Findings can also guide targeted vaccination strategies for at-risk mammals.

Han concludes, “We found that the riskiest species are often the ones that live alongside us. Targeting these species for additional lab validation and field surveillance is critical. We should also explore underutilized data sources like natural history collections, to fill data gaps about animal and pathogen traits. More efficient iteration between computational predictions, lab analysis, and animal surveillance will help us better understand what enables spillover, spillback, and secondary transmission—insight that is needed to guide zoonotic pandemic response now and in the future.”



More information:
Predicting the zoonotic capacity of mammals to transmit SARS-CoV-2, Proceedings of the Royal Society B: Biological Sciences (2021). DOI: 10.1098/rspb.2021.1651. rspb.royalsocietypublishing.or … .1098/rspb.2021.1651

Read More Hexbyte Glen Cove Educational Blog Repost With Backlinks —

Hexbyte Glen Cove Is your machine learning training set biased? How to develop new drugs based on merged datasets thumbnail

Hexbyte Glen Cove Is your machine learning training set biased? How to develop new drugs based on merged datasets

Hexbyte Glen Cove

The authors combined proprietary (GSK) and published (CCDC) datasets to better train machine learning (ML) models for drug discovery. Credit: Alex Moldovan.

Polymorphs are molecules that have different molecular packing arrangements despite identical chemical compositions. In a recent paper, researchers at GlaxoSmithKline (GSK) and the Cambridge Crystallographic Data Centre (CCDC) combined their proprietary (GSK) and published (CCDC) datasets to better train machine learning (ML) models to predict stable polymorphs to use in new drug candidates.

What are the key differences between the CCDC and GSK datasets?

CCDC curates and maintains the Cambridge Structural Database (CSD). For the past century, scientists all over the world have contributed published, experimental crystal structures to the CSD, which now has over 1.1 million structures. The paper’s authors used a drug subset from the CSD combined with structures from GSK. The GSK structures were collected at different stages of the pharmaceutical pipeline and are not limited to marketed products. Co-author Dr. Jason Cole, senior research fellow on CCDC’s research and development team, explained why structures gathered at different stages of the drug discovery pipeline are so important.

“In early-stage drug discovery, a crystal can help to rationalize conformational effects, for example, or characterize the chemistry of a new chemical entity where other techniques have led to ambiguity,” Cole said. “Later in the process, when a new chemical entity is studied as a candidate molecule, crystal structures are critical as they inform form selection and can later aid in overcoming formulation and tabletting issues.”

This information can help researchers prioritize their efforts—saving time and potentially lives down the road.

“By understanding a range of crystal structures, scientists can also assess the risk of a given form being long-term unstable,” Cole said. “A full characterization of the structural landscape leads to confidence in taking a form forward.”

How do ML models in pharmaceutical science benefit from multiple datasets?

Industrial data sets reflect more than just science; they reflect cultural choices within a given organization.

“You will only find co-crystals if you look for co-crystals,” Cole said, as an example. “Most companies prefer to formulate a free, or unbound, drug. One can assume that the types of structures in an industrial set reflect conscious decisions to search for forms of given types, whereas fewer bounds are placed on the researchers who contribute to the CSD.”

ML models benefit from two key things: data volume and data specificity. That’s why coupling the volume and variety of data in the CSD with proprietary data sets is so helpful.

“Large amounts of data lead to more confident predictions,” Cole said. “Data that are most directly relevant to the problem lead to more accurate predictions. In the predictions that use CCDC software, we select a subset of the most relevant entries that is large enough to give confidence. The GSK set is bound to have highly relevant compounds to other compounds in their commercial portfolio. So the model-building software can use these.”

Industrial researchers working with highly relevant data can run into issues when they don’t have enough to generate confident models.

“Consider that CSD software typically picks around two thousand structures from the 1.1 million in the CSD,” Cole said. “The industrial set is tiny by comparison, but you could pick, say, 40 or 50 highly relevant structures. You’d have insufficient data to build a good model with that alone, but the added compounds from the CSD supplement the data set. In essence, by including the GSK and CSD sets we get the best of both worlds: all the highly relevant industrial structures and a set of quite relevant CSD structures together to build a high-quality model.”

Why do polymorphs present a risk to the pharmaceutical industry?

The different packing arrangements mean that one polymorph might be more suited for therapeutic delivery, while another form of the same compound might not. Researchers use databases to make knowledge-based predictions about whether a potential new drug is comprised of a good, stable form that manufacturers can make, store, and deliver in a therapeutic manner. The authors at GSK and CCDC completed a robust analysis of the small molecule crystal structures containing X-ray diffraction results from GSK and its heritage companies for the past 40 years. They then combined those results with a drug subset of structures from CCDC’s CSD, which contains over 1.1 million small-molecule organic and metal-organic crystal structures from researchers all over the world.



More information:
Leen N. Kalash et al, First global analysis of the GSK database of small molecule crystal structures, CrystEngComm (2021). DOI: 10.1039/D1CE00665G

Provided by
CCDC – Cambridge Crystallographic Data Centre

Citation:

Read More Hexbyte Glen Cove Educational Blog Repost With Backlinks —

Hexbyte Glen Cove Machine learning model doubles accuracy of global landslide 'nowcasts' thumbnail

Hexbyte Glen Cove Machine learning model doubles accuracy of global landslide ‘nowcasts’

Hexbyte Glen Cove

Image shows a map of potential landslide risk output by NASA’s Landslide Hazard Assessment Model (LHASA) in June 2021. Red indicates the highest risk and dark blue indicates the lowest risk. Credit: NASA

Every year, landslides—the movement of rock, soil, and debris down a slope—cause thousands of deaths, billions of dollars in damages, and disruptions to roads and power lines. Because terrain, characteristics of the rocks and soil, weather, and climate all contribute to landslide activity, accurately pinpointing areas most at risk of these hazards at any given time can be a challenge. Early warning systems are generally regional—based on region-specific data provided by ground sensors, field observations, and rainfall totals. But what if we could identify at-risk areas anywhere in the world at any time?

Enter NASA’s Global Landslide Hazard Assessment (LHASA) and mapping tool.

LHASA Version 2, released last month along with corresponding research, is a -based model that analyzes a collection of individual variables and satellite-derived datasets to produce customizable “nowcasts.” These timely and targeted nowcasts are estimates of potential activity in near-real time for each 1-square-kilometer area between the poles. The model factors in the slope of the land (higher slopes are more prone to landslides), distance to geologic faults, the makeup of rock, past and present rainfall, and satellite-derived soil moisture and snow mass data.

“The model processes all of this data and outputs a probabilistic estimate of landslide hazard in the form of an interactive map,” said Thomas Stanley, Universities Space Research Association scientist at NASA’s Goddard Space Flight Center in Greenbelt, Maryland, who led the research. “This is valuable because it provides a relative scale of landslide hazard, rather than just saying there is or is not landslide risk. Users can define their area of interest and adjust the categories and probability threshold to suit their needs.”

In order to “teach” the model, researchers input a table with all of the relevant landslide variables and many locations that have recorded landslides in the past. The machine learning algorithm takes the table and tests out different possible scenarios and outcomes, and when it finds the one that fits the data most accurately, it outputs a decision tree. It then identifies the errors in the decision tree and calculates another tree that fixes those errors. This process continues until the model has “learned” and improved 300 times.

“The result is that this version of the model is roughly twice as accurate as the first version of the model, making it the most accurate global nowcasting tool available,” said Stanley. “While the accuracy is highest—often 100%—for major landslide events triggered by tropical cyclones, it improved significantly across all inventories.”

Version 1, released in 2018, was not a machine learning model. It combined satellite precipitation data with a global landslide susceptibility map to produce its nowcasts. It made its predictions using one decision tree largely based on rainfall data from the preceding week and categorized each grid cell as low, moderate, or high risk.

This image shows a landslide “nowcast” for Nov. 18, 2020 during the passage of Hurricane Iota through Nicaragua and Honduras. Credit: NASA

“In this new version, we have 300 trees of better and better information compared with the first , which was based on just one decision tree,” Stanley said. “Version 2 also incorporates more variables than its predecessor, including soil moisture and snow mass data.”

Generally speaking, soil can only absorb so much water before becoming saturated, and combined with other conditions, posing a landslide risk. By incorporating soil moisture data, the model can discern how much water is already present in the soil and how much additional rainfall would push it past that threshold. Likewise, if the model knows the amount of snow present in a given area, it can factor in the additional water entering the soil as the snow melts. This data comes from the Soil Moisture Active Passive (SMAP) satellite, which is managed by NASA’s Jet Propulsion Laboratory in Southern California. It launched in 2015 and provides continuous coverage.

LHASA Version 2 also adds a new exposure feature that analyzes the distribution of roads and population in each grid cell to calculate the number of people or infrastructure exposed to landslide hazards. The exposure data is downloadable and has been integrated into the interactive map. Adding this type of information about exposed roads and populations vulnerable to landslides helps improve situational awareness and actions by stakeholders from international organizations to local officials.

Building on years of research and applications, LHASA Version 2 was tested by the NASA Disasters program and stakeholders in real-world situations leading up to its formal release. In November 2020, when hurricanes Eta and Iota struck Central America within a span of two weeks, researchers working with NASA’s Earth Applied Sciences Disasters program used LHASA Version 2 to generate maps of predicted landslide hazard for Guatemala and Honduras. The researchers overlaid the model with district-level population data so they could better assess the proximity between potential hazards and densely populated communities. Disasters program coordinators shared the information with national and international emergency response agencies to provide better insight of the hazards to personnel on the ground.

While it is a useful tool for planning and risk mitigation purposes, Stanley says the model is meant to be used with a global perspective in mind rather than as a local emergency warning system for any specific area. However, future research may expand that goal.

“We are working on incorporating a precipitation forecast into LHASA Version 2, and we hope it will provide further information for advanced planning and actions prior to major rainfall events,” said Stanley. One challenge, Stanley notes, is obtaining a long-enough archive of forecasted precipitation data from which the model can learn.

In the meantime, governments, relief agencies, emergency responders, and other stakeholders (as well as the general public) have access to a powerful risk assessment tool in LHASA Version 2.



Citation:
Machine learning model doubles accuracy of global landslide ‘nowcasts’ (2021, June 10)
retrieved 10 June 2021
from https://phys.org/news/2021-06-machine

Read More Hexbyte Glen Cove Educational Blog Repost With Backlinks —

Hexbyte Glen Cove New take on machine learning helps us 'scale up' phase transitions thumbnail

Hexbyte Glen Cove New take on machine learning helps us ‘scale up’ phase transitions

Hexbyte Glen Cove

A correlation configuration (top left) is reduced using a newly developed block-cluster transformation (top right). Both the original and reduced configurations have an improved estimator technique applied to give configuration pairs of different size (bottom row). Using these training pairs, a CNN can learn to convert small patterns to large ones, achieving a successful inverse RG transformation. Credit: Tokyo Metropolitan University

Researchers from Tokyo Metropolitan University have enhanced “super-resolution” machine learning techniques to study phase transitions. They identified key features of how large arrays of interacting particles behave at different temperatures by simulating tiny arrays before using a convolutional neural network to generate a good estimate of what a larger array would look like using correlation configurations. The massive saving in computational cost may realize unique ways of understanding how materials behave.

We are surrounded by different states or phases of matter, i.e. gases, liquids, and solids. The study of , how one phase transforms into another, lies at the heart of our understanding of matter in the universe, and remains a hot topic for physicists. In particular, the idea of universality, in which wildly different materials behave in similar ways thanks to a few shared features, is a powerful one. That’s why physicists study model systems, often simple grids of particles on an array that interact via simple rules. These models distill the essence of the common physics shared by materials and, amazingly, still exhibit many of the properties of real materials, like phase transitions. Due to their elegant simplicity, these rules can be encoded into simulations that tell us what materials look like under different conditions.

However, like all simulations, the trouble starts when we want to look at lots of particles at the same time. The computation time required becomes particularly prohibitive near phase transitions, where dynamics slows down, and the correlation length, a measure of how the state of one atom relates to the state of another some distance away, grows larger and larger. This is a real dilemma if we want to apply these findings to the real world: real materials generally always contain many more orders of magnitude of atoms and molecules than simulated matter.

That’s why a team led by Professors Yutaka Okabe and Hiroyuki Mori of Tokyo Metropolitan University, in collaboration with researchers in Shibaura Institute of Technology and Bioinformatics Institute of Singapore, have been studying how to reliably extrapolate smaller simulations to larger ones using a concept known as an inverse renormalization group (RG). The renormalization group is a fundamental concept in the understanding of phase transitions and led Wilson to be awarded the 1982 Nobel Prize in Physics. Recently, the field met a powerful ally in convolutional neural networks (CNN), the same machine learning tool helping computer vision identify objects and decipher handwriting. The idea would be to give an algorithm the state of a small array of particles and get it to estimate what a larger array would look like. There is a strong analogy to the idea of super-resolution images, where blocky, pixelated images are used to generate smoother images at a higher resolution.

Trends found from simulations of larger systems are faithfully reproduced by the trained CNNs for both Ising (left) and three-state Potts (right) models. (inset) Correct temperature rescaling is achieved using data at some arbitrary system size. Credit: Tokyo Metropolitan University

The team has been looking at how this is applied to spin models of matter, where particles interact with other nearby particles via the direction of their spins. Previous attempts have particularly struggled to apply this to systems at temperatures above a phase transition, where configurations tend to look more random. Now, instead of using spin configurations i.e. simple snapshots of which direction the particle spins are pointing, they considered correlation configurations, where each particle is characterized by how similar its own spin is to that of other particles, specifically those which are very far away. It turns out correlation configurations contain more subtle queues about how particles are arranged, particularly at higher temperatures.

Like all machine learning techniques, the key is to be able to generate a reliable training set. The team developed a new algorithm called the block-cluster transformation for correlation configurations to reduce these down to smaller patterns. Applying an improved estimator technique to both the original and reduced patterns, they had pairs of configurations of different size based on the same information. All that’s left is to train the CNN to convert the small patterns to larger ones.

The group considered two systems, the 2D Ising model and the three-state Potts model, both key benchmarks for studies of condensed matter. For both, they found that their CNN could use a simulation of a very small array of points to reproduce how a measure of the correlation g(T) changed across a phase transition point in much larger systems. Comparing with direct simulations of larger systems, the same trends were reproduced for both systems, combined with a simple temperature rescaling based on data at an arbitrary system size.

A successful implementation of inverse RG transformations promises to give scientists a glimpse of previously inaccessible system sizes, and help physicists understand the larger scale features of materials. The team now hopes to apply their method to other models which can map more complex features such as a continuous range of spins, as well as the study of quantum systems.



More information:
Kenta Shiina et al, Inverse renormalization group based on image super-resolution using deep convolutional networks, Scientific Reports (2021). DOI: 10.1038/s41598-021-88605-w

Provided by
Tokyo Metropolitan University

Citation:
New take on machine learning helps us ‘scale up’ phase transitions (2021, May 31)
retrieved 31 May 2021
from https://phys.org/news/2021-05-machine-scale-phase-transitions.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Read More Hexbyte Glen Cove Educational Blog Repost With Backlinks —