You are currently viewing A Machine-Learning Model Can Help Reunite Long-Separated Families
image

Around the world, millions of families have suffered forcible separation, through war, trafficking, natural disasters, or socioeconomic crises. In China, family separation is a particularly large-scale and far-reaching problem. Following the enactment of country’s One Child Policy in 1979, many children were abandoned or trafficked and then adopted either domestically or internationally.

Reuniting children taken from their parents is a logistical challenge. China has established a DNA biobank dedicated to facilitating family reunification, but many victims are reluctant to submit DNA samples for both accessibility and privacy reasons. This reluctance is especially pronounced among child victims who grew up in adoptive families and often hold mixed feelings toward their birth parents. (The biobank is also focused on domestic victims of family separation, limiting its usefulness for cases involving international adoption.) To fill the gap, parents and children—who are usually well into adulthood when they begin to search for their birth families—have turned to online platforms where they can post memories and relevant details. One of the largest, Baby Come Home, has received more than 110,000 requests from families and resulted in some 6,000 reunions.

But even with the help of this platform, matching children and parents based on the information they post is a needle-in-a-haystack proposition. It takes laborious effort by users and volunteers to narrow down a plausible list of parent-child matches from the many thousands of posts on Baby Come Home.

A machine-learning system developed by Huifeng Su, a Yale SOM PhD candidate in operations, is helping to improve that process by identifying smaller sets of likelier parent-child pairs for review. The system is described in a recent working paper co-authored with Professors Lesley Meng and Edieal J. Pinker.

“I think there are two main contributions here,” Pinker says of the research. “One is a technical contribution in terms of developing a methodology tailored to this specific humanitarian challenge, and the other is the potential impact a tool based on this work could have on helping family reunification.”

Meng hopes the research will serve as a useful model for scholars thinking creatively about machine learning and what it can do. “It highlights a nice application area for a lot of these machine-learning methods that are now becoming very popular,” she says. “Hopefully, this will inspire others to use these tools and develop these tools to solve similar issues in different areas.”

As a first-year doctoral student, Su volunteered for a family reunification platform. That experience, as well as a fellow student’s research on volunteering, “gave me the idea that this could be a research topic,” he says. Around the same time, he noticed other researchers developing machine-learning solutions that leverage unstructured data, much like the posts on Baby Come Home, to address a variety of societal challenges.

“The information from the parent side is pretty much ground truth, because they remember things clearly,” Su explains. Children, by contrast, have limited memories and incomplete information about their early lives and birth families: “They collect information from their adopted parents, who collect information from traffickers or intermediaries. And [those people] have an incentive to manipulate the information about the child to make them more likely to be adopted.”

Perhaps the most important manipulation involves age. Families generally prefer to adopt younger children, so traffickers frequently present adoptable children as being slightly younger than they really are. As a result, a birth parent searching for a child taken at age five might actually be looking for someone who reported being adopted at age four. Even a one-year difference matters. On a platform with more than 100,000 posts, shifting the reported age by just a single year can dramatically expand the search space, making accurate matching far more challenging.

In essence, a parent and their child are describing the same story, but from two different perspectives. Su and his collaborators built a recommendation system designed for this humanitarian context by teaching it to distinguish between pairs of posts that led to confirmed reunions and pairs that were unrelated.

The system uses a language model to read thousands of posts on Baby Come Home and transform each story into a numerical representation that captures names, places, timelines, and even descriptive details. It can then compare any two stories in this encoded space and estimate how likely they are to belong to the same family.

To help the system understand what real matches look like, the team trained it on confirmed successful reunions. By observing both true matches and non-matches, the model learns subtle patterns of similarity that are difficult for humans to notice or systematically apply.

With this knowledge, the system can compare more than 100,000 post pairs in near real time and surface promising parent-child candidates for families to review, dramatically speeding up searches that once depended on painstaking manual effort.

Their system—which is small, free, and locally deployable—outperformed expert human volunteers and even surpassed commercial large language models from Google and OpenAI in identifying correct matches in this humanitarian context. For example, if a child posts a memory about swimming in a river at age 3, and a parent posts about their child swimming in a river at age 10, “for the general-purpose language model, [the two posts] will look very similar, because the only difference is age,” Su explains. “But in our context, age is arguably the most important feature in terms of excluding impossible matches. So our model is able to pick up this kind of expert knowledge during the training process, and much more.”

In fact, the extent of the difference caught him off-guard. “It’s quite surprising how poor the performance was for the general-purpose large language model,” Su says. This suggests that smaller transformer-based language models may continue to have an important role to play when it comes to “very specific tasks that require specific domain knowledge.”

Another important and unexpected logistical takeaway is that seeing credible potential matches on platforms like Baby Come Home can motivate individuals who are initially hesitant about DNA testing to engage with DNA-based verification. Su and his coauthors found that when users received potential match recommendations from the platform, more than 60% went on to have their DNA collected within the subsequent month. This suggests that providing better match recommendations—which Su’s system can help achieve—will motivate larger numbers of family separation victims to submit their DNA to the centralized biobank dedicated to family reunification in China, ultimately facilitating more reunions.

For parents and children alike, the search process can be a painful one, unearthing traumatic memories with no guarantee of success. “It takes time. There’s an emotional cost,” Pinker says. For that reason, many victims don’t even want to try. Pinker hopes the new system can begin to change that: “Once you have a method that can give you some sense of the likelihood of finding a match, then it may encourage you to participate more.”

The Yale School of Management is the graduate business school of Yale University, a private research university in New Haven, Connecticut.”

Please visit the firm link to site


Corporate and Taxation services in Cyprus by Totalserve Group >

Cloud, Data centre and Cybersecurity services by CL8 >

You can also contribute and send us your Article.


Interested in more? Learn below.