How a lung gene is linked to post-COVID symptoms as per genetics study 

Nikesh Vaishnav
Disclosure: This website may contain affiliate links, which means I may earn a commission if you click on the link and make a purchase. I only recommend products or services that I personally use and believe will add value to my readers. Your support is appreciated!

More than four years since the COVID-19 pandemic began, the disease remains a global health concern — not because of new surges but because of what persists. Long COVID, or technically post-acute sequelae of SARS-CoV-2 infection (PASC), refers to symptoms that continue for weeks or months after the initial illness clears. These include fatigue, breathing problems, and cognitive issues. The World Health Organization defines long COVID as symptoms that begin within three months of infection and last at least two months without another explanation.

Why some people develop long COVID while others recover quickly remains unclear. A recent genome-wide association study published in Nature Genetics analysed genetic data from six major global ancestries to investigate whether inherited differences play a role.

A diverse study

The study, conducted under the COVID-19 Host Genetics Initiative at the Germans Trias i Pujol research institute in Spain, used a Genome-Wide Association Study (GWAS) to identify genetic risk factors for long COVID. GWAS scans the genome for small ‘spelling mistakes’— also known as single-nucleotide polymorphisms — in the DNA sequence that appear more often in people with a condition than in those without. This method has helped uncover links to many complex and chronic disorders.

The analysis used data from 33 groups across 19 countries, making it one of the largest efforts to date in this area. The researchers first analysed data from 6,450 long COVID cases and over one million population controls. In this discovery phase, they identified a genetic signal near the FOXP4 gene. This signal was then tested in a separate replication cohort of more than 9,500 cases and nearly 8,00,000 controls, and the association was confirmed.

The researchers applied two definitions of long COVID: a strict one requiring test-confirmed infection and ongoing symptoms, and a broader one that included self-reported or clinical diagnoses. Controls were also defined strictly (infected but recovered) or broadly (general population without long COVID). This helped the team test whether its results held up across different clinical definitions.

Gene linked to long COVID risk

The analysis found a strong association between long COVID and a region on chromosome 6, near the FOXP4 gene. A specific variant in the region, called rs9367106, increased the risk of developing long COVID. People with the “C” version of this variant were about 63% more likely to have long COVID symptoms than those without it.

Notably, FOXP4 increased long COVID risk even in people who weren’t hospitalised, suggesting its effect is not tied solely to the severity of the initial infection. The variant’s frequency also varied across populations. It appeared in about 1.6% of non-Finnish Europeans but up to 36% of East Asians. Because it was more common in some groups, its effects were easier to detect, even in smaller samples.

This highlights why genetic studies that include diverse populations are more reliable and globally relevant.

From lungs to immunity

To understand the connection between FOXP4 and long COVID, the researchers examined how active this gene was in different tissues and cell types and how its activity related to the condition.

The authors noted that the variant lies in a stretch of DNA that is especially “active” in lung tissue, suggesting it may affect how lungs function. Using GTEx, a large gene activity database, they found that a nearby variant (rs12660421), often inherited with rs9367106, was linked to higher levels of FOXP4 expression in the lung. This made it more likely that the gene influences how the lungs respond to infection and injury.

Going further, the researchers checked which lung cells produced FOXP4 most strongly. They found high activity in type 2 alveolar cells, key players in keeping air sacs open, clearing fluids, and repairing tissue damage. These cells also help coordinate the immune response to respiratory viruses like SARS-CoV-2. The same genetic region has also been associated with lung cancer in earlier research, suggesting that FOXP4 may influence multiple lung-related conditions via shared biological pathways.

To test whether FOXP4 activity — and not just the genetic variant — might be linked to long COVID, researchers analysed blood samples from people who had recovered from the initial phase of infection. They found that individuals with moderately higher levels of FOXP4 had more than twice the odds of developing long COVID. This association persisted even outside the acute illness phase, suggesting a longer-term role for the gene.

Finally, a technique called co-localisation analysis showed a 91% probability that the same genetic signal affects both FOXP4 activity and long COVID risk, reinforcing the gene’s biological importance.

India’s genomic gaps

The study has important implications for India, given its large population, genetic diversity, and significant COVID-19 burden. Multiple waves of infection and unequal access to care mean many Indians may continue to face lasting symptoms, often undiagnosed or untreated due to limited awareness and clinical follow-up.

Indian studies suggest a wide range in long COVID prevalence: from 45% to nearly 80% depending on design, follow-up, and illness severity. One multicentre study across Hyderabad, Vellore, Mumbai, and Thiruvalla found that 16.5% of hospitalised patients self-reported symptoms like fatigue and breathlessness even a year after discharge.

Although the GWAS included participants from six ancestry groups, the authors said most datasets were of European origin. South Asian representation was limited or unclear. This is a broader issue across GWAS in general, many of which have focused on European populations. Thus, it remains uncertain how frequently the FOXP4 variant occurs in the Indian population or whether its effects are similar in local contexts, particularly given region-specific factors such as air pollution, metabolic risk, and healthcare variability.

India’s growing genomic infrastructure is beginning to close foundational data gaps. The GenomeIndia Project has released genomic data on 10,000 individuals from diverse Indian populations. While the project is not focused on disease mapping, it provides a foundational catalogue of genetic variation across populations. This reference can support future studies, such as an India-specific GWAS on long COVID, thus building confidence in translating findings into clinical or diagnostic settings in local contexts.

Some limitations

This large-scale international study identifies FOXP4 as a genetic factor linked to long COVID, offering a new clue as to why some individuals experience prolonged symptoms after a SARS-CoV-2 infection.

However, the authors also note several limitations.

Most data were collected before widespread vaccination and the emergence of newer variants like Omicron, making it unclear if the findings apply to all populations today. They also caution that evolving definitions of long COVID may have led to misclassification in some cohorts.

Additionally, the overall genetic contribution to long COVID appears modest, suggesting that other factors, including immunity and pre-existing conditions, also play key roles.

As India continues to address the long-term effects of the pandemic, studies like this highlight the importance of including diverse populations in genetic research.

Such efforts can improve public health responses and help tailor care for those living with long COVID.

Anirban Mukhopadhyay is a geneticist by training and science communicator from Delhi.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *