[Paper] Translation of paper on novel coronaviruses and genetics #2

[Paper] Translation of paper on novel coronaviruses and genetics #2

Genetic mechanisms of severe disease in novel coronaviruses.

Received September 27, 2020
Received November 30, 2020
Accelerated article preview published
Online publication date December 11, 2020
Please cite this article as: Pairo-Castineira, E. et al.
Genetic Mechanisms of Severe
Novel coronaviruses. Nature.

s41586-020-03065-y (2020).


Call for Cases

The 2,636 patients enrolled in the GenOMICC study ( had Covid-19 confirmed by local clinical trials and were considered by treating clinicians to require continuous cardiopulmonary monitoring. In the UK, this type of monitoring is performed on high-dependency wards or intensive care units. A further 135 patients were recruited via ISARIC4C ( and these patients were deemed to require hospitalization after local laboratory tests confirmed Covid-19. Both trials were approved by the appropriate research ethics committees (Scotland 15/SS/0110; England, Wales and Northern Ireland: 19/WM/0247). Current and previous versions of the study protocol are available at All participants gave informed consent.


DNA was extracted from whole blood using the Nucleon Kit (Cytiva) using the BACC3 protocol. DNA samples were resuspended in 1 ml of TE buffer pH 7.5 (10 mM Tris-Cl pH 7.5, 1 mM EDTA pH 8.0). DNA yield was measured using Qubit and normalized to 50 ng/µl prior to genotyping.
Genotyping was performed using the Illumina Global Screening Array v3.0+ multidisease bead chip (GSAMD-24v3-0-EA) and Infinium chemistry. In summary, this consisted of three steps: (1) whole genome amplification, (2) fragmentation and subsequent hybridization, and (3) single nucleotide extension and staining. For each sample, 4…l of DNA standardized to 50 ng/μl was used. .l of DNA was used. Each sample was interrogated on the array for 730,059 SNPs. Arrays were imaged on the Illumina iScan platform and genotyped using Genome Studio Analysis software v2.0.3, GSAMD-24v3-0-EA_20034606_ A1.bpm manifest and cluster files provided by the manufacturer. The genotypes were automatically called. In 1667 cases, genotypes and complementary variants were confirmed by Illumina NovaSeq 6000 whole genome sequencing. Samples were aligned to the human reference genome hg38, with variants (software v01., hardware v01.011.269) called to the GVCF stage on the DRAGEN pipeline at Genomics England. Mutants were filtered GATK GenotypeGVCFs tool v4.1.8.1,50 to a minimum depth of 8X (95% sensitivity for heterozygous mutant detection, 51) genotypes were integrated and annotated with allele frequency v1.10.2.

Quality Control

Genotype calls were carefully reviewed within GenomeStudio using manufacturer and 52 recommendations to exclude samples with low initial call rates (<90%) and then the data were reviewed. Briefly, all X and Y marker calls were visually examined and cured as needed, as were autosomal markers with minor allele frequencies >1% with low Gentrain scores, cluster segregation, and excessive or missing heterozygous calls. Sex determination based on genotype was performed in the genome studio and samples were excluded if they did not match the expectations of the records. Five individuals with the XXY genotype were also detected and excluded for downstream GWAS analysis. Genotypes were exported using GenotypeStudio plink-input-report-plugin-v2-1-4 with the Genome Reference Consortium Human Construct 37 (GRCHb37) and Illumina "source" strand orientation. A series of filtering steps were then applied using PLINK 1.9 to further analyze 2790 individuals and 479095 variants (selection of variants with call rates <95%, call rates >99% and minor allele frequencies (MAF) >1%, and call rate > 97%).


Kinship and ancestry inferences were calculated according to the UK Biobank49 and 1M veteran program.53 The analysis flagged 56 duplicate pairs, one of which was removed according to genotyping quality (GenomeStudio p50GC score or/and individual call rate). This left a set of 2734 unique individuals. Regions of high linkage disequilibrium (LD) as defined by UK Biobank 49 were excluded from the analysis, as were SNPs with MAF <1% or missingness >1%. using King 2.1, a relationship matrix was constructed up to third order using the King command --kinship --degree 3, The function biggest_independent_vertex_set() in the igraph tool was used to create the first set of unrelated individuals. Principal Component Analysis (PCA) was performed on the set of unrelated individuals with pruned SNPs using a 1000 marker window, 80 marker step size, and an r2 threshold of 0.1 with gcta 1.955. SNPs with large weights in PC1, PC2 or PC3 were included in the next deleted, retaining at least 2/3 of the number of pruned SNPs to be retained as input for the King 2.1 round; the second round of King 2.1 was performed with SNPs with low weights for PC1, PC2, and PC3 to avoid overestimating non-European kinship. After this round, 2718 individuals were considered to have no kinship up to the third degree.

Genetic Ancestry

Non-associated individuals from the 1000 Genomes Project dataset were calculated using the same procedure as above, and both datasets were merged using common SNPs. The merged genotyped data were pruned with plink using a 1000 marker window with a step size of 50 and r2 of 0.05, and the remaining 92K markers were used to calculate 20 first principal components using gcta 1.9. The ancestry of GenOMICC individuals was inferred using the ADMIXTURE56 population defined by 1000 genomes. If an individual had a greater than 80% probability of being related to one ancestor, it was assigned to a mixed lineage, as in the 1M veteran cohort; otherwise, it was assigned to a mixed lineage, as in the 1M veteran cohort. Following this criterion, there are 1818 individuals of European descent (EUR), 190 individuals of African descent (AFR), 158 individuals of East Asian descent (EAS), 254 individuals of South Asian descent (SAS), and 301 individuals of mixed descent (2 or more).


Genotype files were converted to plus strands and SNPs with Hardy-Weinberg Equilibrium (HWE) p-value<10-6 were deleted. Imputations were calculated using the TOPMed reference panel57 and results are given in GRCh38 human reference genome and plus strand. The imputed dataset was filtered for monogenicity and low imputation quality score (r2<0.4) using BCFtools 1.9. To perform GWAS, files in VCF format were further filtered with r2>0.9 and QCtools 1.3. 58 Imputed variants that overlapped with our variant set (n=5,981,137) with imputation score >0.9 from the UK Biobank were extracted and merged with GenOMICC data into a single BGEN file containing cases and controls using QCtools 1.3 merged into a single BGEN file containing cases and controls using QCtools 1.3.


Three degrees of association individuals were removed. Thirteen cases of American descent were excluded because they did not have sufficient power to perform a reliable GWAS on this population. The final data set consisted of 2244 individuals. Genetic ancestry was estimated using PCA for 1676 individuals of European descent, 149 individuals of East Asian descent, 237 individuals of South Asian descent, and 182 individuals of African descent (Extended Data Table 1). Where age or deprivation status was missing for some individuals, the value was set to the mean of that ancestor. GWAS were performed separately for each ancestry group. Tests of association between case-control status and allele abundance at individual SNPs were performed for each ethnic group by fitting a logistic regression model using PLINK.59 All models included sex, age, mean central age of 2 years, decile of deficiency score for residential zip code, and the first 10 genomic principal components as covariates.

Genomic principal components were calculated on the UK Biobank and on the combined sample of all GenOMICC participants. Specifically, 456,750 genetic variants were identified, which were shared between variants in genotypes referred to as included in the GenOMICC dataset and the complemented UK Biobank genotypes, with complementation information scores above 0.95 and minor allele frequencies above 1%. After genotype integration with these variants, variants with minor allele frequencies below 2.5% and loss rates above 1.5% were removed, showing deviations from Hardy-Weinberg equilibrium with p-values below 10~50 or within previously identified regions of linkage disequilibrium in the UK Biobank It was. Using the PLINK indep-pairwise command, the remaining variants were LD-branch pruned to a maximum of r2 0.01 based on a 1000 variant window moving in 50 variant steps to generate 13,782 SNPs, then the top 20 genomic principal components were calculated using FlashPCA 2.60 The first 20 genomic principal components were calculated using FlashPCA 2.60.

GWAS results for European lines were filtered for MAF > 0.01, HWE p-value > 10-50 and genotyping rate > 0.99. Additional filters were added to avoid bias using alternative genotyping methods and complementary panels of controls and cases. This could not be controlled for because all cases and all controls were genotyped using different methods and therefore regression analysis was used. Comparison of each ancestral MAF of gnomAD hg38 non-Finnish Europeans with the UK Biobank European controls downloaded in August 2020 showed (a) 5% between gnomAD and UK Biobank control MAFs for SNPs with MAF>10%gnomad absolute difference (b) for SNPs in gnomad with MAF<10%, 25% gnomAD MAF difference (difference) between UK Biobank controls and gnomAD. GWAS from non-European ancestry were filtered for MAFs between UK Biobank controls corresponding to the same ancestry >5% and the group of SNPs that passed QC in the European GWAS. To calculate the difference in gnomoAD allele frequencies between UK Biobank European individuals and gnomoAD allele frequencies of non-Finnish Europeans was used. This is because the European Biobank controls are not primarily Finnish. Filtered GWAS for each ancestor, containing a total of about 4.7M SNPs, were combined in a trans-ethnic meta-analysis using METAL62 standard error mode and control for population stratification (genomic control on). Closest genes were defined using FUMA v1.3.6 SNP2GENE function63 with LD R2 > 0.6 and UK Biobank release 2 reference panel. Sex-specific GWAS in European individuals were performed using 1180 unrelated male and 496 unrelated female cases and 5 UK Biobank random controls matched for sex and ancestry for each case. Tests of association between case-control status and allele dose at individual SNPs were performed by fitting logistic regression models using PLINK. Age, mean age squared, residential zip code deprivation deciles, and first 10 principal components were added as covariates in the model.

Deprivation Score

UK Data Services provides a measure of deprivation generated for each zip code based on census data. The most recent version of the deprivation scores were published in 2017 and are based on the 2011 census. Because only partial zip codes were available for most of the sample, these measures could not be used directly. However, an approximation of the scores was generated by calculating an average weighted by the number of people in the entire top-level ZIP code area. The first input file was a portion of the census tabulation data identified at Specifically, ZIP code data is available at downloaded from Ranks/ For each top-level zip code, the population count and deprivation score for each published zip code were extracted and a weighted average score was calculated. Additionally, for a more coarse-grained analysis, each top-level ZIP code score was classified into decile and quintile deciles.

Whole Genome Sequencing

Whole genome sequencing (WGS) gVCF files were obtained for 1667 individuals with whole genome sequence data. Mutants overlapping the position of the complemented mutant were called using GATk, and mutants with dep th<8X (the minimum depth at which 95% coverage could be expected) were filtered. Individual VCF files were combined into multi-sample VCF files and compared to the attributed mutants. 1613 of 1667 samples were used for the final GWAS. Samples were filtered and variants were annotated using bcftools 1.9. VCF files from the complementation were processed in the same manner. Allele frequencies were calculated using PLINK 2.064 for both WGS and attribution data.


British Biobank

UKBID participants were considered potential controls if they were not identified as outliers by the UKBID based on genotype deficiency rates or heterogeneity and if their genotype-assumed gender was not consistent with self-reported sex. For these individuals, information on gender (UKBID 31), age, ancestry, and residential zip code deprivation score deciles was calculated. Specifically, age was calculated based on the participant's birth month (UKBID 34) and year (UKBID 52) as age on April 1, 2020. The first part of the participant's residential zip code was calculated based on the participant's place of residence (UKBID 22702 and 22704) and mapped to the deprivation score decile, as previously described for GenOMICC participants. Prior studies were inferred as previously reported for GenOMICC participants. Based on information downloaded from the UK Biobank in August 2020, we sampled five controls with matching presumed ancestors for each GenOMICC participant after excluding participants who had undergone PCR testing for Covid- 19. After each control group was sampled, the third-degree related participants were excluded from the potential additional control group. Additional analyses with more exact matching for individual characteristics were also performed (Supplementary Information: Matched controls).

100,000 Genome Project

Following ethical approval (14/EE/1112 and 13/EE/032), consenting participants in the 100,000 Genomes Project were enrolled in 13 regional NHS Genomic Medicine Centers in England and Northern Ireland, Scotland and Wales and whole blood was collected for DNA extraction The project was conducted in the United Kingdom, France, Germany, Italy, Italy, Spain and Spain. After quality assurance, whole genome sequencing of 125 or 150 base pairs was performed by Illumina Laboratory Services using a Hiseq 2500 or Hiseq X sequencer from the Genomics England Sequencing Centre, followed by small mutations (single nucleotide variants and small indels) were detected using Starling. Tests for association between case-control status were performed by running mixed model association tests using SAIGE (v0.39). 1675 participants from the GenOMICC study and 45,875 unrelated participants with European ancestry were included. 100,000 GenOMICC study participants and 100,000 Genomic principal components were calculated for the combined dataset of whole genome sequence data from the Genome Project. Principal component analysis (PCA) was performed using GCTA software with approximately 30,000 SNPs selected after minor allele frequency >0.005 and LD pruning (r2 <0.1, window size 500 kb). Fitting of the null logistic mixed model was performed using the SNPs used for PCA and included age, sex, age squared, age × sex, and the first 20 genomic principal components as covariates. Tests of association using SAIGE were performed after filtering variants in the WGS dataset for genotype quality and minor allele frequency ≥ 0.05. GWAS-specific quality filtering was performed to include variants with minor allele counts >20 for each phenotype, differential frequencies missing between cases and controls (p-value <1 × 10-5), and departures from Hardy-Weinberg equilibrium (p-value <1 × 10-5).

Generation Scotland

Generation Scotland Scottish Family Health Study Generation Scotland ("Generation Scotland") is a population-based cohort of 24,084 participants sampled from five regional centers across Scotland65 . The majority of participants were genotyped using either the Illumina HumanOmniExpressExome-8v1_A or v1-2, with 20,032 passing the previously described QC criteria.66,67 Genotype imputation using the TOPMed reference panel was recently performed using Minimac4 v1.0 on the University of Michigan server https://imputationserver. (frozen 5b). Imputation data from 7689 unrelated (shared genomic identity by descent estimated using PLINK1.9 < 5%) participants were used as control genotypes for a GWAS using GenOMICC cases of European ancestry for the purpose of quality check of relevant variants The GWAS was run in a logistic regression framework implemented with the PLINK2 ( glm function, adjusting for age, sex, and the first 10 principal components of European ancestry. These coordinates were obtained from the projection into principal component space of a 1000 genome European population sample using KING v2.2.554, a LD pruned subset of target genotype markers that pass quality checks and intersect with the reference population.


The match of hits in the Discovery GWAS was verified by
Generation Scotland and controls from 100K. For a hit to be considered validated, the direction of the effect must be the same in all three GWAS and the p-value must be p<0.05/nvalidations in both Generation Scotland and 100K (where nvalidations is the discovery threshold of p < 5 × 10-8).


GenOMICC EUR loci were defined using the PLINK 1.964 aggregation function and aggregation parameters r2 = 0.1, pval = 5 × 10-8 and pval2 = 0.01, and distances to the nearest gene were calculated using ENSEMBL GRCh37 gene annotations. There are no GWAS reports of serious disease or death in Covid-19. As a surrogate, we performed a replication analysis using the Host Genetics Initiative build 37, version 2 (July 2020) B2 (hospitalized Covid-19 vs. population) GWAS to provide a partial replication of our findings. To avoid sample duplication, we included all cohorts and GWASs that did not include the UK Biobank and used summary statistics from the full analysis. The replication p-value was set to 6.25 × 10-4 (0.05/8, where 8 is the number of loci significant in the discovery).

Genome-wide Meta-analysis

Meta-analysis between GenOMICC, HGI and 23andMe was performed using fixed-effects inverse variance meta-analysis in METAL,62 and corrected for genomic controls. The 23andMe study consisted of cases and controls from the EUR genetic ancestry group. The HGI B2 analysis was a meta-analysis of trans ancestry, with the majority of cases being multi-ethnic European (EUR and FIN) and 238 cases from non-European ancestry (Ad6 cases, AMR, BRACOVID study, SAS South Asia62 cases, GNH study).

Post-GWAS analysis

TWAS and Meta-TWAS

MetaXcan framework23 and the downloadable GTEx v8 eQTL MASHR-M model ( Transcriptome-wide associations were performed using the To increase SNP coverage for conducting TWAS, the first GWAS summary statistics for the European lineage were used as a fizi69 completion function ( fizi), 1000-genome European population as LD reference, with 30% of a region's The SNPs were complemented using the lowest proportion of SNPs (-min-prop 0.3). The complemented GWAS results were then harmonized and lifted to hg38, and the GWAS tool . and linked to the 1000 Genomes Reference Panel using the Complemented and harmonized GWAS summary statistics were used to perform TWAS on whole blood (Supplementary Figure 16) and lung (Figure 2) GTEx v8 tissue with S-PrediXcan function. Obtained p-values were corrected using Bonferroni correction to find significant genetic associations. To overcome sample size limitations in GTEx v8 lung and whole blood tissues, we prioritized genes with small p-values in these tissues and used GTEx v8 gene expression in all tissues and S-Multixcan70.

Mendelian randomization

A two-sample summary database19 based on Mendel's laws was run using results from GenOMICC and the Genotype-Tissue Expression Project71 GTEx v7 (using SMR / HEIDI pre-prep data: /smr/#DataResource ), Generation Scotland65,72 forms the linkage disequilibrium reference. GenOMICC results from European ancestry results were used as results. Whole blood expression of GTEx (v7) occurs as an exposure. Additional data related to GTEx v7 were downloaded from GTEx: https: // (accessed February 20, 2020, April 5, 2020, and July 4, 2020) and SMR / HEIDI at https: // Downloaded from smr / (accessed July 3, 2020). Analyses were performed using Python 3.7.3 and SMR / HEIDI v1.03 (plots were created using SMR / HEIDI v0.711). LD references were generated using data from the population-based Generation Scotland cohort (used with permission, see above 67). It was generated from a random set of 5,000 individuals using Plinkv1.9 ( ), a set of individuals with a cutoff of genomic relatedness < 0.01 were extracted. 2,778 individuals remained in the final set. All data used for SMR / HEIDI analysis were restricted to autosomal 2 allele SNPs; 4,264,462 variants remained in the final merged data set. Important (by GTEx v7; nominal p-value below nominal p-value threshold) local (distance to transcription start site < 1Mb) protein coding genes (by GENCODE v19) eQTL from GTEx v7 whole blood, MAF > 0.01 (GTEx v7 and GenOMICC) were considered potential operating variables. For each variant, the most strongly associated Ensembl gene ID was selected first, followed by the variant to which each Ensembl gene ID was most strongly associated. Instruments for 4,614 unique Ensembl gene IDs were available. Results were evaluated based on the list of genes selected a priori as of interest (Supplementary Table 3) and together as a whole. An attempt was made to replicate the Covid-19-Host Genetics Initiative-https://www.covid19hgの結果で, Bonferroni corrected significant results. org/ -eQTLgen expression dataset and excluded UKBiobank (data release July 2, 2020). replication cohort because the Covid-19-paired population (ANA_B2_V2) among 20 inpatients was selected as the phenotype most similar to our own. To further validate the above analysis, generalized summary data Mendelian randomization (GSMR)73 was performed using exposure data available at (accessed October 26, 2020)20 and publicly available . GenOMICCEUR data available in TYK2 and IFNAR2 (Supplementary Figure 15). GSMR was performed using GCTA version 1.92.1 beta6Linux. Multifaceted expressed SNPs were filtered using the HEIDI outlier test (threshold = 0.01) and instrumental SNPs were selected at genome-wide significance levels (PeQTL< 5e-8) using LD clumping (LDr2 threshold = 0.05 and window size = 1Mb). Attributed genotypes of 50,000 unrelated individuals from the UK Biobank (SNP-derived genomic relatedness < 0.05, using HapMap 3 SNPs) were used as LD reference for aggregation. GSMR considers the remaining LDs not removed by LD clumping.

Genome Region Plot

Genomic region plots were generated using (Supplementary Figures 5 and 6).

Gene-level and pathway analysis

The burden of significance at the gene level in the EUR ancestry results was calculated using MAGMA v1.08 (Supplementary Fig. 17).74 SNPs were assigned to genes if their location was within 5 kb upstream or downstream of the gene region (defined as from transcription start site to transcription stop site). The MAGMA SNP-wise average method was applied and the sum of squared SNP Z-statistic was used as the test statistic. LD between SNPs was estimated using the 1000 Genomes Project European Reference Panel. Auxiliary file downloaded from magma on September 1, 2020. Gene position files for protein-coding genes are available from NCBI ( gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz on 29/04/2015, and genomes /Homo_sapiens/ARCHIVE/ANNOTATION_RELEASE.105/ mapview/ on 25/05/2016. The reference data file used to estimate LD is derived from Phase 3 of the 1000 Genomes Project. Regression models explaining gene-gene correlations were used to perform competitive gene set incremental analysis in MAGMA to reduce bias arising from clustering of functionally similar genes in the genome.74 Gene sets were derived from KEGG 2019, Reactome 2016, GO Biological Process 2018, Biocarta 2016 and WikiPathways 2019 databases were queried. The Benjamini-Hochberg method was used to control for false discovery rate (<0.05).

Meta-analysis by information content

To correlate these results with existing biological data on host genes in SARS-CoV-2 replication and response, a gene-level analysis (MAIC)24 of the GenOMICC metaTWAS was performed and host response in SARS-CoV-2 virus replication and Covid-19 The analysis was performed in conjunction with a systematic review of existing host factors involved in SARS-CoV-2 virus replication and host response in Covid-19.45 A meta-analysis by informatics (MAIC) was developed to evaluate and integrate gene-level data from diverse sources.24 Multiple in vitro and in vivo studies have demonstrated that SARS-CoV-2 is a direct interactant of SARS-CoV-2, or that SARS-CoV-2 is a direct interactant of SARS-CoV-2. interact with or define host responses to SARS-CoV-2, and key host genes were identified. A systematic review of these studies was previously performed.45 To put the new associations from this GWAS into context, the authors performed a data-driven meta-analysis of gene-level results combined with existing biological data using meta-analysis by information content (MAIC).24 In brief, MAIC aggregates both ranked and unranked lists and performed better than other methods, especially when presented with heterogeneous source data. The input to MAIC is a list of named genes. MAIC assigns a score to each gene according to the number of source data sets that reported that gene and creates a data-driven weighting for each data source (usually individual experiments) based on the scores of the genes ranked higher on the list. This procedure is repeated until the scores and weights converge to stable values. To ensure that no single type of experiment unduly biases the results, the input gene list is assigned to categories, and a rule is applied where only one weighting from each category can contribute to any given gene score.

Tissue/functional genomic enrichment

We downloaded summarized average gene expression data from RNA sequencing by the GTEx project ( GTEx v7 data includes gene expression for 19,791 genes in 48 human tissues. Gene expression values were normalized by the number of transcripts per million reads (TPM). To measure the expression specificity of each gene in each tissue, the expression specificity of each gene was defined as the percentage of its expression in each tissue among all tissues, a value ranging from 0 to 1. For functional genomic enrichment analysis, we considered the built-in primary functional annotation v2.2 provided in the ldsc software ( to annotate SNPs. Using the annotated SNPs, we tested whether human tissues or specific functional genomic features were associated with severe Covid-19 using stratified LD score regression (S-LDSC)75 . Our GWAS summary statistics were harmonized by the ldsc procedure. LD scores for HapMap3 SNPs (excluding MHC regions) for gene annotations in each tissue were calculated using a 1-cM window. Enrichment scores were defined as the proportion of heritability captured by annotated SNPs divided by the proportion of annotated SNPs.

Genetic Correlations

We applied both LD score regression (LDSC)76 and high-definition likelihood (HDL)25 methods to assess the genetic correlations between the 818 GWAS target phenotypes stored in the LD-Hub and Severe Covid-19.77 In the HDL analysis, for each phenotype, a SNP-based Narrow heritability was estimated, and for the GWAS of 818 composite traits, those with SNPs with less than 90% overlap with the HDL reference panel were removed.

Genome Architecture

Results are displayed using the Genome Reference Consortium Human Build37. Attributed genotype and whole genome sequence data were lifted over from the GenomeReference Lifted over from Consortium Human Build38 using PicardliftoverVCF mode in GATK4.0 based on the Consortium Human Build38 assembly_mapping / homo_sapiens / GRCh38_to_GRCh37.chain.gz.78

Report Summary

For more information on the study design, see the Nature Research report summary linked to this paper.

Data Availability

Complete summary-level data supporting the results of this study are available atから入手可能である. Individual level data will be provided by qualified researchers on the ISARIC4C/GenOMICC data analysis platform atでの適用により分析することができる. Complete GWAS summary statistics for the 23andMe Discovery Dataset are available to eligible researchers through 23andMe under an agreement with 23andMe that protects the privacy of 23andMe participants. For more information and to apply for access to the data, go to

  1. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. & DePristo, M.A. Genome analysis toolkit. A Mapreduce framework for analyzing next-generation DNA sequencing data. Genome research 20, 1297-303 (2010).
  2. Meynert, A.M., Ansari, M., FitzPatrick, D.R. & Taylor, M.S. Variant detection sensitivity and bias in whole genome and exome sequencing. BMC bioinformatics 15, 247(2014).
  3. Guo, Y., He, J., Zhao, S., Wu, H., Zhong, X., Sheng, Q., Samuels, D.C., Shyr, Y. & Long, J. Clustering and quality control of the Illumina human exome genotyping array. Nature protocols 9, 2643-62 (2014).
  4. Gaziano, J.M., Concato, J., Brophy, M., Fiore, L., Pyarajan, S., Breeling, J., Whitbourne, S., Deen, J., Shannon, C., Humphries, D., Guarino, P., Aslan, M., Anderson, D., LaFleur, R., Hammond, T., Schaa, K., Moser, J., Huang, G., Muralidhar, S. M., Anderson, D., LaFleur, R., Hammond, T., Schaa, K., Moser, J., Huang, G., Muralidhar, S., Przygodzki, R. & OLeary, T.J. The million veterans program. A mega-biobank for research on genetic influences on health and disease. Journal of clinical epidemiology 70, 214-23 (2016).
  5. Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M. & Chen, W.-M. Robust relational inference in genome-wide association studies Bioinformatics (Oxford, England) 26, 2867 -73 (2010).
  6. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: A tool for genome-wide complex trait analysis. American journal of human genetics 88, 76-82 (2011).
  7. Alexander, D.H. & Lange, K. Enhanced admixture algorithms for personal ancestry estimation BMC bioinformatics 12, 246 (2011).
  8. D, T., Dn, H., Md, K., J, C., Za, S., R, T., Sag, T., A, C., Sm, G., Hm, K., An, P., J, L., S, L., X, T., Bl, B., S, D., A, E., We, C., Dp, L., Ac, S., Tw, B., Q, W., Dk, A. Ae, A.-K., Kc, B., E, B., S, G., R, G., Km, R., Ss, R., E, S., P, Q., W, G., Gj, P., Da, N., Sr, B., Mc, Z., S, Z., Jg, W., La, C., Cc, L., Ce, J., Rd, H., Td, O. & Gr, A. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. (2019). Retrieved from
  9. Wigginton, J.E., Cutler, D.J. & Hardy-Weinberg A note on exact tests of equilibrium. American journal of human genetics 76, 887-93 (2005).
  10. Chang, C.C., Chow, C.C., Tellier, L.C., Vattikuti, S., Purcell, S.M. & Lee, J.J. Second generation plink. The challenge of larger, richer data sets. GigaScience 4, 7 (2015).
  11. Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: Principal component analysis of a biobank-scale genotype dataset Bioinformatics (Oxford, England) 33, 2776-2778 (2017).
  12. Karczewski, K.J., Francioli, L.C., Tiao, G., Cummings, B.B., Alföldi, J., Wang, Q., Collins, R.L., Laricchia, K.M., Ganna, A., Birnbaum, D.P., Gauthier, L.D., Brand, H., Solomonson, M., Watts, N.A., Rhodes, D., Singer-Berk, M., England, E.M., Seaby, E.G., Kosmicki, J.A., Walters, R.K., Tashman, K., Farjoun, Y., Banks, E., Poterba, T., Wang, A., Seed, C., Whiffin, N., Chong, J.X., Samocha, K.E., Pierce-Hoffman, E., Zappala, Z., ODonnell-Luria, A.H., Minikel, E.V., Weisburd, B., Lek, M., Ware, J.S., Vittal, C., Armean, I.M., Bergelson, L., Cibulskis, K., Connolly, K.M., Covarrubias, M., Donnelly, S., Ferriera, S., Gabriel, S., Gentry, J., Gupta, N., Jeandet, T., Kaplan, D., Llanwarne, C., Munshi, R., Novod, S., Petrillo, N., Roazen, D., Ruano-Rubio, V., Saltzman, A., Schleicher, M., Soto, J., Tibbetts, K., Tolonen, C., Wade, G., Talkowski, M.E., Neale, B.M., Daly, M.J. & MacArthur, D.G. Mutation constraint spectrum quantified from 141,456 human mutations. Nature 581, 434-443 (2020).
  13. Willer, C.J., Li, Y. & Abecasis, G.R. METAL: Fast and efficient meta-analysis of genome-wide association scans. Bioinformatics (Oxford, England) 26, 2190-1 (2010).
  14. Watanabe, K., Taskesen, E., Bochoven, A. van & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nature communications 8, 1826 (2017).
  15. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., Bakker, P.I.W. de, Daly, M.J. & Sham, P.C. PLINK: A toolset for whole genome association and population-based linkage analysis. American journal of human genetics 81, 559-75 (2007).
  16. Smith, B.H., Campbell, A., Linksted, P., Fitzpatrick, B., Jackson, C., Kerr, S.M., Deary, I.J., Macintyre, D.J., Campbell, H., McGilchrist, M. Hocking, L.J., Wisely, L., Ford, I., Lindsay, R.S., Morton, R., Palmer, C.N.A., Dominiczak, A.F., Porteous, D.J. & Morris, A.D. Cohort profile: the Generation scotland: Scottish family health study (gs:SFHS). Research on genetic studies of health and disease, their participants, and their potential. International journal of epidemiology 42, 689-700 (2013).
  17. Amador, C., Huffman, J., Trochet, H., Campbell, A., Porteous, D., Wilson, J.F., Hastie, N., Vitart, V., Hayward, C., Navarro, P. & Haley, C.S. Scotland's recent genomic heritage. BMC genomics 16, 437 (2015).
  18. Nagy, R., Boutin, T.S., Marten, J., Huffman, J.E., Kerr, S.M., Campbell, A., Evenden, L., Gibson, J., Amador, C., Howard, D.M., Navarro, P., Morris, A. Deary, I.J., Hocking, L.J., Padmanabhan, S., Smith, B.H., Joshi, P., Wilson, J.F., Hastie, N.D., Wright, A.F., McIntosh, A.M., Porteous, D.J., Haley, C .S., Vitart, V. & Hayward, C. Exploring haplotype research consortium imputation for genome-wide association studies with 20,032 generations of Scottish participants. Genome medicine 9, 23 (2017).
  19. Das, S., Forer, L., Schönherr, S., Sidore, C., Locke, A.E., Kwong, A., Vrieze, S.I., Chew, E.Y., Levy, S., McGue, M., Schlessinger, D., Stambolian, D. Loh, P.-R., Iacono, W.G., Swaroop, A., Scott, L.J., Cucca, F., Kronenberg, F., Boehnke, M., Abecasis, G.R. & Fuchsberger, C. Next generation genotype imputation services and methods. . Nature genetics 48, 1284-1287 (2016).
  20. Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., Hirschhorn, J., Strachan, D.P., Patterson, N. & Price, A.L. Fast and accurate assignment of summary statistics is a feature Strengthening Evidence of Enhancement. Bioinformatics (Oxford, England) 30, 2906-14 (2014).
  21. Barbeira, A.N., Pividori, M., Zheng, J., Wheeler, H.E., Nicolae, D.L. & Im, H.K. Integrating predicted transcriptomes from multiple tissues improves association detection. PLoS genetics 15, e1007889 (2019).
  22. Battle, A., Brown, C.D., Engelhardt, B.E. & Montgomery, S.B. Genetic effects on gene expression across human tissues. Nature 550, 204-213 (2017).
  23. Smith, B.H., Campbell, H., Blackwood, D., Connell, J., Connor, M., Deary, I.J., Dominiczak, A.F., Fitzpatrick, B., Ford, I., Jackson, C., Haddow, G. Kerr, S., Lindsay, R., McGilchrist, M., Morton, R., Murray, G., Palmer, C.N.A., Pell, J.P., Ralston, S.H., St Clair, D., Sullivan, F., Watt, G., Wolf, R. Wright, A., Porteous, D. & Morris, A.D. Generation Scotland: the Scottish Family Health Survey; a new resource for studying genes and heritability. BMC Medical Genetics 7, 74 (2006).
  24. Zhu, Z., Zheng, Z., Zhang, F., Wu, Y., Trzaskowski, M., Maier, R., Robinson, M.R., McGrath, J.J., Visscher, P.M., Wray, N.R. & Yang, J. Inferences from GWAS summary data Causal relationships between risk factors and common diseases. Nature communications 9, 224 (2018).
  25. Leeuw, C.A. de, Mooij, J.M., Heskes, T. & Posthuma, D. MAGMA: Generalized gene set analysis of GWAS data. PLoS computational biology 11, e1004219 (2015).
  26. S., Stahl, E., Lindstrom, S., Perry, J.R.B., Okada, Y., Raychaudhuri, S., Daly, M.J., Patterson, N., Neale, B.M. & Price, A.L. Function using genome-wide association analysis summary statistics partitioning heritability by annotation. Nature genetics 47, 1228-35 (2015).
  27. Bulik-Sullivan, B., Finucane, H.K., Anttila, V., Gusev, A., Day, F.R., Loh, P.-R., Duncan, L., Perry, J.R.B., Patterson, N., Robinson, E.B., Daly, M.J ., Price, A.L. & Neale, B.M. Atlas of genetic correlates across human diseases and traits. Nature genetics 47, 1236-41 (2015).
  28. Zheng, J., Erzurumluoglu, A.M., Elsworth, B.L., Kemp, J.P., Howe, L., Haycock, P.C., Hemani, G., Tansey, K., Laurin, C., Pourcain, B.S., Warrington, N. M., Finucane, H.K., Price, A.L., Bulik-Sullivan, B.K., Anttila, V., Paternoster, L., Gaunt, T.R., Evans, D.M. & Neale, B.M. LD hub: SNP heritability and genetic correlation analysis for A centralized database and web interface for performing lD score regression to maximize the potential of summary-level GWAS data for Bioinformatics (Oxford, England) 33, 272-279 (2017).
  29. Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., Hillman-Jackson , J., Kuhn, R.M., Pedersen, J.S., Pohl, A., Raney, B.J., Rosenbloom, K.R., Siepel, A., Smith, K.E., Sugnet, C.W., Sultan-Qurraie, A., Thomas, D.J., Thomas, D.J. Trumbower, H., Weber, R.J., Weirauch, M., Zweig, A.S., Haussler, D. & Kent, W.J. UCSC genome browser database: update 2006. Nucleic acids research 34, D590 -8 (2006).
  30. Sun, J., Ye, F., Wu, A., Yang, R., Pan, M., Sheng, J., Zhu, W., Mao, L., Wang, M., Huang, B., Tan, W. & Jiang, T. Comparative transcriptomic analysis of intensive host cell reveals early phase responses. (2020). 2020.04.30.071274.
  31. Rosa, B.A., Ahmed, M., Singh, D.K., Choreno-Parra, J.A., Cole, J., Jimenez-Alvarez, L.A., Rodriguez-Reyna, T.S., Singh, B., Golzalez, O., Carrion, R ., Schlesinger, L.S., Martin, J., Zuniga, J., Mitreva, M., Khader, S.A. & Kaushal, D. IFN signaling and neutrophil degranulation transcriptional signatures are induced during SARS-COV-2 infection. BioRxiv : the preprint server for biology (2020). Available at:
  32. Zhang, J.-Y., Wang, X.-M., Xing, X., Xu, Z., Zhang, C., Song, J.-W., Fan, X., Xia, P., Fu, J.-L., Wang, S.-Y., Xu, R.-N., Dai, X.-P., Shi, L., Huang, L. Jiang, T.-J., Shi, M., Zhang, Y., Zumla, A., Maeu75 / 5000 Single-cell status of immune responses in Covid-19 patients. scape of immunological responses in Covid-19 patients. Nature immunology 21, 1107-1118 (2020).
  33. Mick, E., Kamm, J., Pisco, A.O., Ratnasiri, K., Babik, J.M., Calfee, C.S., Castaneda, G., DeRisi, J.L., Detweiler, A.M., Hao, S., Kangelaris, K.N. Kumar, G.R., Li, L.M., Mann, S.A., Neff, N., Prasad, P.A., Serpa, P.H., Shah, S.J., Spottiswoode, N., Tan, M., Christenson, S.A., Kistler, A. & Langelier , C. Gene expression in the upper respiratory tract distinguishes Covid-19 from other acute respiratory diseases and reveals suppression of the innate immune response by SARS-COV-2. medRxiv : the preprint server for health sciences (2020). 05.18.20105171.
  34. Wei, J., Alfajaro, M.M., Hanna, R.E., DeWeirdt, P.C., Strine, M.S., Lu-Culligan, W.J., Zhang, S.-M., Graziano, V.R., Schmitz, C.O., Chen, J.S. Mankowski, M.C., Filler, R.B., Gasque, V., de Miguel, F., Chen, H., Oguntuyo, K., Abriola, L., Surovtseva, Y.V., Orchard, R.C., Lee, B., Lindenbach, B. Politi, K., van Dijk, D., Simon, M.D., Yan, Q., Doench, J.G. & Wilen, C.B. Genome-wide CRISPR screens reveal host genes that regulate SARS-CoV-2 infection. (2020). Available at:
  35. Heaton, B.E., Trimarco, J.D., Hamele, C.E., Harding, A.T., Tata, A., Zhu, X., Tata, P.R., Smith, C.M. & Heaton, N.S. SRSF protein kinases 1 and 2 are SRSF protein kinases 1 and 2 are essential host factors for human coronaviruses, including SARS-COV-2. bioRxiv : the preprint server for biology (2020).


We thank all the intensive care unit research staff who recruited patients and their loved ones to participate in this study, one of the most challenging times in their lives and during one of the most extreme conditions ever experienced in a UK hospital, with personal risk. GenOMICC is grateful to the Sepsis Research (Fiona Elizabeth Agnew Trust), the Intensive Care Society, the Wellcome-Beit Prize award to J. K. Baillie (Wellcome Trust 103258/Z/13/A), the BBSRC Institute Program Support Grant ( BBS/E/D/20002172, BBS/E/D/10002070, and BBS/E/D/30002275) were funded. Whole genome sequencing was performed in partnership with Genomics England and funded by the UK Department of Health and Social Care, UKRI, and LifeArc. ISARIC4C is funded by The Medical Research Council [grant MC_PC_19059], the National Institute for Health Research (NIHR) [grant CO-CIN-01], the NIHR Health Protection Research Unit (HPRU) at the University of Liverpool, UK School of Public Health Emerging and Zoonotic Research on Infectious Diseases (PHE) partnership, with grants from the Liverpool School of Tropical Medicine and the University of Oxford, Oxford [grant 200907]. Respiratory Infections NIHR HPRU with PHE at Imperial College London [award 200927], the Wellcome Trust and Department for International Development [215091/Z/18/Z], the Bill & Melinda Gates Foundation [OPP1209135], and the Liverpool Experimental Cancer Medical Centre ( Grant Reference. Infrastructure support for this work was provided by the NIHR Biomedical Research Centre at Imperial College London [IS-BRC-1215-20013], the EU Platform for European Readiness for (Re)emerging Epidemics (PREPARE) against. [FP7 project 60252525], and the NIHR Clinical Research Network. PJMO is supported by an NIHR Senior Investigator Award [award 201385]. The views in this study are those of the authors and not necessarily those of the DHSC, DID, NIHR, MRC, Wellcome Trust, or PHE HM was supported by the NIHR BRC at University College London Hospital. The Irish Health Research Board (Clinical Trial Network Award 2014-12) funded the sample collection in Ireland. The study was conducted using the UK Biobank resource under Project 788. Generation Scotland received core support from the Scottish Government Health Directorate Chief Scientist's Office [CZD/16/6] and the Scottish Funding Council [HR03006] and is currently supported by the Wellcome Trust [216767/Z/19/Z Genotyping of the GS:SFHS samples was performed by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, University of Edinburgh, Scotland, and is supported by the UK Medical Research Council and the Wellcome Trust (Wellcome Trust Strategic Award Strategising Resilience and Depression Longitudinally ( STRADL) reference 104036/Z/14/Z. The Genomics England and 100,000 Genomes projects are funded by the National Institute for Health Research, Wellcome Trust, Medical Research Council, Cancer Research UK, Department of Health and Social Care, NHS England M. Caulfield is a Senior Research Fellow at the NIHR. This work is part of the translational research portfolio of the NIHR Biomedical Research Centers in Barts and Cambridge. Research carried out in the Human Genetics Unit is funded by the MRC (MC_UU_00007/10, MC_UU_00007/15); LK was supported by an RCUK Innovation Fellowship from the National Productivity Investment Fund (MR/R026408/1); A Bretherick acknowledges funding from the Wellcome Trust Doctoral Training Fellowship for Clinicians (204979/Z/16/Z) and the Edinburgh Clinical Academic Track (ECAT) program. We acknowledge support from the MRC Human Genetics Unit Programme Grant 'Quantitative Traits in Health and Disease' (U. MC_UU_00007/10). A. Tenesa gratefully acknowledges funding from MRC research grant MR/P015514/1, HDR-UK awards HDR-9004 and HDR-9003. We are grateful to the National Institute for Health Research Clinical Research Network (NIHR CRN) and the Chief Scientist's Office (Scotland) for facilitating recruitment to the study in NHS hospitals, and to the global ISARIC and InFACT consortia for their significant contributions to this work. The authors thank Dr. Rebecca Coll, Wellcome-Wolfson Institute, for advice on the interpretation of these results, Queen's University Belfast, for sharing harmonized GWAS summary statistics used in the LD-Hub, Jie The authors would like to thank Zheng (University of Bristol). The authors would like to thank the anonymous reviewers who provided many substantial improvements to the manuscript and analysis. The views expressed here are purely those of the authors and do not under any circumstances represent the official views of the European Commission.

Author Contributions

JK, PK, CHi, PH, AN, DM, LL, DMc, HM, TW, CPP, JKB contributed to study design; SC, JF, FG, WO, SK, AF, KRo, LMu, PJO, MGS, AL, JKB contributed to study coordination; NW, AF, LMu contributed to laboratory work; EP-C, SC, LK, ADB, KR, DP, SW, NP, MHF, JF, AR, EG, DH, BW, YW, AM, AK, LM, ZY, RZ, CZ, GG, BS, MZ, CH, JY, XS, CPP, AT, KRo, AL, VV, JFW, JKB contributed to data analysis; SC, CDR, DJP, CHa SC, CDR, DJP, LT, AH, SCM, AP, ARe, MC, RS, JKB contributed to case and control recruitment; SC, CDR, RB, JM, JKB contributed to interpretation of findings; EP-C, SC, LK, ADB, KR, CDR, RB, JM, CPP, KRo, VVV, JFW, JKB contributed to manuscript preparation; JKB contributed to manuscript editing. JKB conceived the study and wrote the first draft of the manuscript. The final version of the manuscript was approved by all authors.The GenOMICC recruitment sites are listed below, in order of the number of patients recruited at each site.

Competitive Interests

The authors declare no competing interests.

Additional Information

For additional information, For inquiries and requests for information, Please send to J.K.B.

Peer Review Information Nature thanks Paul McLaren and the other anonymous reviewers for their contributions to the peer review of this article. The reviewers' reports are available.

Reproduction and Permission Information, Available at:


Extended Data Figure 1 Q:Q Plot

gcc.eur - Europe, gcc.afr - Africa, gcc.eas - South Asia, and for each ancestry group in the cross-ethnic meta-analysis (gcc.te.meta) and the meta-analysis consisting of GenOMICC, HGI, and 23andMe data (gcc.hgi.23m), Raw (uncorrected) p-values are shown. λ - genomic inflation values. note that the primary analysis in GenOMICC EUR reveals some residual inflation. Repeating the analysis with more principal components (20PCs) as covariates did not improve inflation (λ0.5 = 1.10).

拡張データ図1 Q:Qプロット

Extended data Figure 2 Contents of MAIC shared information

Representation of the amount of information shared among data sources in MAIC analysis. Each experiment or data source is represented by a block on the outer circle, and the size of the block of data sources is proportional to the total amount of information in the input list. Lines are color-coded according to the dominant data source. Data sources within the same category share the same color (see legend). The largest categories and data sources are labeled. An interactive version of this figure can be found at To estimate the probability of specific enrichment of GenOMICC meta-TWAS, we randomly sampled 1000 times from the baseline distribution of meta-TWAS genes and used the same set of MAIC was re-run with Covid-19 systematic review inputs, but the randomly sampled input list was replaced with GenOMICC meta-TWAS results. We modeled a normal distribution based on these empirical results and estimated the probability of the MAIC being strongly enriched in this way to be p = 4.2 × 10-12 by random chance.

Extended Data Figure 3 Genomic overlap between cases and controls

PCA plots showing the distribution of all cases and controls for the first 10 principal components. Cases are indicated by colored closed circles. Europe (EUR, blue), Africa (AFR, red), East Asia (EAS, green), and South Asia (SAS, purple). Controls for each ancestry group are shown as closed circles with a lighter color for that ancestry group. The UK Biobank population background is shown as a light gray closed circle.

rs2236757 A/G

Cohort (statistics,, biology, etc.) Odds ratio OR 95%-CI Weight
gee-AFR rs2236757 1.08 [0.76; 1.53] 2.9%
gee-EAS   1.30 [0.89; 1.91] 2.4%
gee-SAS   1.27 [0.99; 1.62] 5.9%
gee-EUR   1.29 [1.17; 1.41] 43.0%
23m   1.22 [0.96; 1.55] 6.3%
hgi   1.20 [1.09; 1.32] 39.7%
Effect Fixed Model   1.24 [1.17; 1.32] 100.0%
Heterogeneity: I2=0%, τ2=0, d= 0.87   p-val=1.16e-12

rs73064425 T/C

Cohort (statistics,, biology, etc.) Odds ratio OR 95%-CI Weight
gee-AFR rs73064425 7.74 [0.43; 138.83] 0.1%
gee-EAS   4.02 [1.26; 12.76] 0.5%
gee-SAS   1.26 [0.96; 1.66] 9.0%
gee-EUR   2.14 [1.88; 2.45] 39.3%
23m   1.87 [1.56; 2.24] 20.2%
hgi   1.70 [1.47; 1.98] 31.0%
Effect Fixed Model   1.86 [1.71; 2.02] 100.0%
Heterogeneity: I2=69%, τ2=0.0281, p<0.01   p-val=1.97e-49

rs2109069 A/G

Cohort (statistics,, biology, etc.) Odds ratio OR 95%-CI Weight
gee-AFR rs2109069 0.84 [0.59; 1.21] 2.3%
gee-EAS   0.93 [0.55; 1.58] 1.1%
gee-SAS   1.11 [0.83; 1.47] 3.7%
gee-EUR   1.36 [1.25; 1.48] 41.1%
23m   1.13 [0.99; 1.29] 17.2%
hgi   1.19 [1.08; 1.31] 34.5%
Effect Fixed Model   1.23 [1.16; 1.30] 100.0%
Heterogeneity: I2=61%,τ2=0.0091, p=0.03   p-val=3.10e-13

rs10735079 G/A

Cohort (statistics,, biology, etc.) Odds ratio OR 95%-CI Weight
gee-AFR rs10735079 0.90 [0.62; 1.32] 2.2%
gee-EAS   0.69 [0.41; 1.18] 1.1%
gee-SAS   1.24 [0.96; 1.60] 4.8%
gee-EUR   0.77 [0.71; 0.85] 38.7%
23m   0.96 [0.83; 1.12] 13.8%
hgi   0.87 [0.80; 0.95] 39.4%
Effect Fixed Model   0.86 [0.81; 0.91] 100.0%
Heterogeneity: I2=69%,τ2=0.0135, p<0.01   p-val=5.04e-08

rs12004298 (ABO locus)

Cohort (statistics,, biology, etc.) Odds ratio OR 95%-CI Weight
gee-EUR rs12004298 1.11 [1.01;1.21] 33.7%
hgi   1.20 [1.10;1.31] 31.1%
23m   1.15 [1.06; 1.25] 35.2%
Effect Fixed Model   1.15 [1.09; 1.21] 100.0%
Heterogeneity: I2=0%,τ2=0, p=0.44  

Extended Data Figure 4 Effect sizes for ancestral groups within the GenoMICC study.

Data are shown for four replicated variants with significant genome-wide associations at the GenOMICC (a-d) and ABO loci (e). Forest plots display measures of effect size heterogeneity under the fixed effects model with p-values (p), meta-analysis estimates with 95% confidence intervals, and p-values (P-val). Alleles in bold are reference alleles for reported effects (odds ratios).Sample sizes for cases + controls analyzed in 4 groups are as follows: 1092 African (AFR), 894 East Asian, 10055 European, and 1422 South Asian (SAS) cases within GenOMICC. HGI - Covid-19 Host Genetics Initiative; 23m - 23andMe. The observed heterogeneity in effect sizes may be due to true differences between ancestral groups, limited statistical power in small groups (evident from wide confidence intervals), or residual confounding factors may be responsible.

Extended Data Table 1 Baseline characteristics of the 2244 patients included in the study

Selected characteristics GenOMICC (n=2109) ISARIC 4C(n=135)
Missing Data Missing Data
Female Gender 624(30%) 46(34%)
Age (years, mean ± SD) 57.3±12.1 57.3±2.9
European (lineage) 1573(75%) 103(76%)
South Asian 219(10%) 18(13%)
African (race) 174(8%) 8(6%)
East Asian 143(7%) 6(4%)
Significant Comorbidities 396(19%) 49(2%) 40(30%) 26(19%)
Invasive Climate Change 1557(74%) 35(2%) 25(19%) 31(25%)
Death (60 days) 459(22%) 338(16%) 22(16%) 30(22%)

Ancestral groups were determined by principal component analysis (Extended Data Figure 3). Significant comorbidities were defined in the treating clinician's assessment as the presence of functionally limiting comorbidities in GenOMICC. In ISARIC4C, significant comorbidity was defined as the presence of chronic heart disease, pulmonary disease, renal disease, liver disease, cancer, and dementia.
Age is presented as mean ± standard deviation.

Extended Data Table 2 Duplication with External Data

rs73064425 3:45901089 T C 2.1 4.8x10-30 1.7 1.5x10-28* LZTFL1
rs9380142 6:29798794 A G 1.3 3.2x10-8 1 0.76 HLA-G
rs143334143 6:31121426 A G 1.9 8.8x10-18 1.1 0.019 CCHCR1
rs3131294 6:32180146 G A 1.5 2.8x10-8 0.99 0.91 NOTCH4
rs10735079 12:113380008 A G 1.3 1.6x10-8 1.1 0.00082* OAS1/3
rs2109069 19:4719443 A G 1.4 4x10-12 1.1 5x10-5* DPP9
rs74956615 19:10427721 A T 1.6 2.3x10-8 1.4 2x10-6* TYK2
rs2236757 21:34624917 A G 1.3 5x10-8 1.2 4.1x10-5* IFNAR2

Risk - risk allele; Alt - alternative allele; or - effect size (odds ratio) of risk allele. CI-95% confidence interval for odds ratio. P-p-value, locus-the gene closest to the top SNP. Subscript identifiers indicate data source. gcc-GenOMICC study, European ancestry, comparison with UK Biobank. hgi.23m-New coronavirus host genetics initiative used for replication and 23andMe meta-analysis. *Significant Bonferroni values are highlighted to indicate external replication.