J Han / N Uberoi (@1.57) vs Y Zhang / Y Zhao C (@2.25)
10-09-2019

Our Prediction:

J Han / N Uberoi will win
  • Home
  • Tennis
  • J Han / N Uberoi vs Y Zhang / Y Zhao C

J Han / N Uberoi – Y Zhang / Y Zhao C Match Prediction | 10-09-2019 01:00

We demonstrated the power of our pipeline on 185 newly sequenced and 90 assembled Han Chinese genomes. The ongoing improvements of high-throughput sequencing technology and analytic capabilities promote the availability of DNA sequencing data. Currently, hundreds of individual assembled human genomes are available at NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/) and other databases [21, 23]. A number of re-sequencing projects have completed and resulted in high volumes of whole-genome sequencing data [4, 6, 9]. We have shown a strategy based on high-depth sequencing, de novo assembly, gene prediction of novel sequences, and maping raw reads to pan-genome to determine the gene PAV in a large number of humanindividuals. These data provide a great opportunity to understand more complex genetic diversity of human genomes and gain insight on population-specific variations, which are important for clinical or public health [19]. These datasets, especially those deep sequencing data from large cohorts, make it possible to carry out population-scale pan-genome analysis, such as the individuals within a certain geographical range or with a certain disease.

Analyzing membrane proteomes may help us understand carcinogenic mechanisms and promote the discovery of new potential tumor biomarkers and therapeutic targets. Membrane proteins account for approximately 30% of the whole cell proteome and are known to be involved in cell proliferation, cell adhesion, and tumor cell invasion. They are also pivotal to the development, growth, angiogenesis, and metastasis of tumors [12-14]. The cell membrane is involved in many biological functions, including small molecules transport, cell-cell and cell-substrate recognition and interaction, and cell signaling transduction and communications [10,11].

Background

NSCLC includes adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and other cell types. Lung cancer is divided into two classes: non-small cell lung cancer (NSCLC) and small cell lung cancer. Lung cancer is one of the most frequently diagnosed cancers and the leading cause of cancer death worldwide [1]. Although many treatments are available, its prognosis is still poor. Smoking is the most common cause of lung cancer overall, but lung adenocarcinoma is the most frequently occurring cell type in nonsmokers, and its pathogenesis remains unclear. Lung adenocarcinoma is the most common type of lung cancer and has been increasing in recent years. The 5-year survival of all lung cancer patients is only approximately 16% [2].

For example, more than 3700 non-repetitive non-reference (NRNR) sequences were called from whole-genome sequence data of 15,219 Icelanders by de novo assembly of the unmapped reads into contigs [4]. found that each genome carried an average of 0.7Mb sequences that were not found in the human reference genome [6]. Therefore, reference-based methods may miss some sequence variations within or between populations [2, 3]. However, most of these studies are based on the human reference genome, which was built from several individuals, and only a consensus of these genomes was included [1]. For example, a 766-bp non-repetitive non-reference sequence was found to have an association with myocardial infarction in Icelanders [4]. Actually, previous studies have discovered various types of novel sequences, which are not present in the human reference genome [4,5,6,7,8]. In another study, by analyzing the unmapped reads from ~10,000 deep sequencing human genomes, Telenti et al. The Simons Genome Diversity Project reported high-quality genomes of 300 individuals from 142 diverse populations and suggested at least 5.8Mb sequences from these genomes were not present in the human reference genome [9]. These novel sequences may harbor functional genomic elements that are ethnic specific, and may affect gene regulations or transcriptional diversity [2]. Adding these novel sequences into the human reference genome could improve the efficiency of mapping and variant calling process [9]. Single nucleotide variations (SNVs), small insertions and deletions (INDELs), and structural variations (SVs) of the human genome are routinely explored to study the genomic variations in biomedical studies.

The immunoreactive protein bands were visualized by enhanced chemiluminescence and evaluated by densitometry using Image J software. A measured amount (60 g) of protein was separated by SDS-PAGE and transferred to membranes. The membranes were blocked with 5% nonfat dry milk in TBST buffer for 2 h at room temperature, incubated with anti-S100A14 antibody (1:400) overnight at 4C, washed in TBST, and incubated again with horseradish peroxidase-conjugated secondary antibody (1:4000) for 2 h at room temperature. Western blotting analysis was performed for 10 pairs of fresh lung adenocarcinoma and normal lung tissue. -actin was used as a loading control.

Conclusion

The precipitates were resuspended in solution buffer (50% TEAB, 0.1% SDS). The extracted membrane proteins were reduced with 10 mM DTT and alkylated with 55 mM IAM. They were then precipitated by cold acetone, stored at -20C for 3 h, and concentrated by centrifuging at 20,000 g for 30 min. The lung cancer and normal lung tissue samples were labeled with iTRAQ117 and iTRAQ118, respectively. Protein digestion and iTRAQ labeling were performed according to the iTRAQ kit protocol (Applied Biosystems). Then 100 ug protein solutions were digested with 1 ug/ul trypsin solution at 37C overnight and labeled with iTRAQ tags.

After comparing several de novo assembly tools for next-generation sequencing data for large-sized genomes (Additional file 1: Supplementary methods), we selected SGA (String Graph Assembler) [24] due to its high assembly quality and low memory consumption. However, due to the large size of the human genome, assembling an individual genome from a 30-fold depth sequencing data requires more than 500Gb of memory (Additional file 1: Table S4), which prohibits assembling hundreds of individual genomes in practice. We obtained optimized parameters of SGA (Additional file 1: Table S2) on a simulation data and ran SGA with this parameter setting on 185 deep sequencing genomes in parallel. In EUPAN, SOAPDenovo2 was used to assemble individual genome. De novo assembly is one of the important tasks in pan-genome analysis, which provides the capacity of detecting sequences missing in the current reference genome.

S100A14 is a novel member of the S100 protein family [35]. The study by Lukanidin [37] showed that the S100 family is pivotal in cell migration, invasion, and cancer metastasis. S100 is a subfamily of proteins related by Ca2+-binding to the EF-hand superfamily that appear to be involved in the regulation of many cellular processes (e.g., cell cycle progression, differentiation, cell-cell communication, intracellular signaling, energy metabolism) [35,36].

Author information

All raw spectra files were searched against UniProtKB/Swiss-Prot Homo sapiens using Proteome Discoverer (Thermo Fisher Scientific, Version 1.3) and Mascot Version 2. Carbamidomethylation of cysteine was set as a fixed modification while for oxidation of methionine, GlnPyro-Glu (N-term Q), iTRAQ 8 plex labeling at N-terminal, K, and Y were used as variable modifications. FDR less than 1% was deemed acceptable for both the peptide and protein level. Protein quantification required at least one unique peptide. 3. Trypsin was used as the required enzyme, and one missed cleavage was allowed. The maximum mass deviation allowed for precursor mass was set to 15 ppm, and fragment ion tolerance was 0.02 Da.

In another Chinese genome HX1, 12.8Mb sequences were detected not present in GRCh38 but 68% of these novel sequences could be found in Asian populations [2]. The first human pan-genome study was carried out in 2010, and only two representative genomes from Africa and Asia were analyzed [3]. In this study, about 5Mb novel sequences absent in the reference genome (hg19 assembly) were detected for each individual and the total sequences absent in the reference genome were estimated to be 19~40Mb, which might have been underestimated considering the study of 10 Danish trios [19]. The possibility of these non-reference genomic regions to be the driver mutations for some diseases, especially for those dominated by a certain specific ethnic group, is worth our effort to investigate. reported an African pan-genome [20]. In a subsequent study [2], re-analysis of the 5Mb novel sequences from a Chinese individual showed that 3.7Mb sequences could be aligned to GRCh38 human reference genome. Notably, most of these novel sequences were individual-specific, and only 81Mb sequences were shown in two or more individuals. It contained about 300Mb unique sequences missing in the human reference genome. In a latest paper, Sherman et al. These studies indicated the significance of population-specific genome diversity.