生信大本营 · 用Python解决生物信息学问题（To Solve Bioinformatics Problems with Python）

<a id = "header"></a> 本专题是各类生物信息学/生物统计学问题的博览会。问题背景多样，方法灵活，但都没有离开生命科学背景与应用。这里划分了16个子专题，基本涵盖了生命科学与生物信息学研究的主流领域。 >[info] 以下是各个子专题及其问题的索引。需要注意的是，有的问题被归入**不止一个子专题**中。请点击以下子专题的链接，跳转到索引的对应位置： - [比对](#alignment)；[组合数学](#combinatorics)；[计算质谱](#comp-mass-spec)；[分治法](#divide-conquer)； - [动态规划](#dyn-prog)；[基因组组装](#genome-asse)；[基因组重排](#genome-rear)；[图论算法](#graph-algo)； - [图结构](#graphs)；[遗传](#heredity)；[系统发生](#phylogeny)；[种群动态](#popu-dyna)； - [概率论](#probability)；[集合论](#set-theory)；[排序](#sorting)；[字符串算法](#string-algo) 点击各子专题的副标题，可回到本介绍页顶部；要访问其他专题，请回到[欢迎界面](../README.md)。 &emsp; <a id = "alignment"></a> ## [比对](#header) 将一条序列比对到另一条序列上（序列中允许间隔），以表示序列间的插入、缺失与替换。 - Counting Point Mutations [点突变计数](HAMM.md) - Pairwise Global Alignment 双序列全局比对 - Suboptimal Local Alignment 次优的局部比对 - Transitions and Transversions 转换与颠换 - Global Multiple Alignment 多序列全局比对 - Creating a Distance Matrix 创建距离矩阵 - Edit Distance 编辑距离 - Edit Distance Alignment 编辑距离比对 - Counting Optimal Alignments 计算最优比对 - Global Alignment with Scoring Matrix 基于打分矩阵的全局比对 - Global Alignment with Constant Gap Penalty 基于恒定空位罚分的全局比对 - Local Alignment with Scoring Matrix 基于打分矩阵的局部比对 - Maximizing the Gap Symbols of an Optimal Alignment 使最优比对的间隔信号最大化 - Multiple Alignment 多重比对 - Global Alignment with Scoring Matrix and Affine Gap Penalty 基于打分矩阵与仿射空位罚分的全局比对 - Overlap Alignment 重叠比对 - Semiglobal Alignment 半全局比对 - Local Alignment with Affine Gap Penalty 基于仿射空位罚分的局部比对 - Isolating Symbols in Alignments 区分比对信号 &emsp; <a id = "combinatorics"></a> ## [组合数学](#header) 物体计数的数学方法。 - Rabbits and Recurrence Relations [兔子与递推关系](FIB.md) - Mortal Fibonacci Rabbits [寿命有限的斐波那契兔子](FIBD.md) - Inferring mRNA from Protein 从蛋白质推测mRNA - Open Reading Frames 开放读码框 - Enumerating Gene Orders 枚举基因次序 - Perfect Matchings and RNA Secondary Structures 完美匹配与RNA二级结构 - Partial Permutations 部分置换 - Enumerating Oriented Gene Orderings 枚举定向基因次序 - Catalan Numbers and RNA Secondary Structures 卡特兰数与RNA二级结构 - Counting Phylogenetic Ancestors 计算系统发生树祖先 - Maximum Matchings and RNA Secondary Structures 最大匹配与RNA二级结构 - Reversal Distance 翻转距离 - Counting Subsets 子集计算 - Introduction to Alternative Splicing 可变剪接介绍 - Motzkin Numbers and RNA Secondary Structures 莫特金数与RNA二级结构 - Sorting by Reversals 反转排序 - Wobble Bonding and RNA Secondary Structures 摇摆绑定与RNA二级结构 - Counting Optimal Alignments 计算最优比对 - Counting Unrooted Binary Trees 计算无根二叉树 - Counting Quartets 计算四分体 - Enumerating Unrooted Binary Trees 枚举无根二叉树 - Counting Rooted Binary Trees 计算有根二叉树 &emsp; <a id = "comp-mass-spec"></a> ## [计算质谱](#header) 质谱技术——一种通过将分子分裂成小块并分析这些小块的化学性质来识别分子的技术。 - Calculating Protein Mass 计算蛋白质质量 - Inferring Protein from Spectrum 从光谱推断蛋白质 - Comparing Spectra with the Spectral Convolution 光谱卷积的光谱比较 - Inferring Peptide from Full Spectrum 从全谱推断蛋白肽 - Matching a Spectrum to a Protein 蛋白质与光谱的匹配 - Using the Spectrum Graph to Infer Peptides 利用频谱图推断蛋白肽 &emsp; <a id = "divide-conquer"></a> ## [分治法](#header) - Binary Search 二分查找 - Merge Sort 归并排序 &emsp; <a id = "dyn-prog"></a> ## [动态规划](#header) 动态规划算法——通过逐步地在更大的案例中解决问题，来建立解决方案。 - Rabbits and Recurrence Relations [兔子与递推关系](FIB.md) - Mortal Fibonacci Rabbits [寿命有限的斐波那契兔子](FIBD.md) - Longest Increasing Subsequence 最长上升子序列 - Perfect Matchings and RNA Secondary Structures 完美匹配与RNA二级结构 - Catalan Numbers and RNA Secondary Structures 卡特兰数与RNA二级结构 - Finding a Shared Spliced Motif 找出共享剪接基序 - Maximum Matchings and RNA Secondary Structures 最大匹配与RNA二级结构 - Edit Distance 编辑距离 - Motzkin Numbers and RNA Secondary Structures 莫特金数与RNA二级结构 - Interleaving Two Motifs 交错双基序 - Edit Distance Alignment 编辑距离比对 - Finding Disjoint Motifs in a Gene 找出基因中的不相连基序 - Wobble Bonding and RNA Secondary Structures 摇摆绑定与RNA二级结构 - Global Alignment with Scoring Matrix 基于打分矩阵的全局比对 - Global Alignment with Constant Gap Penalty 基于恒定空位罚分的全局比对 - Local Alignment with Scoring Matrix 基于打分矩阵的局部比对 - Maximizing the Gap Symbols of an Optimal Alignment 使最优比对的间隔信号最大化 - Multiple Alignment 多重比对 - Global Alignment with Scoring Matrix and Affine Gap Penalty 基于打分矩阵与仿射空位罚分的全局比对 - Overlap Alignment 重叠比对 - Semiglobal Alignment 半全局比对 - Local Alignment with Affine Gap Penalty 基于仿射空位罚分的局部比对 - Isolating Symbols in Alignments 区分比对信号 &emsp; <a id = "genome-asse"></a> ## [基因组组装](#header) 从DNA短片段中重建连续的染色体大片段的算法。 - Genome Assembly as Shortest Superstring 基因组组装为最短“超序列” - Error Correction in Reads 读段误差校正 - Constructing a De Bruijn Graph 构建De Bruijn图 - Genome Assembly with Perfect Coverage 具有完美覆盖度的基因组组装 - Genome Assembly Using Reads 使用读段进行基因组组装 - Assessing Assembly Quality with N50 and N75 使用N50与N75评估基因组组装质量 - Genome Assembly with Perfect Coverage and Repeats 具有完全覆盖与重复的基因组组装 &emsp; <a id = "genome-rear"></a> ## [基因组重排](#header) 影响整个核酸间隔组成的大规模突变。 - Enumerating Gene Orders 枚举基因次序 - Longest Increasing Subsequence 最长上升子序列 - Partial Permutations 部分置换 - Enumerating Oriented Gene Orderings 枚举定向基因次序 - Reversal Distance 翻转距离 - Sorting by Reversals 反转排序 &emsp; <a id = "graph-algo"></a> ## [图论算法](#header) 解释、处理网络或图的算法。 - Overlap Graphs 重叠图 - Completing a Tree 构建树 - Introduction to Pattern Matching 模式匹配介绍 - Finding the Longest Multiple Repeat 找出最长多次重复区域 - Wobble Bonding and RNA Secondary Structures 摇摆绑定与RNA二级结构 - Genome Assembly with Perfect Coverage 具有完美覆盖度的基因组组装 - Using the Spectrum Graph to Infer Peptides 利用频谱图推断蛋白肽 - Encoding Suffix Trees 编码后缀树 - Genome Assembly Using Reads 使用读段进行基因组组装 - Identifying Maximal Repeats 识别最大重复区域 - Genome Assembly with Perfect Coverage and Repeats 具有完全覆盖与重复的基因组组装 &emsp; <a id = "graphs"></a> ## [图结构](#header) 图——包含一系列结点、两结点之间以边相连接的网络。 - Degree Array 度的数组 - Double-Degree Array 双向度的数组 - Breadth-First Search 广度优先搜索 - Connected Components 连通分支 - Testing Bipartiteness 测试双向性 - Testing Acyclicity 测试无环性 - Dijkstra's Algorithm Dijkstra最短路算法 - Square in a Graph 图中的平方 - Bellman-Ford Algorithm BF最短路算法 - Shortest Cycle Through a Given Edge 通过给定边的最短环路 - Topological Sorting 拓扑排序 - Hamiltonian Path in DAG 有向无环图中的哈密顿路径 - Negative Weight Cycle 负权环路 - Strongly Connected Components 强连通分支 - 2-Satisfiability 2-可满足性问题 - General Sink 普遍连通节点 - Semi-Connected Graph 半连通图 - Shortest Paths in DAG 有向无环图上的最短路 &emsp; <a id = "heredity"></a> ## [遗传](#header) 性状遗传的科学研究。 - Mendel's First Law [孟德尔第一定律（分离定律）](IPRB.md) - Calculating Expected Offspring 计算预期后代 - Independent Alleles 独立的等位基因 - Independent Segregation of Chromosomes 染色体的独立分离 - Inferring Genotype from a Pedigree 从系谱图推断基因型 - Sex-Linked Inheritance 伴性遗传 &emsp; <a id = "phylogeny"></a> ## [系统发生](#header) 系统发生树——对生物演化场景进行建模，一系列物种从它们的预设祖先中产生。 - Completing a Tree 构建树 - Counting Phylogenetic Ancestors 计算系统发生树祖先 - Creating a Distance Matrix 创建距离矩阵 - Distances in Trees 树的距离 - Creating a Character Table 创建特征表 - Newick Format with Edge Weights 带边权值的Newick格式 - Creating a Character Table from Genetic Strings 从基因序列中创建特征表 - Counting Unrooted Binary Trees 计算无根二叉树 - Quartets 四分体 - Character-Based Phylogeny 基于特征的系统发生 - Counting Quartets 计算四分体 - Enumerating Unrooted Binary Trees 枚举无根二叉树 - Inferring Genotype from a Pedigree 从系谱图推断基因型 - Counting Rooted Binary Trees 计算有根二叉树 - Phylogeny Comparison with Split Distance 伴随分歧距离的系统发生比较 - Alignment-Based Phylogeny 基于比对的系统发生 - Fixing an Inconsistent Character Set 修复不一致的特征集合 - Quartet Distance 四分体距离 - Identifying Reversing Substitutions 识别反转替换 &emsp; <a id = "popu-dyna"></a> ## [种群动态](#header) - Counting Disease Carriers 计算疾病携带者 - The Wright-Fisher Model of Genetic Drift 遗传漂变的Wright-Fisher模型 - The Founder Effect and Genetic Drift 奠基者效应与遗传漂变 &emsp; <a id = "probability"></a> ## [概率论](#header) 概率论——关于随机事件发生可能性，或特定事件即将发生的可能性大小的数学研究。 - Mendel's First Law [孟德尔第一定律（分离定律）](IPRB.md) - Calculating Expected Offspring 计算预期后代 - Independent Alleles 独立的等位基因 - Introduction to Random Strings 随机序列简介 - Matching Random Motifs 随机基序的匹配 - Expected Number of Restriction Sites 限制性位点的期望数 - Independent Segregation of Chromosomes 染色体的独立分离 - Counting Disease Carriers 计算疾病携带者 - Inferring Genotype from a Pedigree 从系谱图推断基因型 - Sex-Linked Inheritance 伴性遗传 - The Wright-Fisher Model of Genetic Drift 遗传漂变的Wright-Fisher模型 - Wright-Fisher's Expected Behavior Wright-Fisher的预期行为 - The Founder Effect and Genetic Drift 奠基者效应与遗传漂变 &emsp; <a id = "set-theory"></a> ## [集合论](#header) 集合论——集合及其特性的数学研究。 - Counting Subsets 计算子集 - Introduction to Set Operations 集合运算简介 - Creating a Restriction Map 构建限制性图谱 &emsp; <a id = "sorting"></a> ## [排序](#header) 排序问题——寻找将无序结构更改为有序结构的最少操作方式。 - Insertion Sort 插入排序 - Majority Element 主元素（排序） - Merge Two Sorted Arrays 合并两个已排序数组 - 2SUM - Building a Heap 建立堆 - Merge Sort 归并排序 - 2-Way Partition 双向划分 - 3SUM - Heap Sort 堆排序 - Counting Inversions 计算反演 - 3-Way Partition 3向划分 - Median 中位数 - Partial Sort 部分排序 - Quick Sort 快速排序 &emsp; <a id = "string-algo"></a> ## [字符串算法](#header) 涉及字符串操作与特性的算法。 - Counting DNA Nucleotides [DNA核酸计数](DNA.md) - Transcribing DNA into RNA [DNA转录成RNA](RNA.md) - Complementing a Strand of DNA [DNA链互补配对](REVC.md) - Computing GC Content [计算GC含量](GC.md) - Translating RNA into Protein [RNA翻译成蛋白质](PROT.md) - Finding a Motif in DNA [找出DNA中的基序](SUBS.md) - Consensus and Profile [（序列）一致性与概况](CONS.md) - Finding a Shared Motif 找出共享基序 - Locating Restriction Sites 定位限制性位点 - RNA Splicing RNA剪接 - Enumerating k-mers Lexicographically 按字典序枚举k-mers - Perfect Matchings and RNA Secondary Structures 完美匹配与RNA二级结构 - Finding a Spliced Motif 找出剪接基序 - Catalan Numbers and RNA Secondary Structures 卡特兰数与RNA二级结构 - k-Mer Composition k成分 - Speeding Up Motif Finding 加速基序查找 - Finding a Shared Spliced Motif 找出共享剪接基序 - Ordering Strings of Varying Length Lexicographically 按字典序排列长度不同的字符串 - Maximum Matchings and RNA Secondary Structures 最大匹配与RNA二级结构 - Motzkin Numbers and RNA Secondary Structures 莫特金数与RNA二级结构 - Interleaving Two Motifs 交错双基序 - Introduction to Pattern Matching 模式匹配介绍 - Finding Disjoint Motifs in a Gene 找出基因中的不相连基序 - Encoding Suffix Trees 编码后缀树 - Linguistic Complexity of a Genome 基因组的语义复杂性 - Identifying Maximal Repeats 识别最大重复区域 - Finding All Similar Motifs 找出所有相似基序