Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《散漫的天性》内容简介:本书的作者冯骥才散文具有独特的艺术美风格,他将自然美糅进他的语言风格、行文结构中,形成一种与思想内
《商业银行操作风险量化分析》内容简介:本书以信度理论和贝叶斯网络为主要工具,研究了操作风险的高级计量法与预警机制,并针对中
《你不理财 财不理你(2)》内容简介:钱是需要去赚的,通常情况下,只要你能赚,你的财富才会越来越多,但同时也是省下来的。很多
《编程卓越之道第二卷:运用底层语言思想编写高级语言代码》是《编程卓越之道》系列书的第二卷,将探讨怎样用高级语言(而非汇编语
《这就是财务管理:世界500强CFO的独家分享》内容简介:本书财务工作涉及企业运营的方方面面,长期以来,无论是财务工作者还是企业
随着宽带数据和多媒体业务的迅猛发展,第三代移动通信原定目标规定的2Mbit/s的传输速率已经远远不能满足需求,加上WiMAX等宽带无
CorelDRAW X4基础运用与设计实例 本书特色 《CorelDRAW X4基础运用与设计实例》:艺术与设计类规划教材。CorelDRAW X4基础运用与设...
《灾害康复医学》内容简介:全书内容包括绪论、灾害伤员早期医学救援、灾害伤员康复治疗(物理治疗、作业治疗、假肢矫形)、灾害伤
《你就是极客!软件开发人员生存指南》是一本软件工程师的职场指南,以虚构的人物和情景描述了极客的日常工作,对他们常遇到的各
从貌似天书的汇编代码中,一探Windows底层的核心实现。.在开发中出现的问题,能从Windows自身找到答案!...本书从基本的Windows程
本书以清晰简明的风格解释了有关的标准、概念和实现,极具权威性。读者可以从中了解到建立和部署Web服务的主要工具包。书中用许多
这本书以街道的视觉秩序的创造作为建筑平面布局形成设计的出发点,分别从街道的自然特征,美学规律,人文特色出发由浅至深论述如
《唐草物语》内容简介:本书于1981年获得第九届泉镜花文学奖。本书是作者以历史上十二位著名人物的故事为蓝本,用自己独特的奇幻风
PythonAlgorithmsexplainsthePythonapproachtoalgorithmanalysisanddesign.WrittenbyM...
国际产品设计基础教程》系列丛书从设计概念开发材料工艺应用、手绘表现技法及产品发展趋势等角度全方位、多层次地教授设计理念和
【内容简介】本书深入浅出地介绍了Redis的5种数据类型,并通过多个实用示例展示了Redis的用法。除此之外,书中还讲述了Redis的优
瑞萨M16C/62P单片机原理和应用 内容简介 本书介绍瑞萨科技股份公司*近推出的16位M16C/62P单片机的工作原理、性能特点及使用方法。M16C/62P单...
TheSeriesinCommunicationTechnologyandSocietyisanintegratedseriescenteringontheso...
《清代武科考试研究》内容简介:本书综合参撷正史、官书、典章、方志、文集、笔记、报刊等各类史料,并发掘运用海峡两岸及国外所藏
《过去的工作》内容简介:《过去的工作》收入周作人在抗战胜利前后(一九四五年四月至十二月)所作文章十五篇。文章延续四十年代以