Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《打造高质量Android应用:Android开发必知的50个诀窍》是目前唯一一本从开发技巧角度讲解Android应用开发的著作,旨在迅速提高开
《Spring Batch批处理框架》内容简介:本书全面、系统地介绍了批处理框架Spring Batch,通过详尽的实战示例向读者展示了Spring Bat
本书为美国经典教材,介绍真实工作场景中,尤其是与科技相关的领域中的写作交流方法与策略。此书除讲述科技交流写作过程、结构方
《图像处理系统》内容简介:本书主要以图像处理系统方面的一些科技成果为基础,论述了图像处理系统的系统结构及其设计方法、图像处
《工具,还是武器?》内容简介:当下,科技正处于大迸发时代。新技术的开发、大数据的使用、人工智能的探索等,为人类提供了各种各
《纳税筹划实战精选百例(第6版)》内容简介:全书根据2016年3月23日后颁布的“营改增”政策修订,集实例与纳税筹划方案于一体,包
《理论是非辨与析:用马克思主义引领社会思潮》内容简介:党的十八大以来,以习近平同志为核心的党中央高度重视意识形态工作,旗帜
《ASP.NET4从入门到精通》以ASP.NET应用程序开发为主题,全面介绍了ASP.NET4的所有功能和特性。书中采用深受读者欢迎的stepbyste
《大众媒介研究导论》(第7版)是一本经典的媒介研究方法教材。两位作者都是资深的媒介研究者,维曼博士是一家市场研究公司的总裁和
葛列众,男,1956年出生,工学博士,浙江理工大学教授,博士生导师,心理研究所所长。现任中国心理学会工业心理学专业委员会副主
《信息化和工业化融合:方法与实践》内容简介:本书是对2009—2019十年来我国两化融合推进过程中形成的理论和实践成果作一个阶段性
《不烦恼的月子生活:开心做妈妈》内容简介:对新妈妈而言,宝宝的出生并不意味孕产生活的结束,而是一段影响终身的坐月子生活的开
《面向21世纪高等学校信息工程类专业规划教材·通信系统概论》共有8章内容,包括绪论、数字通信系统、多媒体通信系统、通信网络系
本书是Struts项目的完全指南,引导开发人员理解概念、设计和实现方式。虽然它由Struts应用程序的基础开始并深入讲解了Model2设计
《军人常见心理问题解析及辅导》内容简介:本书选取军人在兵之初、日常工作、人际交往、个人生活(包括恋爱、婚姻、家庭等方面)常
《头部主播养成计划:打造你的超级带货力》内容简介:如何定义电商和新零售?为什么直播卖货如此火爆?如何选择合适的直播带货平台
编辑推荐自媒体红利时代,无论企业还是个人,不融入,只能OUT!你愿意做时代的弃子?传统媒体正在受到新兴的自媒体的挑战,本书旨
《崧泽之光》内容简介:以古文化和水文化为特色的青浦,有着丰厚的历史文化底蕴。早在六千多年前,上海最早的外来移民来到了境内的
《数据与计算机通信(第7版)(影印版)》的宗旨是向读者完整地介绍数据与计算机通信这一广阔领域。作者通过书中的章节,将庞大的论题
《通信原理》(第5版)是在1980、1984、1988、1995年出版的《通信原理》教材的基础上,根据科技发展和教学改革实践的需要,经评审和