Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《情绪聚焦疗法的刻意练习》内容简介:近年来,心理治疗的刻意练习得到广泛的关注,其对心理治疗效果的预测得到了相当程度的验证,
移动学习理论与实践 本书特色 《移动学习理论与实践》通过对移动学习的理论分析,探讨了与其紧密相关的技术问题及实现模式,并通过案例详细介绍了移动学习系统的构建过程...
本书从软件开发者角度出发,详细介绍了现代计算机体系结构,重点讲解如何处理存储器问题以及如何写出能直接与底层硬件交互并充分
全国计算机等级考试一级教程-计算机基础及WPS Office应用-(2018年版) 本书特色 本书是根据《全国计算机等级考试一级WPS Office考试大纲(2...
知名餐桌造型师、《爱就是在一起,吃好多好多顿饭》作者曾焱冰翻译推荐,餐桌美学经典之作。内容简介:◆餐桌布置一直都是社交中
《这里是故宫》内容简介:只露声音的宫殿君称得上是行走的“故宫资料库”,他以特别“接地气”的方式讲述故宫的建筑黑科技、超级神
《灵活Web设计》讲述如何应用可变或不固定布局及弹性布局来实现灵活设计,以满足用户的根据自己需求而调整浏览站点的窗口大小的要
《梳理:从混乱到有序,人生提效50%》内容简介:毕业在即,考研还是工作?自己想要的到底是什么?很迷茫; 逛知乎、看豆瓣、浏览微
并行程序设计(第2版) 本书特色 本书系统介绍并行程序设计原理及应用。除介绍常用的一些算法范例,包括分治、流水、同步计算、主从及工作池,还介绍了一些常用的经典数...
FromthecoauthorsoftheNewYorkTimesbestsellerAbundancecomestheirmuchanticipatedfol...
《冰心散文》内容简介:本书精选冰心经典散文八十余篇,既有早期的《笑》《寄小读者》《往事》等中国新文学史上脍炙人口的名篇,也
《温故(十三)》内容简介:《温故》是一种陆续出版的历史文化读物。以今天的视角来追怀与审视过去,并为当下的生存与未来的发展提
Java 数据库高级编程宝典 本书特色 内容丰富、权威,详尽细致地介绍了软件开发环境的搭建方法,通过完整的网络相册和留言板实例展现了JSP+SServlet+J...
《画笔之下:插画设计入门教程》内容简介:这是一本面向初学者和插画爱好者的教程书,书中将插画设计的思维、技法、理论穿插于六个
《开会是门技术活儿》内容简介:《开会是门技术活儿》是一本讲述开会的门道和技巧的通俗实用书,分为“秘诀篇”和“实战篇”。例如
Theinternethasbecomeembeddedintoourdailylives,nolongeranesotericphenomenon,butin...
《聚势》内容简介:本书首先从理论上分析移动互联网时代的渠道发展趋势,提出渠道运营管理“442”模型,解析通信业渠道发展历史和发
《华杉讲透《资治通鉴》10》内容简介:《资治通鉴》从战国写到五代十国,生动展现了16个朝代1362年历史中一个个活生生的人和故事,
《初中现代文阅读内容把握与方法突破》内容简介:本书以现行课程标准为研究抓手,以整体序列设计课程标准教学要求,明确具体的教学
《二战尖端武器鉴赏指南(珍藏版)》内容简介:本书筛选了大量自一战结束至二战结束的划时代高尖端武器,以及当时各国主力、或者具