Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
ThisupdatetoaWroxbestsellerdivesinandguidesthereaderthroughtheentireprocessofcre...
《钢琴套曲·葛蓓莉娅》内容简介:本书根据法国作曲家德里勃创作的芭蕾音乐《葛蓓莉娅》而改编的钢琴套曲。德里勃以霍夫曼的故事《
本书深入浅出介绍了人机交互系统的原理与应用,内容涉及电子原理基础、设备选型、建立集成电路、微处理编程、计算机通信、转换和
Web Color Design:设计师谈网页配色 内容简介 本书是为网页设计师量身定制的配色方案专业书籍。全书共分七个部分,分别结合丰富的实例讲述了配色基础知...
《心智社会》内容简介:人类思维是一个复杂的过程。“为什么人类下雨不想被淋湿,却愿意在卫生间沐浴”这种3岁小孩都知道的问题,计
高效能程序员的修炼 本书特色 jeff atwood于2004年创办codinghorror博客(http://www.codinghorror.com),记录...
苹果热门产品“i”系列的创意鼻祖乔布斯御用17年的广告狂人一旦做到了简洁,你将无所不能。——史蒂夫·乔布斯对史蒂夫·乔布斯来
OpenGL ES 3.x游戏开发-(下卷) 本书特色 本书共分14章,内容涵盖了从OpenGL ES 3.x着色器的使用技巧到高级光影效果以及物理模拟问题的解...
《通信原理习题集》是北京邮电大学出版社出版的《通信原理》教材的配套教学参考书。《通信原理习题集》可分三部分:第一部分包括
《苏东坡新传》内容简介:苏东坡,天赋异禀的大文豪,无可救药的乐天派…… 他的标签很多,却难以被标签定义。居高处为翰林学士,落
全书一共分为9章,首先从宏观上介绍了CSS3技术的最新发展现状、新特性,以及现有的主流浏览器对这些新特性的支持情况;然后详细讲
机器人爱好者-第2辑 本书特色 本书是美国机器人杂志《Servo》精华内容的合集。 全书根据主题内容的相关性,进行了精选和重新组织,分为5章。 第1章介绍了机器...
从0起飞五笔打字易学通 内容简介 本书内容包括:键盘的历史和种类、五笔字型输入法程序的使用、掌握五笔字型的基础知识、使用五笔输入法进行汉字的输入等。从0起飞五笔...
"Itsnotjustwhatitlookslikeandfeelslike.Designishowitworks."-SteveJobsTheresanewr...
《智能物联网》内容简介:本书探讨了物联网的现状和未来发展趋势,以及所面临的相关安全问题。作者艾哈迈德·巴纳法(Ahmed Banafa
软件工程-理论与实践(第三版 影印版) 本书特色 本套教学用书的特点:权威性——教育部高等教育司推荐、教育部高等学校信息科学与技术引进教材专家组遴选。系统性——...
《造型原本》是作者中央美术学院教授吕胜中以广受学生欢迎的“造型原本”课的讲稿为基础,结合大量作品实例,带读者发现各种风格
Originallypublishedin1985,NeilPostmansgroundbreakingpolemicaboutthecorrosiveeffe...
《Java自然语言处理》内容简介:本书将教会读者如何在Java库的帮助下执行语言分析,同时不断地从结果中获得见解。首先介绍NLP及其各
《人间值得一回游》内容简介:任凭心中涌起孤独万种,观照尘世依然欢喜如初。散文大家刘白羽一生写出了大量具有深刻思想内涵和独特