Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
Web Color Design:设计师谈网页配色 内容简介 本书是为网页设计师量身定制的配色方案专业书籍。全书共分七个部分,分别结合丰富的实例讲述了配色基础知...
高效能程序员的修炼 本书特色 jeff atwood于2004年创办codinghorror博客(http://www.codinghorror.com),记录...
OpenGL ES 3.x游戏开发-(下卷) 本书特色 本书共分14章,内容涵盖了从OpenGL ES 3.x着色器的使用技巧到高级光影效果以及物理模拟问题的解...
《苏东坡新传》内容简介:苏东坡,天赋异禀的大文豪,无可救药的乐天派…… 他的标签很多,却难以被标签定义。居高处为翰林学士,落
机器人爱好者-第2辑 本书特色 本书是美国机器人杂志《Servo》精华内容的合集。 全书根据主题内容的相关性,进行了精选和重新组织,分为5章。 第1章介绍了机器...
从0起飞五笔打字易学通 内容简介 本书内容包括:键盘的历史和种类、五笔字型输入法程序的使用、掌握五笔字型的基础知识、使用五笔输入法进行汉字的输入等。从0起飞五笔...
《智能物联网》内容简介:本书探讨了物联网的现状和未来发展趋势,以及所面临的相关安全问题。作者艾哈迈德·巴纳法(Ahmed Banafa
软件工程-理论与实践(第三版 影印版) 本书特色 本套教学用书的特点:权威性——教育部高等教育司推荐、教育部高等学校信息科学与技术引进教材专家组遴选。系统性——...