Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
MSC Adams多体动力学仿真基础与实例解析-(赠1DVD) 本书特色 《mscadams多体动力学仿真基础与实例解析(附光盘)/万水msc技术丛书...
网页设计,是指网页设计者以既有的技术和艺术知识为基础,依照设计目的和要求,自觉地对网页的构成元素进行艺术构思,创造出艺术
本书概述了云计算的起源、发展以及商业模式,介绍了云计算的关键技术、典型应用以及开源软件和社区。云计算既是机遇也是挑战,中
LeeAllen是在顶尖大学里任职的安全架构师。多年以来,他持续关注信息安全行业和安全界内的新近发展。他有15年以上的IT行业经验,
《文徵明行书琵琶行》内容简介:文徵明行书《琵琶行》,书于八十八岁(一五五七),现藏于湖南省博物馆。正文所录《琵琶行》是白居
《蔡澜说美食:学会浅尝二字》内容简介:蔡澜先生说: 吃不饱的菜,最妙。豆那么细小,一颗颗吃,爱惜每一粒的滋味,也爱惜了人生中
ThisbookwillintroduceyoutothePythonprogramminglanguage.It’saimedatbeginningprogr...
Fiddler是一种流行的Web调试代理。它功能强大,界面友好,简单易用,无论对开发人员或者测试人员来说,都是非常有用的工具。《Fi
《社会批判理论纪事(第10辑)》内容简介:本书包括以下三个部分:法国著名作家、思想家莫里斯·布朗肖思想专辑、各国学者对《马克
本书用精炼的内容介绍了基础知识,把重点放在了各种Eclipse插件的使用和工程的实例开发中,能帮助有一些Web基础知识的读者迅速地
《数码单反摄影从新手到高手》内容简介:本书是专门为单反摄影初学者编写的一本相机设置、操控与实拍运用指南,帮助你深入了解单反
《佳爷房谈:购房租房一本通》内容简介:本书分为房产基础知识、购买新房、购买二手房、租房四部分,采取问答形式,运用通俗易懂的
《破绽:风口上的独角兽》内容简介:互联网时代是一个英雄不问出处的草莽时代。这个时代造就了一大批独角兽公司和新兴业态,它们出
《实用语义网RDFS与OWL高效建模(英文版)》是语义网的入门教程,详细讲述语义网的核心内容的语言,包括语义网的概念、语义建模等。
电商圈第一本自媒体著作《做自己——鬼脚七自媒体第一季》出版以来,深受广大读者喜爱。本书是鬼脚七的第二本书《爱生活——鬼脚
《语文课超有趣:部编本语文教材同步学(五年级·下册)》内容简介:从小学到初中,每个年级、每一篇课文都配有若干拓展阅读的文章
《阅读理解高分6法:顺序法》内容简介:阅读理解想要拿高分,文章读不透,学习再多答题技巧也是事倍功半!本书聚焦“文章如何读”“
《图解博弈心理学·微表情心理学》内容简介:本书主要针对各行各业的精英人士以及想学习微表情心理学知识的人员而编写。全书以分析
《赋能业务》内容简介:现有团队的定位与公司的需要脱节?团队积累的技能和经验,并不符合公司的需求方向?某些部门和团队的工作变
《物理原来很有趣:李淼的30堂物理课》内容简介:本书将精炼的物理学知识囊括进30堂通识课之中,是著名科普物理学家李淼老师的全新