Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
《萝卜回来了(中国儿童文学经典书系·第一辑)》内容简介:《萝卜回来了》是方轶群的童话集。方轶群非常注重语言的优美、浅显、规
陈安玉口腔种植学 内容简介 《陈安玉口腔种植学》是1991年陈安玉教授主编出版的我国**部《口腔种植学》专著的再版,本书在原著的基础上,进一步梳理并补充了近20...
九朝律考 内容简介 汉至隋9个朝代的法律资料汇编。作者鉴于唐代以前的法典散失无存,从现存史籍中,收集公元前2世纪至公元7世纪间各种零散的法律资料,逐一考订,按朝...
《中国法律史研究(2017年卷)》内容简介:《中国法律史研究》是中国法律史学会的会刊,由中国法律史学会主办。会刊以中国法律史研
彭湃,青年作家、《紫色BOX》资深编辑、杂志平面模特。擅讲青春故事,国内公认最受女生欢迎的优质暖男。喜欢养猫、熬夜、追剧、旅行、写作、爱在微博写段子和煲鸡汤,江...
《一双慧眼》内容简介:本书从医疗行业的现状分析入手,一步一步分析眼科领域现状及前景,眼科细分的屈光视光领域的现状。从一个入
★普利策奖得主、《奥丽芙·基特里奇》作者新作收录《我叫露西·巴顿》与《一切皆有可能》★每个尝过孤独之痛的人,都会因为这本书而感觉好多了!★《纽约时报》畅销榜第一...
从公堂走向法庭-清末民初诉讼制度改革研究 本书特色 《从公堂走向法庭:清末民初诉讼制度改革研究》为中国政法大学出版社出版。从公堂走向法庭-清末民初诉讼制度改革研...
中国文化产业突出贡献奖获得者,中国节庆产业杰出人物,中央电视台特邀专家,环球活动网、盛典传播创始人。先后获得“中国优秀品牌专家”、“中国节庆产业十大理论人物”、...
高居翰教授(Professer James Cahill),1926年出生于美国加州,是当今中国艺术史研究的权威之一。1950年,毕业于柏克莱加州大学东方语文学...
【受访人】Sandor Ellix Katz / 现居美国,《发酵圣经》作者,世界发酵复兴驱动者之一。高桥万太郎 / 日本“职人酱油”品牌掌门人、传统设计工房董...
米米拉,青春文学畅销作者。曾出版《214度恶龙王子》、《变装小姐真心殿》系列等多部畅销作品,其作品以风格稳定,时代性强,能够始终贴近目标著称。被业界誉为“校园青...
黄易,香港著名新武侠宗师。原名:黄祖强1952年生。1989年开始创作长篇新武侠小说。代表作品:《破碎虚空》、《大唐双龙传》、《寻秦记》、《边荒传说》、《覆雨翻...
以赛亚·伯林1909年生于拉脱维亚的里加,1920年迁居英格兰,其后一直受牛津教育,后半生也一直任教、居住于牛津。他甚至成为牛津大学的一个学术象征,圣约翰学院的...
潮州传统建筑大木构架体系研究 本书特色 本图书以潮州区地理环境及开发历史特点的角度,对潮州传统建筑体系形成其特点的必然性进行了论述。主要分为六章:潮州传统建筑发...
这不是一本教你如何早起的书,而是教会你如何更好的管理你的一天。把握起床后的黄金一小时。所以,无论你是早上6点还是8点起床,只要用好这一小时,你都能成为高效的人。...
一个从科网泡沫中赚钱却又全身而退的传奇人物,告诉你科技与资本的互动之路。只有创新的技术,却没有资金支持,只能贡献泡沫;空有资金,却没有技术可供投资,我们会原地踏...
作品目录温暖和百感交集的旅程�卡夫卡和K�山鲁佐德的故事�博尔赫斯的现实�契诃夫的等待�布尔加科夫与《大师和玛格丽特》�
文/图 [俄]卡特瑞娜·格瑞里克莫斯科的儿童作家和插画家。她拥有大学法律学位,曾担任律师。但她后来决定把精力集中在她小时候最喜欢的事情上——画画。2015年,卡...
小到减肥、健身、戒烟,大到考学、升职、创业,人生就是不断升级打怪、实现各种目标的过程。无论目标大小,如果成功实现,人生就能顺利推进;如果失败,人生就会停滞不前甚...