2.3.1 Big

Datasets Large ndi yotithandiza; iwo si mapeto mu okha.

Mbali yochulukitsidwa kwambiri pazinthu zazikulu za deta ndikuti ndi BIG. Mapepala ambiri, mwachitsanzo, ayambe kukambirana-ndipo nthawi zina kudzitukumula-za kuchuluka kwa deta zomwe adazifufuza. Mwachitsanzo, pepala lofalitsidwa mu Sayansi yophunzira mawu ogwiritsira ntchito mawu mu Google Books corpus ili ndi zotsatirazi (Michel et al. 2011) :

Chilankhulo chathu chili ndi mawu oposa 500 biliyoni, m'Chingelezi (361 biliyoni), Chifalansa (45 biliyoni), Chisipanishi (45 biliyoni), Chijeremani (37 biliyoni), Chinese (13 biliyoni), Russian (35 biliyoni), ndi Chihebri (2 biliyoni). Ntchito zakale kwambiri zinafalitsidwa m'ma 1500. Zaka makumi oyambirira zikuyimiridwa ndi mabuku ochepa chabe pachaka, omwe ali ndi mawu mazana angapo. Pofika m'chaka cha 1800, corpus imakula kufika pa 98 miliyoni mawu pa chaka; pofika 1900, 1.8 biliyoni; ndipo pofika 2000, 11 biliyoni. The corpus sitingakhoze kuwerengedwa ndi munthu. Ngati mutayesa kuwerenga zolembedwera m'Chingelezi zokha kuyambira chaka cha 2000 zokha, pamlingo woyenerera wa mawu 200 / min, popanda kusokonezeka kwa chakudya kapena kugona, zingatenge zaka 80. Mndandanda wa makalata ndi wotalika nthawi 1000 kuposa momwe thupi laumunthu limakhalire: Ngati mwalemba ilo molunjika, likhoza kufika ku Mwezi ndi kubwereza katatu. "

Kuchuluka kwa deta iyi mosakayikira ndi kodabwitsa, ndipo ndife osowa kuti gulu la Google Books latulutsa deta iyi kwa anthu onse (zowona, zina mwazochitika kumapeto kwa mutu uno zigwiritse ntchito deta iyi). Koma, nthawi iliyonse mukamawona chinachake chonga ichi muyenera kufunsa: kodi zonsezi zikuchitikadi? Kodi iwo akanachita kafukufuku womwewo ngati deta ikhoza kufika kwa Mwezi ndi kubwerera kamodzi kokha? Nanga bwanji ngati deta ingathe kufika pamwamba pa Phiri la Everest kapena pamwamba pa Eiffel Tower?

Pachifukwa ichi, kafukufuku wawo ali ndi zowonjezera zomwe zimafuna mawu ochuluka kwa nthawi yaitali. Mwachitsanzo, chinthu chimodzi chimene iwo amafufuza ndicho kusintha kwa galamala, makamaka kusintha kwa mlingo wosasinthika. Popeza zizindikiro zina zosawerengeka sizikhala zochepa, deta yochuluka imayenera kuzindikira kuti kusintha kwa nthawi. Kawirikawiri, kafukufuku amawoneka ngati akupanga kukula kwa deta yaikulu monga mapeto- "yang'anani kuchuluka kwa deta yomwe ine ndingathe kuigwedeza" -kuposa njira yowonjezera cholinga china cha sayansi.

Mwachidziwitso changa, kuphunzira zochitika zosayembekezereka ndi chimodzi mwa zinthu zitatu zenizeni zomwe zasayansi amatha kuzigwiritsa ntchito. Lachiwiri ndilo phunziro la kuperewera kwa magazi, monga momwe tingagwirizire ndi phunziro la Raj Chetty ndi anzathu (2014) kuti tiyende bwino ku United States. M'mbuyomu, ofufuza ambiri aphunzira kusintha kwa anthu poyerekezera zotsatira za moyo wa makolo ndi ana. Kufufuza kosasinthika kuchokera m'mabuku awa ndi kuti makolo opindula amakonda kukhala ndi ana opindula, koma mphamvu za ubale umenewu zimasiyanasiyana nthawi ndi mayiko ena (Hout and DiPrete 2006) . Posachedwapa, Chetty ndi anzake adatha kugwiritsa ntchito zolemba za msonkho kuchokera kwa anthu mamiliyoni makumi 40 kuti awonetse kusagwirizana pakati pa mayiko osiyanasiyana ku United States (chifaniziro 2.1). Mwachitsanzo, iwo adapeza kuti mwina mwanayo akufikira pa quintile yapamwamba ya kugawidwa kwa ndalama kuchokera ku banja pa quintile pansi ndi pafupifupi 13% ku San Jose, California, koma pafupifupi 4 peresenti ku Charlotte, North Carolina. Ngati muyang'ana pa chifaniziro 2.1 kwa mphindi, mukhoza kuyamba kudzifunsa kuti n'chifukwa chiyani kuyenda pakati pa anthu osiyana pakati pawo kumakhalako kwina kuposa ena. Chetty ndi ogwira nawo ntchito anali ndi funso lomwelo, ndipo adapeza kuti madera okwerawo ali ndi tsankho lochepa, kuchepa kwa ndalama zochepa, masukulu apamwamba apamwamba, mabungwe akuluakulu, komanso kukhazikika kwa banja. Zoonadi, mgwirizano uwu wokha sumasonyeza kuti izi zimapangitsa kuti apite patsogolo, koma amalingalira njira zomwe zingathe kufufuzidwe kuntchito yowonjezera, zomwe Chetty ndi anzake agwira ntchito yotsatira. Onani momwe kukula kwa deta kunali kofunikira kwambiri pulojekitiyi. Ngati Chetty ndi anzake adagwiritsa ntchito zolemba za msonkho za anthu zikwi makumi 40 kuphatikizapo mamiliyoni 40, sakanakhoza kulingalira za kugonana kwa dera lawo ndipo sakanakhoza kuchita kafukufuku wotsatira pofuna kuyesa kuzindikira njira zomwe zimapangidwira zosiyanazi.

Chithunzi 2.1: Chiwerengero cha mwayi wa mwana wopeza ndalama 20 peresenti ya kufalitsa ndalama kumapereka makolo pansi 20% (Chetty et al. 2014). Zomwe zili m'deralo, zomwe zimasonyeza kusagwirizana, mwachibadwa zimabweretsa mafunso ochititsa chidwi ndi ofunika omwe samawuka kuchokera kulingaliro limodzi la dziko lonse. Zomwe zigawo za m'maderawa anaziyika zinatheka chifukwa mbaliyi idali chifukwa chakuti ochita kafukufukuwa anali kugwiritsa ntchito gwero lalikulu la deta: zolemba za msonkho za anthu 40 miliyoni. Adapangidwa kuchokera ku deta likupezeka pa http://www.equality-of-opportunity.org/.

Chithunzi 2.1: Chiwerengero cha mwayi wa mwana wopeza ndalama 20 peresenti ya kufalitsa ndalama kumapereka makolo pansi 20% (Chetty et al. 2014) . Zomwe zili m'deralo, zomwe zimasonyeza kusagwirizana, mwachibadwa zimabweretsa mafunso ochititsa chidwi ndi ofunika omwe samawuka kuchokera kulingaliro limodzi la dziko lonse. Zomwe zigawo za m'maderawa anaziyika zinatheka chifukwa mbaliyi idali chifukwa chakuti ochita kafukufukuwa anali kugwiritsa ntchito gwero lalikulu la deta: zolemba za msonkho za anthu 40 miliyoni. Adapangidwa kuchokera ku deta likupezeka pa http://www.equality-of-opportunity.org/.

Potsirizira pake, kuwonjezera pa kuphunzira zochitika zosawerengeka ndi kuphunzira kusagwirizana, zigawo zazikuluzikulu zimathandizanso ofufuza kuzindikira kusiyana kwakukulu. Ndipotu, zambiri zomwe zimagwiritsidwa ntchito pazinthu zazikuluzikulu ndizosiyana zazing'onozi: kuzindikira mosamalitsa kusiyana pakati pa 1% ndi 1.1% chiwerengero chodutsa pa malonda amatha kumasulira madola mamiliyoni ambiri phindu lina. Muzinthu zina zasayansi, komabe kusiyana kwakukulu kotere sikungakhale kofunikira, ngakhale kuti ndikofunika kwambiri (Prentice and Miller 1992) . Koma, muzikonzedwe zina, angathe kukhala ofunikira pamene akuwoneka mowonjezera. Mwachitsanzo, ngati pali njira ziwiri zothandizira thanzi labwino ndipo imodzi imakhala yothandiza kwambiri kuposa yina, ndiye kuti kutenga njira yowonjezera yowonjezereka kungathe kupulumutsa miyoyo yambirimbiri.

Ngakhale kuti kukula ndi malo abwino ngati agwiritsidwa ntchito molondola, ndazindikira kuti nthawi zina zimatha kuwongolera zolakwika. Pazifukwa zina, ukulu ukuoneka kuwatsogolera ofufuza kunyalanyaza momwe deta yawo inapangidwira. Ngakhale kukula kumachepetsa kufunika kokhumudwa ndi zolakwika zosavuta, zimapangitsa kufunika kudandaula za zolakwika zolakwika, zolakwika zomwe ndifotokoze m'munsizi zomwe zimachokera ku zosokoneza momwe deta yapangidwira. Mwachitsanzo, pulojekiti ndikufotokozera mtsogolo muno, ochita kafukufuku anagwiritsa ntchito mauthenga omwe anapangidwa pa September 11, 2001 kuti apange ndondomeko yowonongeka yowonongeka kwa zomwe zidachitika ndi magulu achigawenga (Back, Küfner, and Egloff 2010) . Chifukwa chakuti ochita kafukufuku anali ndi mauthenga ambirimbiri, sanafunikire kudandaula kuti kaya awonetsedwe bwanji-kuwonjezeka mkwiyo pa nthawi ya tsiku-akhoza kufotokozedwa mwa kusintha kopanda phindu. Panali deta yochuluka kwambiri ndipo ndondomekoyi inali yoonekeratu kuti mayesero onse ofufuza awonetsera kuti ichi chinali chitsanzo chenichenicho. Koma, mayesero owerengetserawa sanadziwe momwe deta yapangidwira. Ndipotu, zinapezeka kuti machitidwe ambiri anali opangidwa ndi botolo limodzi lomwe linapanga mauthenga ambirimbiri opanda pake tsiku lonse. Kuchotsa bukhu ili kumathetsa zina mwazidule zomwe zili mu pepala (Pury 2011; Back, Küfner, and Egloff 2011) . Osavuta, ofufuza omwe saganizira za zolakwika zolakwika amakumana ndi chiopsezo chogwiritsa ntchito ma dataset awo akuluakulu kuti awonetsetse bwino kuchuluka kwa mauthenga, monga mauthenga omwe alibe mauthenga opanda pake omwe amapangidwa ndi bot.

Pomalizira, ma datasti akuluakulu sali mapeto mwa iwo okha, koma amatha kupanga mitundu ina ya kafukufuku kuphatikizapo kufufuza zochitika zosawerengeka, kuyerekezera kutengana, komanso kuzindikira kusiyana kwakukulu. Ma datasitoma akuluakulu akuwoneka kuti amatsogolera ochita kafukufuku kuti asanyalanyaze momwe deta yawo inalengedwera, zomwe zingawathandize kupeza tsatanetsatane wa kuchuluka kosafunikira.