2.2 data Big

Idatha enkulu idalwe kwaye iqokelelwe yiinkampani noorhulumente ngenjongo ngaphandle kwephando. Ukusebenzisa le datha yophando, ngoko, kufuna ukuphindaphinda.

Indlela yokuqala abantu abaninzi abafumana ngayo uphando lwentlalo kwixesha ledijithali ngokubhekiselele kwizinto ezibizwa ngokuba yi- data enkulu . Nangona kusetyenziswe ukusetyenziswa kweli gama, akukho mvumelwano malunga nokuba yintoni idatha enkulu. Nangona kunjalo, enye yeenkcazelo eziqhelekileyo zeenkcukacha ezinkulu zijolise kwi "3 Vs": Umqulu, Uhlobo, kunye neVelocity. Kancinci, kukho idatha eninzi, kwiifom ezahlukeneyo, kwaye iyadalwa rhoqo. Abanye abalandeli beedata ezinkulu badibanisa ezinye i-"Vs" njenge-Veracity and Value, ngelixa abanye abagxekayo bongeza ii-Vs ezinjenge-Vague and Vacuous. Esikhundleni se-3 "Vs" (okanye i-5 "Vs" okanye i-"Vs" yesi-7), ngenjongo yophando loluntu, ndicinga ukuba indawo engcono yokuqala yi-5 "Ws": Ngubani, Yintoni, Kuphi, Xa , kwaye Kutheni. Enyanisweni, ndicinga ukuba imingeni nemingeni emininzi eyenziwa yimithombo yolwazi emikhulu ilandelwa ukusuka kwi "W" kuphela: Kutheni.

Ngomlinganiso we-analog, ininzi yedatha esetyenzisiweyo yophando loluntu yenzelwe ngenjongo yophando. Kwixesha ledijithali, nangona kunjalo, ixabiso elikhulu lwedatha lenziwa ngamashishini kunye noorhulumente ngenjongo ngaphandle kophando, njengokunikezela ngeenkonzo, ukuvelisa inzuzo kunye nokulawula imithetho. Abantu abakhelayo, nangona kunjalo, baqaphele ukuba unokwenza uhlalutye le nkcukacha zenkampani kunye nekarhulumente zophando. Ukucinga emva kokufana nobugcisa kwisahluko 1, njengoko iDuchamp iphinda iphinda ifunyenwe into efunyenweyo ukudala ubugcisa, izazinzulu ziyakwazi ukuphinda zifunyenwe idatha ukuze zenze uphando.

Nangona kukho amathuba amakhulu okuphindaphinda, ukusebenzisa i-data engadalwanga ngenjongo yophando iveza nemingeni emitsha. Thelekisa, umzekelo, inkonzo yenkonzo yoluntu, njenge-Twitter, ngoluvo lwenzululuntu yoluntu, njenge-General Social Survey. Iinjongo eziphambili ze-Twitter kukubonelela ngenkonzo kubasebenzisi bayo kwaye benze inzuzo. I-General General Survey, ngakolunye uhlangothi, ijolise ekudaleni idatha-jikelele yenzelwe uphando loluntu, ngokukodwa ukujonga uphando loluntu. Lo mahluko kwiinjongo kuthetha ukuba idatha edalwe yi-Twitter kwaye eyadalwa yi-General Social Survey inempahla eyahlukileyo, nangona zombini ingasetyenziselwa ukufunda uluntu loluntu. I-Twitter isebenza kwinqanaba kunye nokukhawuleza ukuba i-General Social Survey ayikwazi ukudibanisa, kodwa, ngokungafani ne-General Social Survey, i-Twitter ayiqaphelisisi abasebenzisi kwaye ayisebenzi kanzima ukugcina ukufaniswa kwexesha. Ngenxa yokuba le mihlaba emibini yedatha ihluke kakhulu, akunangqiqo ukusho ukuba i-General Social Survey ibhetele kune-Twitter okanye ngokuphambene nayo. Ukuba ufuna iimitha zemizuzu yeemeko zomhlaba (umzekelo, Golder and Macy (2011) ), i-Twitter iyona nto ibhetele. Ngakolunye uhlangothi, ukuba ufuna ukuqonda utshintsho lwexesha elide ekugqithiseni izimo zengqondo eUnited States (umz., DiMaggio, Evans, and Bryson (1996) ), ngoko ke i-General Social Survey iyona nto ikhethekileyo. Ngokuqhelekileyo, kunokuba uzame ukuthetha ukuba imithombo yolwazi emikhulu ibhetele okanye iyimbi ngakumbi kunezinye iindidi zedatha, esi sahluko siza kuzama ukucacisa ukuba zeziphi iimfuno zophando ezinkulu imithombo yolwazi eneempawu ezikhangayo kwaye zeziphi iindidi zeembuzo ezingenako efanelekileyo.

Xa sicinga ngemithombo emikhulu yedatha, abaphandi abaninzi baphendule ngokukhawuleza idatha e-intanethi eyenziwe kwaye iqokelelwe ngamashishini, njengengxowuni ye-injini yokukhangela kunye nezithuba zoononophelo loluntu. Nangona kunjalo, lo mgxininiso ogqithisileyo uphuma eminye imithombo emibini ebalulekileyo yedatha enkulu. Okokuqala, imithombo yolwazi ekhulayo yenkampani ivela kwiidivaysi zedijithali kwihlabathi elingokoqobo. Ngokomzekelo, kwesi sahluko, ndiza kukuxelela ngolu cwaningo oluthi luphinde luphinde lucacise idatha yokukhangela i-supermarket ukuqinisekisa ukuba imveliso yomsebenzi ifuthe njani kwimveliso yeontanga (Mas and Moretti 2009) . Emva koko, kwizahluko ezilandelayo, ndiya kukuxelela ngabaphandi abasebenzisa iirekhodi zeefowuni kwiiselula (Blumenstock, Cadamuro, and On 2015) kunye nedatha (Allcott 2015) eyenziwe ngamagesi kagesi (Allcott 2015) . Njengoko le mizekelo ibonisa, imithombo yamashishini emikhulu yenkampani iphezulu malunga nokuziphatha kwe-intanethi.

Umthombo wesibini obalulekileyo weenkcukacha ezinkulu ezingekho phantsi kwengqwalaselo encinci yokuziphatha kwe-intanethi yidatha eyenziwe ngoorhulumente. Le nkcukacha karhulumente, abaphandi ababiza iirekhodi zolawulo zikarhulumente , ziquka izinto ezifana neirekhodi zerhafu, iirekodi zesikolo kunye neengxelo zamanani ezibalulekileyo (umz., Ukubhaliswa kwabazalwa kunye nokufa). Oorhulumente baye badala olu hlobo lwedatha, kwezinye iimeko, amakhulu eminyaka, kwaye izazinzulu zenzululwazi ziye zazisebenzisa ngokukhawuleza ukuba zide zenze iinqununu zentlalo. Yintoni eye yatshintshile, nangona kunjalo, i-digitization, eyenze kube lula ukuba oorhulumente baqokelele, bathumele, bagcine, bahlaziye idatha. Ngokomzekelo, kwesi sahluko, ndiza kukuxelela ngephononongo ephindaphinda idatha esuka kwiimitha zeteksi zikaRhulumente waseNew York City ukuze idibanise ingxoxo ebalulekileyo kwizoqoqosho zabasebenzi (Farber 2015) . Emva koko, kwizahluko ezilandelayo, ndiza kukuxelela malunga nendlela iirekhodi zokuvota zikarhulumente ezisetyenziswe ngayo uphando (Ansolabehere and Hersh 2012) kunye novavanyo (Bond et al. 2012) .

Ndicinga ukuba imbono yokuphindaphinda ibalulekile ekufundeni kwimithombo emininzi yolwazi, kwaye ke, ngaphambi kokuba uthethe ngokuthe ngqo malunga nepropati yemithombo yolwazi emikhulu (icandelo 2.3) nendlela oku kusetyenziswa ngayo uphando (icandelo 2.4), ndingathanda ukunikezela ngeengcebiso ezibini malunga nokuphindaphinda. Okokuqala, kunokuzama ukucinga ngokuchaseneyo endikuseleyo njengendawo ekhoyo phakathi kwedata "efunyenweyo" nedatha "eyenziwe". Kusondele, kodwa akunjalo. Nangona, ngokubhekiselele kwimbono yabaphandi, imithombo enkulu yedatha "ifunyenwe," abayi kuwela esibhakabhakeni. Endaweni yoko, imithombo yolwazi "ifumaneka" ngabaphandi yenziwe ngumntu ngenjongo ethile. Ngenxa yokuba "ifunyenwe" idatha idalwe ngumntu, ndihlala ndincoma ukuba uzame ukuqonda ngokubanzi malunga nabantu kunye neenkqubo ezidale idatha yakho. Okwesibini, xa uhlaziya idatha, ngokuqhelekileyo kunceda kakhulu ukuba ucinge i-dataset efanelekileyo yenkathazo yakho uze uqhathanise le dataset efanelekileyo nento oyisebenzisayo. Ukuba awuzange uqokelele idatha yakho ngokwakho, kunokubakho ukubaluleka kokubaluleka phakathi kokufunayo kunye nento onayo. Ukuqaphela le ngxabano kuya kunceda ukucacisa oko unokukwazi kwaye awukwazi ukufunda kwiidatha onayo, kwaye ingakhombisa idatha entsha ofuna ukuyiqokelela.

Kwamava am, izazinzulu zenzululwazi kunye nolwazi lwezentlupheko zivame ukuzithengisa ngokungafaniyo. Inzululwazi zentlalo, abajwayele ukusebenza kunye nedatha eyenzelwe uphando, ngokukhawuleza ukukhawuleza iingxaki ngeenkcukacha eziphinda zikhutshwe ngelixa zihoxisa amandla alo. Ngakolunye uhlangothi, izazinzulu zedatha zikhawuleza ukubonisa iingenelo zeedatha eziphinda zikhunjulwe ngelixa zinyanzelisa ubuthathaka bayo. Ngokuqhelekileyo, indlela eyona ndlela ibhetele i-hybrid. Okokuthi, abaphandi kufuneka baqonde impawu yemithombo emininzi yolwazi-kokubili kokuhle nokubi-uze ufunde indlela yokufunda kuyo. Yaye, eso siyilo salo sisele kwisahluko. Kwinqanaba elilandelayo, ndiza kuchaza iimpawu eziqhelekileyo ezilishumi zemithombo yolwazi emikhulu. Emva koko, kwicandelo elilandelayo, ndiza kuchaza iindlela ezintathu zokuphanda ezinokusebenza kakuhle ngelo datha.