2.3.2.6 ezimdaka

Imithombo yedatha Big inokufakwa kunye engabalulekanga kunye spam.

Abanye abaphengululi bakholelwa ukuba imithombo yedatha enkulu, ingakumbi abo kwimithombo Intanethi, ngaba acwengileyo kuba ziqokelelwa ngokuzenzekelayo. Eneneni, abantu abaye basebenza kunye imithombo yedatha enkulu bayazi ukuba ezimdaka rhoqo. Oko kukuthi, badla iquka data ezingabonakalisi iintshukumo wokwenene umdla abaphandi. Izazinzulu ezininzi zentlalo sele uqhelene inkqubo wokucoca ezinkulu-isikali survey zentlalo data, kodwa ukucoca imithombo yedatha omkhulu nzima ngenxa yezizathu ezibini: 1) azizange zidalwe abaphandi ukuba abaphandi kunye 2) abaphandi jikelele Ndinokuqiqa ngaphantsi indlela zadalwa.

Ngeengozi data wokulanda yedijithali zimdaka ibonisiwe yi Umva noogxa ' (2010) ukufunda impendulo ezingeemvakalelo sokuhlaselwa September 11, 2001. Abaphandi fundela sibona iziganeko ezimanyumnyezi usebenzisa iinkcukacha ukuqalela eqokelelwe phezu iinyanga okanye iminyaka. Kodwa, Emva kunye nabo bafumana imiyalezo usoloko-nomthombo enkcazelo imizila-le timestamped, ebhalwe ngokuzenzekelayo ukusuka 85,000 American neepager-yaye oku kwenza abaphandi ukuba bafunde impendulo ngokweemvakalelo phezu isikali kakhulu omhle. Umva kunye nabalingane udale umzuzu-by-umzuzu amaxesha ngokweemvakalelo ngoSeptemba 11 ngokuthi nokhowudo isiqulatho ngokweemvakalelo yemiyalezo pager ngokuthi ipesenti amagama anxulumene (1) intlungu (umzekelo, bedanduluka, intlungu), (2) ixhala (umzekelo, nexhala, Woyikekayo), kunye (3) umsindo (umzekelo, intiyo, ezibalulekileyo). Bafumanisa ukuba intlungu kunye namaxhala mihla yonke imini ngaphandle ipateni owomeleleyo, kodwa ukuba kukho ukwanda umxhelo ngomsindo imini yonke. Olu phando kubonakala ukuba umzekeliso emangalisayo amandla lokusoloko-on imithombo yedatha: usebenzisa iindlela umgangatho bekuya kuba nzima ukuba onjalo eliphezulu-isisombululo amaxesha amisiweyo impendulo ekhawulezileyo sisiganeko engalindelekanga.

Emva enye kanye ngonyaka, nangona kunjalo, Cynthia Pury (2011) Ndajonga data ngocoselelo nangenyameko. Wafumanisa ukuba inani elikhulu imiyalezo ekwakufanele ukuba nomsindo lwaye lwenziwa liqela pager enye yaye bonke twatse. Nazi izinto ezo imiyalezo ekwakusithiwa ngumsindo wathi:

"Kumatshini Yivule NT [igama] kumbuso [igama] kwinqanaba [indawo]: EZIMANDUNDU: [umhla kunye nexesha]"

Ezi izigidimi enombhalo nomsindo kuba wawuquka igama elithi "EZIMANDUNDU", apho lisenokuthetha ngokubanzi umsindo kodwa akuthethi kule meko. Ukususa imiyalezo eveliswa ngulo pager enye automated akubikho ngokupheleleyo zixhaphake ngomsindo phezu khosi yosuku (Figure 2.2). Ngamanye amazwi, isiphumo eziphambili Back, Küfner, and Egloff (2010) waye bento ye pager enye. Njengoko lo mzekelo ubonisa, uhlalutyo olulula le nkcukacha kwaye lixelegu inamandla ukuya nzulu engalunganga.

Isazobe 2.2: Intsingiselo Uqikelelo ngomsindo ngenxa kaSeptemba 11, 2001 ngokusekelwe 85,000 neepager yaseMelika (Umva, Küfner, kwaye Egloff 2010; Pury 2011; Umva, Küfner, kwaye Egloff 2011). Ekuqaleni, Emva, Küfner, kwaye Egloff (2010) ingxelo indlela yokwandisa umsindo imini yonke. Nangona kunjalo, uninzi lwezi imiyalezo ecacileyo nomsindo zaye eveliswe pager single ngokuphindaphindiweyo wathumela umyalezo olandelayo: Qalisa NT Umatshini [igama] kumbuso [igama] kwinqanaba [indawo]: EZIMANDUNDU: [umhla kunye nexesha]. Ekubeni lo myalezo kususwa, zixhaphake ngomsindo liyalahleka (Pury-2011; Umva, Küfner, kwaye Egloff 2011). Eli nani enomfanekiso Isazobe 1B in Pury (2011).

Isazobe 2.2: Intsingiselo Uqikelelo ngomsindo ngenxa kaSeptemba 11, 2001 ngokusekelwe 85,000 neepager yaseMelika (Back, Küfner, and Egloff 2010; Pury 2011; Back, Küfner, and Egloff 2011) . Ekuqaleni, Back, Küfner, and Egloff (2010) ingxelo indlela yokwandisa umsindo imini yonke. Nangona kunjalo, uninzi lwezi imiyalezo ecacileyo nomsindo zaye eveliswe pager single ngokuphindaphindiweyo wathumela umyalezo olandelayo: "Yivule NT umatshini [igama] kumbuso [igama] kwinqanaba [indawo]: EZIMANDUNDU: [umhla kunye nexesha]". Ekubeni lo myalezo kususwa, zixhaphake ngomsindo liyalahleka (Pury 2011; Back, Küfner, and Egloff 2011) . Eli nani enomfanekiso Isazobe 1B in Pury (2011) .

Nangona data emdaka ukuba kudalwe ngokulahleka-ezifana ukusuka enye ingxolo pager-inokufunyaniswa yi uphando olucokisekileyo ngokufanelekileyo, kukho kwakhona ezinye iinkqubo zekhompyutha umtsalane spammers ngabom. Ezi spammers ngenkuthalo ukujikelezisa data zomgunyathi, yaye-soloko ziqhutywa nzima kakhulu ingeniso-umsebenzi ukugcina kwee yabo ifihliwe. Ngokomzekelo, kubonakala ezopolitiko on Twitter ukuba makaquke ubuncinane ezinye spam ngokufanelekileyo nobunzima, apho wenza ngabom abanye oonobangela zezopolitiko ukuba khangela zithandwa kakhulu kunabo iyiyo (Ratkiewicz et al. 2011) . Abaphandi ukusebenza data ezinokuba spam ngabom kujamelana nocelomngeni ngokubeyisela abaphulaphuli babo ukuba baye ibhaqe wazisusa spam efanelekileyo.

Okokugqibela, oko kugqalwa data ukungcola kuxhomekeka ngeendlela ezingaqondakaliyo imibuzo yakho yophando. Ngokomzekelo, edits ezininzi Wikipedia yenziwe bots automated (Geiger 2014) . Ukuba unomdla kwi eziphilayo Wikipedia, ngoko la bots zibalulekile. Kodwa ke, ukuba unomdla kwindlela abantu negalelo Wikipedia, ezi edits elenziwa ngala bots makungaqukwa.

Eyona ndlela ibhetele ukuphepha ukuqhathwa data emdaka ukuze siqonde indlela zadalwa data yakho ukwenza uhlalutyo elula ephicothayo, efana nokwenza iziza usichithachithe elula.