2.3.9 Dirty

Big data kafofin za a iya ɗora Kwatancen da takarce kuma spam.

Wasu masu bincike sunyi imanin cewa manyan masanan bayanai, musamman mabuluban intanet, suna da kyau saboda an tattara su ta atomatik. A gaskiya ma, mutanen da suka yi aiki tare da manyan masanan bayanai sun san cewa suna da datti . Wato, sun hada da bayanai da ba su dace da ayyukan da suke da sha'awa ga masu bincike. Yawancin masanin kimiyyar zamantakewa sun riga sun saba da tsarin tsaftace tsaftaran bincike na zamantakewa, amma tsaftace manyan asusun bayanai yana da wuya. Ina tsammanin babban mawuyacin wannan matsala ita ce, yawancin wadannan manyan bayanan bayanan basu taba amfani da su ba don bincike, saboda haka ba a tara su ba, adana, da kuma rubuce-rubuce a hanyar da za ta tsaftace tsaftace bayanai.

Hanyoyin haɗari na lalataccen labarun dijital sun nuna misalin Back da colleagues ' (2010) nazarin yadda za a kai ga hare-haren na ranar 11 ga Satumba, 2001, wanda na takaice a cikin sura ta farko. Masu bincike sunyi nazarin amsawa ga abubuwa masu ban tausayi ta amfani da bayanan bayanan da aka tattara akan watanni ko ma shekaru. Amma, Back da abokan aiki sun samo asali na al'amuran dijital-saitunan da aka rubuta, da saitunan rikodin da aka rubuta daga nau'in haɗin gwal na Amurka 85,000-wannan kuma ya sa su suyi nazarin amsawar motsin rai a kan lokaci mafi kyau. Sun halicci lokaci na motsa jiki na minti daya da minti daya daga cikin Satumba 11 ta hanyar yin amfani da kalmomin da suka danganci (1) bakin ciki (misali, "kuka" da "bakin ciki"), (2) tashin hankali ( misali, "damuwa" da "tsoro"), da (3) fushi (misali, "ƙi" da "m"). Sun gano cewa bakin ciki da damuwa sun ci gaba a cikin rana ba tare da wata alama mai karfi ba, amma akwai ci gaba da fushi a cikin yini. Wannan bincike ya zama alama ce mai ban mamaki na duk lokacin da aka samo asali na bayanai: idan an yi amfani da asalin bayanan gargajiya, ba zai yiwu ba don samun irin wannan lokaci mai kyau na saurin gaggawa zuwa wani abin da ba zai faru ba.

Amma bayan shekara guda, duk da haka, Cynthia Pury (2011) dubi bayanan a hankali. Ta gano cewa yawancin wadanda ake zaton zartar da sakonni sun fito ne daga guda ɗaya daga cikin su kuma sun kasance daidai. Ga abin da wa] annan maganganun da suka yi fushi sun ce:

"Sake NT na'ura [suna] in hukuma [suna] a [location]: m: [kwanan wata da lokaci]"

Wadannan sakonni sun kasance suna fushi saboda sun hada da kalmar "MUKU," wanda zai iya nuna fushi amma a wannan yanayin ba haka yake ba. Ana cire sakonnin da wannan kullin ɗin yayi amfani da shi na atomatik yana kawar da karuwa cikin fushi a kan rana (siffa 2.4). A wasu kalmomi, babban ma'anar Back, Küfner, and Egloff (2010) wani abu ne na ɗaya daga cikin pager. Kamar yadda wannan misali ya kwatanta, bincike mai sauki game da ƙananan hadaddun da rikice-rikice na da yiwuwar yin kuskuren kuskure.

Hoto na 2.4: Yanayin da aka kwatanta cikin fushi a ranar 11 ga watan Satumbar 2001, wanda ya danganci dalar Amurka 85,000 (Back, Küfner, da Egloff 2010, 2011; Pury 2011). Asalin, Back, Küfner, da Egloff (2010) sun bayar da rahoton yadda za su kara fushi a ko'ina cikin yini. Duk da haka, yawancin wadannan sakonnin da aka yi da fushi sunyi ta hanyar daɗaɗɗa guda wanda ya aika da sako mai zuwa: Yi NT na'ura [sunan] a cikin majalisar [sunan] a [wuri]: KYAU: [kwanan wata da lokaci]. Da wannan sakon ya cire, ƙarfin fuska ya ɓace (Pury 2011; Back, Küfner, da Egloff 2011). An sauya daga Pury (2011), adadi 1b.

Hoto na 2.4: Yanayin da aka kwatanta cikin fushi a ranar 11 ga watan Satumbar 2001, wanda ya danganci dalar Amurka 85,000 (Back, Küfner, and Egloff 2010, 2011; Pury 2011) . Asalin, Back, Küfner, and Egloff (2010) sun bayar da rahoton yadda za su kara fushi a ko'ina cikin yini. Duk da haka, mafi yawan waɗannan saƙonnin da aka yi da fushi sune nema ta hanyar guda ɗaya wanda ya aika da sako mai zuwa: "Sake yin NT na'ura [sunan] a cikin majalisar [suna] a [wurin: CRITICAL: [date and time]". Da wannan sakon ya cire, ƙarfin fuska ya ɓace (Pury 2011; Back, Küfner, and Egloff 2011) . An sauya daga Pury (2011) , adadi 1b.

Duk da yake bayanan datti da aka halicce shi ba tare da gangan ba-irin su cewa daga wani sarƙaƙƙiya mai ƙarfi - wanda mai bincike mai hankali ya iya gano shi, akwai wasu hanyoyin kan layi waɗanda ke jawo hankalin masu ba da launi. Wadannan shafukan yanar gizon suna haifar da sababbin bayanai, kuma sau da yawa suna amfani da riba-aiki mai wuyar gaske don ci gaba da ɓoye su. Alal misali, aikin siyasa a kan Twitter yana dauke da akalla wasu mawuyacin hali, wanda wasu dalilai na siyasa suke sanya su da gangan don ganin sun fi kyau fiye da yadda suke (Ratkiewicz et al. 2011) . Abin takaici, cire wannan spam na gangan zai iya zama da wuya.

Hakika abin da ake la'akari da bayanan datti na iya dogara, a wani ɓangare, a kan tambayoyin bincike. Alal misali, yawancin gyare-gyare zuwa Wikipedia an halicce su ta bots na atomatik (Geiger 2014) . Idan kuna da sha'awar ilimin ilimin kimiyya na Wikipedia, to, waɗannan gyaran haɓakar bot-halitta suna da muhimmanci. Amma idan kuna sha'awar irin yadda mutane ke taimakawa zuwa Wikipedia, to, dole ne a cire abubuwan gyaran haɓakar bot-halitta.

Babu wata takamaiman ƙididdiga ko mahimmanci wanda zai iya tabbatar da cewa ka tsaftace tsaftace bayanai naka. A ƙarshe, ina tsammanin hanya mafi kyau ta guje wa lalacewar ta datti shine fahimtar yadda ya kamata game da yadda aka halicce bayananka.