6.6.2 hankali da kuma Manajan bayani hadarin

Rashin bayanai shine haɗarin da yafi kowa a cikin bincike na zamantakewa; ya kara ƙaruwa sosai; kuma shine mafi haɗarin hadari don fahimta.

Kalubale na biyu na gwagwarmayar bincike na shekaru dijital shine haɗarin bayanai , da yiwuwar cutar daga labaran bayanan (National Research Council 2014) . Bayani na lalacewa daga bayyanawar bayanan sirri zai iya zama tattalin arziki (misali, rasa aikin), zamantakewa (misali, kunya), jin dadi (misali, damuwa), ko ma laifi (misali, kamawa don aikata laifuka). Abin takaici, shekarun dijital yana ƙara yawan haɗarin bayanai - akwai kawai ƙarin bayani game da halinmu. Kuma hadarin bayanai ya tabbatar da matukar wuya a fahimta da sarrafawa idan aka kwatanta da hadarin da ke damuwa a binciken bincike na zamantakewa, irin su hadarin jiki.

Daya hanyar da zamantakewa masu bincike rage bayani hadarin shi ne "anonymization" na bayanai. "Anonymization" shi ne aiwatar da cire bayyane sirri identifiers kamar name, address, da kuma lambar tarho daga bayanai. Duk da haka, wannan dabarar ne da yawa kasa tasiri fiye da mutane da yawa yi, kuma shi ne, a gaskiya, warai kuma fundamentally iyaka. Don wannan dalili ne, duk lokacin da na bayyana "anonymization," Zan yi amfani da zance alamomi to tunatar da ku cewa, wannan tsari na halitta bayyanar anonymity amma ba gaskiya ba ne anonymity.

Misali mai kyau na rashin cin nasara na "sanarwa" ya zo daga ƙarshen shekarun 1990 a Massachusetts (Sweeney 2002) . Hukumar Gudanarwa ta Gundumar (GIC) ta kasance wata hukumar gwamnati ce ta sayen inshorar lafiya ga dukan ma'aikata. Ta hanyar wannan aikin, GIC ta tattara cikakken bayanan kiwon lafiya game da dubban ma'aikata a jihar. Dangane da kokarin gudanar da bincike, GIC ta yanke shawarar saki wadannan bayanai ga masu bincike. Duk da haka, ba su raba dukkanin bayanai ba; Maimakon haka, sun "sanarda" wadannan bayanai ta hanyar cire bayanai kamar sunaye da adiresoshin. Duk da haka, sun bar wasu bayanan da suka yi tunani zai iya zama da amfani ga masu bincike irin su bayanin alƙaluma (zip code, ranar haihuwar, kabilanci, da jima'i) da kuma bayanan likita (ziyarci bayanai, ganewa, hanya) (adadi na 6.4) (Ohm 2010) . Abin takaici, wannan "sanarwa" bai isa ya kare bayanai ba.

Figure 6.4: Ancewa shi ne hanyar kawar da bayanin da ke bayyane. Alal misali, lokacin da aka sake watsar da asusun inshorar likita na ma'aikatan gwamnati, Massachusetts Group Insurance Insurance (GIC) cire sunayen da adireshin daga fayiloli. Ina amfani da alamomi da ke kewaye da kalmar sacewa saboda tsari yana nuna bayyanar anonymity amma ba ainihin anonymity ba.

Figure 6.4: "Anonymisation" ita ce hanyar kawar da bayanin bayanan. Alal misali, lokacin da aka sake watsar da asusun inshorar likita na ma'aikatan gwamnati, Massachusetts Group Insurance Insurance (GIC) cire sunayen da adireshin daga fayiloli. Na yi amfani da alamomin da ake magana a kan kalmar "lakaftawa" saboda tsari yana nuna bayyanar rashin sunan amma ba ainihin anonymity ba.

Don nuna misalai na GIC "sa hannu", Latanya Sweeney-sa'an nan kuma dalibin digiri a MIT ya biya $ 20 don sayo kayan tarihi daga garin Cambridge, garin garin William Weld gwamnan Massachusetts. Wadannan bayanan jefa kuri'a sun haɗa da bayanai kamar suna, adireshin, lambar zip, ranar haihuwa, da jinsi. Gaskiyar cewa fayilolin bayanan likita da kuma mai jefa kuri'a sun hada da filayen-zip code, ranar haihuwa, da kuma jima'i-nufin cewa Sweeney zai iya danganta su. Sweeney ya san cewa ranar haihuwar Weld ita ce 31 ga Yuli, 1945, kuma wadanda aka rubuta sunayen sun hada da mutane shida a Cambridge da ranar haihuwar. Bugu da ƙari, daga cikin waɗannan mutane shida, kawai uku ne namiji. Kuma, daga cikin waɗannan mutane uku, kawai wanda aka raba Weld ta zip code. Saboda haka, bayanan jefa kuri'a ya nuna cewa kowa a cikin bayanan likita tare da Weld haɗin ranar haihuwar, jinsi, da kuma lambar zip ita ce William Weld. Ainihin, wadannan bangarori uku na bayanai sun ba shi ƙananan yatsa a cikin bayanai. Ta amfani da wannan hujja, Sweeney ya iya gano sunayen likitan Weld, kuma, don sanar da shi game da ita, ta aika masa da takardun sa (Ohm 2010) .

Figure 6.5: Saukewa na bayanan bayanan. Latanya Sweeney ya haɗu da bayanan likitocin da aka rubuta tare da rubuce-rubuce na kuri'un don ya sami bayanan likitocin Gwamna William Weld wanda aka cire daga Sweeney (2002), adadi na 1.

Figure 6.5: Saukewa na bayanan "anonymous". Latanya Sweeney ya hade da rubutun kiwon lafiyar "asiri" tare da rubuce-rubuce na kuri'un don ya sami bayanan likitocin Gwamna William Weld wanda aka cire daga Sweeney (2002) , adadi na 1.

Ayyukan Sweeney ya nuna mahimman tsari na hare-haren sake ganewa - don karɓar lokaci daga masu zaman lafiyar kwamfuta. A cikin wadannan hare-haren, bayanan bayanai guda biyu, banda abin da ke nuna kanta ba da sanarwa ba, an haɗa su, kuma ta hanyar wannan dangantaka, bayanan sirri an bayyana.

Sakamakon aikin Sweeney, da sauran ayyukan da suke da alaka da su, masu bincike yanzu suna cire bayanai da yawa-duk abin da ake kira "bayanin kai tsaye" (PII) (Narayanan and Shmatikov 2010) -in aiwatar da "sanarwa." Bugu da ƙari, masu bincike masu yawa yanzu gane cewa wasu bayanai-irin su bayanan likita, bayanan kudi, amsoshin tambayoyin tambayoyin game da halayyar doka-suna da mahimmanci ga saki ko da bayan "sanarwa." Duk da haka, misalan da nake so in ba da shawarar cewa masu bincike na zamantakewa suna buƙatar don canza tunaninsu. A matsayin mataki na farko, yana da hikima a ɗauka cewa dukkanin bayanai suna iya ganewa kuma dukkanin bayanai sun kasance masu ƙwarewa. A wasu kalmomi, maimakon tunanin cewa hadarin da ke cikin lamarin ya shafi wani ƙananan raƙuman ayyukan, ya kamata mu ɗauka cewa yana amfani da shi-zuwa wasu matakai-ga dukan ayyukan.

Dukkan bangarori na wannan sakewa an kwatanta da kyautar Netflix. Kamar yadda aka bayyana a babi na 5, Netflix ya ba da kyautar fina-finai 100 na kimanin kusan 500,000, kuma yana da kira na budewa inda mutane daga ko'ina cikin duniya suka bada algorithms wanda zai iya inganta ikon Netflix don bayar da shawarar fina-finai. Kafin sake watsar da bayanan, Netflix cire duk wani bayanan da yake bayarwa na sirri, kamar sunaye. Har ila yau, sun ci gaba da yin wani mataki kuma sun gabatar da wasu matsala a cikin wasu bayanan (misali, canza wasu sharuddan daga taurari 4 zuwa 3). Nan da nan sun gano cewa, duk da kokarin da suke yi, har yanzu ba a san bayanan ba.

Bayan makonni biyu bayan da aka saki bayanai, Arvind Narayanan da Vitaly Shmatikov (2008) sun nuna cewa yana yiwuwa a koyi game da abubuwan da ake so na fim din mutane. Trick zuwa ga sake ganewa sun kasance kamar Sweeney: hada haɗin biyu bayanan bayanai, wanda tare da bayanan mai damuwar da ba a gano ainihin bayanin da kuma wanda ya ƙunshi ainihin mutane. Kowane ɗayan waɗannan bayanan bayanan na iya zama a kowane haɗari, amma idan aka haɗu su, dataset haɗaka na iya ƙirƙirar hadarin bayanai. A game da bayanan Netflix, ga yadda yake faruwa. Ka yi tunanin cewa na zabi in faɗi ra'ayina game da ayyukan da fina-finai na wasan kwaikwayo tare da ma'aikata na, amma na fi so kada in raba ra'ayina game da finafinan addini da na siyasa. Abokan hulɗina na iya amfani da bayanin da na raba tare da su don neman bayanan na a cikin bayanan Netflix; bayanin da zan raba zai iya kasancewa na musamman yatsa kamar yadda William Weld ya haifa, lambar zip, da jima'i. Bayan haka, idan sun samo samfuran sa na musamman a cikin bayanai, zasu iya koyon karatun na game da duk fina-finai, ciki har da fina-finai da na zaɓa kada in raba. Bugu da ƙari, irin wannan harin da ake mayar da hankali a kan mutum guda, Narayanan da Shmatikov kuma sun nuna cewa yana yiwuwa a yi wani hari -wanda ya shafi mutane da yawa - ta hanyar hada bayanai na Netflix tare da bayanan sirri da kuma fim din da wasu suka zaɓa don aikawa kan Intanit Intanet (IMDb). Da gaske, duk wani bayanin da yake na musamman ga yatsa zuwa wani mutum-har ma da saitin fina-finai na fim - za a iya amfani dashi don gano su.

Kodayake bayanai na Netflix za a iya sake gano su ko dai an yi niyyar kai hare-haren kai tsaye, har yanzu yana iya bayyana su zama ƙananan hadarin. Bayan haka, kyautar fina-finai ba ta da kyau sosai. Duk da cewa wannan zai iya zama gaskiya a gaba ɗaya, ga wasu mutane 500,000 a cikin dataset, sharuddan fina-finai na iya zama damu sosai. A hakikanin gaskiya, saboda mayar da martani, wata mace mai lakabi ta haɗaka ta shiga aiki da ta dace da Netflix. Ga yadda aka bayyana matsalar a cikin shari'a (Singel 2009) :

"[M] ovie da rating data dauke da bayanin wani ... sosai sirri da kuma m yanayi. Bayanin fim na memba ya nuna sha'awar da ke da nasaba da abubuwan da ke cikin sirri, ciki har da jima'i, rashin lafiyar hankali, dawowa daga shan giya, da kuma cin zarafi, cin zarafin jiki, tashin hankalin gida, zina, da fyade. "

Sake sake ganewa na bayanan Netflix Prize ya nuna cewa dukkanin bayanai ana iya ganewa kuma cewa dukkanin bayanai sun kasance masu damu. A wannan lokaci, zakuyi tunanin cewa wannan kawai ya shafi bayanan da suke son zama game da mutane. Abin mamaki, wannan ba haka ba ne. Saboda amsa dokar Dokar Freedom of Information, Gwamnatin New York City ta ba da rahotanni game da kowane motsi a birnin New York a shekara ta 2013, ciki har da kwarewa da sauke lokaci, wurare, da farashi (tuna daga babi na 2 cewa Farber (2015) yi amfani da irin wannan bayanai don gwada muhimman abubuwan da ke cikin harkokin tattalin arziki). Wadannan bayanai game da tafiye-tafiyen taksi na iya zama marasa amfani saboda ba su da alama samar da bayanai game da mutane, amma Anthony Tockar ya fahimci cewa wannan takardun takardun na takarda yana dauke da batutuwa da yawa game da mutane. Alal misali, ya dubi dukan tafiye-tafiye da suka fara daga Hustler Club-wani babban filin wasa a birnin New York - tsakanin tsakar dare da 6 na safe sannan kuma ya sami wuraren da suka ɓace. Wannan bincike ya bayyana-a cikin ainihin-jerin sunayen adireshin wasu mutanen da suka halarci Hustler Club (Tockar 2014) . Yana da wuya a yi tunanin cewa gwamnatin birnin tana da wannan a lokacin da ya fitar da bayanan. A gaskiya ma, wannan ƙwarewar za a iya amfani dasu don neman adireshin gida na mutanen da suka ziyarci wani wuri a cikin birni-asibitin likita, ginin gwamnati, ko kuma wani jami'in addini.

Wadannan lokuta biyu na kyautar Netflix da kuma bayanan haraji na birnin New York sun nuna cewa mutane masu ƙwarewa suna da ƙananan ƙididdiga matsalar hadarin bayanai a cikin bayanan da suka saki-kuma waɗannan sharuɗɗa ba su da wata ma'ana (Barbaro and Zeller 2006; Zimmer 2010; Narayanan, Huey, and Felten 2016) . Bugu da ari, a cikin irin waɗannan lokuta, har yanzu ana samun bayanai a cikin layi kyauta, yana nuna wahalar da za a sake warware bayanan bayanan. Gaba ɗaya, waɗannan misalai-da kuma bincike a kimiyyar kwamfuta game da sirri-kai ga taƙaitaccen mahimmanci. Masu bincike zasu dauka cewa dukkanin bayanai suna iya ganewa kuma dukkanin bayanai suna da mahimmanci.

Abin takaici, babu wani bayani mai sauƙi ga gaskiyar cewa dukkanin bayanai suna iya ganewa kuma cewa dukkanin bayanai suna da damuwa. Duk da haka, hanyar da za ta rage haɗarin bayani yayin da kake aiki tare da bayanan shi ne ƙirƙirar kuma bi tsari na kare bayanai . Wannan shirin zai rage damar da bayanan ku zai yi da kuma zai rage cutar idan wani kullun ya faru. Bayanan tsare-tsaren tsare-tsaren bayanai, irin su nau'in ɓoyayyen ɓoyayye don amfani, zai canza a tsawon lokaci, amma Birtaniya sabis na sabis ya shirya abubuwa masu mahimmanci na tsare-tsaren bayanan bayanai cikin sassa biyar da suka kira safari biyar : ayyukan lafiya, mutane masu aminci , saitunan tsaro, bayanan lafiya, da kayan tsaro (tebur 6.2) (Desai, Ritchie, and Welpton 2016) . Babu wani daga cikin kayan tsaro guda biyar da ke ba da cikakken kariya. Amma tare suna samar da wani abu mai mahimmanci na abubuwan da za su iya rage yawan hadarin bayanai.

Tabbi na 6.2: "Cin Gudun Gyara" sune Sharuɗɗa don tsara da aiwatar da Shirin Tsaro na Bayanai (Desai, Ritchie, and Welpton 2016)
Safe Action
Ayyukan lafiya Ƙayyadaddun ayyuka tare da bayanai ga waɗanda suke da halayyar
Mutane masu aminci An ƙuntata dama ga mutanen da za a iya yarda da bayanan (misali, mutanen da suka yi horo horo)
Bayanan lafiya Bayanan da aka gano da kuma haɗuwa har zuwa yiwuwar
Saitunan sanyi Ana ajiye bayanai a cikin kwakwalwa tare da jiki mai dacewa (misali, ɗakin kulle) da kuma software (misali, kariya ta sirri, ɓoye) kariya
Kayan aiki mai lafiya An sake duba samfurin bincike don hana haɗari na sirri na sirri

Baya ga kare bayananka yayin da kake amfani da su, mataki daya a cikin tsarin bincike inda inda lamarin ya fi dacewa shine mai raba bayanai tare da wasu masu bincike. Rarraba bayanai tsakanin masana kimiyya shine ainihin mahimmancin aikin kimiyya, kuma yana taimakawa wajen ci gaban ilimi. Ga yadda Birnin Birtaniya ya bayyana muhimmancin raba bayanai (Molloy 2011) :

"Samun dama ga bayanai yana da mahimmanci idan masu bincike su haifa, tabbatar da kuma ginawa a kan sakamakon da aka ruwaito a cikin wallafe-wallafe. Tsarin ya zama dole ne, sai dai idan akwai wata dalili mai mahimmanci, to dole ne a bayyana cikakken bayani kuma a samu a fili. "

Duk da haka, ta hanyar rarraba bayananka tare da wani mai bincike, zaka iya kara yawan haɗarin da ke cikin mahalarta. Saboda haka, yana iya zama alama cewa rarraba bayanai yana haifar da ƙananan haɗin kai tsakanin wajibi ne don raba bayanai tare da sauran masana kimiyya da kuma wajibi don rage haɗarin hadarin ga masu mahalarta. Abin farin, wannan matsala ba ta da tsanani kamar yadda yake bayyana. Maimakon haka, ya fi kyau muyi tunani game da raba bayanai kamar yadda ya fadi tare da ci gaba, tare da kowane aya akan wannan ci gaba wanda ke samar da wata tasiri mai amfani ga jama'a da kuma hadarin ga mahalarta (adadi 6.6).

A wani matsayi, za ka iya raba bayananka ba tare da wani ba, wanda ya rage haɗarin haɗari ga mahalarta amma kuma ya rage samun karɓuwa ga jama'a. A wani ɓangaren, za ka iya saki da kuma manta , inda bayanan "an sanarda" kuma an aika wa kowa. Abinda ke da muhimmanci wajen ba da bayanai, saki da kuma manta yana ba da damar mafi girma ga jama'a da kuma haɗari mafi girma ga mahalarta. Tsakanin waɗannan mummunan lamurra sune matakan hybrids, ciki har da abin da zan kira tsarin kula da gonar walƙiya . A karkashin wannan hanyar, ana raba bayanai tare da mutanen da suka cika wasu ka'idodin kuma sun yarda sun rataye ta wasu dokoki (misali, dubawa daga IRB da shirin kare kariyar bayanai). Tsarin lambu na walƙiya yana samar da dama daga saki da kuma manta tare da rashin hadarin. Tabbas, irin wannan tsari ya haifar da tambayoyi da dama - wajibi ne a sami damar shiga, a wace yanayi, da kuma tsawon lokacin, wanda zai biya don kulawa da 'yan sanda da lambun daji, da dai sauransu - amma waɗannan ba su da tabbas. A gaskiya ma, akwai wurin aiki da lambun daji a wurin da masu bincike zasu iya amfani da su a yanzu, irin su bayanan bayanan Cibiyar Consortium na Jami'ar Harkokin Siyasa da Harkokin Siyasa a Jami'ar Michigan.

Figure 6.6: Sakamakon bayanan bayanai zai iya fada tare da ci gaba. Inda za ku kasance a kan wannan ci gaba ya dogara da bayanan bayanan ku na bayananku, kuma nazari na uku na iya taimaka muku yanke shawarar daidaitaccen hadarin da kuma amfani a cikin shari'arku. Daidai ainihin wannan tsari ya dogara ne akan ƙayyadaddun bayanai da bincike (Goroff 2015).

Figure 6.6: Sakamakon bayanan bayanai zai iya fada tare da ci gaba. Inda za ku kasance a kan wannan ci gaba ya dogara da bayanan bayanan ku na bayananku, kuma nazari na uku na iya taimaka muku yanke shawarar daidaitaccen hadarin da kuma amfani a cikin shari'arku. Daidai ainihin wannan tsari ya dogara ne akan ƙayyadaddun bayanai da bincike (Goroff 2015) .

Don haka, ina ya kamata bayananku daga bincikenku su kasance a kan ci gaba da ba raba, gonar walled, da saki da kuma manta? Wannan ya dangana ne akan cikakkun bayanai game da bayananku: masu bincike dole su daidaita Mutunta Mutum, Jinƙai, Adalci, da Mutunta Shari'a da Harkokin Jama'a. Bisa ga wannan hangen nesa, rabawa ba bayanai ba ne; yana da daya daga cikin bangarori daban-daban na bincike inda masu bincike zasu sami daidaitattun daidaito.

Wasu masu sukar suna saba wa raba bayanai saboda, a ganina, suna mayar da hankali ga hadarinsa - wanda babu shakka lalle ne-kuma suna watsi da amfaninta. Don haka, domin karfafawa da hankali kan duk hadari da wadata, Ina son bayar da misali. Kowace shekara, motoci suna da alhakin dubban mutuwar, amma ba mu yunkurin dakatar da tuki. A gaskiya ma, kira don dakatar da tuki zai zama maras kyau saboda tuki yana sa abubuwa masu ban mamaki. Maimakon haka, jama'a suna sanya takunkumi akan wanda zai iya fitar da (misali, bukatar zama dan shekaru da ya wuce wasu gwaje-gwaje) da kuma yadda za su iya fitar (misali, a ƙarƙashin iyakar gudun). Har ila yau, jama'a sun ha] a da yin amfani da wa] annan dokoki (misali, 'yan sanda), kuma muna azabtar da mutanen da aka kama da su. Wannan irin tunanin da ya dace da cewa al'umma ta shafi tsarin tuki zai iya amfani da shi wajen rarraba bayanai. Wato, maimakon yin jayayya na asali na ko kuma don raba bayanai, ina tsammanin za mu ci gaba da ci gaba ta hanyar mayar da hankalin yadda za mu iya rage haɗari kuma mu kara yawan amfanin daga raba bayanai.

Don ƙare, haɗarin bayani ya karu da ƙaruwa, kuma yana da wuyar gane hangen nesa. Sabili da haka, ya fi dacewa a ɗauka cewa dukkanin bayanai suna iya ganewa kuma suna da damuwa. Don rage yawan halayyar bayanai yayin yin bincike, masu bincike zasu iya ƙirƙirar kuma bi tsari na kare bayanai. Bugu da ƙari, hadarin bayanai bazai hana masu bincike su raba bayanai tare da sauran masana kimiyya.