6.6.2 hankali da kuma Manajan bayani hadarin

Information hadarin shi ne ya fi kowa hadarin a social bincike. ya kara da cika fuska. kuma shi ne mafi wuya hadarin fahimta.

Na biyu da'a kalubale ga zamantakewa shekaru digital bincike ne bayani hadarin, da yiwuwar cũta daga watsuwar bayanai (Council 2014) . Bayani illolin daga watsuwar da bayanan sirri da zai iya zama m (misali, rasa wani aiki), zamantakewa (misali, kunya), m (misali, ciki), ko ma m (misali, kama ba bisa doka ba hali). Abin baƙin ciki, digital shekaru qara bayanai hadarin da cika fuska-akwai kawai sosai ƙarin bayani game da hali. Kuma, bayani hadarin ya tabbatar da matukar wuya a gane da kuma gudanar da idan aka kwatanta da hadari da suke damuwa a analog shekaru zamantakewa bincike, kamar jiki hadarin. Don ganin yadda digital shekaru qara bayani hadarin, la'akari da miƙa mulki daga takarda ga lantarki likita records. Dukansu iri records halitta hadarin, amma lantarki records halitta mafi girma kasada domin a mai kauri sikelin su za a iya daukar kwayar cutar zuwa wani m jam'iyyar ko garwaya da wasu records. Social masu bincike a cikin digital shekaru riga gudu a cikin matsala da bayani hadarin, in part domin ba su da cikakken hankali da yadda za a quantify da kuma gudanar da shi. Saboda haka, zan bayar da taimako hanyar tunani game da bayani hadarin, sa'an nan kuma zan je ba ka wasu shawara ga yadda za a gudanar da bayani hadarin a cikin bincike da kuma a sakewa data zuwa wasu masu bincike.

Daya hanyar da zamantakewa masu bincike rage bayani hadarin shi ne "anonymization" na bayanai. "Anonymization" shi ne aiwatar da cire bayyane sirri identifiers kamar name, address, da kuma lambar tarho daga bayanai. Duk da haka, wannan dabarar ne da yawa kasa tasiri fiye da mutane da yawa yi, kuma shi ne, a gaskiya, warai kuma fundamentally iyaka. Don wannan dalili ne, duk lokacin da na bayyana "anonymization," Zan yi amfani da zance alamomi to tunatar da ku cewa, wannan tsari na halitta bayyanar anonymity amma ba gaskiya ba ne anonymity.

A m misali da gazawar of "anonymization" ya zo daga marigayi 1990s a Massachusetts (Sweeney 2002) . The Group Insurance Commission (GIC) ya hukumar gwamnati alhakin sayen kiwon lafiya inshora ga dukan jihar ma'aikata. Ta hanyar wannan aikin, da GIC tattara cikakken kiwon lafiya records game dubban jihar ma'aikata. A wani kokarin spur bincike game da hanyoyin da za a inganta kiwon lafiya, GIC yanke shawarar da su saki wadannan records to masu bincike. Duk da haka, ba su raba duk da bayanai; Ã'a, sũ "anonymized" shi ta cire bayani kamar sunan da address. Duk da haka, suka bar sauran bayani da cewa su yi tunani zai iya zama da amfani ga masu bincike kamar alƙaluma da bayanai (zip code, haihuwar rana, kabila, da kuma jima'i) da kuma kiwon lafiya bayanai (ziyarar data, ganewar asali, hanya) (Figure 6.4) (Ohm 2010) . Abin baƙin ciki, wannan "anonymization" shi bai isa ba, don kare data.

Figure 6.4: Anonymization ne tsari na cire fili gano bayanai. Alal misali, a lokacin da sakewa da inshorar kiwon lafiya records na jihar ma'aikata da Massachusetts Group Insurance Commission (GIC) cire sunan kuma address daga files. Na yi amfani da quotes kusa da kalmar anonymization saboda tsari na samar da bayyanar anonymity, amma ba ainihin anonymity.

Figure 6.4: "Anonymization" shi ne aiwatar da cire fili gano bayanai. Alal misali, a lokacin da sakewa da inshorar kiwon lafiya records na jihar ma'aikata da Massachusetts Group Insurance Commission (GIC) cire sunan kuma address daga files. Na yi amfani da quotes kusa da kalmar "anonymization" saboda tsari na samar da bayyanar anonymity, amma ba ainihin anonymity.

Alal misali da shortcomings na GIC "anonymization", Latanya Sweeney-sa'an nan kuma wani digiri na biyu dalibi a MIT-biya $ 20 don saya da zabe records daga birnin Cambridge, da garinsu na Massachusetts gwamnan William Weld. Wadannan 'yancin kada kuri'a records kunshe bayani kamar name, address, zip code, haihuwar rana, da kuma jinsi. Gaskiyar cewa likita data fayil da masu jefa} uri'a file shared filayen-zip code, haihuwar rana, da kuma jima'i-nufi da cewa Sweeney iya danganta su. Sweeney san cewa Weld Birthday ya Yuli 31, 1945, da kuma yin zabe records kunshe ne kawai mutane shida a Cambridge da cewa birthday. Bugu da ari, daga waɗanda mutane shida, uku kawai suke namiji. Kuma, daga waɗanda mutane uku, amma daya shared Weld ta zip code. Saboda haka, zabe data nuna cewa duk a cikin likita data da Weld ta hade da kwanan watan haihuwa, jinsi, da kuma zip code shi ne William Weld. A ainihi, waɗannan uku guda na bayanin musamman yatsa shi, a cikin bayanai. Amfani da wannan al'amari, Sweeney ya iya gano wuri Weld ta likita records, da kuma sanar da shi ta feat, ta akan aikawa Wasiku shi kwafin ya records (Ohm 2010) .

Adadi 6.5: Re-idenification na anonymized data. Latanya Sweeney hade da anonymized lafiya records da zabe records domin a sami likita records Gwamna William Weld (Sweeney 2002).

Adadi 6.5: Re-idenification of "anonymized" data. Latanya Sweeney hade da "anonymized" kiwon lafiya records da zabe records domin a sami likita records Gwamna William Weld (Sweeney 2002) .

Sweeney aikin nuna ainihin tsarin de-anonymization harin -domin dauko wani ajali daga kwamfuta tsaro al'umma. A cikin wadannan hare-hare, biyu data sets, ba abin da ta kanta bayyana m bayani, suna da nasaba, kuma ta hanyar wannan hada huldodi, m bayanin da aka fallasa. A wasu hanyoyi da wannan tsari ne kama da hanyar da yin burodi soda da vinegar, biyu abubuwa da suke da kansu hadari, za a iya hade don samar da wani m sakamako.

A mayar da martani ga Sweeney ta aiki, da kuma sauran related aiki, masu bincike a yanzu kullum cire fiye da bayanai-duk don haka ya kira "kaina Gano Information" (PII) (Narayanan and Shmatikov 2010) -during tsari na "anonymization." Bugu da ari, mutane da yawa masu bincike a yanzu gane cewa wani data-kamar likita records, kudi records, amsoshin tambayoyi game da nazarin doka hali-ne mai yiwuwa kuma m don saki ko da bayan "anonymization." Duk da haka, kwanan nan more misalai da zan bayyana a kasa nuna cewa zamantakewa bincike bukatar canza su tunani. Kamar yadda a mataki na farko, shi ne mai hikima ya ɗauka cewa dukan data ne yiwuwar tabbatarwa da dukan data ne yiwuwar m. A wasu kalmomin, maimakon tunanin cewa bayani hadarin ya shafi wani karamin tsarin cikin tsari na ayyukan, ya kamata mu ɗauka cewa shi ya shafi-zuwa wani mataki-to duk ayyukan.

Dukansu al'amurran da wannan sake-fuskantarwa an kwatanta da Netflix Prize. Kamar yadda aka bayyana a Babi na 5, Netflix saki miliyan 100 movie ratings bayar da kusan 500,000 members, kuma yana da wani kira a bayyane inda mutane daga ko'ina cikin duniya sallama Algorithms da zai iya inganta Netflix ta ikon bayar da shawarar movies. Kafin sakewa da data, Netflix cire wani fili da kaina-gano bayanai, kamar sunayen. Netflix kuma tafi da karin wani mataki da kuma gabatar da kadan perturbations a wasu daga cikin records (misali, canja wani ratings daga 4 taurari zuwa 3 taurari). Netflix ewa gano, duk da haka, cewa, duk yunkurin da data kasance da ba yana nufin m.

Just makonni biyu bayan data aka saki Narayanan and Shmatikov (2008) ya nuna cewa ya yiwu su koyi game da takamaiman mutane movie zaɓin. A abin zamba ga re-ganewa harin da aka kama Sweeney ta: hade tare biyu bayanai kafofin, daya tare da yiwuwar m bayanai da kuma wani fili gano bayanai da kuma daya cewa yana dauke da ainihi na mutane. Kowace daga cikin wadannan data kafofin iya zama dai-dai lafiya, amma a lokacin da suke a hade da merged dataset iya ƙirƙirar bayani hadarin. A cikin yanayin da Netflix data, a nan ne yadda zai iya faruwa. Ka yi tunanin abin da na zabi a raba tunanina game da mataki da comedy movies da ta co-ma'aikata, amma abin da na fi so ba a raba ganina game da addini da kuma siyasa movies. My co-ma'aikata iya amfani da bayanin da Na shared tare da su don samun ta records a Netflix data. da bayanin abin da na raba zai iya zama na musamman da yatsa kamar William Weld ta haihuwa rana, zip code, da kuma jima'i. Sa'an nan kuma, idan sun samu ta musamman yatsa a data, ba su iya koya ta ratings game da dukan movies, ciki har da fina-finai inda na zabi ba a raba. Baya ga wannan irin niyya harin mayar da hankali kan wani rai guda, Narayanan and Shmatikov (2008) ya nuna cewa shi ne zai yiwu a yi a m harin -one shafe mutane da yawa-by tattara abubuwa masu kyau cikin Netflix data da sirri da kuma movie rating data cewa wasu mutane sun zaba to post a yanar-gizo Movie Database (IMDb). Duk wani bayani da ne na musamman yatsa zuwa wani mutum-har ma da kafa movie ratings-za a iya amfani da su gano su.

Ko da yake da Netflix data za a iya sake gano a ko dai a niyya ko m harin, har yanzu zai bayyana a low hadarin. Hakika, movie ratings ba ze sosai m. Duk da yake cewa zai yi gaskiya a general, ga wasu daga cikin 500,000 da mutane a cikin dataset, movie ratings zai yi quite m. A gaskiya, a mayar da martani ga de-anonymization a closeted 'yan madigo mace koma wani aji-mataki kwat da wando da Netflix. Ga yadda matsalar da aka bayyana a cikin kara (Singel 2009) :

"[M] Ovie kuma rating data ƙunshi bayanin da wani more sosai sirri da kuma m yanayi [sic]. The m ta movie data fallashi a Netflix m na sirri amfani da / ko fama da daban-daban sosai sirri al'amurran da suka shafi, ciki har da jima'i, shafi tunanin mutum rashin lafiya, maida daga shan barasa, da kuma victimization daga lalatar, jiki zagi, m tashin hankali, da zina, da kuma fyade. "

The de-anonymization na Netflix Prize data nuna biyu da cewa duk data ne yiwuwar tabbatarwa da kuma cewa duk data ne yiwuwar m. A wannan aya, za ka iya ganin cewa wannan kawai ya shafi data cewa wannan tamkar su zama game da mutane. Abin mamaki, abin da yake ba haka al'amarin. A mayar da martani ga wani Freedom of Information Dokar request, da New York City Gwamnatin saki records kowane taxi tafiya a New York, a 2013, ciki har da-kori-kura da sauke kashe sau, wurare, da kuma kudin tafiya adadi (yi tunani daga Babi na 2 wanda Farber (2015) yi amfani da wannan bayanai, don gwada muhimmanci theories a aiki tattalin arziki). Ko da yake wannan bayanai game da tafiye-tafiye taxi iya ze benign domin shi ba ze zama bayani game da mutane, Anthony Tockar gane cewa wannan taxi dataset zahiri dauke da kuri'a na yiwuwar m bayani game da mutane. Alal misali, ya duba a duk tafiye-tafiye fara a The Hustler Club-a manyan tsiri kulob a New York-tsakanin tsakar dare, kuma 6am sa'an nan kuma iske su drop-kashe wurare. Wannan search saukar-in ainihi-a jerin adireshin da wasu mutanen da suka m A Hustler Club (Tockar 2014) . Yana da wuya su yi tunanin cewa gwamnatin birnin da wannan a zuciyarsa a lõkacin da ta fito da bayanai. A gaskiya ma, wannan dabara da za a iya amfani da su sami gida adireshin mutanen da suka ziyarci wani wuri a birnin-a likita asibitin, a gwamnatin gini, ko wani addini ma'aikata.

Wadannan biyu lokuta-da Netflix Prize da New York City taxi data-show cewa mun gwada gwani mutane kasa daidai kimanta da bayani hadarin a data cewa su sake, kuma waɗannan lokuta ne da ba yana nufin musamman (Barbaro and Zeller Jr 2006; Zimmer 2010; Narayanan, Huey, and Felten 2016) . Bugu da ari, a da yawa daga cikin wadannan lokuta, da matsala data shi ne har yanzu da yardar kaina available online, na nuna wahala da kullum lalacewa a data release. Tare da wadannan misalai-kazalika bincike a kwamfuta kimiyya game da tsare sirri-take kaiwa zuwa wani m ƙarshe. Masu bincike ya kamata ya ɗauka cewa dukan data ne yiwuwar tabbatarwa da dukan data ne yiwuwar m.

Abin baƙin ciki, babu sauki bayani da cewa duk bayanan ne yiwuwar tabbatarwa da dukan data ne yiwuwar m. Duk da haka, wata hanya ce ta rage bayanai hadarin yayin da kake aiki tare da bayanai ne da halittar kuma bi data kariya shirin. Wannan shirin zai rage-rage damar cewa your data za zuba, kuma zai rage cutar da idan wani zuba ko ta yaya na faruwa. A ƙayyadaddu na kare bayanan da tsare-tsaren, kamar abin da nau'i na boye-boye don amfani, zai canja a kan lokaci, amma Birtaniya Data Services taimako shirya abubuwa na kare bayanan shirin zuwa 5 Categories da suke kira da 5 safes: lafiya ayyukan, hadari da mutane , mai lafiya saituna, lafiya data, kuma mai lafiya jimloli (Table 6.2) (Desai, Ritchie, and Welpton 2016) . Babu wani daga cikin biyar safes akayi daban-daban samar da cikakken kariya. Amma, tare da suke samar da mai iko ya kafa dalilai da za su iya rage bayani hadarin.

Table 6.2: A 5 safes ne ka'idodin domin zayyana da kuma aiwatar da wani data kariya shirin (Desai, Ritchie, and Welpton 2016) .
Safe Action
Safe ayyukan takaita ayyukan da bayanai zuwa mãsu da'a
Safe mutane access aka ƙuntata ga mutane wanda za a iya dogara da bayanai (misali, mutane sun halartar da'a horo)
Safe data bayanai da aka soke-gano da kuma aggregated da har zai yiwu
Safe saituna bayanai da aka adana a cikin kwakwalwa tare da m jiki (misali, kulle dakin) da kuma software (misali, kalmar sirri kariya, zane) kare
Safe fitarwa bincike fitarwa an sake nazari su hana bazata bayanin tsare breaches

Bugu da ƙari, kare data yayin da kake amfani da shi, mataki daya a cikin bincike tsari inda bayani hadarin ne musamman salient ne data sharing da wasu masu bincike. Data sharing tsakanin masana kimiyya ne mai core tamanin da kimiyya ungiyar Endeavor, da shi ƙwarai wurare da ci gaba na ilimi. Ga yadda Birtaniya House of Commons bayyana muhimmancin data sharing:

"Samun bayanan ne muhimman hakkokin idan masu bincike ne a haifa, tabbatar da gina a kan sakamakon da aka ruwaito a cikin wallafe-wallafe. The zatonsa dole ne cewa, sai dai idan akwai wani dalili mai karfi in ba haka ba, data kamata a cikakken bayyana, kuma ya sanya a fili available. A line tare da wannan manufa, inda zai yiwu, data hade tare da dukan fili ɗ en gudanar da bincike ya kamata a yi yadu da yardar kaina available. " (Molloy 2011)

Duk da haka, ta hanyar raba your data tare da wani bincike, za ka iya kara bayani hadarin to your mahalarta. Saboda haka, shi yana iya ba da alama cewa masu bincike suke so a raba su data-ko da ake bukata domin raba su data-suna fuskantar wani muhimman hakkokin tashin hankali. A daya hannun su da wani mai da'a wajibi a raba su data tare da sauran masana kimiyya, musamman idan na asali bincike da aka fili ɗ en. Duk da haka, a lokaci guda, masu bincike da wani da'a wajibi don rage, kamar yadda zai yiwu, da bayanin hadarin zuwa ga mahalarta.

Abin farin, wannan jin kwanda ne ba kamar yadda tsanani kamar yadda ya bayyana. Yana da muhimmanci a yi tunanin data raba tare da wani maras iyaka daga wani data sharing don ya saki ka manta, inda data ne "anonymized" da kuma posted ga kowa don samun damar (Figure 6.6). Duka wadannan matsananci matsayi da kasada da kuma amfanin. Wancan ne, shi ne ba ta atomatik mafi da'a abu ba raba your data. irin wannan m gusar da yawa m amfani ga al'umma. Komowa dandana, huldar, kuma Time, misali tattauna a baya a cikin babi, muhawara da data saka da mayar da hankali kawai a kan m illolin da cewa watsi yiwu amfanin ne overly daya mai gefe. Zan bayyana matsalolin da wannan daya mai gefe, overly m m in more daki-daki, a kasa a lokacin da na bayar da shawara game da yin yanke shawara a fuskar rashin tabbas (Sashe 6.6.4).

Adadi 6.6: Data release dabarun iya fada tare da wani maras iyaka. Ina ya kamata ka zama tare da wannan maras iyaka ya dogara da takamaiman details of your data. A wannan yanayin, na uku review iya taimake ka shirya da ya dace ma'auni na hadarin da amfani a cikin al'amarin.

Adadi 6.6: Data release dabarun iya fada tare da wani maras iyaka. Ina ya kamata ka zama tare da wannan maras iyaka ya dogara da takamaiman details of your data. A wannan yanayin, na uku review iya taimake ka shirya da ya dace ma'auni na hadarin da amfani a cikin al'amarin.

Bugu da ari, a tsakanin wadannan biyu matsanancin shi ne abin da zan kira a garu lambu m inda bayanai da aka yi tarayya da mutanen da suka hadu da wani sharudda da kuma suka yarda da za a daure da wasu sharudda (misali, lura daga wani IRB da kare bayanan tsare-tsaren) . Wannan walled lambu m bayar da dama daga cikin amfanin release kuma manta da kasa hadarin. Hakika, a garu lambu m halitta da yawa tambayoyi-wanda ya kamata da damar, a karkashin abin da yanayi, har tsawon, wanda ya kamata ya biya don kula da 'yan sanda da walled lambu da dai sauransu-amma wadannan ba su warkarwa. A gaskiya ma, akwai riga aiki walled gidãjen Aljanna a wurin da masu bincike za su iya amfani da dama a yanzu, kamar data archive na Inter-jami'a Consortium Siyasa da Social Research a Jami'ar Michigan.

Saboda haka, inda ya kamata da bayanai daga binciken kasance a kan maras iyaka ba ta sharing, walled gonar, da kuma saki da kuma manta? Yana dogara ne a kan cikakken bayani game da your data. masu bincike dole ne daidaita Mutunta mutane, karimci, Justice, kuma Mutunta Attaura da Public Interest. A lokacin da kimantawa m balance ga sauran yanke shawara masu bincike nemi shawara da yardar IRBs, kuma data saka iya zama kamar wani bangare na cewa tsari. A wasu kalmomin, ko da yake wasu mutane suna tunanin data saka a matsayin m da'a morass, mun riga da tsarin a wurin don taimakawa masu bincike daidaita wadannan irin da'a dilemmas.

Daya karshe hanyar tunani game da data sharing ne da misalin. A kowace shekara motoci ne da alhakin dubban rayuka, amma ba mu yi ƙoƙarin yin ban tuki. A gaskiya, irin wannan kira zuwa ban tuki zai zama m domin tuki sa mutane da yawa m abubuwa. Maimakon haka, jama'a sanya hani akan wanda zai iya fitar da (misali, bukatar zama wani zamani, bukatar sun shũɗe wani gwaje-gwaje) da kuma yadda za su iya fitar da (misali, a karkashin gudun iyaka). Society ma yana mutane kallafa da aiwatar da wadannan dokoki (misali, 'yan sanda), kuma muka azabta mutanen da suke kama saba da su. Wannan guda irin daidaita tunani cewa jama'a ya shafi gudãnar tuki kuma za a iya amfani da data sharing. Wancan ne, maimakon yin absolutist muhawara domin ko da data sharing, ina ganin babbar amfanin za su zo daga figuring fitar da yadda za mu iya raba more data more amince.

Don ƙare, bayani hadarin ya karu da cika fuska, kuma yana da matukar wuya a hango ko hasashen da quantify. Saboda haka, yana da mafi kyau ga ɗauka cewa dukan data ne yiwuwar tabbatarwa da yiwuwar m. Don rage bayani hadarin yayin da yin bincike, masu bincike za su iya haifar da kuma bi a data kariya shirin. Bugu da ari, bayani hadarin bai hana masu bincike daga raba bayanai da sauran masana kimiyya.