6.6.2 Kumvetsa ndi kuchepetsa pazankhani chiopsezo

Kuopsa kwa chidziwitso ndizoopsa kwambiri pakufufuza kafukufuku; yawonjezeka modabwitsa; ndipo ndizovuta kwambiri kuzimvetsa.

Chikhalidwe chachiwiri chotsutsana ndi kafukufuku wa zaka zam'chipatala ndizoopsa zowonjezera , zomwe zingathe kuvulaza kuchokera ku kufotokoza kwadzidzidzi (National Research Council 2014) . Zowonongeka zokhudzana ndi kufotokoza zaumwini zimakhala zachuma (mwachitsanzo, kutayika ntchito), chikhalidwe (mwachitsanzo, manyazi), maganizo (mwachitsanzo, kupsinjika maganizo), kapena kuphwanya malamulo (mwachitsanzo, kumangidwa chifukwa cha khalidwe loletsedwa). Tsoka ilo, zaka za digito zimachulukitsa chidziwitso chodziwika bwino-pali zambiri zambiri zokhudza khalidwe lathu. Ndipo chiopsezo chadzidzidzi chakhala chovuta kwambiri kumvetsetsa ndi kuyang'anira poyerekeza ndi zoopsa zomwe zinali zodetsa nkhaŵa mu kafukufuku wamagulu a zaka za analoji, monga chiopsezo chakuthupi.

Njira imodzi imene ofufuza chikhalidwe muchepetse vuto pazankhani ndi "anonymization" deta. "Anonymization" ndiyo njira ya kuchotsa zoonekeratu identifiers munthu monga dzina, adiresi, ndi nambala kwa deta. Komabe, njira imeneyi ndi zochepa ogwira kuposa anthu ambiri kuzindikira, ndipo Ndipotu, kwambiri ndi thunthu zochepa. Chifukwa kuti, pamene ine pofotokoza "anonymization," Ine ntchito zizindikiro mwa mawuwo kukukumbutsani kuti ndondomekoyi amalenga kuoneka kudzibisa koma kudzibisa woona.

Chitsanzo chowonekera cha kulephera kwa "kudziwika" kumachokera kumapeto kwa zaka za m'ma 1990 ku Massachusetts (Sweeney 2002) . Bungwe la Inshuwalansi ya Gulu (GIC) linali bungwe la boma lomwe likuyenera kugula inshuwalansi ya umoyo kwa ogwira ntchito onse a boma. Kupyolera mu ntchitoyi, GIC inapeza mauthenga okhudzana ndi thanzi la zikwi zambiri za ogwira ntchito ku boma. Poyesera kuti apangitse kafukufuku, GIC adasula kumasula awa kwa ofufuza. Komabe, iwo sankagawana nawo deta yawo yonse; M'malo mwake, iwo "adawonetsa" deta imeneyi pochotsa mauthenga monga maina ndi maadiresi. Komabe, iwo anasiya mfundo zina zomwe iwo amaganiza kuti zingakhale zothandiza kwa ofufuza monga chidziwitso cha anthu (zipangizo, tsiku lobadwa, mtundu, ndi kugonana) ndi mauthenga azachipatala (pitani ku deta, kuunika, ndondomeko) (Chithunzi cha 6.4) (Ohm 2010) . Mwamwayi, "zonyansa" izi sizinali zokwanira kuteteza deta.

Chithunzi 6.4: Kunyenga ndi njira yochotsera zidziwitso zomveka bwino. Mwachitsanzo, pamene atulutsa inshuwalansi ya zachipatala zolemba za antchito a boma, Massachusetts Group Insurance Commission (GIC) inachotsa maina ndi maadiresi pa mafayilo. Ndimagwiritsa ntchito mawu a quotation kuzungulira liwu lachidziwitso chifukwa chokonzekera chimapereka maonekedwe osadziwika koma osadziwika.

Chithunzi 6.4: "Kutchuka" ndi njira yochotsera chidziwitso chodziwika bwino. Mwachitsanzo, pamene atulutsa inshuwalansi ya zachipatala zolemba za antchito a boma, Massachusetts Group Insurance Commission (GIC) inachotsa maina ndi maadiresi pa mafayilo. Ndimagwiritsa ntchito mawu a quotation kuzungulira mawu akuti "chonyansa" chifukwa njirayi imapereka maonekedwe osadziwika koma osadziwika.

Kuwonetsa zolephera za GIC "kutchuka", Latanya Sweeney-ndiye wophunzira wophunzira pa MIT-analipira $ 20 kuti adziwe zolemba zovota kuchokera mumzinda wa Cambridge, tawuni ya boma la Massachusetts William Weld. Mavoti awa akuphatikizapo maina monga adiresi, adilesi, zip code, tsiku lobadwa, ndi chiwerewere. Mfundo yakuti deta ya fayilo ndi fayilo ya voti inagawana zipangizo za zipangizo, tsiku lobadwa, komanso kugonana-zinatanthauza kuti Sweeney angawagwirizanitse. Sweeney ankadziwa kuti tsiku la kubadwa kwa Weld ndi Julai 31, 1945, ndipo mavoti a voti anali ndi anthu asanu ndi limodzi okha ku Cambridge ndi tsiku lobadwa. Ndiponso, mwa anthu asanu ndi limodziwo, atatu okha anali amuna. Ndipo, mwa amuna atatu aja, imodzi yokha inaphatikizidwa ndi zip code za Weld. Choncho, chiwerengero cha voti chinasonyeza kuti aliyense wogwiritsira ntchito zachipatala ndi Weld pamodzi ndi tsiku lobadwa, gender, ndi zip code anali William Weld. Mwachidziwikire, izi zidutswa zitatu zazidziwitso zimapereka zala zapadera kwa iye mu deta. Pogwiritsa ntchito mfundoyi, Sweeney adatha kupeza zolemba za Weld, ndipo, pomudziwitsa za iye, adamulembera kalata yake (Ohm 2010) .

Chithunzi 6.5: Kudziwitsidwa kachiwiri kwa deta yosadziwika. Latanya Sweeney analumikiza ma rekodi a zaumoyo ndi ma rekodi kuti apeze zolemba zachipatala za Kazembe William Weld Adapted kuchokera ku Sweeney (2002), chiwerengero cha 1.

Chithunzi 6.5: Kudziwitsidwa kachiwiri kwa deta "yosadziwika". Latanya Sweeney pamodzi ndi zolemba zaumoyo zodziwika ndi zolemba za voti kuti adziwe zolemba zachipatala za Guverinoma William Weld Adapted kuchokera ku Sweeney (2002) , chiwerengero cha 1.

Ntchito ya Sweeney ikuwonetseratu chiyambi cha zidziwitso zowonongeka -kutenga nthawi kuchokera ku gulu la chitetezo cha makompyuta. Pazitsutso izi, zigawo ziwiri zadeta, zomwe sizidziwika zokhazokha zimapereka chidziwitso chodziwika bwino, zimagwirizanitsidwa, ndipo kudzera mu mgwirizanowu, chidziwitso chodziwika chikuwonekera.

Poyankha ntchito ya Sweeney, ndi ntchito zina zowonjezera, ofufuza tsopano akuchotsa zambiri zambiri-zonse zomwe zimatchedwa "zidziwitso zokhazokha" (PII) (Narayanan and Shmatikov 2010) -kugwiritsa ntchito "chidziwitso." Komanso, ambiri ofufuza tsopano zindikirani kuti deta zina-monga zolemba zachipatala, zolemba zachuma, mayankho a kufufuza mafunso okhudza khalidwe loletsedwa-mwinamwake zimakhala zovuta kumasula ngakhale zitatha "zonyansa." Komabe, zitsanzo zomwe ndatsala pang'ono kupereka zimasonyeza kuti akatswiri ofufuza anthu amafunikira kusintha maganizo awo. Monga sitepe yoyamba, ndi kwanzeru kuganiza kuti deta yonse ingathe kudziwikiratu ndipo deta yonse ingakhale yovuta. Mwa kuyankhula kwina, osati kuganiza kuti chiopsezo chadzidzidzi chikugwiritsidwa ntchito kumagulu ang'onoang'ono a mapulojekiti, tiyenera kuganiza kuti imagwiritsidwa ntchito-pamlingo wina-kuzinthu zonse.

Zonse ziwiri za kukonzanso uku zikuwonetsedwa ndi Netflix Mphoto. Monga momwe tafotokozera mu chaputala 5, Netflix inamasula mafilimu 100 miliyoni omwe amaperekedwa ndi mamembala pafupifupi 500,000, ndipo adayitanidwa kumene anthu ochokera m'mayiko onse adakonza njira zomwe zingathetsere Netflix kuthetsa mafilimu. Asanayambe kumasula deta, Netflix wachotsa chidziwitso chilichonse chodziwika bwino, monga maina. Anapitanso njira yowonjezerapo ndipo adayambitsa zovuta zina m'mabuku ena (mwachitsanzo, kusintha mawerengedwe kuchokera 4 nyenyezi mpaka 3 nyenyezi). Koma posakhalitsa, adapeza kuti ngakhale atayesetsa, detayi sinali yosadziwika.

Atangotha ​​masabata awiri, Arvind Narayanan ndi Vitaly Shmatikov (2008) adasonyeza kuti n'zotheka kuphunzira za anthu omwe amakonda mafilimu. Chinyengo pa chidziwitso chawo chodziwikiratu chinali chofanana ndi Sweeney's: akuphatikizana pamodzi magwero awiri othandizira, omwe ali ndi chidziwitso chodziwika bwino ndipo palibe chidziwitso chodziwikiratu ndipo chiri ndi zizindikiro za anthu. Zonsezi zikhoza kukhala zotetezeka payekha, koma zikaphatikizidwa, dataset yowonjezera ingawononge chidziŵitso. Pankhani ya data ya Netflix, ndi momwe zingakhalire. Tangoganizirani kuti ndimasankha kugawana nawo maganizo anga okhudza zomwe ndikuchita komanso kusewera mafilimu ndi ogwira nawo ntchito, koma ndimakonda kusagawana maganizo anga pa mafilimu achipembedzo ndi ndale. Antchito anga ogwira nawo ntchito angagwiritse ntchito chidziwitso chimene ndagawana nawo kuti apeze zolemba zanga mu data ya Netflix; Zomwe ndikugawana nazo zingakhale zozizwitsa zapadera monga William Weld tsiku lobadwa, zip code, ndi kugonana. Ndiye, ngati apeza zolemba zanga zapadera pa deta, angaphunzire zanga za mafilimu onse, kuphatikizapo mafilimu omwe ndimasankha kuti ndisagawane nawo. Kuphatikiza pa mtundu uwu wa chiwonongeko chomwe chinayang'ana pa munthu mmodzi, Narayanan ndi Shmatikov anasonyezanso kuti n'zotheka kuwononga kwakukulu -kuphatikizapo anthu ambiri-mwa kuphatikiza deta ya Netflix ndi deta yaumwini komanso mafilimu omwe anthu ena asankha kutumiza pa intaneti Movie Database (IMDb). Zowonongeka chabe, chidziwitso chirichonse chomwe chiri chododometsa chapadera kwa munthu wina-ngakhale momwe amawonetsera mafilimu-angagwiritsidwe ntchito kuti awone.

Ngakhale chiwerengero cha Netflix chikhoza kudziwika mobwerezabwereza kapena chiwonetsero chachikulu, icho chikhoza kuwoneka kukhala chowopsa. Pambuyo pake, zoyimira mafilimu siziwoneka zovuta kwambiri. Ngakhale kuti izi zikhoza kukhala zoona, kwa anthu 500,000 mu dataset, mafilimu angayese kukhala ovuta. Ndipotu, poyankha kubwezeretsanso kachiwiri, mayi wina yemwe ali ndi zibwenzi anagwirizana ndi Netflix. Momwemo vutoli linayesedwa mu milandu yawo (Singel 2009) :

"[M] ovie ndi deta yolongosola zili ndi mbiri ya ... yapamwamba kwambiri ya umunthu ndi yovuta. Deta ya filimuyo ikuwonetsa chidwi cha munthu wina wa Netflix komanso / kapena akukumana ndi mavuto osiyanasiyana, kuphatikizapo kugonana, matenda a m'maganizo, kupumula kwauchidakwa, ndi kuzunzika kuchokera ku zibwenzi, kugwiriridwa, kuzunzidwa kunyumba, chigololo, ndi kugwiriridwa. "

Kuzindikiranso kachiwiri kwa Deta ya Netflix Mphoto kumasonyeza zonse kuti deta zonse zikhoza kudziwika komanso kuti deta yonse ndi yovuta. Panthawiyi, mungaganize kuti izi zimagwiranso ntchito pa deta yomwe imati ndi anthu. Chodabwitsa n'chakuti si choncho. Poyankha pempho lalamulo la ufulu wa chidziwitso, boma la New York City linatulutsa ma teksi pamtunda ku New York mchaka cha 2013, kuphatikizapo phukusi ndi kusiya nthawi, malo, ndi ndalama zochepa (kumbukirani chaputala 2 kuti Farber (2015) amagwiritsira ntchito deta yofanana kuti ayesere mfundo zofunikira pazochuma zamagwira ntchito). Deta iyi yokhudza maulendo a ma taxi ikhoza kuoneka ngati yowopsya chifukwa samawoneka kuti amapereka chidziwitso chokhudza anthu, koma Anthony Tockar anazindikira kuti dataset iyi ili ndi zambiri zokhudzana ndi anthu. Mwachitsanzo, adayang'ana maulendo onse akuyamba ku Hustler Club-gulu lalikulu lachikwama ku New York-pakati pa pakati pa usiku ndi 6 koloko m'mawa ndikupeza malo awo otayira. Kufufuza uku kunawululidwa-makamaka mndandanda wa maadiresi a anthu ena omwe ankapita ku Hustler Club (Tockar 2014) . Ziri zovuta kuganiza kuti boma la mzinda linali ndi malingaliro awa pamene linamasula deta. Ndipotu, njira yomweyi ingagwiritsidwe ntchito pofufuza maadiresi a anthu omwe amapita kumudzi uliwonse-chipatala, nyumba ya boma, kapena bungwe lachipembedzo.

Zotsatira ziwirizi za Netflix Prize ndi data ya taxi ya New York City zimasonyeza kuti anthu omwe ali ndi luso amatha kulephera kulingalira bwino za chidziŵitso chodziŵika pamasom'pamaso omwe amamasula-ndipo milanduyi si yodabwitsa (Barbaro and Zeller 2006; Zimmer 2010; Narayanan, Huey, and Felten 2016) . Komanso, muzochitika zambirizi, deta yovuta imakhala ikupezeka mosavuta pa intaneti, kuwonetsa kuti ndivuta kuthetsa kumasulidwa kwa deta. Zonsezi, zitsanzozi-komanso kufufuza mu sayansi yamakompyuta zokhudzana ndi chinsinsi-kumabweretsa zofunikira. Ochita kafukufuku ayenera kuganiza kuti deta yonse ingathe kudziwika ndipo deta yonse ingakhale yovuta.

Tsoka ilo, palibe njira yowonjezera yowona kuti deta yonse ingathe kudziwikiratu ndipo kuti deta yonse ingakhale yovuta. Komabe, njira imodzi yochepetsera chidziŵitso chodziŵika bwino pamene mukugwira ntchito ndi deta ndikupanga ndi kutsatira ndondomeko yotetezera deta . Ndondomekoyi idzachepetsa mwayi woti deta yanu iwonongeke ndipo idzachepetseni vuto ngati chitsimikizo chimachitika mwanjira ina. Malinga ndi ndondomeko za chitetezo cha deta, monga mawonekedwe omwe angagwiritsidwe ntchito, zidzasintha pakapita nthawi, koma UK Data Services ikuthandizira kukonza ndondomeko ya ndondomeko yotetezera deta m'magulu asanu omwe amachitcha maulendo asanu : mapulogalamu otetezeka, anthu otetezeka , zotetezeka, deta, komanso zotetezeka (tebulo 6.2) (Desai, Ritchie, and Welpton 2016) . Palibe iliyonse yosungiramo asanu yopereka chitetezo chokwanira. Koma pamodzi amapanga zifukwa zowonjezera zomwe zingachepetse chiopsezo chodziwika bwino.

Phunziro 6.2: "Zisanu Zosungira" ndizo Mfundo Zopangira ndi Kuchita Pulogalamu ya Chitetezo cha Data (Desai, Ritchie, and Welpton 2016)
Otetezeka Ntchito
Mapulogalamu otetezeka Amachepetsa polojekiti ndi deta kwa omwe ali oyenerera
Anthu otetezeka Kufikira kumangoperekedwa kwa anthu amene angathe kudalirika ndi deta (mwachitsanzo, anthu omwe aphunzira maphunziro abwino)
Deta yotetezeka Deta ndizodziwika ndikuziphatikiza momwe zingathere
Zosungika bwino Deta imasungidwa pamakompyuta ndi malo abwino (mwachitsanzo, chipinda chosatsekedwa) ndi mapulogalamu (mwachitsanzo, chitetezo cha mawu achinsinsi, chitetezo) chitetezo
Zosungidwa bwino Zotsatira zafukufuku zimayang'aniridwa kuti zisawonongeke zolakwika zachinsinsi

Kuphatikiza pa kuteteza deta yanu pamene mukuigwiritsa ntchito, sitepe imodzi mufukufuku momwe chidziwitso chodziwika bwino ndizogawidwa ndi deta ndi ochita kafukufuku ena. Kugawanika kwadongosolo pakati pa asayansi ndikofunika kwakukulu pa ntchito ya sayansi, ndipo kumathandizira kwambiri kupita patsogolo kwa chidziwitso. Nazi momwe UK House of Commons inafotokozera kufunika kwa kugawana deta (Molloy 2011) :

"Kupeza deta ndikofunika ngati ochita kafukufuku ayenera kubereka, kutsimikizira ndi kumanga pa zotsatira zomwe zalembedwa m'mabuku. Kulingalira kungakhale kuti, pokhapokha pali chifukwa champhamvu, deta iyenera kufotokozedwa mokwanira ndikuperekedwa poyera. "

Komabe, pogawana deta yanu ndi wofufuza wina, mwina mukuwonjezera chiopsezo chenicheni kwa ophunzira anu. Choncho, zikhoza kuoneka ngati kugawa deta kumabweretsa mavuto aakulu pakati pa udindo wogawana deta ndi asayansi ena ndi udindo wochepetsera chidziwitso kwa ophunzira. Mwamwayi, vutoli si lalikulu ngati likuwoneka. M'malo mwake, ndibwino kuganizira za kugawidwa kwa deta monga kugwera pang'onopang'ono, ndi mfundo iliyonse yomwe ikupitirizabe kupereka zopindulitsa zosiyanasiyana kwa anthu komanso chiopsezo kwa ophunzira (chithunzi 6.6).

Panthawi ina, mukhoza kugawira deta yanu popanda wina, zomwe zimachepetsera chiopsezo kwa ophunzira koma zimachepetsanso phindu kwa anthu. Panthawi ina, mukhoza kumasula ndi kuiwala , kumene deta "imadziwika" ndi kuikidwa kwa aliyense. Zokhudzana ndi kusasula deta, kumasula ndi kuiwala zopereka zonse zopindulitsa kwa anthu komanso chiopsezo chachikulu kwa ophunzira. Pakati pa zochitika ziwirizi ndizosiyana mitundu yambiri, kuphatikizapo zomwe ndimadzitcha kuti njira yamaluwa yokhala ndi mipanda . Pansi pa njirayi, deta imagawidwa ndi anthu omwe amakumana ndi zifukwa zina ndi omwe amavomereza kukhala omangidwa ndi malamulo ena (mwachitsanzo, kuyang'anira kuchokera ku IRB ndi dongosolo la chitetezo cha data). Kufika kwa mipanda yamaluwa kumapatsa ubwino wambiri kumasulidwa ndikuiwala mosavuta. Inde, njira imeneyi imabweretsa mafunso ambiri-ndani ayenera kukhala nawo, pansi pa zifukwa ziti, ndi nthawi yayitali, ndani ayenera kulipira kuti apitirize ndi apolisi munda wotchingidwa ndi mipanda, ndi zina zotero-koma izi sizingatheke. Ndipotu, pali kale minda yokhala ndi mipanda yomwe ochita kafukufuku angagwiritse ntchito pakalipano, monga data ya Inter-yunivesite Consortium ya Political and Social Research pa yunivesite ya Michigan.

Chithunzi 6.6: Njira zotulutsira deta zingagwere potsatira pulogalamuyo. Kumene mukuyenera kukhala pa pulogalamuyi kumadalira mafotokozedwe enieni a deta yanu, ndipo ndemanga yachitatu ikuthandizani kusankha momwe mungayankhire. Maonekedwe enieni a chiwongosoledwe ichi amadalira zenizeni za deta ndi zolinga zofufuza (Goroff 2015).

Chithunzi 6.6: Njira zotulutsira deta zingagwere potsatira pulogalamuyo. Kumene mukuyenera kukhala pa pulogalamuyi kumadalira mafotokozedwe enieni a deta yanu, ndipo ndemanga yachitatu ikuthandizani kusankha momwe mungayankhire. Maonekedwe enieni a chiwongosoledwe ichi amadalira zenizeni za deta ndi zolinga zofufuza (Goroff 2015) .

Kotero, kodi deta yochokera ku phunziro lanu iyenera kukhala kuti yopitilizabe kugawana, munda waminga, ndi kumasulidwa ndikuiwala? Izi zimadalira pazinthu za deta yanu: ofufuza ayenera kulemekeza kulemekeza anthu, kulandira ubwino, chilungamo, ndi kulemekeza Chilamulo ndi Chidwi. Kuwonera kuchokera pazifukwa izi, kugawidwa kwa deta sizomwe zimakhazikitsidwa mwakhalidwe; Ichi ndi chimodzi mwa zinthu zambiri zofukufuku zomwe akatswiri akufuna kupeza zoyenera zoyenera kuchita.

Otsutsa ena nthawi zambiri amatsutsana ndi kugawidwa kwa deta chifukwa, chifukwa cha lingaliro langa, iwo amaganizira zoopsa zake-zomwe mosakayikira zenizeni-ndipo akunyalanyaza madalitso ake. Choncho, kuti ndikulimbikitse kuganizira zoopsa komanso zopindulitsa, Ndikufuna kupereka fanizo. Chaka chilichonse, magalimoto amachititsa anthu ambirimbiri kufa, koma sitiyesa kuyendetsa galimoto. Ndipotu, kuyitanitsa kuyendetsa galimoto kungakhale kopanda pake chifukwa kuyendetsa galimoto kumapangitsa zinthu zambiri zodabwitsa. M'malo mwake, anthu amaletsa anthu omwe angathe kuyendetsa galimoto (mwachitsanzo, kufunika kokhala ndi zaka zingapo komanso kuyesa mayesero ena) ndi momwe angayendetsere (mwachitsanzo, pansi pa malire). Sukulu imakhalanso ndi anthu omwe amayesetsa kutsata malamulo awa (mwachitsanzo, apolisi), ndipo timalanga anthu omwe akugwidwa. Maganizo amodzimodzi omwe anthu amagwiritsidwa ntchito pakuyendetsa galimoto angagwiritsidwe ntchito kugawana deta. Izi zikutanthauza kuti, m'malo momangokhalira kukangana kapena kusagwirizana ndi deta, ndikuganiza kuti tidzapambana kwambiri poyang'ana momwe tingachepetsere zoopsa ndikuonjezera phindu logawana deta.

Pomaliza, ngozi yowonjezera yawonjezeka kwambiri, ndipo ndi kovuta kufotokozera ndi kuwerengera. Choncho, ndibwino kuganiza kuti deta yonse ndi yotheka ndipo ingakhale yovuta. Kuchepetsa chiopsezo chachinsinsi pamene mukufufuza, ofufuza akhoza kupanga ndi kutsatira ndondomeko yotetezera deta. Kuwonjezera apo, chiopsezo chadzidzidzi sichiletsa ochita kafukufuku kuti agawane deta ndi asayansi ena.