ukuhlaziya Okwengeziwe

Lesi sigaba yakhelwe ukuba lisetshenziswe njengeHholo reference, kunokuba ukuze ifundwe njengoba lelandzisako.

  • Isingeniso (Isigaba 2.1)

Omunye uhlobo kokubona ukuthi is hhayi kufakwe kulesi sahluko kuyinto ethnography. Ukuze uthole olunye ulwazi ethnography esikhaleni digital ukubona Boellstorff et al. (2012) , futhi uma ufuna ulwazi olungaphezulu ethnography esikhaleni okuxubile digital ngokomzimba ukubona Lane (2016) .

  • Idatha Big (Isigaba 2.2)

Uma repurposing idatha, kukhona tricks ezimbili naleso bangakusiza ukuba uqonde izinkinga kungenzeka ongase uhlangabezane nazo. Okokuqala, ungase uzame ukucabanga kudathasethi ekahle inkinga yakho futhi qhathanisa lokho okufundayo ukuze kudathasethi ukuthi wena usebenzisa. Bakhona njani efanayo nendlela Bahluke? Uma ungazange ukuqoqa idatha yakho ngokwakho, zikhona cishe babe umehluko phakathi okufunayo futhi lokho onakho. Kodwa, kufanele unqume uma lezi umehluko zincane noma ezinkulu.

Okwesibili, khumbula ukuthi umuntu wadalwa baqoqa idatha yakho ngesizathu esithile. Kufanele sizame ukuqonda indlela abacabanga ngayo. Lolu hlobo reverse-engineering kungakusiza ukuba ubone izinkinga kungenzeka futhi ukucwasana idatha yakho kushintshwe inhloso yayo.

Akukho ukuvumelana definition olulodwa of "idatha big", kodwa izincazelo eziningi kubonakala sengathi ukugxila 3 Vs: (isib, volume, ezihlukahlukene, futhi velocity Japec et al. (2015) ). Kunokuba sigxile emsebenzini izimfanelo idatha, definition yami izogxila kakhulu kungani idatha yadalwa.

iinthombe My zikahulumeni kwemininingwane ehhovisi ngaphakathi komkhakha idatha big kancane ngokungavamile. Abanye abaye wenza leli cala, zihlanganisa Legewie (2015) , Connelly et al. (2016) , kanye Einav and Levin (2014) . Ukuze uthole olunye ulwazi ukubaluleka kahulumeni idatha zokuphatha ucwaningo, bheka Card et al. (2010) , Taskforce (2012) , kanye Grusky, Smeeding, and Snipp (2015) .

Ukuze uthole umbono yocwaningo zokuphatha ngaphakathi ohlelweni uhulumeni ezibalo, ikakhulukazi US Census Bureau, bheka Jarmin and O'Hara (2016) . Ukuze uthole ukwelashwa incwadi ubude ucwaningo amarekhodi nabaphathi Statistics Sweden, bheka Wallgren and Wallgren (2007) .

Esahlukweni, ngiqhathanisa kafushane inhlolovo bendabuko ezifana General Social Survey (GSS) kuya media umthombo wemininingwane social efana Twitter. Ukuze uqhathanise ngokungashwampuluzi ngokucophelela phakathi survey bendabuko kanye nedatha social media, bheka Schober et al. (2016) .

  • Ezijayelekileko idatha big (Isigaba 2.3)

Lezi 10 izici idatha big iye yachazwa ngo ngezindlela ezihlukahlukene ezahlukene ezihlukahlukene abalobi ezahlukene. Kubhala ezathonya ukucabanga kwami ​​mayelana nalezi zinkinga zihlanganisa: Lazer et al. (2009) , Groves (2011) , Howison, Wiggins, and Crowston (2011) , boyd and Crawford (2012) , Taylor (2013) , Mayer-Schönberger and Cukier (2013) , Golder and Macy (2014) , Ruths and Pfeffer (2014) , Tufekci (2014) , Sampson and Small (2015) , Lewis (2015) , Lazer (2015) , Horton and Tambe (2015) , Japec et al. (2015) , kanye Goldstone and Lupyan (2016) .

Phakathi naso sonke lesi sahluko, Ngiye wasebenzisa igama elithi iminonjana digital, okuyinto engicabanga ukuthi kuqhathaniswa abathathi hlangothi. Nelinye igama popular iminonjana digital izinyathelo digital (Golder and Macy 2014) , kodwa njengoba Hal Abelson, uKen Ledeen, noHarry Lewis (2008) baveza ukuba aphathe ihlandla ezifanelekayo Cishe izigxivizo zeminwe digital. Uma udala izinyathelo, uyazi ukuthi kwenzekani futhi izinyathelo zakho ngeke ukwazi ngokuvamile zibangelwa wena uqobo. Kwenzeka okufanayo akulona iqiniso iminonjana zakho digital. Empeleni, wena ashiya insalela sonke isikhathi ngazo banolwazi oluncane kakhulu. Futhi, nakuba lezi iminonjana akudingeki igama lakho phezu kwabo, ngokuvamile kuxhumene emuva kuwe. Ngamanye amazwi, benza zifana izigxivizo zeminwe: engabonakali futhi olukugagulayo.

Big

Ukuze okwengeziwe mayelana nokuthi kungani datasets ezinkulu, zihumusha izivivinyo kwezibalo eziyinkinga, bheka Lin, Lucas, and Shmueli (2013) kanye McFarland and McFarland (2015) . Lezi zimpikiswano akuholele abacwaningi ukuba agxile ingqondo ezingokoqobo kunokuba ukubaluleka kwezibalo.

Njalo-on

Lapho nixoxa njalo-on idatha, kubalulekile ukuba sicabangele ukuthi usuke ngokuqhathanisa ngqo kubantu efanayo phezu kwesikhathi noma ngabe ngokuqhathanisa ezinye iqembu Ukushintshashintsha kwezindlela abantu abaphila; bheka isibonelo, Diaz et al. (2016) .

Non-esisheshayo

Incwadi ezingasoze zabuna izinyathelo non-esisheshayo kuyinto Webb et al. (1966) . Izibonelo encwadini pre-date yobudala digital, kodwa basuke namanje ukhanyisa. Ukuze uthole izibonelo zabantu ukushintsha ukuziphatha kwabo ngenxa yokuba khona ukubhekwa mass, bheka Penney (2016) kanye Brayne (2014) .

Akuphelele

Ukuze uthole olunye ulwazi irekhodi nokuxhumanisa, bheka Dunn (1946) kanye Fellegi and Sunter (1969) (historical) kanye Larsen and Winkler (2014) (yanamuhla). Okufanayo wasondela Kuye asethuthukile computer science ngaphansi amagama ezifana idatha deduplication, Ngokwesibonelo ukuhlonza, ibizo lakhe elimetjhako, phinda bangabonwa, kanye neziphindekile irekhodi bangabonwa (Elmagarmid, Ipeirotis, and Verykios 2007) . Kukhona futhi zobumfihlo ukulondwa tindlelanchubo ukuloba nokuxhumanisa okuyinto asidingi nokudlulisela ubuwena (Schnell 2013) . Facebook futhi uye wazakhela uqhubekele ukuxhumanisa amarekhodi abo ukuze ukuziphatha yokuvota; lokhu kwakwenzelwa ahlaziye ukuhlola Ngizokutshela mayelana eSahlukweni 4 (Bond et al. 2012; Jones et al. 2013) .

Ukuze uthole olunye ulwazi enza semthethweni, bheka Shadish, Cook, and Campbell (2001) , Isahluko 3.

Kungafinyeleleki

Ukuze uthole olunye on the AOL search log debacle, bheka Ohm (2010) . I zinikeza izeluleko mayelana ne izinkampani kanye nohulumeni eSahlukweni 4 lapho ngichaze ucwaningo. A eziningana ababhali uzwakalise ukukhathazeka mayelana nocwaningo lincike idatha engafinyeleleki, bheka Huberman (2012) kanye boyd and Crawford (2012) .

Enye indlela enhle yokusebenza kubacwaningi enyuvesi ukuthola ukufinyelela idatha ukusebenza enkampanini njengoba intern noma umcwaningi evakashele. Ngaphezu olwenza ukufinyelela idatha, le nqubo ngeke aphinde asize umcwaningi ufunde okwengeziwe mayelana nokuthi idatha yadalwa, okuyinto ebaluleke for analysis.

Non-omele

Non-representativeness kuyinkinga enkulu ngoba abacwaningi nohulumeni abafisa ukwenza izitatimende mayelana sonke isibalo sabantu. Lena kancane yokukhathalela izinkampani ukuthi zivame egxile abasebenzisi zabo. Ukuze uthole olunye ulwazi kanjani Statistics Netherlands uyazicabangela udaba of non-representativeness ibhizinisi idatha big, bheka Buelens et al. (2014) .

ESehlukweni 3, ngizobuya ukuchaza izibonelo zalokho futhi ukulinganisa ngokuningiliziwe okukhulu. Ngisho noma idatha non-omele, ngaphansi kwezimo ezithile, angase isisindo ukukhiqiza tilinganiso ezinhle.

ukukhukhuleka

System drift kunzima kakhulu ukubona kusukela ngaphandle. Nokho, MovieLens project (okuxoxwe more eSahlukweni 4) iye run iminyaka engaphezu kwengu-15 i-iqembu locwaningo academic. Ngakho, baye bodokotela kanye okwabelwana ulwazi mayelana nendlela uhlelo savela ngokuhamba kwesikhathi nokuthi lokhu zingaba nomthelela onjani analysis (Harper and Konstan 2015) .

Izazi eziningi ziye wagxila drift Twitter: Liu, Kliman-Silver, and Mislove (2014) kanye Tufekci (2014) .

Algorithmically ngajabha

Ngiqala ukuzwa igama elithi "algorithmically lenhloni" asetshenziswa Jon Kleinberg enkulumweni. Umqondo oyinhloko ngemuva performativity wukuthi abanye eyeza nezazi zezinkanyezi ezikhuluma isayensi yokuzijabulisa "izinjini hhayi amakhamera" (Mackenzie 2008) . Okungukuthi, empeleni balolonge kwezwe kunokuba nje walithumba.

Dirty

Kukahulumeni ejensi kwezibalo ukubiza idatha yokuhlanza, ezibalo kwemaphutsa idatha. De Waal, Puts, and Daas (2014) uchaze kwezibalo amasu kwemaphutsa idatha zenzelwe zedathamininikwane futhi sihlole ukuthi yiziphi zinga ayoba ziyasetshenziswa big imithombo idatha, kanye Puts, Daas, and Waal (2015) presents yemibono efanayo ababukeli more jikelele.

Ngoba ezinye izibonelo izifundo wagxila ugaxekile Twitter, Clark et al. (2016) kanye Chu et al. (2012) . Ekugcineni, Subrahmanian et al. (2016) uchaza imiphumela DARPA Twitter Bot Challenge.

ezizwelayo

Ohm (2015) ubukeza ucwaningo phambilini on umqondo kolwazi olubucayi futhi inikeza test multi-factor. The izici ezine ucela yilezi: ematfuba umonakalo; Amathuba umonakalo; khona ubuhlobo oluyimfihlo; nokuthi ingozi ukubonakalisa ukukhathazeka majoritarian.

  • Counting izinto (Isigaba 2.4.1)

Ukutadisha Farber sika amatekisi eNew York sisekelwe ocwaningweni ngaphambili Camerer et al. (1997) ukuthi wasebenzisa lula amasampula ezintathu ezahlukene iphepha uhambo amafomu amashidi-paper esetshenziswa abashayeli ukuba alobe uhambo sokuqala, isikhathi sokuphela, futhi yokugibela. Lolu cwaningo lwathola ukuthi ngaphambili abashayeli owayebonakala emkhulu target labaholako: basebenza ngaphansi ngezinsuku lapho inkokhelo yabo lalilikhulu.

Kossinets and Watts (2009) wagxilisa umsuka homophily zokuxhumana nabantu. Bheka Wimmer and Lewis (2010) indlela ehlukile ukuze inkinga efanayo esebenzisa idatha kusuka Facebook.

Emsebenzini ezalandela, iNkosi kanye nozakwabo ziye phambili wahlola ukucwaninga kuyi-Internet e China (King, Pan, and Roberts 2014; King, Pan, and Roberts 2016) . Ukuze uthole ndlela ahlobene okuwusizo ekukaleni ukucwaninga online e China, bheka Bamman, O'Connor, and Smith (2012) . Ukuze uthole olunye ulwazi izindlela yezibalo elifana nalelo elisetshenziswa ku King, Pan, and Roberts (2013) ukulinganisa umuzwa izikhala million 11, bheka Hopkins and King (2010) . Ukuze uthole olunye ulwazi lokufunda egadiwe, bheka James et al. (2013) (ngaphansi lobuchwepheshe) futhi Hastie, Tibshirani, and Friedman (2009) (more lobuchwepheshe).

  • Ukubikezela (Isigaba 2.4.2)

Ukubikezela kuyinto ingxenye enkulu yezimboni idatha isayensi (Mayer-Schönberger and Cukier 2013; Provost and Fawcett 2013) . Olunye uhlobo wokubikezela ngokuvamile kwenziwa by abacwaningi yokuzijabulisa ukubikezela indaba ekhanga abantu babo, isibonelo Raftery et al. (2012) .

-Google Flu Amathrendi wayengeyena iphrojekthi wokuqala ukusebenzisa idatha search ukuze nowcast umkhuhlane yokudlanga. Empeleni, abacwaningi e-United States (Polgreen et al. 2008; Ginsberg et al. 2009) kanye Sweden (Hulth, Rydevik, and Linde 2009) baye bathola ukuthi amagama osesho ethile (isib, "flu") ababikezela kazwelonke zomphakathi zezempilo ukubhekwa idatha ngaphambi kokuthi adedelwe. Kamuva eziningi, ezinye amaphrojekthi abaningi baye bazama ukusebenzisa idatha trace digital isifo ukubhekwa bangabonwa, bheka Althouse et al. (2015) sokuba kubuyekezwe.

Ngaphezu kokusebenzisa idatha trace sedijithali ukubikezela umphumela wetemphilo, futhi akuzange kube khona msebenzi omkhulukazi usebenzisa idatha Twitter ukubikezela imiPhumela yokuFunda kumiphumela yokuhlaziya ukhetho; izibuyekezo bheka Gayo-Avello (2011) , Gayo-Avello (2013) , Jungherr (2015) (Ch. 7), futhi Huberty (2015) .

Ukusebenzisa idatha search ukuze ukubikezela umkhuhlane yokudlanga nokusebenzisa idatha Twitter ukubikezela ukhetho kokubili izibonelo kokusebenzisa ezinye uhlobo trace sedijithali ukubikezela uhlobo oluthile izinkundla in the world. Kukhona isibalo omkhulu izifundo ukuthi unaso lesi sakhiwo jikelele. Ithebula 2.5 kuhlanganisa nezinye izibonelo ezimbalwa.

Ithebula 2.5: ingxenye yohlu izifundo ukusebenzisa ezinye trace sedijithali ukubikezela ezinye umcimbi.
trace Digital wesifundo ukulandisa
Twitter Ibhokisi Ihhovisi ngemali amabhayisikobho e-United States Asur and Huberman (2010)
amalogi osesho Sales of movies, umculo, izincwadi, kanye isiqophi imidlalo eU.S. Goel et al. (2010)
Twitter Dow Jones Industrial esijwayelekile (US stock market) Bollen, Mao, and Zeng (2011)
  • Nemaresiphi Approximating (Isigaba 2.4.3)

Iphephabhuku PS Political Science kwadingeka Uchungechunge data big, nikhuluma esiyimbangela, futhi imfundiso esisemthethweni, futhi Clark and Golder (2015) uchaza ngokufingqiwe esivivaneni ngamunye. Iphephabhuku ekuqhubeni National Academy of Sciences of the United States of America kwadingeka Uchungechunge nikhuluma esiyimbangela kanye nedatha big, futhi Shiffrin (2016) uchaza ngokufingqiwe esivivaneni ngamunye.

Ngokwalesi ucwaningo zemvelo, Dunning (2012) kunikeza umbono omuhle ukwelashwa incwadi ubude. Ukuze uthole olunye ngokusebenzisa Vietnam okusalungiswa lottery njengoba enza ucwaningo yemvelo, bheka Berinsky and Chatfield (2015) . Ngoba umshini ofunda tindlelanchubo ezizama bethola ngokuzenzakalelayo kwenziwa ucwaningo kwemvelo ngaphakathi emikhulu imithombo data, bheka Jensen et al. (2008) kanye Sharma, Hofman, and Watts (2015) .

NgokoMthetho we elimetjhako, ngoba uhlaziyo inethemba lokuhle, bheka Stuart (2010) , futhi isibuyekezo engenathemba ukubona Sekhon (2009) . Ukuze uthole olunye ulwazi kumatanisa njengoba uhlobo izihlahla, bheka Ho et al. (2007) . Amabhuku ukunikeza zokwelapha omuhle of lelihambisana, bheka Rosenbaum (2002) , Rosenbaum (2009) , Morgan and Winship (2014) , kanye Imbens and Rubin (2015) .