5.2.1 Galaxy Zoo

Galaxy Zoo hadawa da} o} arin da yawa wadanda ba gwani masu sa kai zuwa rarraba miliyan taurari.

Galaxy Zoo girma daga wata matsala da fuskantar da Kevin Schawinski, a digiri na biyu dalibi a ilmin taurari, a Jami'ar Oxford a 2007. simplifying quite a bit, Schawinski sha'awar taurari, da kuma taurari za a iya classified da su ilimin halittar jiki-elliptical ko karkace-da da su launi-blue ko ja. A wannan lokacin, da na al'ada da hikima daga Masana ilmin Taurari ne cewa karkace taurari, kamar mu Milky Way, sun kasance blue a launi (nuna matasa) da kuma cewa elliptical taurari suke ja a launi (nuna tsufa). Schawinski shakka wannan al'ada hikima. Ya zargin cewa yayin da wannan abin kwaikwaya iya zama gaskiya a general, akwai yiwuwa a sizable yawan ware, da kuma cewa da nazarin kuri'a na wadannan m taurari-wadanda cewa bai dace da sa ran juna-ya iya koyi wani abu game da tsari, ta hanyar da taurari kafa.

Saboda haka, abin da Schawinski bukata domin ya kife al'ada hikima wani babban sa na morphologically tallace taurari. wato, taurari da aka classified a matsayin ko dai karkace ko elliptical. Matsalar, duk da haka, shi ne cewa data kasance algorithmic hanyoyin rarrabuwa ba yet kyau isa da za a yi amfani da kimiyya da bincike. a cikin wasu kalmomi, kassa taurari shi, a wancan lokaci, matsala da yake da wuya ga kwakwalwa. Saboda haka, abin da aka bukata shi ne babban adadin mutum classified taurari. Schawinski gudanar da wannan rarrabuwa matsala tare da babbar sha'awa da wani digiri na biyu dalibi. A cikin wata marathon taro na bakwai, 12-hour kwanaki, ya ya iya rarraba 50,000 taurari. Duk da yake 50,000 taurari iya sauti kamar mai yawa, shi ne ainihin kawai game da 5% na kusan miliyan daya taurari da aka photographed a cikin Sloan Digital Sky Survey. Schawinski gane cewa ya bukaci a more scalable m.

Abin farin, shi dai itace cewa aiki na kassa taurari ba ya bukatar m horo a ilmin taurari. za ka iya sanar da wani ya yi da shi m da sauri. A wasu kalmomin, kuma kõ dã kassa taurari ne mai aiki da yake da wuya ga kwakwalwa, shi ne kyawawan sauki ga mutane. Saboda haka, lokacin da zaune a cikin wani mashaya a Oxford, Schawinski da kuma 'yan'uwanmu falakin Chris Lintott mafarkin wani website inda masu sa kai za su rarraba images of taurari. Bayan 'yan watanni, Galaxy Zoo da aka haife.

A Galaxy Zoo website, masu sa kai za su sha 'yan mintoci kaɗan da horo; misali, da koyo da bambanci tsakanin karkace da elliptical galaxy (Figure 5.2). Bayan wannan horo, da masu sa kai ya wuce da dangantaka mai sauki jarrabawa-daidai kassa 11 of 15 taurari da aka sani sukayi fassara-sa'an nan kuma sa zai fara real rarrabuwa na unknown taurari ta mai sauki yanar gizo na tushen ke dubawa (Figure 5.3). The miƙa mulki daga wanda ya ba da kansa ga falakin zai faru a kasa da minti 10 da kuma kawai ake bukata wucewa mafi ƙasƙanci daga kalubale, mai sauki jarrabawa.

Figure 5.2: Misalan biyu main iri taurari: karkace da elliptical. The Galaxy Zoo shiri amfani fiye da 100,000 masu sa kai zuwa Categories fiye 900,000 images. Source: www.galaxyzoo.org.

Figure 5.2: Misalan biyu main iri taurari: karkace da elliptical. The Galaxy Zoo shiri amfani fiye da 100,000 masu sa kai zuwa Categories fiye 900,000 images. Source: www.galaxyzoo.org .

Figure 5.3: Input allon inda masu jefa kuri'a da aka tambaye su rarraba guda image. Source: www.galaxyzoo.org.

Figure 5.3: Input allon inda masu jefa kuri'a da aka tambaye su rarraba guda image. Source: www.galaxyzoo.org .

Galaxy Zoo janyo hankalin da farko sa kai bayan aikin da aka featured a wani labari labarin, kuma a game da watanni shida da aikin yi girma ya unsa fiye da 100,000 jama'a masana kimiyya, mutanen da suka halarci domin sun ji da aiki da suke so su taimaka gaba ilmin taurari. Tare, waɗannan 100,000 masu sa kai da gudummawar a total fiye da miliyan 40 sukayi fassara, tare da masu rinjaye na sukayi fassara ya fito daga wata gwada kananan, core rukuni na mahalarta (Lintott et al. 2008) .

Masu bincike suka yi kwarewa haya dalibi bincike mataimakansa iya nan da nan a m game data quality. Duk da yake wannan shakka ne m, Galaxy Zoo nuna cewa lokacin sa kai gudunmawar da ake daidai tsabtace, debiased, kuma aggregated, za su iya samar da high quality-results (Lintott et al. 2008) . An muhimmanci zamba domin samun taron ya halicci sana'a quality data ne redundancy. wato, tun da wannan aiki yi da yawa mutane daban-daban. A Galaxy Zoo, akwai game da 40 sukayi fassara da galaxy. masu bincike ta amfani da dalibi bincike mataimakansa iya taba iya wannan matakin redundancy sabili da haka ya bukatar ya zama yafi damuwa da ingancin kowane mutum rarrabuwa. Abin da masu sa kai rasa a horo, suka yi up for da redundancy.

Ko da tare da mahara sukayi fassara da galaxy, duk da haka, hada da kafa sa sukayi fassara don samar da wani yarjejeniya rarrabuwa ne tricky. Saboda sosai kama kalubale bayyana a cikin mafi yawan mutum ƙidãyar ayyukan, shi ne m taƙaice bitar uku matakai cewa Galaxy Zoo bincike amfani da su samar da yarjejeniya sukayi fassara. Na farko, da masu bincike "tsabtace" da bayanai ta cire bogus sukayi fassara. Alal misali, mutanen da suka akai-akai classified guda galaxy-wani abu da zai faru idan suka yi amfani da kokarin da sakamakon-da abin da sukayi fassara jefar. Wannan da kuma sauran m tsaftacewa cire game da 4% na dukkan sukayi fassara.

Na biyu, bayan tsaftacewa, da masu bincike da ake bukata don cire din biases a sukayi fassara. Ta hanyar jerin nuna bambanci ganewa karatu saka a cikin na asali shiri-misali, nuna wasu masu taimako da galaxy a monochrome maimakon launi-da masu bincike gano dama din biases, kamar din nuna bambanci ga rarraba nisa karkace taurari a matsayin elliptical taurari (Bamford et al. 2009) . Daidaitawa na wadannan din biases ne musamman da muhimmanci, domin averaging yawa gudunmawar ba ya cire din nuna bambanci. shi ne kawai ta kawar da bazuwar ɓata.

A karshe, bayan debiasing, da masu bincike da ake bukata a hanya zuwa hada da mutum sukayi fassara don samar da wani yarjejeniya rarrabuwa. The sauki hanyar hada sukayi fassara ga kowane galaxy zai zama zabi ya fi na kowa rarrabuwa. Duk da haka, wannan dabarar da zai ba kowa sa kai daidai nauyi, da kuma masu bincike da ake zargi da cewa wasu masu sa kai sun kasance mafi alhẽri a rarrabuwa fiye da wasu. Saboda haka, masu bincike ɓullo da wani more hadaddun iterative weighting hanya cewa ƙoƙarin ta atomatik gane mafi kyau classifiers kuma ba su more nauyi.

Saboda haka, bayan da uku mataki tsari-tsaftacewa, debiasing, kuma weighting-da Galaxy Zoo bincike tawagar ya tuba miliyan 40 wanda ya ba da kansa sukayi fassara a cikin wani sa na yarjejeniya morphological sukayi fassara. A lokacin da wadannan Galaxy Zoo sukayi fassara da aka idan aka kwatanta da uku da suka gabata karami sikelin-yunkurin da masu sana'a Masana ilmin Taurari, ciki har da rarrabuwa da Schawinski cewa taimaka wajen wahayi zuwa gare Galaxy Zoo, akwai karfi da yarjejeniya. Saboda haka, masu aikin sa kai, a tara, sun iya samar da high quality sukayi fassara da kuma a sikelin cewa masu bincike ba zai iya daidaita (Lintott et al. 2008) . A gaskiya ma, da ciwon mutum sukayi fassara ga irin wannan babban adadin taurari, Schawinski, Lintott, da sauransu sun iya nuna cewa kawai game da 80% na yawan taurari bi sa ran juna-blue spirals kuma ja ellipticals-da m takardunku da aka rubuta game da wannan samu (Fortson et al. 2011) .

Aka ba da wannan baya, yanzu muna iya ganin yadda Galaxy Zoo bi tsaga-tambaya-hada girke-girke, wannan girke-girke da ake amfani da mafi mutum ƙidãyar ayyukan. Na farko, wani babban al'amari ne raba cikin chunks. A wannan yanayin, matsalar kassa miliyan taurari ne raba cikin miliyan matsalolin kassa daya galaxy. Next, wani aiki ne amfani da kowane bantara da kansa. A wannan yanayin, mai sa zai rarraba kowane galaxy kamar dai karkace ko elliptical. A karshe, sakamakon suna hade samar da yarjejeniya sakamakon. A wannan yanayin, da hada mataki hada da tsaftacewa, debiasing, kuma weighting don samar da wani yarjejeniya rarrabuwa ga kowane galaxy. Ko da yake mafi yawan ayyukan yi amfani da wannan general girke-girke, kowane daga cikin matakai na bukatar musamman da takamaiman matsalar da ake jawabi. Alal misali, a cikin mutum ƙidãyar aikin aka bayyana a kasa, wannan girke-girke za a bi, amma tambaya da kuma hada matakai za su kasance quite daban-daban.

Ga Galaxy Zoo tawagar, wannan aikin farko ne kawai farkon. Very da sauri suka gane cewa ko da yake sun kasance iya rarraba kusa da wani miliyan taurari, wannan sikelin bai isa ya yi aiki tare da sabon digital sama safiyo, wanda zai iya samar da images of about biliyan 10 taurari (Kuminski et al. 2014) . To rike da karuwa daga miliyan 1 zuwa 10 biliyan-a factor of 10,000-Galaxy Zoo zai bukatar kurtu wajen 10,000 sau fiye da mahalarta. Ko da yake yawan masu sa kai a kan yanar-gizo ne babba, shi ne, ba iyaka. Saboda haka, masu bincike ya gane cewa idan zã su rike kullum girma yawa na data, wani sabon, har ma fiye da scalable, m da aka bukata.

Saboda haka, Manda Banerji-aiki tare da Kevin Schawinski, Chris Lintott, da kuma sauran 'yan Galaxy Zoo tawagar-fara koyarwa kwakwalwa to rarraba taurari. More musamman, ta amfani da mutum sukayi fassara halitta da Galaxy Zoo, Banerji et al. (2010) ya gina na'ura ilmantarwa model da zai iya hango ko hasashen mutum rarrabuwa da wani galaxy bisa halaye na image. Idan wannan na'ura ilmantarwa model iya haifa mutum sukayi fassara da high daidaituwa, to, za a iya amfani da Galaxy Zoo masu bincike zuwa rarraba wani gaske iyaka yawan taurari.

The core na Banerji da kuma abokan aiki 'm ne ainihin m kama dabarun amfani a social bincike, ko da yake cewa kama ba su bayyana a farko duba. Na farko, da kuma abokan aiki Banerji tuba kowane image a cikin wani sa na Tazarar siffofin da takaita yana da kaddarorin. Alal misali, don images of taurari a can zai iya zama uku siffofin: adadin blue a cikin image, da sãɓã wa jũna a cikin haske da pixels, da rabo daga wadanda ba fari pixels. The selection na daidai fasali ne wani muhimmin ɓangare na matsalar, kuma shi kullum bukatar batun-area gwaninta. Wannan mataki na farko, fiye da ake kira alama aikin injiniya, results a cikin wani data matrix da daya jere da image, sa'an nan kuma uku ginshikan bayyana cewa image. Ganin data matrix da ake so fitarwa (misali, ko siffar da aka classified by wani mutum a matsayin elliptical galaxy), da bincike kiyasin da sigogi na ilimin kididdiga model-misali, wani abu kamar kayayyaki komawa da baya-da hasashen mutum rarrabuwa tushen a fasali na image. A karshe, cikin bincike amfani da sigogi a cikin wannan ilimin kididdiga model don samar da kimani sukayi fassara da sababbin taurari (Figure 5.4). To tunani na a social analog, tunanin cewa ku da alƙaluma bayani game da miliyan dalibai, kuma ka san ko su sauke karatu daga kwalejin ko ba. Kana iya shige a kayayyaki komawa da baya ga wannan data, sa'an nan kuma ka iya yin amfani da sakamakon model sigogi hango ko hasashen ko sabon dalibai suna faruwa kammala karatu daga kwalejin. A na'ura ilmantarwa, wannan dabarar-yin amfani da labeled misalai don ƙirƙirar ilimin kididdiga model da za su iya to Label sabon data-da ake kira dubawa koyo (Hastie, Tibshirani, and Friedman 2009) .

Figure 5.4: A Saukake bayanin yadda za Banerji et al. (2010) sun yi amfani da Galaxy Zoo sukayi fassara horar da wata na'ura ilmantarwa model yi galaxy rarrabuwa. Images of taurari aka tuba a cikin wani matrix fasali. A cikin wannan Saukake misali akwai uku fasali (adadin blue a cikin image, da sãɓã wa jũna a cikin haske da pixels, da rabo daga wadanda ba fari pixels). Sa'an nan kuma, ga wani tsarin cikin tsari na images, da Galaxy Zoo tasirin da ake amfani da su horar da wata na'ura ilmantarwa model. A karshe, cikin na'ura ilmantarwa shi ne aka yi amfani da su kimanta sukayi fassara ga sauran taurari. Ina kira da irin wannan shiri a karo na biyu-ƙarni mutum mai aiki da na'urar kwamfuta shiri domin, maimakon ciwon mutane warware matsala, suna da mutane gina dataset da za a iya amfani da su horar da a kwamfutarka don warware matsalar. A amfani da wannan kwamfuta-taimaka m shi ne cewa shi sa ka ka rike da gaske iyaka yawa na bayanai ta amfani kawai iyakatacce adadin mutum kokarin.

Figure 5.4: A Saukake bayanin yadda za Banerji et al. (2010) sun yi amfani da Galaxy Zoo sukayi fassara horar da wata na'ura ilmantarwa model yi galaxy rarrabuwa. Images of taurari aka tuba a cikin wani matrix fasali. A cikin wannan Saukake misali akwai uku fasali (adadin blue a cikin image, da sãɓã wa jũna a cikin haske da pixels, da rabo daga wadanda ba fari pixels). Sa'an nan kuma, ga wani tsarin cikin tsari na images, da Galaxy Zoo tasirin da ake amfani da su horar da wata na'ura ilmantarwa model. A karshe, cikin na'ura ilmantarwa shi ne aka yi amfani da su kimanta sukayi fassara ga sauran taurari. Ina kira da irin wannan shiri a karo na biyu-ƙarni mutum mai aiki da na'urar kwamfuta shiri domin, maimakon ciwon mutane warware matsala, suna da mutane gina dataset da za a iya amfani da su horar da a kwamfutarka don warware matsalar. A amfani da wannan kwamfuta-taimaka m shi ne cewa shi sa ka ka rike da gaske iyaka yawa na bayanai ta amfani kawai iyakatacce adadin mutum kokarin.

The fasali a Banerji et al. (2010) na'ura ilmantarwa model kasance mafi hadaddun fiye da waɗanda suke a cikin ta wasa misali-misali, ta amfani da fasali kamar "de Vaucouleurs shige axial rabo" wato ta model ba kayayyaki komawa da baya, shi ne wani wucin gadi na tsarin jijiya na cibiyar sadarwa. Amfani da ta fasali, ta model, da kuma yarjejeniya Galaxy Zoo sukayi fassara, sai ta ya iya ya halicci nauyi a kan kowane alama, sa'an nan kuma amfani da waɗannan kaya masu nauyi a yi tsinkaya game da rarrabuwa na taurari. Alal misali, ta gano cewa, analysis images da low "de Vaucouleurs shige axial rabo" kasance mafi kusantar su zama karkace taurari. Ganin wadannan kaya masu nauyi, sai ta ya iya hango ko hasashen mutum rarrabuwa da wani galaxy da m daidaito.

Aikin Banerji et al. (2010) ya juya Galaxy Zoo cikin abin da na yi kira a karo na biyu-ƙarni mutum ƙidãyar tsarin. Hanya mafi kyau a yi tunani game da waɗannan biyu-ƙarni tsarin shi ne cewa maimakon ciwon mutane warware matsala, suna da mutane gina dataset da za a iya amfani da su horar da a kwamfutarka don warware matsalar. The adadin data bukata don horar da kwamfuta zai iya zama haka babban abin da ya bukatar wani mutum taro haɗin gwiwar ya haifar da. A cikin hali na Galaxy Zoo, da na tsarin jijiya networks amfani da Banerji et al. (2010) da ake bukata a manya-manyan yawan mutum-labeled misalai domin gina model cewa ya iya dogara haifa mutum rarrabuwa.

A amfani da wannan kwamfuta-taimaka m shi ne cewa shi sa ka ka rike da gaske iyaka yawa na bayanai ta amfani kawai iyakatacce adadin mutum kokarin. Alal misali, wani mai bincike da miliyan mutum classified taurari iya gina gaibu model da za su iya to, za a yi amfani da su rarraba a biliyan ko ma wani tiriliyan taurari. Idan akwai babban lambobin taurari, to irin wannan mutum-kwamfuta matasan ne da gaske ne kawai zai yiwu bayani. Wannan iyaka scalability ba free, duk da haka. Gina na'ura ilmantarwa model da za su iya daidai haifa mutum sukayi fassara ne da kanta a wuya matsala, amma sa'a ​​akwai riga m littattafan sadaukar domin wannan topic (Hastie, Tibshirani, and Friedman 2009; Murphy 2012; James et al. 2013) .

Galaxy Zoo nuna juyin halitta da dama mutum ƙidãyar ayyukan. Na farko, wani mai bincike ƙoƙarin aikin da kanta ko da karamin tawagar bincike mataimakansa (misali, Schawinski ta farko rarrabuwa kokarin). Idan wannan m ba hawansa da kyau, da bincike na iya komawa mutum ƙidãyar shiri inda mutane da yawa taimaka sukayi fassara. Amma, ga wani girma data, m adam kokarin ba zai zama isa. A wannan aya, masu bincike bukatar gina biyu-ƙarni tsarin inda mutum sukayi fassara da ake amfani da su horar da wata na'ura ilmantarwa model da za su iya to, a iya amfani da kusan Unlimited yawa na bayanai.