{"id":4496,"date":"2020-11-02T18:02:41","date_gmt":"2020-11-02T23:02:41","guid":{"rendered":"https:\/\/commons.princeton.edu\/ant347-f20\/?p=4496"},"modified":"2020-11-02T18:20:03","modified_gmt":"2020-11-02T23:20:03","slug":"variation-in-data-availability","status":"publish","type":"post","link":"https:\/\/commons.princeton.edu\/ant347-f20\/variation-in-data-availability\/","title":{"rendered":"Variation in data availability"},"content":{"rendered":"<p class=\"p1\">When it comes to user profiles\u2014ad preferences, interests, taste profiles\u2014the relevant data are usually available in their relatively \u201craw\u201d state. This could be individual, atomic google searches, youtube videos watched, songs played, locations visited.<\/p>\n<p class=\"p1\">What\u2019s usually <i>not<\/i> directly available:<\/p>\n<ul class=\"ul1\">\n<li class=\"li1\">Mid-level representations: histograms, frequencies, time-series plots\n<ul class=\"ul1\">\n<li class=\"li1\">Exceptions:\n<ul class=\"ul1\">\n<li class=\"li1\">Facebook allowed me to see my \u201cinterests,\u201d the content tags that determine my news feed. These are probably a slight aggregation of \u201craw\u201d clicks and interactions. (matching them to predetermined tags) (see Facebook screenshot)<\/li>\n<li class=\"li1\">Spotify generates mid-level personalized content, such as the \u201cOn Repeat\u201d playlist, which lists your 30 most played songs from the last 30 days. This is likely just a very small portion of all the mid-level representational information computed by Spotify. (see Spotify screenshot)<\/li>\n<li class=\"li1\">Usage statistics (phone, browser, etc.) are usually directly available in histogram form.<\/li>\n<\/ul>\n<\/li>\n<li class=\"li1\">Generating mid-level representations from available \u201craw\u201d data would take some work; in most cases it would be possible, but also infeasible for most.\n<ul class=\"ul1\">\n<li class=\"li1\">Sometimes, 3rd-party applications exist to do this for some type of data: for example, I used a chrome extension to produce a representation of my browsing history. (see browsing history screenshot)<\/li>\n<li>Without 3rd-party assistance, generating mid-level representations (or even downloading the data in a form that allows manipulation) is likely unreasonable for most users who may be interested in them.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li class=\"li1\">High-level, abstract, latent representations: membership in the groups by which advertisers and other data-buyers understand us. These are only available to an entity (like a social media company) with access to many user profiles, which can be jointly analyzed for latent variation.\n<ul class=\"ul1\">\n<li class=\"li1\">Obtaining this information may be within the realm of theoretical possibility for a very motivated individual using a site such as Twitter with large amounts of public, scrapeable data, but decidedly outside of the realm of possibility for the vast majority.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<div id=\"attachment_4497\" style=\"width: 255px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4497\" class=\"size-medium wp-image-4497\" src=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-5.47.43-PM-245x300.png\" alt=\"\" width=\"245\" height=\"300\" srcset=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-5.47.43-PM-245x300.png 245w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-5.47.43-PM-838x1024.png 838w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-5.47.43-PM-768x939.png 768w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-5.47.43-PM.png 1240w\" sizes=\"auto, (max-width: 245px) 100vw, 245px\" \/><p id=\"caption-attachment-4497\" class=\"wp-caption-text\">Some of my interests, (likely) based on Facebook interactions, used for ad targeting<\/p><\/div>\n<div id=\"attachment_4498\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4498\" class=\"size-medium wp-image-4498\" src=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/on_repeat-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/on_repeat-300x300.jpg 300w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/on_repeat-150x150.jpg 150w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/on_repeat-768x768.jpg 768w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/on_repeat.jpg 1000w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-4498\" class=\"wp-caption-text\">Most of my &#8220;On Repeat&#8221; playlist based on the last 30 days of Spotify use, part of the listening data used to recommend new music<\/p><\/div>\n<div id=\"attachment_4499\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4499\" class=\"size-medium wp-image-4499\" src=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM-300x244.png\" alt=\"\" width=\"300\" height=\"244\" srcset=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM-300x244.png 300w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM-1024x832.png 1024w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM-768x624.png 768w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM-1536x1248.png 1536w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-3.34.50-PM.png 1942w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-4499\" class=\"wp-caption-text\">My last 3 months of browsing history, visualized by day and by time of day; also frequency tallied by website on the left. Unclear exactly what it is used for<\/p><\/div>\n<div id=\"attachment_4501\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4501\" class=\"size-medium wp-image-4501\" src=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-300x142.png\" alt=\"\" width=\"300\" height=\"142\" srcset=\"https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-300x142.png 300w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-1024x484.png 1024w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-768x363.png 768w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-1536x726.png 1536w, https:\/\/commons.princeton.edu\/ant347-f20\/wp-content\/uploads\/sites\/221\/2020\/11\/Screen-Shot-2020-11-02-at-4.04.54-PM-2048x968.png 2048w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-4501\" class=\"wp-caption-text\">My location on Friday, as tracked by Facebook, who purportedly uses this data as part of its ad targeting<\/p><\/div>\n<p class=\"p1\">Opacity of data usage<\/p>\n<ul class=\"ul1\">\n<li class=\"li1\">Most opaque: location tracking\u2014performed by Facebook and Google (by default), both of which do not exhaustively announce the uses of that data<\/li>\n<li class=\"li1\">Least opaque: probably taste profiles like those constructed by YouTube and Spotify, both of which openly use algorithms to recommend new content to users.<\/li>\n<\/ul>\n<p class=\"p1\">Feedback<\/p>\n<ul class=\"ul1\">\n<li class=\"li1\">Most obvious data collection featuring feedback is in Facebook, Spotify, and YouTube, where the generation of preferences and the consumption based on those preferences occur in a very tight feedback loop\u2014on the scale of the consumption of individual units of content.<\/li>\n<li class=\"li1\">The feedback loop involving browsing history is much less clear, and likely occurs on a much larger scale, both in terms of time and in terms of user base. The output of Google\u2019s webpage-ranking algorithm, for example, evolves based on links between webpages and trends in searching across its billions of users.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to user profiles\u2014ad preferences, interests, taste profiles\u2014the relevant data are usually available in their relatively \u201craw\u201d state. This could be individual, atomic google searches, youtube videos watched, songs played, locations visited. What\u2019s usually not directly available: Mid-level representations: histograms, frequencies, time-series plots Exceptions: Facebook allowed me to see my \u201cinterests,\u201d the content [&hellip;]<\/p>\n","protected":false},"author":2977,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[],"class_list":["post-4496","post","type-post","status-publish","format-standard","hentry","category-data"],"_links":{"self":[{"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/posts\/4496","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/users\/2977"}],"replies":[{"embeddable":true,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/comments?post=4496"}],"version-history":[{"count":5,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/posts\/4496\/revisions"}],"predecessor-version":[{"id":4509,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/posts\/4496\/revisions\/4509"}],"wp:attachment":[{"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/media?parent=4496"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/categories?post=4496"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/commons.princeton.edu\/ant347-f20\/wp-json\/wp\/v2\/tags?post=4496"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}