{"id":684,"date":"2020-09-02T17:55:28","date_gmt":"2020-09-02T17:55:28","guid":{"rendered":"https:\/\/commons.princeton.edu\/remote-ethnography\/?page_id=684"},"modified":"2020-12-17T18:27:44","modified_gmt":"2020-12-17T18:27:44","slug":"collecting-data","status":"publish","type":"page","link":"https:\/\/commons.princeton.edu\/remote-ethnography\/collecting-data\/","title":{"rendered":"Collecting Data"},"content":{"rendered":"<div id=\"panels-ipe-paneid-66106\" class=\"panels-ipe-portlet-wrapper panels-ipe-portlet-marker\">\n<div class=\"panels-ipe-portlet-content\">\n<div class=\"panel-pane pane-fieldable-panels-pane pane-vuuid-7c0ef233-9d13-4146-b26e-a9f3b85a1c5a pane-bundle-text\">\n<h2 class=\"pane-title\">Creating Structured Data Sets for Visualization<\/h2>\n<div class=\"fieldable-panels-pane\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"panels-ipe-paneid-66111\" class=\"panels-ipe-portlet-wrapper panels-ipe-portlet-marker\">\n<div class=\"panels-ipe-portlet-content\">\n<div class=\"panel-pane pane-entity-view pane-node\">\n<div class=\"panelizer-view-mode node node-full node-basic-page node-2366 page-default\">\n<div class=\"panel-display boxton clearfix radix-boxton\">\n<div class=\"container-fluid\">\n<div class=\"row\">\n<div class=\"col-md-12 radix-layouts-content panel-panel\">\n<div class=\"panel-panel-inner\">\n<div class=\"panel-pane pane-entity-view pane-node\">\n<article class=\"node-2366 node node-basic-page view-mode-full node-by-viewer clearfix\">\n<div class=\"field field-name-body field-type-text-with-summary field-label-hidden\">\n<div class=\"field-items\">\n<div class=\"field-item even\">\n<p>These are essential tips to guide you in collecting and generating data sets that you will be able to analyze and visualize using a flexible variety of existing and future software tools and media formats. They also apply to <a href=\"https:\/\/commons.princeton.edu\/remote-ethnography\/collecting-mapping-data\/\">collecting data sets for mapping geolocation data<\/a>.<\/p>\n<ul>\n<li><strong>Write your notes using a plain text editor such as TextEdit, or maintain a backup set of your fieldnotes in a plain text format (e.g. .TXT files).<\/strong> Keeping an archive of your notes in a plain text format will enable you to cleanly import your field note data, or copy and paste selected notes, into specialized software and retain them for use in future versions of writing software.<\/li>\n<li><strong>Create or Save structured data from spreadsheets in plain formats such as .TSV (text separated values) or .CSV (comma separated values).<\/strong> The formatting issues described above for word processing software applies to structured and tabulated data, too. Thus, exporting and saving your data as .TSV or .CSV files will ensure access to your data from a wider variety of existing and future visualization tools.<\/li>\n<li><strong>Use a consistent and cogent set of categories and units to describe your data<\/strong>\u00a0whether you are collecting measurable or descriptive data.<\/li>\n<li><strong>Ensure that column names are the same for the same types of\u00a0data<\/strong> if you are using multiple worksheets, workbooks or data sets. For example, if the data for \u201cyear of birth,\u201d &#8220;birthday,&#8221; and &#8220;year&#8221; are the same type of data, they should be defined as a single term in all your data sets. This consistency\u00a0is especially important for joining and analyzing data sets that are derived from multiple sources or time periods.<\/li>\n<li><strong>Avoid leaving empty spaces in column headers<\/strong> that describe data, as required by some software. For example, age (months) might be age_months; \u201cbirthday\u201d could be \u201cbirthdate\u201d.<\/li>\n<li><strong>Review and clean up tabular data.<\/strong> Whether you collect them or generate them yourself,\u00a0spreadsheet tables may not be ready for accurate visual analysis. Despite the neat gridded layout, data gets messy! Check for errors, obvious outliers, typos and erroneous empty (null) rows. Ensure that columns are formatted as data types that correspond to\u00a0how you will use their data. For example, check that dates and currency are defined as such. Change numbers to text for columns with numbers that are actually not measurable quantities to be computed. Zip codes or medical codes, for example. Remove any pre-aggregated data that is not part of the raw data itself, such as totals or sub-totals\u00a0that contain sums, averages, counts, etc. Remove introductory text such as titles or legends which might\u00a0appear apart from your\u00a0column headers, and flatten any sub-headers by creating a new \u00a0columns for major headers in the hierarchy. Finally, remove blank rows;\u00a0check where white spaces\u00a0may appear in your headers and data; trim leading and trailing whitespaces and collapse consecutive whitespaces.\u00a0<a href=\"http:\/\/openrefine.org\/\" data-extlink=\"\">OpenRefine<\/a> is an excellent free tool for managing these issues and for cleaning and organizing datasets before you import\u00a0them into\u00a0visualization or mapping software.<\/li>\n<li><strong>Keep a record of your data sources<\/strong> and record the last time the data set was collected, edited or published, and when you accessed or generated the data.<\/li>\n<li>You can <strong>extract tabulated data from PDFs<\/strong> and save as CSV tables using <a href=\"http:\/\/tabula.technology\/\" data-extlink=\"\">Tabula<\/a>.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/article>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Creating Structured Data Sets for Visualization These are essential tips to guide you in collecting and generating data sets that you will be able to analyze and visualize using a flexible variety of existing and future software tools and media formats. They also apply to collecting data sets for mapping geolocation data. Write your notes [&hellip;]<\/p>\n","protected":false},"author":142,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-684","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/pages\/684","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/users\/142"}],"replies":[{"embeddable":true,"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/comments?post=684"}],"version-history":[{"count":3,"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/pages\/684\/revisions"}],"predecessor-version":[{"id":805,"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/pages\/684\/revisions\/805"}],"wp:attachment":[{"href":"https:\/\/commons.princeton.edu\/remote-ethnography\/wp-json\/wp\/v2\/media?parent=684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}