<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>climate-change | Automated Data Observatories</title>
    <link>/tag/climate-change/</link>
      <atom:link href="/tag/climate-change/index.xml" rel="self" type="application/rss+xml" />
    <description>climate-change</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2020-2021 Daniel Antal</copyright><lastBuildDate>Wed, 07 Jul 2021 00:00:00 +0000</lastBuildDate>
    <image>
      <url>/media/icon_hub7eb2fbae5fdd7bfeda5a9178a9e4f33_23448_512x512_fill_lanczos_center_2.png</url>
      <title>climate-change</title>
      <link>/tag/climate-change/</link>
    </image>
    
    <item>
      <title>Green Deal Data Observatory</title>
      <link>/observatories/greendeal/</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>/observatories/greendeal/</guid>
      <description>&lt;p&gt;&lt;strong&gt;Finding reliable historic and new data and information about climate change, as well as the impact of various European Green Deal policies that try to mitigate it is surprisingly hard to find if you are a scientific researcher. And it is even more hopeless if you work as a (data) journalist, a policy researcher in an NGO, or in the sustainability unit of a company that does not provide you with an army of (geo)statisticians, data engineers, and data scientists who can render various data into usable format, i.e.something that you can trust, quote, visualize, import, or copy &amp;amp; paste.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-globe  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Visit the Green Deal Data Observatory&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fas fa-database  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://api.greendeal.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Try the Green Deal Data Observatory API&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
  &lt;i class=&#34;fab fa-linkedin  pr-1 fa-fw&#34;&gt;&lt;/i&gt; &lt;a href=&#34;https://www.linkedin.com/company/78562153/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Connect on LinkedIn&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;better-bigger-faster-more&#34;&gt;Better, Bigger, Faster, More&lt;/h2&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-novel-data-products&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/global_problem_1_climate_change_5_plots.png&#34; alt=&#34;**Novel data products**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Novel data products&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Official statistics at the national and European levels follow legal regulations, and in the EU, compromises between member states. New policy indicators often appear 5-10 years after demand appears. We employ the same methodology, software, and often even the same data that Eurostat might use to develop policy indicators, but we do not have to wait for a political and legal consensus to create new datasets. See our &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-19_global_problem/&#34; target = &#34;_blank&#34;&gt;100,000 Opinions on the Most Pressing Global Problem&lt;/a&gt; blogpost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-better-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/noaa-WWVD4wXRX38-unsplash-edited.png&#34; alt=&#34;**Better data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Better data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Statistical agencies, old fashioned observatories, and data providers often do not have the mandate, know-how or resources to improve data quality. Using peer-reviewed statistical software and hundreds of computational tests, we are able to correct mistakes, impute missing data, generate  forecasts, and increase the information content of public data by 20-200% percent. This makes the data usable for NGOs, journalists, and visual artists—among other potential users—who do not have this statistical know-how to make incomplete, mislabelled or low quality data usable for their needs and applications. See our example with the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-11-08-indicator_value_added/&#34; target = &#34;_blank&#34;&gt;Government Budget Allocations for R&amp;D in Environment&lt;/a&gt; indicator.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-never-seen-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Gold_panning_at_Bonanza_Creek_4x6.png&#34; alt=&#34;**Never seen data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Never seen data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;The &lt;a href=&#34;https://eur-lex.europa.eu/eli/dir/2019/1024/oj&#34; target = &#34;_blank&#34;&gt;2019/1024 directive&lt;/a&gt; on &lt;i&gt;open data and the re-use of public sector information&lt;/i&gt; of the European Union (which is an extension and modernization of the earlier directives on &lt;i&gt;re-use of public sector information&lt;/i&gt; since 2003) makes data gathered in EU institutions, national institutions, and municipalities, as well as state-owned companies legally available. According to the &lt;a href=&#34;https://data.europa.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf&#34; target = &#34;_blank&#34;&gt;European Data Portal&lt;/a&gt; the estimated historical cost of the data released annually is in the billions of euros. But if this data is a gold mine, its full potential can only be unlocked by an experienced data mining partner like Reprex. Here is why: data is not readily downloadable; it sits in various obsolete file formats in disorganized databases; it is documented in various languages, or not documented at all; it is plagued with various processing errors. We make the powerful promise of &lt;a href=&#34;http://dataobservatory.eu/post/2021-06-18-gold-without-rush/&#34; target = &#34;_blank&#34;&gt;open data&lt;/a&gt; of the EU legislation a reality in the field of the Green Deal policy context.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;increase-your-impact-avoid-old-mistakes&#34;&gt;Increase Your Impact, Avoid Old Mistakes&lt;/h2&gt;
&lt;p&gt;Reprex helps its policy, business, and scientific partners by providing efficient solutions for necessary data engineering, data processing and statistical tasks that are as complex as they are tedious to perform. We deploy validated, open-source, peer-reviewed scientific software to create up-to-date, reliable, high-quality, and immediately usable data and visualizations. Our partners can leave the burden of this task, share the cost of data processing, and concentrate on what they do best: disseminating and advocating, researching, or setting sustainable business or underwriting indicators and creating early warning systems.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-impact&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/zenodo_global_problem_1_climate_change.png&#34; alt=&#34;**Impact**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Impact&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;We publish the data in a way that it is easy to find—as a separate data publication with a DOI, full library metadata, and place it in open science repositories. Our data is more findable than 99% of the open science data, and therefore makes far bigger impact. See our data on the European open science repository &lt;a href=&#34;https://zenodo.org/record/5658849#.YbM_K73MLIU/&#34; target = &#34;_blank&#34;&gt;Zenodo&lt;/a&gt; managed by CERN  (the European Organization for Nuclear Research).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-easy-to-use-data&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png&#34; alt=&#34;**Easy-to-use data**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Easy-to-use data&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Our data follows the &lt;i&gt;tidy data principle&lt;/i&gt; and comes with all the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/&#34; target = &#34;_blank&#34;&gt;recommended Dublin Core and DataCite metadata&lt;/a&gt;. This increases our data compatibility, allowing users  to open it in any spreadsheet application or import into their databases. We publish the data in tabular form, and in JSON form through our API enabling automatic retrieval for heavy users, especially if they plan to automatically use our data in daily or weekly updates. Using the best practice of data formatting and documentation with metadata ensures reproducibility and data integrity, rather than repeating data processing and preparation steps (e.g. changing data formats, removing unwanted characters, creating documentation, and other data processing steps that take up thousands of working hours. See our blogpost on the &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-07-08-data-sisyphus/&#34; target = &#34;_blank&#34;&gt;data Sisyphus&lt;/a&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;ethical-big-data-for-all&#34;&gt;Ethical Big Data for All&lt;/h2&gt;
&lt;p&gt;Big data creates inequalities, because only the largest corporations, government bureaucracies and best endowed universities can afford large data collection programs, the use of satellites, and the employment of many data scientists. Our open collaboration method of data pooling and cost sharing makes big data available for all.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col style=&#34;width: 25%&#34; /&gt;
&lt;col style=&#34;width: 75%&#34; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-big-picture&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/belgium_problem_maps.png&#34; alt=&#34;**Big picture**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Big picture&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;Integrating and joining data is hard—it requires engineering, mathematical, and geo-statistical know-how that a large amount of environmental users and stakeholders do not possess. Some examples of the challenges implicit in making data usable include addressing the changing boundaries of French departments (and European administrative-geographic borders, in general), various projections of coordinates on satellite images of land cover, different measurement areas for public opinion and hydrological data, public finance expressed in different orders (e.g. millions versus thousands of euros). We create data that is easy to combine, map, and visualize for end users. See our case study on the severity and awareness of &lt;a href=&#34;https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/&#34; target = &#34;_blank&#34;&gt;flood risk in Belgium&lt;/a&gt;, as well as the financial capacity to manage it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td style=&#34;text-align: center;&#34;&gt;













&lt;figure  id=&#34;figure-ethical-trustworthy-ai&#34;&gt;
  &lt;div class=&#34;d-flex justify-content-center&#34;&gt;
    &lt;div class=&#34;w-100&#34; &gt;&lt;img src=&#34;/media/img/blogposts_2021/firing_squad.png&#34; alt=&#34;**Ethical, Trustworthy AI**
&#34; loading=&#34;lazy&#34; data-zoomable /&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;figcaption&gt;
      &lt;strong&gt;Ethical, Trustworthy AI&lt;/strong&gt;&lt;br&gt;
    &lt;/figcaption&gt;&lt;/figure&gt;&lt;/td&gt;
&lt;td style=&#34;text-align: left;&#34;&gt;AI in 2021 increases data inequalities because large government and corporate entities with an army of data engineers can create proprietary, black box business algorithms that fundamentally alter our lives. We are involved in the R&amp;D and advocacy of the EU’s trustworthy AI agenda which aims at similar protections like GDPR in privacy. We want to demystify AI by making it available for organizations who cannot finance a data engineering team, because 95% of a successful AI is cheap, complete, reliable data tested for negative outcomes – precisely what d&lt;a href=&#34;https://dataandlyrics.com/post/2021-05-16-recommendation-outcomes/&#34; target = &#34;_blank&#34;&gt;we offer&lt;/a&gt; to our users.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;open-collaboration&#34;&gt;Open Collaboration&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://reprex.nl/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Reprex&lt;/a&gt; grew out of an international data cooperation and works in the open-source world. We use the agile open collaboration method that allows us to work with large corporations, NGOs, developers, university researcher institutes and individuals on an equal footing.&lt;/p&gt;
&lt;p&gt;Find us on &lt;a href=&#34;https://www.linkedin.com/company/78562153/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LinkedIn&lt;/a&gt; or send us an &lt;a href=&#34;https://reprex.nl/#contact&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;email&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Regional Geocoding Harmonization Case Study - Regional Climate Change Awareness Datasets</title>
      <link>/post/2021-03-06-regions-climate/</link>
      <pubDate>Sat, 06 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-06-regions-climate/</guid>
      <description>&lt;pre&gt;&lt;code&gt;library(regions)
library(lubridate)
library(dplyr)

if ( dir.exists(&#39;data-raw&#39;) ) {
  data_raw_dir &amp;lt;- &amp;quot;data-raw&amp;quot;
} else {
  data_raw_dir &amp;lt;- file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;, &amp;quot;data-raw&amp;quot;)
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;going-beyond-the-national-level&#34;&gt;Going beyond the national level&lt;/h2&gt;
&lt;p&gt;Let’s start with a dirty averaging by sub-national unit. The w1
weighting variable contains the post-stratification weight for the
national samples. The Eurobarometer samples represent nations (with the
exception of East and West Germany, Northern Ireland and Great Britain.)
The average of the &lt;code&gt;w1&lt;/code&gt; variable is 1.00 for each sample, but it is not
necessarily 1 for smaller territorial units. If &lt;code&gt;sum(w)&amp;gt;1&lt;/code&gt; for say,
&lt;code&gt;AT23&lt;/code&gt; it only means that the &lt;code&gt;AT23&lt;/code&gt; region was undersampled relatively
to the rest of Austria, and responses must be over-weighted in
post-stratification.&lt;/p&gt;
&lt;p&gt;There is no way to make the samples become regionally representative,
and a correct post-stratification would require further data about the
sampel design. But we can simply adjust to over/undersampling by making
sure that oversampled territorial averages are proportionally increased
and undersampled ones are decreased. [Another ‘dirty’ averaging would
be the use of an unweighted average, but our method is better, because
it more-or-less adjusts gender and education level biases, but leaves
intra-country regional biases in the sample.]&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;panel &amp;lt;- readRDS((file.path(data_raw_dir, &amp;quot;climate-panel.rds&amp;quot;)))

climate_data &amp;lt;-  panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select ( all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;w1&amp;quot;)), 
           contains(&amp;quot;problem&amp;quot;)
  )  %&amp;gt;%
  mutate ( 
    # use the post-stratification weights for national samples
    serious_world_problems_first = w1*serious_world_problems_first , 
    serious_world_problems_climate_change = w1*serious_world_problems_climate_change) %&amp;gt;%
  group_by (  .data$geo ) %&amp;gt;%
  summarise( serious_world_problems_first = mean(serious_world_problems_first, na.rm=TRUE),
             serious_world_problems_climate_change = mean (serious_world_problems_climate_change, na.rm=TRUE),
             mean_w1 = mean(w1)
             ) %&amp;gt;%
  mutate ( 
    # adjust for post-stratification weight bias due to regional over/undersampling
    climate_first = serious_world_problems_first / mean_w1, 
    climate_mentioned = serious_world_problems_climate_change / mean_w1
    ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, we averaged, weighted and adjusted the mentioning of climate change
as the world’s most serious, or one of the most serious problems by NUTS
regions.&lt;/p&gt;
&lt;h2 id=&#34;aggregation-level&#34;&gt;Aggregation level&lt;/h2&gt;
&lt;p&gt;The problem is that most statistical data is available in for the NUTS
regional boundaries according to the &lt;code&gt;NUTS2016&lt;/code&gt; definition. However,
GESIS uses &lt;code&gt;NUTS2013&lt;/code&gt; regions, so 252 regional codes in the four survey
waves are invalid. Some data is available only on national level, but it
can be projected to regional level, because small countries like
Luxembourg have no regional divisions. Larger countries like Germany are
divided only on state level (&lt;code&gt;NUTS1&lt;/code&gt;), while small countries are divided
on &lt;code&gt;NUTS3&lt;/code&gt; level.&lt;/p&gt;
&lt;p&gt;This leads to various problems. Many data is available only on &lt;code&gt;NUTS2&lt;/code&gt;
level, in which case &lt;code&gt;NUTS1&lt;/code&gt; data should be projected to its constituent
smaller &lt;code&gt;NUTS2&lt;/code&gt; regions, and &lt;code&gt;NUTS3&lt;/code&gt; level data must be aggregated up to
larger, containing &lt;code&gt;NUTS2&lt;/code&gt; levels.&lt;/p&gt;
&lt;p&gt;Of course, we also must choose if we use `&lt;code&gt;NUTS2013&lt;/code&gt; or &lt;code&gt;NUTS2016&lt;/code&gt;
boundaries. Sub-national boundaries have changed many thousand times in
the EU27 countries alone since 1999.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 2
##   validate         n
##   &amp;lt;chr&amp;gt;        &amp;lt;int&amp;gt;
## 1 country         15
## 2 invalid        252
## 3 nuts_level_1   132
## 4 nuts_level_2   452
## 5 nuts_level_3   141
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;recoding-the-regions&#34;&gt;Recoding the Regions&lt;/h2&gt;
&lt;p&gt;Our regions package was designed to keep track of sub-national regional
boundary changes. It can validate regional data codes, and to some
extent carry out recoding, imputation or simple aggregation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Recoding means that the boundaries are unchanged, but the country
changed the names/codes of regions, because there were other
boundary changes which did not affect our observation unit.&lt;/li&gt;
&lt;li&gt;Imputation must not be done with usual, general imputation tools,
because our data is regionally structured. However, some imputations
are very simple, because we can use equality equasions like &lt;code&gt;MT&lt;/code&gt; =
&lt;code&gt;MT0&lt;/code&gt;, &lt;code&gt;MT00&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Often the boundary change is additive, and merged territorial units
can simple aggregated for comparison in earlier data.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;regional_coding_2016 &amp;lt;- panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select (  all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;, &amp;quot;year&amp;quot;) ) ) %&amp;gt;%
  distinct_all() %&amp;gt;%
  recode_nuts()

regional_coding_2013 &amp;lt;- panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  select (  all_of(c(&amp;quot;isocntry&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;, &amp;quot;year&amp;quot;) ) ) %&amp;gt;%
  distinct_all() %&amp;gt;%
  recode_nuts( nuts_year = 2013)

climate_data_recoded &amp;lt;- climate_data %&amp;gt;% 
  left_join ( regional_coding_2016, by = &#39;geo&#39; ) %&amp;gt;%
  left_join ( regional_coding_2013 %&amp;gt;% 
                select ( all_of(c(&amp;quot;geo&amp;quot;, &amp;quot;code_2013&amp;quot;))), 
              by = &amp;quot;geo&amp;quot;) %&amp;gt;%
  distinct_all()

saveRDS ( climate_data_recoded , file.path(tempdir(), &amp;quot;climate_panel_recoded_agr.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( climate_data_recoded , file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate_panel_recoded_agr.rds&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://netzero.dataobservatory.eu/media/gif/eu_climate_change.gif&#34; alt=&#34;&#34;&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Where Are People More Likely To Treat Climate Change as the Most Serious Global Problem?</title>
      <link>/post/2021-03-06-individual-join/</link>
      <pubDate>Sat, 06 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-06-individual-join/</guid>
      <description>&lt;pre&gt;&lt;code&gt;library(regions)
library(lubridate)
library(dplyr)

if ( dir.exists(&#39;data-raw&#39;) ) {
  data_raw_dir &amp;lt;- &amp;quot;data-raw&amp;quot;
} else {
  data_raw_dir &amp;lt;- file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;, &amp;quot;data-raw&amp;quot;)
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first results of our longitudinal table &lt;a href=&#34;post/2021-03-05-retroharmonize-climate/&#34;&gt;were difficult to
map&lt;/a&gt;, because the surveys used
an obsolete regional coding. We will adjust the wrong coding, when
possible, and join the data with the European Environment Agency’s (EEA)
Air Quality e-Reporting (AQ e-Reporting) data on environmental
pollution. We recoded the annual level for every available reporting
stations [&lt;em&gt;not shown here&lt;/em&gt;] and all values are in μg/m3. The period
under observation is 2014-2016. Data file:
&lt;a href=&#34;https://www.eea.europa.eu/data-and-maps/data/aqereporting-8&#34;&gt;https://www.eea.europa.eu/data-and-maps/data/aqereporting-8&lt;/a&gt; (European
Environment Agency 2021).&lt;/p&gt;
&lt;h2 id=&#34;recoding-the-regions&#34;&gt;Recoding the Regions&lt;/h2&gt;
&lt;p&gt;Recoding means that the boundaries are unchanged, but the country
changed the names and codes of regions because there were other boundary
changes which did not affect our observation unit. We explain the
problem and the solution in greater detail in &lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-06-regions-climate/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our
tutorial&lt;/a&gt;
that aggregates the data on regional levels.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;panel &amp;lt;- readRDS((file.path(data_raw_dir, &amp;quot;climate-panel.rds&amp;quot;)))

climate_data_geocode &amp;lt;-  panel %&amp;gt;%
  mutate ( year = lubridate::year(date_of_interview)) %&amp;gt;%
  recode_nuts()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s join the air pollution data and join it by corrected geocodes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;load(file.path(&amp;quot;data&amp;quot;, &amp;quot;air_pollutants.rda&amp;quot;)) ## good practice to use system-independent file.path

climate_awareness_air &amp;lt;- climate_data_geocode %&amp;gt;%
  rename ( region_nuts_codes  = .data$code_2016) %&amp;gt;%
  left_join ( air_pollutants, by = &amp;quot;region_nuts_codes&amp;quot; ) %&amp;gt;%
  select ( -all_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;date_of_interview&amp;quot;, 
                     &amp;quot;typology&amp;quot;, &amp;quot;typology_change&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;))) %&amp;gt;%
  mutate (
    # remove special labels and create NA_numeric_ 
    age_education = retroharmonize::as_numeric(age_education)) %&amp;gt;%
  mutate_if ( is.character, as.factor) %&amp;gt;%
  mutate ( 
    # we only have responses from 4 years, and this should be treated as a categorical variable
    year = as.factor(year) 
    ) %&amp;gt;%
  filter ( complete.cases(.) ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;climate_awareness_air&lt;/code&gt; data frame contains the answers of 75086
individual respondents. 17.07% thought that climate change was the most
serious world problem and 33.6% mentioned climate change as one of the
three most important global problems.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;summary ( climate_awareness_air  )

##                  rowid       serious_world_problems_first
##  ZA5877_v2-0-0_1    :    1   Min.   :0.0000              
##  ZA5877_v2-0-0_10   :    1   1st Qu.:0.0000              
##  ZA5877_v2-0-0_100  :    1   Median :0.0000              
##  ZA5877_v2-0-0_1000 :    1   Mean   :0.1707              
##  ZA5877_v2-0-0_10000:    1   3rd Qu.:0.0000              
##  ZA5877_v2-0-0_10001:    1   Max.   :1.0000              
##  (Other)            :75080                               
##  serious_world_problems_climate_change    isocntry    
##  Min.   :0.000                         BE     : 3028  
##  1st Qu.:0.000                         CZ     : 3023  
##  Median :0.000                         NL     : 3019  
##  Mean   :0.336                         SK     : 3000  
##  3rd Qu.:1.000                         SE     : 2980  
##  Max.   :1.000                         DE-W   : 2978  
##                                        (Other):57058  
##                                    marital_status         age_education  
##  (Re-)Married: without children           :13242   18            :15485  
##  (Re-)Married: children this marriage     :12696   19            : 7728  
##  Single: without children                 : 7650   16            : 5840  
##  (Re-)Married: w children of this marriage: 6520   still studying: 5098  
##  (Re-)Married: living without children    : 6225   17            : 5092  
##  Single: living without children          : 4102   15            : 4528  
##  (Other)                                  :24651   (Other)       :31315  
##    age_exact                      occupation_of_respondent
##  Min.   :15.0   Retired, unable to work       :22911      
##  1st Qu.:36.0   Skilled manual worker         : 6774      
##  Median :51.0   Employed position, at desk    : 6716      
##  Mean   :50.1   Employed position, service job: 5624      
##  3rd Qu.:65.0   Middle management, etc.       : 5252      
##  Max.   :99.0   Student                       : 5098      
##                 (Other)                       :22711      
##             occupation_of_respondent_recoded
##  Employed (10-18 in d15a)   :32763          
##  Not working (1-4 in d15a)  :37125          
##  Self-employed (5-9 in d15a): 5198          
##                                             
##                                             
##                                             
##                                             
##                        respondent_occupation_scale_c_14
##  Retired (4 in d15a)                   :22911          
##  Manual workers (15 to 18 in d15a)     :15269          
##  Other white collars (13 or 14 in d15a): 9203          
##  Managers (10 to 12 in d15a)           : 8291          
##  Self-employed (5 to 9 in d15a)        : 5198          
##  Students (2 in d15a)                  : 5098          
##  (Other)                               : 9116          
##                   type_of_community   is_student      no_education     
##  DK                        :   34   Min.   :0.0000   Min.   :0.000000  
##  Large town                :20939   1st Qu.:0.0000   1st Qu.:0.000000  
##  Rural area or village     :24686   Median :0.0000   Median :0.000000  
##  Small or middle sized town: 9850   Mean   :0.0679   Mean   :0.008151  
##  Small/middle town         :19577   3rd Qu.:0.0000   3rd Qu.:0.000000  
##                                     Max.   :1.0000   Max.   :1.000000  
##                                                                        
##    education       year       region_nuts_codes  country_code  
##  Min.   :14.00   2013:25103   LU     : 1432     DE     : 4531  
##  1st Qu.:17.00   2015:    0   MT     : 1398     GB     : 3538  
##  Median :18.00   2017:25053   CY     : 1192     BE     : 3028  
##  Mean   :19.61   2019:24930   SK02   : 1053     CZ     : 3023  
##  3rd Qu.:22.00                EL30   :  974     NL     : 3019  
##  Max.   :30.00                EE     :  973     SK     : 3000  
##                               (Other):68064     (Other):54947  
##      pm2_5             pm10               o3              BaP        
##  Min.   : 2.109   Min.   :  5.883   Min.   : 66.37   Min.   :0.0102  
##  1st Qu.: 9.374   1st Qu.: 28.326   1st Qu.: 90.89   1st Qu.:0.1779  
##  Median :11.866   Median : 33.673   Median :102.81   Median :0.4105  
##  Mean   :12.954   Mean   : 38.637   Mean   :101.49   Mean   :0.8759  
##  3rd Qu.:15.890   3rd Qu.: 49.488   3rd Qu.:110.73   3rd Qu.:1.0692  
##  Max.   :41.293   Max.   :123.239   Max.   :141.04   Max.   :7.8050  
##                                                                      
##       so2              ap_pc1            ap_pc2             ap_pc3       
##  Min.   : 0.0000   Min.   :-4.6669   Min.   :-2.21851   Min.   :-2.1007  
##  1st Qu.: 0.0000   1st Qu.:-0.4624   1st Qu.:-0.49130   1st Qu.:-0.5695  
##  Median : 0.0000   Median : 0.4263   Median : 0.02902   Median :-0.1113  
##  Mean   : 0.1032   Mean   : 0.1031   Mean   : 0.04166   Mean   :-0.1746  
##  3rd Qu.: 0.0000   3rd Qu.: 0.9748   3rd Qu.: 0.57416   3rd Qu.: 0.3309  
##  Max.   :42.5325   Max.   : 2.0344   Max.   : 3.25841   Max.   : 4.1615  
##                                                                          
##      ap_pc4            ap_pc5        
##  Min.   :-1.7387   Min.   :-2.75079  
##  1st Qu.:-0.1669   1st Qu.:-0.18748  
##  Median : 0.0371   Median : 0.01811  
##  Mean   : 0.1154   Mean   : 0.06797  
##  3rd Qu.: 0.3050   3rd Qu.: 0.34937  
##  Max.   : 3.2476   Max.   : 1.42816  
## 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see a simple CART tree! We remove the regional codes, because
there are very serious differences among regional climate awareness.
These differences, together with education level, and the year we are
talking about, are the most important predictors of thinking about
climate change as the most important global problem in Europe.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Classification Tree with rpart
library(rpart)

# grow tree
fit &amp;lt;- rpart(as.factor(serious_world_problems_first) ~ .,
   method=&amp;quot;class&amp;quot;, data=climate_awareness_air %&amp;gt;%
     select ( - all_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;region_nuts_codes&amp;quot;))), 
   control = rpart.control(cp = 0.005))

printcp(fit) # display the results

## 
## Classification tree:
## rpart(formula = as.factor(serious_world_problems_first) ~ ., 
##     data = climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;, 
##         &amp;quot;region_nuts_codes&amp;quot;))), method = &amp;quot;class&amp;quot;, control = rpart.control(cp = 0.005))
## 
## Variables actually used in tree construction:
## [1] age_education                         isocntry                             
## [3] serious_world_problems_climate_change year                                 
## 
## Root node error: 12817/75086 = 0.1707
## 
## n= 75086 
## 
##          CP nsplit rel error  xerror      xstd
## 1 0.0240566      0   1.00000 1.00000 0.0080438
## 2 0.0082703      3   0.92783 0.92783 0.0078055
## 3 0.0050000      5   0.91129 0.91425 0.0077588

plotcp(fit) # visualize cross-validation results
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;rpart-1.png&#34; alt=&#34;&amp;ldquo;Visualize cross-validation results&amp;rdquo;&#34;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;summary(fit) # detailed summary of splits

## Call:
## rpart(formula = as.factor(serious_world_problems_first) ~ ., 
##     data = climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;, 
##         &amp;quot;region_nuts_codes&amp;quot;))), method = &amp;quot;class&amp;quot;, control = rpart.control(cp = 0.005))
##   n= 75086 
## 
##            CP nsplit rel error    xerror        xstd
## 1 0.024056592      0 1.0000000 1.0000000 0.008043837
## 2 0.008270266      3 0.9278302 0.9278302 0.007805478
## 3 0.005000000      5 0.9112897 0.9142545 0.007758824
## 
## Variable importance
## serious_world_problems_climate_change                              isocntry 
##                                    31                                    26 
##                          country_code                                   BaP 
##                                    20                                     8 
##                                 pm2_5                                ap_pc1 
##                                     4                                     3 
##                         age_education                                  pm10 
##                                     2                                     2 
##                             education                                ap_pc2 
##                                     2                                     1 
##                                  year 
##                                     1 
## 
## Node number 1: 75086 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.1706976  P(node) =1
##     class counts: 62269 12817
##    probabilities: 0.829 0.171 
##   left son=2 (25229 obs) right son=3 (49857 obs)
##   Primary splits:
##       serious_world_problems_climate_change &amp;lt; 0.5          to the right, improve=2214.2040, (0 missing)
##       isocntry                              splits as  RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve= 728.0160, (0 missing)
##       country_code                          splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve= 673.3656, (0 missing)
##       BaP                                   &amp;lt; 0.4300347    to the right, improve= 310.6229, (0 missing)
##       pm2_5                                 &amp;lt; 13.38264     to the right, improve= 296.4013, (0 missing)
##   Surrogate splits:
##       age_education splits as  ----RRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRL-RRR-RRRRRRRRR--RRRLLR--R-R, agree=0.664, adj=0, (0 split)
##       pm10          &amp;lt; 7.491315     to the left,  agree=0.664, adj=0, (0 split)
## 
## Node number 2: 25229 observations
##   predicted class=0  expected loss=0  P(node) =0.3360014
##     class counts: 25229     0
##    probabilities: 1.000 0.000 
## 
## Node number 3: 49857 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.2570752  P(node) =0.6639986
##     class counts: 37040 12817
##    probabilities: 0.743 0.257 
##   left son=6 (34631 obs) right son=7 (15226 obs)
##   Primary splits:
##       isocntry     splits as  RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve=1454.9460, (0 missing)
##       country_code splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve=1359.7210, (0 missing)
##       BaP          &amp;lt; 0.4300347    to the right, improve= 629.8844, (0 missing)
##       pm2_5        &amp;lt; 13.38264     to the right, improve= 555.7484, (0 missing)
##       ap_pc1       &amp;lt; -0.005459537 to the left,  improve= 533.3579, (0 missing)
##   Surrogate splits:
##       country_code splits as  RRLLLRRLLRLLLLLLLLLLRRLLLRLL, agree=0.987, adj=0.957, (0 split)
##       BaP          &amp;lt; 0.1749425    to the right, agree=0.775, adj=0.264, (0 split)
##       pm2_5        &amp;lt; 5.206993     to the right, agree=0.737, adj=0.140, (0 split)
##       ap_pc1       &amp;lt; 1.405527     to the left,  agree=0.733, adj=0.126, (0 split)
##       pm10         &amp;lt; 25.31211     to the right, agree=0.718, adj=0.076, (0 split)
## 
## Node number 6: 34631 observations
##   predicted class=0  expected loss=0.1769802  P(node) =0.4612178
##     class counts: 28502  6129
##    probabilities: 0.823 0.177 
## 
## Node number 7: 15226 observations,    complexity param=0.02405659
##   predicted class=0  expected loss=0.4392487  P(node) =0.2027808
##     class counts:  8538  6688
##    probabilities: 0.561 0.439 
##   left son=14 (11607 obs) right son=15 (3619 obs)
##   Primary splits:
##       isocntry      splits as  LL---LLR--L-L----------LL---R--, improve=337.5462, (0 missing)
##       country_code  splits as  LL---LR--L-L--------LL---R--, improve=337.5462, (0 missing)
##       age_education splits as  ----LLLLLL-LLLRRRRRRR-RRRRRRRRRL-RRRRRRLLRR-RRRRLLRLRL-RRLRRR-RRR-LLLLRRR-----LR-----L-R, improve=294.0807, (0 missing)
##       education     &amp;lt; 22.5         to the left,  improve=262.3747, (0 missing)
##       BaP           &amp;lt; 0.053328     to the right, improve=232.7043, (0 missing)
##   Surrogate splits:
##       BaP           &amp;lt; 0.053328     to the right, agree=0.878, adj=0.485, (0 split)
##       pm2_5         &amp;lt; 4.810361     to the right, agree=0.827, adj=0.271, (0 split)
##       ap_pc2        &amp;lt; 0.8746175    to the left,  agree=0.792, adj=0.124, (0 split)
##       so2           &amp;lt; 0.3302972    to the left,  agree=0.781, adj=0.078, (0 split)
##       age_education splits as  ----LLLLLL-LLLLLLLRLR-LRRLRRRRRR-RRRRLLLLLR-LRLRLLRRLL-LLRLLR-LLR-RRLLLLL-----RR-----R-L, agree=0.779, adj=0.071, (0 split)
## 
## Node number 14: 11607 observations,    complexity param=0.008270266
##   predicted class=0  expected loss=0.3804601  P(node) =0.1545827
##     class counts:  7191  4416
##    probabilities: 0.620 0.380 
##   left son=28 (7462 obs) right son=29 (4145 obs)
##   Primary splits:
##       age_education                    splits as  ----LLLLLL-LRRRRRRRRR-RRLRRLRRLL-RRRRLRLLRR-RLRLLLRLRL-RR-RR--RRL-L-LLRRR------------L-R, improve=123.71070, (0 missing)
##       year                             splits as  R-LR, improve=107.79460, (0 missing)
##       education                        &amp;lt; 20.5         to the left,  improve= 90.28724, (0 missing)
##       occupation_of_respondent         splits as  LRRLRRRRRLRLLLRLLL, improve= 84.62865, (0 missing)
##       respondent_occupation_scale_c_14 splits as  LRLLLRRL, improve= 68.88653, (0 missing)
##   Surrogate splits:
##       education                        &amp;lt; 20.5         to the left,  agree=0.950, adj=0.861, (0 split)
##       occupation_of_respondent         splits as  LLLLRLLRRLRLLLRLLL, agree=0.738, adj=0.267, (0 split)
##       respondent_occupation_scale_c_14 splits as  LRLLLLRL, agree=0.733, adj=0.251, (0 split)
##       is_student                       &amp;lt; 0.5          to the left,  agree=0.709, adj=0.186, (0 split)
##       age_exact                        &amp;lt; 23.5         to the right, agree=0.676, adj=0.094, (0 split)
## 
## Node number 15: 3619 observations
##   predicted class=1  expected loss=0.3722023  P(node) =0.04819807
##     class counts:  1347  2272
##    probabilities: 0.372 0.628 
## 
## Node number 28: 7462 observations
##   predicted class=0  expected loss=0.326052  P(node) =0.09937938
##     class counts:  5029  2433
##    probabilities: 0.674 0.326 
## 
## Node number 29: 4145 observations,    complexity param=0.008270266
##   predicted class=0  expected loss=0.4784077  P(node) =0.05520337
##     class counts:  2162  1983
##    probabilities: 0.522 0.478 
##   left son=58 (2573 obs) right son=59 (1572 obs)
##   Primary splits:
##       year                     splits as  L-LR, improve=40.13885, (0 missing)
##       occupation_of_respondent splits as  LRLLRRRRRLRLLLRLLL, improve=18.33254, (0 missing)
##       marital_status           splits as  LRRRLRRRLRRLRLLRRRRRRLRLRLLRR, improve=17.86888, (0 missing)
##       type_of_community        splits as  LRLRL, improve=17.55254, (0 missing)
##       age_education            splits as  ------------LLRRRRRRR-RR-RL-RR---LRRR-R--LR-R-R---R-R--RR-RR--RR------RRR--------------R, improve=14.66121, (0 missing)
##   Surrogate splits:
##       type_of_community splits as  LLLRL, agree=0.777, adj=0.412, (0 split)
##       marital_status    splits as  RRLLLLLRLLLLLLLRRRLLLLLLRLRLL, agree=0.680, adj=0.155, (0 split)
##       isocntry          splits as  LL---LL---L-R----------LL------, agree=0.669, adj=0.127, (0 split)
##       country_code      splits as  LL---L---L-R--------LL------, agree=0.669, adj=0.127, (0 split)
##       o3                &amp;lt; 83.06345     to the right, agree=0.650, adj=0.076, (0 split)
## 
## Node number 58: 2573 observations
##   predicted class=0  expected loss=0.4240187  P(node) =0.03426737
##     class counts:  1482  1091
##    probabilities: 0.576 0.424 
## 
## Node number 59: 1572 observations
##   predicted class=1  expected loss=0.43257  P(node) =0.02093599
##     class counts:   680   892
##    probabilities: 0.433 0.567

# plot tree
plot(fit, uniform=TRUE,
   main=&amp;quot;Classification Tree: Climate Change Is The Most Serious Threat&amp;quot;)
text(fit, use.n=TRUE, all=TRUE, cex=.8)

## Warning in labels.rpart(x, minlength = minlength): more than 52 levels in a
## predicting factor, truncated for printout
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;rpart-2.png&#34; alt=&#34;&amp;ldquo;predicting factor, truncated for printout&amp;rdquo;&#34;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;saveRDS ( climate_awareness_air , file.path(tempdir(), &amp;quot;climate_panel_recoded.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( climate_awareness_air, file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel_recoded.rds&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
</description>
    </item>
    
    <item>
      <title>Retrospective Survey Harmonization Case Study - Climate Awareness Change in Europe 2013-2019.</title>
      <link>/post/2021-03-05-retroharmonize-climate/</link>
      <pubDate>Fri, 05 Mar 2021 00:00:00 +0000</pubDate>
      <guid>/post/2021-03-05-retroharmonize-climate/</guid>
      <description>&lt;p&gt;Retrospective survey harmonization comes with many challenges, as we
have shown in the
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04_retroharmonize_intro/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;introduction&lt;/a&gt;
to this tutorial case study. In this example, we will work with
Eurobarometer’s data.&lt;/p&gt;
&lt;p&gt;Please use the development version of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;antaldaniel/retroharmonize&amp;quot;)

library(retroharmonize)
library(dplyr)       # this is necessary for the example 
library(lubridate)   # easier date conversion

## Warning: package &#39;lubridate&#39; was built under R version 4.0.4

library(stringr)     # You can also use base R string processing functions 
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;get-the-data&#34;&gt;Get the Data&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;retroharmonize&lt;/code&gt; is not associated with Eurobarometer, or its creators,
Kantar, or its archivists, GESIS. We assume that you have acquired the
necessary files from GESIS after carefully reading their terms and you
placed it on a path that you call gesis_dir. The precise documentation
of the data we use can be found in this supporting
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;blogpost&lt;/a&gt;.
To reproduce this blogpost, you will need &lt;code&gt;ZA5877_v2-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA6595_v3-0-0.sav&lt;/code&gt;, &lt;code&gt;ZA6861_v1-2-0.sav&lt;/code&gt;, &lt;code&gt;ZA7488_v1-0-0.sav&lt;/code&gt;,
&lt;code&gt;ZA7572_v1-0-0.sav&lt;/code&gt; in a directory that you will name &lt;code&gt;gesis_dir&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#Not run in the blogpost. In the repo we have a saved version.
climate_change_files &amp;lt;- c(&amp;quot;ZA5877_v2-0-0.sav&amp;quot;, &amp;quot;ZA6595_v3-0-0.sav&amp;quot;,  &amp;quot;ZA6861_v1-2-0.sav&amp;quot;, 
                          &amp;quot;ZA7488_v1-0-0.sav&amp;quot;, &amp;quot;ZA7572_v1-0-0.sav&amp;quot;)

eb_waves &amp;lt;- read_surveys(file.path(gesis_dir, climate_change_files), .f=&#39;read_spss&#39;)

if (dir.exists(&amp;quot;data-raw&amp;quot;)) {
  save ( eb_waves,  file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}

if ( file.exists( file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )) {
  load (file.path( &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot; ) )
} else {
  load (file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;,  &amp;quot;data-raw&amp;quot;, &amp;quot;eb_climate_change_waves.rda&amp;quot;) )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;eb_waves&lt;/code&gt; nested list contains five surveys imported from SPSS to
the survey class of
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize&lt;/a&gt;.
The survey class is a data.frame that retains important metadata for
further harmonization.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;document_waves (eb_waves)

## # A tibble: 5 x 5
##   id            filename           ncol  nrow object_size
##   &amp;lt;chr&amp;gt;         &amp;lt;chr&amp;gt;             &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 ZA5877_v2-0-0 ZA5877_v2-0-0.sav   604 27919   139352456
## 2 ZA6595_v3-0-0 ZA6595_v3-0-0.sav   519 27718   119370440
## 3 ZA6861_v1-2-0 ZA6861_v1-2-0.sav   657 27901   151397528
## 4 ZA7488_v1-0-0 ZA7488_v1-0-0.sav   752 27339   169465928
## 5 ZA7572_v1-0-0 ZA7572_v1-0-0.sav   348 27655    80562432
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Beware the object sizes. If you work with many surveys, memory-efficient
programming becomes imperative. We will be subsetting whenever possible.&lt;/p&gt;
&lt;h2 id=&#34;metadata-analysis&#34;&gt;Metadata analysis&lt;/h2&gt;
&lt;p&gt;As noted before, prepare to work with nested lists. Each imported survey
is nested as a data frame in the &lt;code&gt;eb_waves&lt;/code&gt; list.&lt;/p&gt;
&lt;h2 id=&#34;metadata-protocol-variables&#34;&gt;Metadata: Protocol Variables&lt;/h2&gt;
&lt;p&gt;Eurobarometer calls certain metadata elements, like interviewee
cooperation level or the date of a survey interview as protocol
variable. Let’s start here. This will be our template to harmonize more
and more aspects of the five surveys (which are, in fact, already
harmonization of about 30 surveys conducted in a single ‘wave’ in
multiple countries.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# select variables of interest from the metadata
eb_protocol_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( .data$label_orig %in% c(&amp;quot;date of interview&amp;quot;) |
             .data$var_name_orig == &amp;quot;rowid&amp;quot;)  %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; )

# subset and harmonize these variables in all nested list items of &#39;waves&#39; of surveys
interview_dates &amp;lt;- harmonize_var_names(eb_waves, 
                                       eb_protocol_metadata )

# apply similar data processing rules to same variables
interview_dates &amp;lt;- lapply (interview_dates, 
                      function (x) x %&amp;gt;% mutate ( date_of_interview = as_character(.data$date_of_interview) )
                      )

# join the individual survey tables into a single table 
interview_dates &amp;lt;- as_tibble ( Reduce (rbind, interview_dates) )

# Check the variable classes.

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;       &amp;quot;character&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is our sample workflow for each block of variables.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Get a unique identifier.&lt;/li&gt;
&lt;li&gt;Add other variables&lt;/li&gt;
&lt;li&gt;Harmonize the variable names&lt;/li&gt;
&lt;li&gt;Subset the data leaving out anything that you do not harmonize in
this block.&lt;/li&gt;
&lt;li&gt;Apply some normalization in a nested list.&lt;/li&gt;
&lt;li&gt;When the variables are harmonized to same name, class, merge them
into a data.frame-like &lt;code&gt;tibble&lt;/code&gt; object.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now finish the harmonization. &lt;code&gt;Wednesday, 31st October 2018&lt;/code&gt; should
become a Date type &lt;code&gt;2018-10-31&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;require(lubridate)
harmonize_date &amp;lt;- function(x) {
  x &amp;lt;- tolower(as.character(x))
  x &amp;lt;- gsub(&amp;quot;monday|tuesday|wednesday|thursday|friday|saturday|sunday|\\,|th|nd|rd|st&amp;quot;, &amp;quot;&amp;quot;, x)
  x &amp;lt;- gsub(&amp;quot;decemberber&amp;quot;, &amp;quot;december&amp;quot;, x) # all those annoying real-life data problems!
  x &amp;lt;- stringr::str_trim (x, &amp;quot;both&amp;quot;)
  x &amp;lt;- gsub(&amp;quot;^0&amp;quot;, &amp;quot;&amp;quot;, x )
  x &amp;lt;- gsub(&amp;quot;\\s\\s&amp;quot;, &amp;quot;\\s&amp;quot;, x)
  lubridate::dmy(x) 
}

interview_dates &amp;lt;- interview_dates %&amp;gt;%
  mutate ( date_of_interview = harmonize_date(.data$date_of_interview) )

vapply(interview_dates, function(x) class(x)[1], character(1))

##             rowid date_of_interview 
##       &amp;quot;character&amp;quot;            &amp;quot;Date&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To avoid duplication of row IDs in surveys that may not be unique in
&lt;em&gt;different&lt;/em&gt; surveys, we created a simple, sequential ID for each survey,
including the ID of the original file.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(interview_dates, 6)

## # A tibble: 6 x 2
##   rowid               date_of_interview
##   &amp;lt;chr&amp;gt;               &amp;lt;date&amp;gt;           
## 1 ZA7488_v1-0-0_7016  2018-10-28       
## 2 ZA7488_v1-0-0_19187 2018-11-02       
## 3 ZA6861_v1-2-0_1218  2017-03-18       
## 4 ZA6861_v1-2-0_4142  2017-03-21       
## 5 ZA7572_v1-0-0_12363 2019-04-17       
## 6 ZA7572_v1-0-0_8071  2019-04-18
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After this type-conversion problem let’s see an issue when an original
SPSS variable can have two meaningful R representations.&lt;/p&gt;
&lt;h2 id=&#34;metadata-geographical-information&#34;&gt;Metadata: Geographical information&lt;/h2&gt;
&lt;p&gt;Let’s continue with harmonizing geographical information in the files.
In this example, &lt;code&gt;var_name_suggested&lt;/code&gt; will contain the harmonized
variable name. It is likely that you have to make this call, after
carefully reading the original questionnaires and codebooks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_regional_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^nuts$&amp;quot;, .data$var_name_orig)) %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  mutate ( var_name_suggested = case_when ( 
    var_name_suggested == &amp;quot;region_nuts_codes&amp;quot;     ~ &amp;quot;geo&amp;quot;,
    TRUE ~ var_name_suggested ))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_var_names()&lt;/code&gt; takes all variables in the subsetted,
geographical metadata table, and brings them to the harmonized
&lt;code&gt;var_name_suggested&lt;/code&gt; name. The function subsets the surveys to avoid the
presence of non-harmonized variables. All regional NUTS codes become
&lt;code&gt;geo&lt;/code&gt; in our case:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- harmonize_var_names(eb_waves, 
                                 eb_regional_metadata)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are used to work with single survey files, you are likely to work
in a tabular format, which easily converts into a data.frame like
object, in our example, to tidyverse’s &lt;code&gt;tibble&lt;/code&gt;. However, when working
with longitudinal data, it is far simpler to work with nested lists,
because the tables usually have different dimensions (neither the rows
corresponding to observations or the columns are the same across all
survey files.)&lt;/p&gt;
&lt;p&gt;In the nested list, each list element is a single, tabular-format
survey. (In fact, the survey are in retroharmonize’s
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/survey.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;survey&lt;/a&gt;
class, which is a rich tibble that contains the metadata and the
processing history of the survey.)&lt;/p&gt;
&lt;p&gt;The regional information in the Eurobarometer files is contained in the
&lt;code&gt;nuts&lt;/code&gt; variable. We want to keep both the original labels and values.
The original values are the region’s codes, and the labels are the
names. The easiest and fastest solution is the base R &lt;code&gt;lapply&lt;/code&gt; loop.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- lapply ( geography, 
                      function (x) x %&amp;gt;% mutate ( region = as_character(geo), 
                                                  geo    = as.character(geo) )  
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because each table has exactly the same columns, we can simply use
&lt;code&gt;rbind()&lt;/code&gt; and reduce the list to a modern &lt;code&gt;data.frame&lt;/code&gt;, i.e. a &lt;code&gt;tibble&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;geography &amp;lt;- as_tibble ( Reduce (rbind, geography) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see a dozen cases:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(geography, 12)

## # A tibble: 12 x 4
##    rowid               isocntry geo   region              
##    &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;               
##  1 ZA7488_v1-0-0_7016  SI       SI012 Podravska           
##  2 ZA7488_v1-0-0_19187 PL       PL63  Pomorskie           
##  3 ZA6861_v1-2-0_1218  DK       DK02  Sjaelland           
##  4 ZA6861_v1-2-0_4142  FI       FI1B  Helsinki-Uusimaa    
##  5 ZA7572_v1-0-0_12363 SE       SE12  Oestra Mellansverige
##  6 ZA7572_v1-0-0_8071  IT       ITH   Nord-Est [IT]       
##  7 ZA6861_v1-2-0_6145  IE       IE021 Dublin              
##  8 ZA6861_v1-2-0_24638 RO       RO31  South [RO]          
##  9 ZA7488_v1-0-0_11315 CY       CY    REPUBLIC OF CYPRUS  
## 10 ZA6595_v3-0-0_27568 HR       HR041 Grad Zagreb         
## 11 ZA7572_v1-0-0_17397 CZ       CZ06  Jihovychod          
## 12 ZA6861_v1-2-0_10993 PT       PT17  Lisboa
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The idea is that we do similar variable harmonization block by block,
and eventually we will join them together. Next step: socio-demography
and weights.&lt;/p&gt;
&lt;h2 id=&#34;socio-demography-and-weights&#34;&gt;Socio-demography and Weights&lt;/h2&gt;
&lt;p&gt;There are a few peculiar issues to look out for. This example shows that
survey harmonization requires plenty of expert judgment, and you cannot
fully automate the process.&lt;/p&gt;
&lt;p&gt;The Eurobarometer archives do not use all weight and demographic
variable names consistently. For example, the &lt;code&gt;wex&lt;/code&gt; variable, which is a
projected weight for the country’s 15 years old or older population is
sometimes called &lt;code&gt;wex&lt;/code&gt;, sometimes &lt;code&gt;wextra&lt;/code&gt;. The individual survey’s
post-stratification weight is the &lt;code&gt;w1&lt;/code&gt; variable, but this is not
necessarily what you need to use.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;suggest_var_names()&lt;/code&gt; function has a parameter for
&lt;code&gt;survey_program = &amp;quot;eurobaromater&amp;quot;&lt;/code&gt; which normalizes a bit the most used
variables. For example, all variations of wex, wextra wil be noramlized
to wex. You can ignore this parameter and use your own names, too.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata  &amp;lt;- eb_climate_metadata %&amp;gt;%
  filter ( grepl( &amp;quot;rowid|isocntry|^d8$|^d7$|^wex|^w1$|d25|^d15a|^d11$&amp;quot;, .data$var_name_orig) ) %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, using the original labels would not help, because they
also contain various alterations.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;eb_demography_metadata %&amp;gt;%
  select ( filename, var_name_orig, label_orig, var_name_suggested ) %&amp;gt;%
  filter (var_name_orig %in% c(&amp;quot;wex&amp;quot;, &amp;quot;wextra&amp;quot;) )

##            filename var_name_orig                                  label_orig
## 1 ZA5877_v2-0-0.sav        wextra      weight extrapolated population 15 plus
## 2 ZA6595_v3-0-0.sav        wextra      weight extrapolated population 15 plus
## 3 ZA6861_v1-2-0.sav           wex weight extrapolated population aged 15 plus
## 4 ZA7488_v1-0-0.sav           wex weight extrapolated population aged 15 plus
## 5 ZA7572_v1-0-0.sav           wex weight extrapolated population aged 15 plus
##   var_name_suggested
## 1                wex
## 2                wex
## 3                wex
## 4                wex
## 5                wex

demography &amp;lt;- harmonize_var_names ( waves = eb_waves, 
                                    metadata = eb_demography_metadata ) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Socio-demographic variables like level of highest education or
occupation are rather country-specific. Eurobarometer uses standardized
occupation and marital status scales, and a proxy for education levels,
age of leaving full-time education.&lt;/p&gt;
&lt;p&gt;This is a particularly tricky variable, because it’s coding in fact
contains three different variables - school leaving age, except for
students, and except for people who did not finish their compulsory
primary school. And while school leaving age was a good proxy since the
1970s, in the age when the EU is promoting life-long-learning becomes
less and less useful, as people stop and re-start their education
throughout their lives.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;example &amp;lt;- demography[[1]] %&amp;gt;%
  mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
  mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) )
unique ( example$age_education )

##  [1] &amp;quot;22&amp;quot;                     &amp;quot;25&amp;quot;                     &amp;quot;17&amp;quot;                    
##  [4] &amp;quot;19&amp;quot;                     &amp;quot;12&amp;quot;                     &amp;quot;23&amp;quot;                    
##  [7] &amp;quot;18&amp;quot;                     &amp;quot;20&amp;quot;                     &amp;quot;21&amp;quot;                    
## [10] &amp;quot;14&amp;quot;                     &amp;quot;24&amp;quot;                     &amp;quot;16&amp;quot;                    
## [13] &amp;quot;26&amp;quot;                     &amp;quot;15&amp;quot;                     &amp;quot;Still studying&amp;quot;        
## [16] &amp;quot;DK&amp;quot;                     &amp;quot;31&amp;quot;                     &amp;quot;29&amp;quot;                    
## [19] &amp;quot;27&amp;quot;                     &amp;quot;13&amp;quot;                     &amp;quot;32&amp;quot;                    
## [22] &amp;quot;28&amp;quot;                     &amp;quot;30&amp;quot;                     &amp;quot;53&amp;quot;                    
## [25] &amp;quot;42&amp;quot;                     &amp;quot;62&amp;quot;                     &amp;quot;40&amp;quot;                    
## [28] &amp;quot;No full-time education&amp;quot; &amp;quot;Refusal&amp;quot;                &amp;quot;37&amp;quot;                    
## [31] &amp;quot;39&amp;quot;                     &amp;quot;34&amp;quot;                     &amp;quot;35&amp;quot;                    
## [34] &amp;quot;47&amp;quot;                     &amp;quot;36&amp;quot;                     &amp;quot;45&amp;quot;                    
## [37] &amp;quot;51&amp;quot;                     &amp;quot;33&amp;quot;                     &amp;quot;43&amp;quot;                    
## [40] &amp;quot;38&amp;quot;                     &amp;quot;49&amp;quot;                     &amp;quot;46&amp;quot;                    
## [43] &amp;quot;41&amp;quot;                     &amp;quot;57&amp;quot;                     &amp;quot;7&amp;quot;                     
## [46] &amp;quot;48&amp;quot;                     &amp;quot;44&amp;quot;                     &amp;quot;50&amp;quot;                    
## [49] &amp;quot;56&amp;quot;                     &amp;quot;8&amp;quot;                      &amp;quot;11&amp;quot;                    
## [52] &amp;quot;10&amp;quot;                     &amp;quot;9&amp;quot;                      &amp;quot;75 years&amp;quot;              
## [55] &amp;quot;6&amp;quot;                      &amp;quot;3&amp;quot;                      &amp;quot;54&amp;quot;                    
## [58] &amp;quot;55&amp;quot;                     &amp;quot;60&amp;quot;                     &amp;quot;64&amp;quot;                    
## [61] &amp;quot;2 years&amp;quot;                &amp;quot;58&amp;quot;                     &amp;quot;52&amp;quot;                    
## [64] &amp;quot;72&amp;quot;                     &amp;quot;61&amp;quot;                     &amp;quot;4&amp;quot;                     
## [67] &amp;quot;63&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The seamingly trival &lt;code&gt;age_exact&lt;/code&gt; variable has its own issues, too:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;unique ( example$age_exact)

##  [1] &amp;quot;54&amp;quot;       &amp;quot;66&amp;quot;       &amp;quot;56&amp;quot;       &amp;quot;53&amp;quot;       &amp;quot;33&amp;quot;       &amp;quot;72&amp;quot;      
##  [7] &amp;quot;83&amp;quot;       &amp;quot;62&amp;quot;       &amp;quot;86&amp;quot;       &amp;quot;77&amp;quot;       &amp;quot;64&amp;quot;       &amp;quot;46&amp;quot;      
## [13] &amp;quot;44&amp;quot;       &amp;quot;59&amp;quot;       &amp;quot;60&amp;quot;       &amp;quot;67&amp;quot;       &amp;quot;63&amp;quot;       &amp;quot;20&amp;quot;      
## [19] &amp;quot;43&amp;quot;       &amp;quot;37&amp;quot;       &amp;quot;78&amp;quot;       &amp;quot;49&amp;quot;       &amp;quot;90&amp;quot;       &amp;quot;45&amp;quot;      
## [25] &amp;quot;28&amp;quot;       &amp;quot;29&amp;quot;       &amp;quot;30&amp;quot;       &amp;quot;39&amp;quot;       &amp;quot;51&amp;quot;       &amp;quot;38&amp;quot;      
## [31] &amp;quot;41&amp;quot;       &amp;quot;71&amp;quot;       &amp;quot;25&amp;quot;       &amp;quot;48&amp;quot;       &amp;quot;79&amp;quot;       &amp;quot;88&amp;quot;      
## [37] &amp;quot;61&amp;quot;       &amp;quot;85&amp;quot;       &amp;quot;70&amp;quot;       &amp;quot;35&amp;quot;       &amp;quot;81&amp;quot;       &amp;quot;52&amp;quot;      
## [43] &amp;quot;57&amp;quot;       &amp;quot;27&amp;quot;       &amp;quot;47&amp;quot;       &amp;quot;15 years&amp;quot; &amp;quot;21&amp;quot;       &amp;quot;42&amp;quot;      
## [49] &amp;quot;32&amp;quot;       &amp;quot;68&amp;quot;       &amp;quot;36&amp;quot;       &amp;quot;34&amp;quot;       &amp;quot;19&amp;quot;       &amp;quot;31&amp;quot;      
## [55] &amp;quot;26&amp;quot;       &amp;quot;23&amp;quot;       &amp;quot;24&amp;quot;       &amp;quot;22&amp;quot;       &amp;quot;16&amp;quot;       &amp;quot;84&amp;quot;      
## [61] &amp;quot;65&amp;quot;       &amp;quot;18&amp;quot;       &amp;quot;55&amp;quot;       &amp;quot;40&amp;quot;       &amp;quot;50&amp;quot;       &amp;quot;73&amp;quot;      
## [67] &amp;quot;69&amp;quot;       &amp;quot;87&amp;quot;       &amp;quot;89&amp;quot;       &amp;quot;74&amp;quot;       &amp;quot;75&amp;quot;       &amp;quot;98 years&amp;quot;
## [73] &amp;quot;76&amp;quot;       &amp;quot;80&amp;quot;       &amp;quot;58&amp;quot;       &amp;quot;82&amp;quot;       &amp;quot;17&amp;quot;       &amp;quot;93&amp;quot;      
## [79] &amp;quot;91&amp;quot;       &amp;quot;92&amp;quot;       &amp;quot;95&amp;quot;       &amp;quot;94&amp;quot;       &amp;quot;97&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see all the strange labels attached to &lt;code&gt;age&lt;/code&gt;-type variables:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(metadata = eb_demography_metadata %&amp;gt;%
                     filter ( var_name_suggested %in% c(&amp;quot;age_exact&amp;quot;, &amp;quot;age_education&amp;quot;)) )

##  [1] &amp;quot;2 years&amp;quot;                  &amp;quot;75 years&amp;quot;                
##  [3] &amp;quot;No full-time education&amp;quot;   &amp;quot;Still studying&amp;quot;          
##  [5] &amp;quot;15 years&amp;quot;                 &amp;quot;98 years&amp;quot;                
##  [7] &amp;quot;96 years&amp;quot;                 &amp;quot;[NOT CLEARLY DOCUMENTED]&amp;quot;
##  [9] &amp;quot;74 years&amp;quot;                 &amp;quot;99 and older&amp;quot;            
## [11] &amp;quot;Refusal&amp;quot;                  &amp;quot;87 years&amp;quot;                
## [13] &amp;quot;DK&amp;quot;                       &amp;quot;88 years&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We must handle many exception, so we created a function for this
purpose:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;remove_years  &amp;lt;- function(x) { 
  x &amp;lt;- gsub(&amp;quot;years|and\\solder&amp;quot;, &amp;quot;&amp;quot;, tolower(x))
  stringr::str_trim (x, &amp;quot;both&amp;quot;)}

process_demography &amp;lt;- function (x) { 
  
  x %&amp;gt;% mutate ( across ( -any_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_character) ) %&amp;gt;%
    mutate ( across (any_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;)), as_numeric) ) %&amp;gt;%
    mutate ( across (contains(&amp;quot;age&amp;quot;), remove_years)) %&amp;gt;%
    mutate ( age_exact = as.numeric (age_exact)) %&amp;gt;%
    mutate ( is_student = ifelse ( tolower(age_education) == &amp;quot;still studying&amp;quot;, 
                                   1, 0), 
             no_education = ifelse ( tolower(age_education) == &amp;quot;no full-time education&amp;quot;, 1, 0)) %&amp;gt;%
    mutate ( education = case_when (
      grepl(&amp;quot;studying&amp;quot;, age_education) ~ age_exact, 
      grepl (&amp;quot;education&amp;quot;, age_education)  ~ 14, 
      grepl (&amp;quot;refus|document|dk&amp;quot;, tolower(age_education)) ~ NA_real_,
      TRUE ~ as.numeric(age_education)
    ))  %&amp;gt;%
    mutate ( education = case_when ( 
      education &amp;lt; 14 ~ NA_real_, 
      education &amp;gt; 30 ~ 30, 
      TRUE ~ education )) 
}

demography &amp;lt;- lapply ( demography, process_demography )

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## WE&#39;ll full join and not use rbind, because we have different variables in different waves.
demography &amp;lt;- Reduce ( full_join, demography )

## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
## Joining, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;, &amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;marital_status&amp;quot;, &amp;quot;age_education&amp;quot;, &amp;quot;age_exact&amp;quot;, &amp;quot;occupation_of_respondent&amp;quot;, &amp;quot;occupation_of_respondent_recoded&amp;quot;, &amp;quot;respondent_occupation_scale_c_14&amp;quot;, &amp;quot;type_of_community&amp;quot;, &amp;quot;is_student&amp;quot;, &amp;quot;no_education&amp;quot;, &amp;quot;education&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s see what we have here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n(demography, 12)

## # A tibble: 12 x 14
##    rowid    isocntry    w1    wex marital_status        age_education  age_exact
##    &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt;    &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;                 &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;
##  1 ZA7488_~ SI       0.828  1428. (Re-)Married: withou~ 19                    43
##  2 ZA7488_~ PL       1.01  32830. (Re-)Married: withou~ 19                    64
##  3 ZA6861_~ DK       0.641  3100. (Re-)Married: withou~ 22                    78
##  4 ZA6861_~ FI       1.83   8601. (Re-)Married: childr~ 30                    38
##  5 ZA7572_~ SE       0.342  2645. (Re-)Married: withou~ 17                    68
##  6 ZA7572_~ IT       0.630 32287. (Re-)Married: childr~ 20                    40
##  7 ZA6861_~ IE       0.868  3054. (Re-)Married: childr~ 32                    42
##  8 ZA6861_~ RO       0.724 11805. (Re-)Married: withou~ 14                    59
##  9 ZA7488_~ CY       0.691  1013. (Re-)Married: childr~ 18                    67
## 10 ZA6595_~ HR       0.580  2098. Single living w part~ 27                    30
## 11 ZA7572_~ CZ       1.86  16908. Single: without chil~ still studying        20
## 12 ZA6861_~ PT       0.932  7448. Widow: with children  no full-time ~        84
## # ... with 7 more variables: occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;harmonizing-variable-labels&#34;&gt;Harmonizing Variable Labels&lt;/h2&gt;
&lt;p&gt;So far we have been working with metadata, weights and socio-demography.
In other words, we have not even started the desired harmonization of
climate change awareness. The methodology is the same, but here we
really must look out for the answer options in the questionnaire. (Refer
to our data summary again
&lt;a href=&#34;http://netzero.dataobservatory.eu/post/2021-03-04-eurobarometer_data/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;climate_awareness_metadata &amp;lt;- eb_climate_metadata %&amp;gt;%
  suggest_var_names( survey_program = &amp;quot;eurobarometer&amp;quot; ) %&amp;gt;%
  filter ( .data$var_name_suggested  %in% c(&amp;quot;rowid&amp;quot;,
                                            &amp;quot;serious_world_problems_first&amp;quot;, 
                                             &amp;quot;serious_world_problems_climate_change&amp;quot;)
  ) 

hw &amp;lt;- harmonize_var_names ( waves = eb_waves, 
                            metadata = climate_awareness_metadata )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;retroharmoinze&lt;/code&gt; package comes with a generic
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_waves.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
function that will change the value labels of categorical variables
(including binary ones) to a unitary format. It will also take care of
various types of missing values.&lt;/p&gt;
&lt;p&gt;First, let’s go back to our metadata and collect all value labels that
will show up with
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/collect_val_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;collect_val_labels()&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;collect_val_labels(climate_awareness_metadata)

##  [1] &amp;quot;Climate change&amp;quot;                            
##  [2] &amp;quot;International terrorism&amp;quot;                   
##  [3] &amp;quot;Poverty, hunger and lack of drinking water&amp;quot;
##  [4] &amp;quot;Spread of infectious diseases&amp;quot;             
##  [5] &amp;quot;The economic situation&amp;quot;                    
##  [6] &amp;quot;Proliferation of nuclear weapons&amp;quot;          
##  [7] &amp;quot;Armed conflicts&amp;quot;                           
##  [8] &amp;quot;The increasing global population&amp;quot;          
##  [9] &amp;quot;Other (SPONTANEOUS)&amp;quot;                       
## [10] &amp;quot;None (SPONTANEOUS)&amp;quot;                        
## [11] &amp;quot;Not mentioned&amp;quot;                             
## [12] &amp;quot;Mentioned&amp;quot;                                 
## [13] &amp;quot;DK&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we want to select &lt;code&gt;Climate change&lt;/code&gt; as the mentioned &lt;em&gt;most
serious problem&lt;/em&gt;, and &lt;code&gt;Climate change&lt;/code&gt; taken from a list of three
serious problems. The first question type is a single-choice one, where
&lt;code&gt;Climate change&lt;/code&gt; is either mentioned, or the alternative answer is
labeled as &lt;code&gt;Not mentioned&lt;/code&gt;. In the multiple choice case, the alternative
may be something else, for example, &lt;code&gt;Spread of infectious diseases&lt;/code&gt;, as
we all well know by 2021.&lt;/p&gt;
&lt;p&gt;We want to see who thought &lt;code&gt;Climate change&lt;/code&gt; was the most serious
problem, or one of the most serious problems, so we label each mentions
of &lt;code&gt;Climate change&lt;/code&gt; as &lt;code&gt;mentioned&lt;/code&gt; and we pair it with a numeric value
of &lt;code&gt;1&lt;/code&gt;. All other cases are labeled as &lt;code&gt;not_mentioned&lt;/code&gt;, with the
exceptions of various missing observations, which in these cases are
&lt;code&gt;Do not know&lt;/code&gt; answers, &lt;code&gt;Declined to answer&lt;/code&gt; cases, and &lt;code&gt;Inappropriate&lt;/code&gt;
cases [The latter one is Eurobarometer’s label for questions that were
for one reason or other not asked from a particular interviewee – for
example, because the Turkish Cypriot community received a different
questionnaire.]&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# positive cases
label_1 = c(&amp;quot;^Climate\\schange&amp;quot;, &amp;quot;^Mentioned&amp;quot;)
# missing cases 
na_labels &amp;lt;- collect_na_labels( climate_awareness_metadata)
na_labels

## [1] &amp;quot;DK&amp;quot;                             &amp;quot;Inap. (10 or 11 in qa1a)&amp;quot;      
## [3] &amp;quot;Inap. (coded 10 or 11 in qc1a)&amp;quot; &amp;quot;Inap. (coded 10 or 11 in qb1a)&amp;quot;

# negative cases
label_0 &amp;lt;- collect_val_labels( climate_awareness_metadata)
label_0 &amp;lt;- label_0[! label_0 %in% label_1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;harmonize_serious_problems()&lt;/code&gt; function harmonizes the labels within
the special labeled class of &lt;code&gt;retroharmonize&lt;/code&gt;. This class retains all
information to give categorical variables a character or numeric
representation, and various processing metadata for documentation
purposes. While this class is very reach (it contains whatever was
imported from SPSS’s proprietary data format and the history), it is not
suitable for statistical analysis. We could, of course, directly call
the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
from the retroharmonize package, but the parameterization would be very
complicated even in a simple function call, not to mention a looped
call. Because this function is the heart of the
&lt;code&gt;retroharmonize package&lt;/code&gt;, it has &lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;a tutorial
article&lt;/a&gt;
on its own.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;harmonize_serious_problems &amp;lt;- function(x) {
  label_list &amp;lt;- list(
    from = c(label_0, label_1, na_labels), 
    to = c( rep ( &amp;quot;not_mentioned&amp;quot;, length(label_0) ),   # use the same order as in from!
            rep ( &amp;quot;mentioned&amp;quot;, length(label_1) ),
            &amp;quot;do_not_know&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;, &amp;quot;inap&amp;quot;), 
    numeric_values = c(rep ( 0, length(label_0) ), # use the same order as in from!
                       rep ( 1, length(label_1) ),
                       99997,99999,99999,99999)
  )
  
  harmonize_values(x, 
                   harmonize_labels = label_list, 
                   na_values = c(&amp;quot;do_not_know&amp;quot;=99997,
                                 &amp;quot;declined&amp;quot;=99998,
                                 &amp;quot;inap&amp;quot;=99999), 
                   remove = &amp;quot;\\(|\\)|\\[|\\]|\\%&amp;quot;
  )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our objects are rather big in memory, so first, let’s remove the surveys
that do not contain these world problem variables. In this cases, the
subsetted and harmonized surveys in the nested list have only one
columns, i.e. the &lt;code&gt;rowid&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- hw[unlist ( lapply ( hw, ncol)) &amp;gt; 1 ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have a smaller problem to deal with. With many surveys, it is
easy to fill up your computer’s memory, so let’s start building up our
joined panel data from a smaller set of nested, subsetted surveys.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function (x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), harmonize_serious_problems) ) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our &lt;code&gt;lapply&lt;/code&gt; loop calls an anonymous function which in turn calls the
&lt;code&gt;harmonize_serious_problems&lt;/code&gt; parameterized version of the
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/reference/harmonize_values.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;harmonize_values()&lt;/a&gt;
on all variables that have &lt;code&gt;problem&lt;/code&gt; in their names.&lt;/p&gt;
&lt;p&gt;once we are done, our variables have harmonized names, and harmonized
values, and harmonized label, but they are stored in the complex
&lt;a href=&#34;https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;retroharmonize_labelled_spss_survey&lt;/a&gt;
class, inherited from the &lt;code&gt;haven_labelled_spss&lt;/code&gt; in
&lt;a href=&#34;https://haven.tidyverse.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;haven&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We reduced our single and multiple choice questions to binary choice
variables. We can now give them a numeric representation. Be mindful
that &lt;code&gt;retroharmonize&lt;/code&gt; has special methods for its special labeled class
that retains metadata from SPSS. This means that &lt;code&gt;as_character&lt;/code&gt; and
&lt;code&gt;as_numeric&lt;/code&gt; knows how to handle various types of missing values,
whereas the base R &lt;code&gt;as.character&lt;/code&gt; and &lt;code&gt;as.numeric&lt;/code&gt; may coerce special
values to unwanted results. This is particularly dangerous with numeric
variables – and this is the reason why we introduced a new set of S3
objects and methods in the package.&lt;/p&gt;
&lt;p&gt;We will ignore the differences between various forms of missingness,
i.e. the person said that she did not know, or did not want to answer,
or for some reason was not asked in the survey. In a more descriptive,
non-harmonized analysis you would probably want to explore them as
various ‘categories’ and use a character representation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hw &amp;lt;- lapply ( hw, function(x) x %&amp;gt;% mutate ( across ( contains(&amp;quot;problem&amp;quot;), as_numeric) ))

hw &amp;lt;- Reduce ( full_join, hw) # we must use joins instead of binds because the number of columns vary.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s see what we have:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set.seed(2021)
sample_n (hw, 12)

## # A tibble: 12 x 3
##    rowid             serious_world_problems_fi~ serious_world_problems_climate_~
##    &amp;lt;chr&amp;gt;                                  &amp;lt;dbl&amp;gt;                            &amp;lt;dbl&amp;gt;
##  1 ZA6595_v3-0-0_23~                          0                               NA
##  2 ZA7572_v1-0-0_70~                          0                                0
##  3 ZA6595_v3-0-0_18~                          0                               NA
##  4 ZA6861_v1-2-0_27~                          0                                0
##  5 ZA6595_v3-0-0_26~                          0                               NA
##  6 ZA7572_v1-0-0_19~                          0                                1
##  7 ZA5877_v2-0-0_16~                          0                                0
##  8 ZA6861_v1-2-0_12~                          0                                0
##  9 ZA7572_v1-0-0_17~                          0                                0
## 10 ZA5877_v2-0-0_17~                          0                                1
## 11 ZA6861_v1-2-0_41~                          0                                0
## 12 ZA6861_v1-2-0_61~                          0                                1
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;creating-the-longitudional-table&#34;&gt;Creating the Longitudional Table&lt;/h2&gt;
&lt;p&gt;Now we just need to join the partial table by the &lt;code&gt;rowid&lt;/code&gt; together:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#start from the smallest (we removed the survey that had no relevant questionnaire item)
panel &amp;lt;- hw %&amp;gt;%
  left_join ( geography, by = &#39;rowid&#39; ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( demography, by = c(&amp;quot;rowid&amp;quot;, &amp;quot;isocntry&amp;quot;) ) 

panel &amp;lt;- panel %&amp;gt;%
  left_join ( interview_dates, by = &#39;rowid&#39; )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And let’s see a small sample:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sample_n(panel, 12)

## # A tibble: 12 x 19
##    rowid  serious_world_pr~ serious_world_pr~ isocntry geo   region    w1    wex
##    &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;             &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;
##  1 ZA686~                 0                 0 ES       ES41  Casti~ 1.21  46787.
##  2 ZA686~                 0                 0 RO       RO31  South~ 0.724 11805.
##  3 ZA686~                 0                 0 SK       SK02  Zapad~ 0.774  3499.
##  4 ZA757~                 0                 1 PT       PT16  Centr~ 1.11   9336.
##  5 ZA659~                 1                NA HR       HR041 Grad ~ 0.580  2098.
##  6 ZA659~                 1                NA RO       RO21  North~ 1.21  20160.
##  7 ZA686~                 0                 0 PT       PT17  Lisboa 0.932  7448.
##  8 ZA659~                 0                NA GB-GBN   UKI   London 0.994 50133.
##  9 ZA757~                 0                 0 CY       CY    REPUB~ 0.594   874.
## 10 ZA686~                 0                 0 LT       LT003 Klaip~ 0.623  1564.
## 11 ZA757~                 0                 0 IE       IE013 West ~ 0.490  1651.
## 12 ZA659~                 0                NA LT       LT003 Klaip~ 1.16   2917.
## # ... with 11 more variables: marital_status &amp;lt;chr&amp;gt;, age_education &amp;lt;chr&amp;gt;,
## #   age_exact &amp;lt;dbl&amp;gt;, occupation_of_respondent &amp;lt;chr&amp;gt;,
## #   occupation_of_respondent_recoded &amp;lt;chr&amp;gt;,
## #   respondent_occupation_scale_c_14 &amp;lt;chr&amp;gt;, type_of_community &amp;lt;chr&amp;gt;,
## #   is_student &amp;lt;dbl&amp;gt;, no_education &amp;lt;dbl&amp;gt;, education &amp;lt;dbl&amp;gt;,
## #   date_of_interview &amp;lt;date&amp;gt;

saveRDS ( panel, file.path(tempdir(), &amp;quot;climate_panel.rds&amp;quot;), version = 2)

# not evaluated
saveRDS( panel, file = file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel.rds&amp;quot;), version=2)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;putting-it-on-a-map&#34;&gt;Putting It on a Map&lt;/h2&gt;
&lt;p&gt;This is not the end of the story. If you put all this on a map, the
results are a bit disappointing.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;featured.png&#34; width=&#34;660&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Why? Because sub-national (provincial, state, county, district, parish)
borders are changing all the time - within the EU and everywhere. The
next step is to harmonize the geographical information. We have another
CRAN released package to help you with. See the next post: &lt;a href=&#34;https://rpubs.com/antaldaniel/regions-OOD21&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Regional
Climate Change Awareness
Dataset&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
