Bültmann & Gerriets
Data Wrangling with R
von Ph. D. Boehmke
Verlag: Springer Nature Switzerland
Reihe: Use R!
E-Book / PDF
Kopierschutz: PDF mit Wasserzeichen

Hinweis: Nach dem Checkout (Kasse) wird direkt ein Link zum Download bereitgestellt. Der Link kann dann auf PC, Smartphone oder E-Book-Reader ausgeführt werden.
E-Books können per PayPal bezahlt werden. Wenn Sie E-Books per Rechnung bezahlen möchten, kontaktieren Sie uns bitte.

ISBN: 978-3-319-45599-0
Auflage: 1st ed. 2016
Erschienen am 17.11.2016
Sprache: Englisch
Umfang: 238 Seiten

Preis: 85,59 €

85,59 €
merken
zum Hardcover 85,59 €
Biografische Anmerkung
Inhaltsverzeichnis
Klappentext

Brad Boehmke, Ph.D., is an Operations Research Analyst at Headquarters Air Force Materiel Command, Studies and Analyses Division.  He is also Assistant Professor in the Operational Sciences Department at the Air Force Institute of Technology.  Dr. Boehmke's research interests are in the areas of cost analysis, economic modeling, decision analysis, and developing applied modeling applications through the R statistical language.



1. Preface

2. Introduction

a. The Role of Data Wrangling

i. Introduction to R

1. Open Source

2. Flexibility

3. Community

ii. R Basics

1. Assignment & Evaluation

2. Vectorization

3. Getting help

4. Workspace

5. Working with packages

6. Style guide

3. Working with Different Types of Data in R

a. Dealing with Numbers

i. Integer vs. Double

ii. Generating sequence of non-random numbers

iii. Generating sequence of random numbers

iv. Setting the seed for reproducible random numbers

v. Comparing numeric values

vi. Rounding numbers

b. Dealing with Character Strings

i. Character string basics

ii. String manipulation with base R

iii. String manipulation with stringr

iv. Set operatons for character strings

c. Dealing with Regular Expressions

i. Regex Syntax

ii. Regex Functions

iii. Additional resources

d. Dealing with Factors

i. Creating, converting & inspecting factors

ii. Ordering levels

iii. Revalue levels

iv. Dropping levels

e. Dealing with Dates

i. Getting current date & time

ii. Converting strings to dates

iii. Extract & manipulate parts of dates

iv. Creating date sequences

v. Calculations with dates

vi. Dealing with time zones & daylight savings

vii. Additional resources

<4. Managing Data Structures in R

a. Data Structure Basics

i. Identifying the Structure

ii. Attributes

b. Managing Vectors

i. Creating

ii. Adding on to

iii. Adding attributes

iv. Subsetting

c. Managing Lists

i. Creating

iii. Adding attributes

iv. Subsetting

d. Managing Matrices

i. Creating

ii. Adding on to

iii. Adding attributes

iv. Subsetting

e. Managing Data Frames

i. Creating

ii. Adding on to

iii. Adding attributes

iv. Subsetting

f. Dealing with Missing Values

i. Testing for missing values

ii. Recoding missing values iii. Excluding missing values

5. Importing, Scraping, and Exporting Data with R

a. Importing Data

i. Reading data from text files

ii. Reading data from Excel files

iii. Load data from saved R object file

iv. Additional resources

b. Scraping Data

i. Importing tabular and Excel files stored online

ii. Scraping HTML text

iii. Scraping HTML table data

iv. Working with APIs

v. Additional Resources

c. Exporting Data

i. Writing data to text files

ii. Writing data to Excel files

iii. Saving data as an R object file

iv. Additional resources

6. Creating Efficient & Readable Code in R

a. Functions

i. Function Components

ii. Arguments

iii. Scoping Rules

iv. Lazy Evaluation

v. Returning Multiple Outputs from a Function

vi. Dealing with Invalid Parameters

vii. Saving and Sourcing Functions

viii. Additional Resources

b. Loop Control Statements

i. Basic control statements (i.e. if, for, while, etc.)

ii. Apply family

iii. Other useful "loop-like" functions

iv. Additional Resources

>%

>%) Operator

ii. Additional Functions

iii. Additional Pipe Operators

iv. Additional Resources

7. Shaping & Transforming Your Data with R

a. Reshaping Your Data with tidyr

i. Making wide data long

ii. Making long data wide iii. Splitting a single column into multiple columns

iv. Combining multiple columns into a single column

v. Additional tidyr functions

vi. Sequencing your tidyr operations

vii. Additional resources

b. Transforming Your Data with dplyr

i. Selecting variables of interest

ii. Filtering rows

iii. Grouping data by categorical variables

iv. Performing summary statistics on variables

v. Arranging variables by value

vi. Joining datasets

vii. Creating new variables

viii. Additional resources



This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. Data wrangling, which is also commonly referred to as data munging, transformation, manipulation, janitor work, etc., can be a painstakingly laborious process. Roughly 80% of data analysis is spent on cleaning and preparing data; however, being a prerequisite to the rest of the data analysis workflow (visualization, analysis, reporting), it is essential that one become fluent and efficient in data wrangling techniques.

This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation for working with data in R. The author's goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data. By the end of the book, the user will have learned:

  • How to work with different types of data such as numerics, characters, regular expressions, factors, and dates
  • The difference between different data structures and how to create, add additional components to, and subset each data structure
  • How to acquire and parse data from locations previously inaccessible
  • How to develop functions and use loop control structures to reduce code redundancy
  • How to use pipe operators to simplify code and make it more readable
  • How to reshape the layout of data and manipulate, summarize, and join data sets


andere Formate
weitere Titel der Reihe