Loading......

Password

Section 1. R Environment Quick Start & First Program

Part 1: R Language Environment Quick Start

1.1 R Language Interface Walkthrough

Basic Command Execution
- Direct command input demonstration:
```
print("Hello World")
2 + 3
sqrt(25)
```
Essential Shortcuts
- ↑/↓ Arrow keys: Cycle through command history
- Tab: Auto-complete for:
  - Function names
  - Variable names
  - File paths
- Ctrl + C: Interrupt current command
- Ctrl + L: Clear console screen

1.2 Environment Configuration Check

Verify R version:
```
R.version.string
```
Check working directory:
```
getwd()
```
List installed packages:
```
installed.packages()
```

Part 2: Your First R Program

2.1 Script File Management

File Operations Workflow
1. Create new file: nano first_program.R
2. Edit content:
```
# My First R Script
message <- "Welcome to Data Analysis!"
print(message)
```
3. Save: Ctrl+O → Enter → Ctrl+X
4. Execute: Rscript first_program.R
File Organization Best Practices
- Use consistent naming: lowercase_with_underscores
- Create dedicated project folders
- Maintain version control basics:
  - v01_initial_script.R
  - v02_updated_script.R

2.2 Code Documentation Standards

Effective Commenting

Section headers:

# Data Cleaning Section ------------------------

Function explanations:

# Calculates Euclidean distance between two points
# Input: Numeric vectors x and y
# Output: Scalar distance value

TODO notes:
```
# TODO: Implement error handling here
```

Debugging Annotation

# TEST: Verify input dimensions
print(dim(data_matrix))

2.3 Error Diagnosis Fundamentals

Error Type	Examples	Quick Checks
Syntax Errors	Missing `)`, `}`	Color highlighting in editor
Runtime Errors	`object 'var' not found`	`ls()` to list variables
Logical Errors	Incorrect algorithm implementation	Add `print()` statements
Package Errors	`there is no package called...`	Verify installation with `install.packages()`

Supplemental Materials

AI-Assisted Learning Tips

When encountering errors:
- Copy-paste full error message to AI tools
- Example prompt: “I get ‘Error: unexpected symbol in…’ when running my R script. Here’s my code: [code snippet]”
For concept clarification:
- Ask: “Explain R environments to a complete beginner”
- Request: “Show comparison between vectors and lists in R”

Recommended Practice Workflow

Write code in script file
Execute through terminal
Encounter error → analyze message
Consult AI helper with:
- Exact error message
- Relevant code section
- Session info (sessionInfo())

Critical Linux-R Integration

Server file management:
- Transfer files using Termius SFTP
- Navigate directories with cd
- Verify file permissions: ls -l script.R

Section 2. Variables & Data Types in R

1. Variable Fundamentals

Naming Convention

# Valid names
my_var <- 5
Var2 <- "text"
.height <- 7.2  # Hidden variable

# Invalid names
2var <- 10      # Error: Starts with number
if <- 20        # Error: Reserved word

Key Rules:

Case sensitive: myVar ≠ myvar
Avoid:
- Reserved words (if, function, NULL)
- Special symbols (except . and _)
Best practice: Use snake_case (my_variable)

Assignment Operators

x <- 5   # Recommended style
y = 10   # Works but less common
25 -> z  # Right assignment (rare)

# Scope differences in functions:
median(x = 1:10)    # Temporary assignment
median(x <- 1:10)   # Creates permanent object

Community Preference:

<- for object creation
= for function arguments

Variable Management

# List objects
ls()                # All variables
ls(pattern = "^x")  # Filter by pattern

# Remove variables
rm(x)               # Remove single
rm(list = ls())     # Clear environment

# Environment comparison
.GlobalEnv         # Default workspace
search()           # Attached packages

2. Core Data Types

Numeric Types

# Default is double
num1 <- 15      # Type: double
num2 <- 15L     # Explicit integer

# Type checking
typeof(num1)    # "double"
is.integer(num2) # TRUE

# Scientific notation
1.2e3 == 1200   # TRUE

Character Type

# Quote variations
str1 <- "Hello's World"   # Double quotes
str2 <- 'He said "Hi"'    # Single quotes

# Escape characters
path <- "C:\\Documents\\file.txt" 
cat(path)  # C:\Documents\file.txt

# Concatenation
paste("RNA", "seq", sep = "-")  # "RNA-seq"

Logical Type

# Basic logic
is_valid <- TRUE
is_empty <- FALSE

# Quick input
T <- FALSE  # Dangerous! (T is redefined)
TRUE <- 5   # Error: Reserved word

# Best practice: Always spell out
valid <- TRUE  

Type Conversion

# Explicit conversion
as.numeric("123")     # 123
as.character(45.7)    # "45.7"
as.integer(TRUE)      # 1

# Automatic conversion
sum(c(TRUE, FALSE))   # 1 (logical → numeric)
"Value:" + 5          # Error (implicit failure)

Watch for:

as.numeric("ABC")     # NA with warning

3. Special Values

Missing Values (NA)

# Detection
x <- c(1, NA, 3)
is.na(x)  # FALSE TRUE FALSE

# Operations
sum(x)               # NA
sum(x, na.rm=TRUE)   # 4

# Special cases
NA == NA   # NA (not TRUE)

NULL vs NaN vs Inf

# NULL (empty object)
length(NULL)  # 0
x <- NULL

# NaN (Not a Number)
0/0            # NaN
sqrt(-1)       # NaN with warning

# Inf/-Inf
1/0            # Inf
log(0)         # -Inf

Checking Methods:

is.null(x)     # For NULL
is.nan(y)      # For NaN
is.infinite(z) # For Inf/-Inf

Section 3. R Operator System

1. Arithmetic Operators

Modulus Operator `%%`

# Check divisibility
9 %% 3  # Returns 0
7 %% 2  # Returns 1

# Practical application: Batch processing
i=1
while(i<=10000){
    if(i %% 100 == 0){
        print(paste("Processing batch", i/10000))
    }
    Sys.sleep(0.001)
    i=i+1
}

Exponentiation

2^3    # 8
2**3   # 8 (equivalent)

Key Note: Always use ^ for better code readability in R

2. Comparison Operators

Floating Point Precision Warning

0.1 + 0.2 == 0.3         # FALSE
all.equal(0.1+0.2, 0.3)  # TRUE

Rule: Never use == with floating numbers

Vectorized Comparisons (Preview)

c(1,3,5) > 2  # Returns FALSE TRUE TRUE

Special Note: Will be crucial for matrix operations (Lesson 04)

3. Logical Operators

Element-wise vs Single-element

# & (vectorized)
c(TRUE, FALSE) & c(FALSE, FALSE)  # FALSE FALSE

# && (single-element)
c(TRUE, FALSE) && c(FALSE, FALSE) # FALSE

Common Pitfall

x <- 5
x > 3 & x < 10
x > 3 && x < 10

x <- c(5,7)
x > 3 & x < 10
x > 3 && x < 10

Short-circuit Evaluation

# || stops at first TRUE
ppp = function(){
    print('ok')
    return(FALSE)}

TRUE || ppp()
FALSE || ppp()

4. Operator Precedence

Essential Priority Levels

() - Parentheses
^ - Exponentiation
%% - Modulus
* / - Multiplication/Division
+ - - Addition/Subtraction
> < - Comparisons
! - NOT
& && - AND
| || - OR
<- - Assignment

Critical Example

3 + 5 * 2 ^ 2  # = 3 + (5*(2^2)) = 23

Best Practices

Always use parentheses for complex expressions
Use spaces around operators: x <- 5 + (a & b)
Prefer &/| over &&/|| unless specifically needed
Use all.equal() instead of == for numeric comparisons

Section 4. String Manipulation in R

1. Basic String Operations

Concatenation Functions

# paste() with separator
paste("Hello", "World", sep = " ")  # "Hello World"

# paste0() for direct concatenation
paste0("File_", 1:3, ".txt")  # "File_1.txt" "File_2.txt" "File_3.txt"

**Key Difference:**
- `paste()`: Default space separator, customizable
- `paste0()`: No separator (equivalent to paste(..., sep=""))

String Splitting

# Basic splitting
result <- strsplit("apple,orange,banana", ",")
# Returns list: [[1]] [1] "apple"  "orange" "banana"

# Accessing elements
result[[1]][2]  # "orange"

**Important Characteristics:**
- Always returns a *list* structure
- Use `unlist()` to convert to vector: 
  unlist(strsplit("a-b-c", "-"))  # [1] "a" "b" "c"

2. Regular Expressions Basics

Pattern Matching with grep()

# Find matching elements
colors <- c("red", "blue", "green2", "yellow3")
grep("\\d", colors)  # Returns positions: 3 4

# Get actual values
grep("\\d", colors, value=TRUE)  # "green2" "yellow3"

**Common Flags:**
- `ignore.case=TRUE`: Case-insensitive matching
- `invert=TRUE`: Return non-matching elements

Regular Expression Special Characters

Character	Meaning	Example
`^`	Start of string	`^A` matches “Apple” not “aPple”
`$`	End of string	`d$` matches “end” not “ending”
`.`	Any single character	`a.c` matches “abc”, “a2c”
`*`	Zero or more repeats	`lo*` matches “l”, “lo”, “loo”

Example:

grep("^[A-Z]", c("Apple", "banana"))  # 1 (matches first element)

3. Practical Tips & Tricks

Multi-line Strings

# Good practice
long_text <- "This is a multi-line
string in R. Use explicit newline
characters for clarity."

# Problematic approach
bad_text <- "This might cause 
             unexpected indentation"

Special Character Escaping

# Common escape sequences
cat("Line1\nLine2")   # New line
cat("Column1\tColumn2")  # Tab
cat("C:\\Program Files\\R")  # Backslash

**Essential Escapes:**
- `\n`: New line
- `\t`: Tab
- `\\`: Literal backslash
- `\"`: Literal double quote

String Length Pitfalls

# Handling NA values
nchar(NA)         # Returns NA
nchar("NA")       # Returns 2

# Safe approach
nchar(na.omit(c("text", NA)))  # Length: 4

**Best Practice:**
- Always pre-process NA values before using nchar()
- Consider `stringi::stri_length()` for alternative NA handling

Section 5. Control Flow in R Programming

1. Conditional Statements

1.1 Nested if-else Structure

# Proper nested formatting example
a=5
if (a<1) {
  print('less than 1')
} else if (a<4) {
  print('a>=1 and a<4')
} else {
  print('a>=4')
}

Key Points:

Always use curly braces {} even for single-line statements
Maintain consistent indentation (2/4 spaces)
Limit nesting depth (<3 levels recommended)

1.2 Vectorized ifelse()

x=c(5,7)
# Traditional approach
result <- vector()
for (i in 1:length(x)) {
  result[i] <- if (x[i] > 0) "Positive" else "Negative"
}

# Vectorized approach
result <- ifelse(x > 0, "Positive", "Negative")

Advantages:

300% faster for large datasets (demonstrate with system.time())
More readable & concise syntax
Automatic type conversion handling

1.3 switch() for Multiple Branches

# Basic syntax
operation <- function(op, x, y) {
  switch(op,
         "add" = x + y,
         "sub" = x - y,
         "mul" = x * y,
         stop("Invalid operation"))
}

Use Cases:

Menu-driven programs
State machine implementations
Alternative to long if-else chains

2. Loop Structures

2.1 for Loop Variations

# Index styles comparison
for (i in 1:5) {...}          # Basic numeric range
for (i in seq_along(vec)) {...} # Safer for empty vectors
for (item in vector) {...}    # Direct element access

Protection Tips:

Always test length(vector) > 0 before looping
Use seq_along() instead of 1:length() to handle empty vectors
Consider purrr::map() for functional programming

2.2 While Loop Safety

# Safe while template
counter <- 0
while (condition && counter < max_iter) {
  # loop body
  counter <- counter + 1
}

Essential Checks:

Define maximum iterations (e.g., max_iter = 1e6)
Include fail-safe condition checks
Update control variables at loop END

2.3 Loop Control Flow

for (i in 1:10) {
  if (i %% 2 == 0) next  # Skip even numbers
  if (i > 7) break       # Early exit
  print(i)
}

When to Use:

next: Skip invalid/irrelevant cases
break: Error conditions met, or early completion
Use sparingly to maintain code readability

3. Best Practices

3.1 Defensive Programming

# Before loop
if (length(data) == 0) {
  stop("Empty input dataset")
}

# In while loops
timeout <- Sys.time() + 300  # 5-minute timeout
while (condition) {
  if (Sys.time() > timeout) {
    stop("Execution timeout reached")
  }
  # ...
}

3.2 Vectorization Techniques

# Loop approach
total <- 0
for (num in numbers) {
  total <- total + num
}

# Vectorized approach
total <- sum(numbers)

# Benchmark comparison (1M elements):
- Loop: 0.15s
- Vectorized: 0.0001s

Key Functions:

apply() family
rowSums()/colMeans()
pmap()/map_dbl() from purrr

3.3 Memory Pre-allocation

# Poor practice
result <- c()
for (i in 1:1e5) {
  result <- c(result, i^2)
}

# Proper pre-allocation
result <- numeric(1e5)
for (i in 1:1e5) {
  result[i] <- i^2
}

Performance Gain:

10,000 elements: 100x faster
1M elements: Avoids 1M memory reallocations

Recommended Practice

# AI-Assisted Exercise
"Ask ChatGPT/Kimi to help write a function that:
1. Takes numeric vector input
2. Uses ifelse() to replace negative values with NA
3. Calculates mean with vectorized operations
4. Includes defensive programming checks"

AI Tips

When Stuck:

Try explaining your goal to AI in simple words
Ask for R code examples with:
- Input data format
- Desired output
Request debugging help with error messages

Example Prompt:
“Help me convert this for-loop to while-loop:
for (i in 1:length(x)) { print(i) }”