Recoding and labeling variables

judges$sex.politics[judges$sex=="Male" & judges$republican=="0"]<-1
judges$sex.politics[judges$sex=="Male" & judges$republican=="1"]<-2
judges$sex.politics[judges$sex=="Female" & judges$republican=="0"]<-3
judges$sex.politics[judges$sex=="Female" & judges$republican=="1"]<-4

judges$sex.politics=factor(judges$sex.politics,labels=c("M-dem", "M-rep", "F-dem", "F-rep"))

Creating a counting variable in Stata

If we are working with a clinical data that is set in a long format, we might want to create a "counting" variable that would serve as the visit number.
First step is to make sure that every patient has a unique "numeric code associated to him/her".
In my case I will start by using the encode function
encode patientname, gen (id)         ///In my case I will start by using the encode function
bysort id: gen visit = _n         /// then we sort id and generate the counting variable visit

Working with dates

#how to set and format dates

gen new_date_variable=date(date_variable ,"YMD")
format date_new %td

or 

gen new_date_variable=date(date_variable ,"DMY")
format date_new %tw

%tc Daytime   01jan1960 00:00:00
%td Daily     01jan1960
%tw Weekly    1960w1
%tm Monthly   1960m1
%tq Quarterly 1960q1
%ty yearly    0  
## how to use if conditions with dates

>gen season=.
>replace season=1 if Injury_Date >= date("01012015","DMY") & Injury_Date<= date("21032015","DMY") | Injury_Date >= date("21122015","DMY") & Injury_Date<= date("21032016","DMY") | Injury_Date >= date("21122016","DMY") & Injury_Date<= date("21032017","DMY") | Injury_Date >= date("21122017","DMY")
>replace season=2 if Injury_Date >= date("21032015","DMY") & Injury_Date<= date("21062015","DMY") | Injury_Date >= date("21032016","DMY") & Injury_Date<= date("21062016","DMY") | Injury_Date >= date("21032017","DMY") & Injury_Date<= date("21062017","DMY")
>replace season=3 if Injury_Date >= date("21062015","DMY") & Injury_Date<= date("21092015","DMY") | Injury_Date >= date("21062016","DMY") & Injury_Date<= date("21092016","DMY") | Injury_Date >= date("21062017","DMY") & Injury_Date<= date("21092017","DMY") 
>replace season=4 if Injury_Date >= date("21092015","DMY") & Injury_Date<= date("21122015","DMY") | Injury_Date >= date("21092016","DMY") & Injury_Date<= date("21122016","DMY") | Injury_Date >= date("21092017","DMY") & Injury_Date<= date("21122017","DMY") 

label define season 1 winter 2 spring 3 summer 4 fall

label values season season

How to model using dates.
Assuming we want to fit a fourth order auto-regressive model:

Yt=μ+β1yt-1+ β2yt-2 + β3yt-3 + β4yt-4 + ε

There are two ways to go about using lag variables:
gen y1 = y[_n-1]
gen y2 = y[_n-2]
gen y3 = y[_n-3]
gen y4 = y[_n-4]

regress y y1 y2 y3 y4

After setting the data into ts mode we can use the L operator to indicate lag variables

regress y L.y L2.y L3.y L4.y

or
regress L(0/4).y

The lead operator F is the inverse of L. Where F.y indicates yt+1.
The lead operator D indicates the arithmetical difference of adjacent observations Where D.y indicates  Δ≡yt-yt-1 

and

D2.y indicates  Δ≡yt-yt-2



Non-linear least-square estimation and imposing constraints

When a linear regression is not possible, we can fit a non-linear equation


nl (y={b0}+{b1}*x1+{b2}*x2)

but since nl doesn't handle missing data, we need to make sure that we drop all missing values first. 

drop if y==. | x1==. |x2==.

In many occasions the results from nl match those of the regress. However the point from the nl command is its ability to impose non-linear constraints. To demonstrate, we can impose nonsensical constraints here: 

nl (y={b0}+{b1}*x1+{b1}*x2)
*note here that we have constrained the coefficients of x1 and x2 to be the same (b1)

We can also impose constraints (more complex ones using the constraint command)
constraint define 1 x1=x2
constraint list

Using the quantiles command instead of cut

quantiles f1, gen (assets_quartiles) nquant(4)

Instead of

egen assets_quartiles=cut(f1), group(4)

*The quantiles seems to provide a more precise cut than the command cut

Household composition (part 1 & 2)

gen fhead=0
replace fhead=1 if q102==2 & q104==1
label var fhead "female head"

gen mhead=0
replace mhead=1 if q102==1 & q104==1
label var fhead "male head"

Repeat same procedure and create the following variables:  spouse, relative, nonrelative, ownchild and a variable representing total number of household members (this variable will not make any sense for you at the time being, however it will be very useful once you collapse your data)


Part 2:
egen hhcompid = concat (cluster underscore hhnum)
collapse (sum)  fhead mhead spouse relative nonrel ownchild totmem, by (hhcompid)