Catplot - manual sort

February 12, 2020, 4:38 am

≪ Previous: How to display the 3 most reoccurring observations in a string variable (presented as median)

Hello,

I am struggling to sort my catplot for my variable CCR_adjusted.
I know that with catplot I can sort by ascending and descending, but I want a certain order (Finance practice): CC, CCC-, CCC, CCC+, B-, B, B+, BB-, BB, BB+, BBB-, BBB, BBB+ (if this is not within the data set then still show the label).
I read that I may have to do it with egen = group() and define my variables. But the problem is that the data values of my variable are the different labels. Do I have to create new variables for each of the possible values and plot them then? How would that look like?

My code:

Code:

catplot CCR_adjusted, percent var1opts(sort(1) descending) recast(bar)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 CCR_adjusted
"B"  
"BB-"
"B+"  
"B"  
"B"  
"B"  
"B+"  
"B"  
"B"  
"B-"  
"B-"  
"B-"  
"B"  
"B"  
"B"  
"BB-"
"B"  
"B"  
"B"  
"B"  
"B"  
"CCC+"
"B"  
"B+"  
"B+"  
"B"  
"B"  
"B"  
"B"  
"BB-"
"BB-"
"B+"  
"BB+"
"B+"  
"BB+"
"B+"  
"B"  
"B+"  
"BB-"
"B"  
"B"  
"B"  
"B"  
"B+"  
"B+"  
"BB-"
"B"  
"B-"  
"BB"  
"BB-"
"B-"  
"B"  
"BB-"
"B"  
"BB-"
"B"  
"BB-"
"B"  
"B"  
"BB+"
"B"  
"B+"  
"BB-"
"BB+"
"B+"  
"B"  
"B-"  
"B+"  
"BB-"
"BB"  
"B-"  
"B-"  
"B"  
"B"  
"B+"  
"B"  
"BB"  
"BB"  
"B"  
"B+"  
"BB"  
"B"  
"B+"  
"B+"  
"B+"  
"BB"  
"BB+"
"BBB-"
"BB-"
"B"  
"B"  
"B"  
"BB+"
"B"  
"BB-"
"BB+"
"BBB-"
"B"  
"BB"  
"B"  
end

↧

Creating time dependent scores

February 12, 2020, 5:15 am

≫ Next: Overlapping dummies, multicollinearity issue?

≪ Previous: Catplot - manual sort

Hello Statlist-Forum,

I am in need for the technical commands for two datasets.

The first problem: I have a dataset that gives me the amount of e.g. hours of sun per day in governorate X at day X. My time periode spreads over several years, some days are missing.
What I would like to do, is to create a variable that sums up the hours of sun per month per governorate. Concrete, so I can tell in Feburary in 2006 in California, there were 50 hours of sun.
What would be the code to sum up the hours of sun per month, by governorate. My problem really is the month. I could ofc manually make a code for all month individually, but since my dataset covers over 4 years, this would be at least 48 time definitions. Is there any way that Stata automatically devides into the month?

Date	Hours of sun	Governorate
01.01.2005	4	A
03.01.2005	5	B
21.05.2006	8	A
31.07.2006	0	A

Secondly, I would like to merge this monthly score with a dataset that contains, let say, health information for individuals. I have information about the health status of the individual at time t (this is the same for all individuals, they have all been measuerd at the end of 2006), the governorate they live in and their date of birth.
I now want to regress the hours of sun on the health status. However, I only want to use the hours of sun AFTER the birth (obviously).
So lets say my Individuals is born April 2005, then the hours of sun in January 2005 should not be included in the "sun score". Meaning I somehow need to sum up all months that happened after the date of birth until the end of the survey.
Very similar, I would like to create another score that counts the number of hours of sun 9-month before the indiviudual was born. The same code like above should be applicable here I think.

Individual	Date of Birth	Governorate	Hours of Sun after Birth	Hours of Sun before Birth
1	04.05.2004	A	100	40
2	02.02.2005	C	110	20
3	17.10.2005	B	50	25
4	19.09.2006	A	200	9

So first, I need to aggregated my data on the monthly level per governorate. Secondly, I want to create a score that is unique for each individual, that corresponds to its individual hours of sun.

I hope my execution is clear and somebody is able to help me!

↧

Overlapping dummies, multicollinearity issue?

February 12, 2020, 5:20 am

≫ Next: command xtscc is unrecognized

≪ Previous: Creating time dependent scores

Dear Statalists, a quick heads up: The following question is not Stata but conception related, and probably an easy one but I somehow cannot see the forrest for the trees currently (have been overthinking this for too long) and could therefore need your help and did not find this question being asked previously on this platform and maybe others might benefit from the answer, too.
In the analysis for my thesis, I use several dummies:
1) Founder dummy = 1, if founder of a firm is present on firm or holds >5% shares
2) Family member dummy = 1, if family members (of the founder) are present on a firm or hold >5% shares
3) Family Firm Status = if if 1) + 2) >= 1 (in plain English: Either the founder, or family members or both groups satisfy the conditions)
Now I wanted to include both the Family firm Status and the Founder Dummy as Independent variables in my model. However, technically 1) is a subset of 3) (in that every founder firm is also a Family Firm but not every Family Firm is a Founder firm), hence including both dummies should lead to issues, correct? I do not have VIF issues, but still thinking about it makes me wonder whether or not this is actually correct, as this may just be due to a small overlap (that is the vast majority of firms in my sample might be Family firms without the founder still on board). Thank you very much in advance for your help, Jon.

↧

command xtscc is unrecognized

February 12, 2020, 6:27 am

≫ Next: Normalizing variable which has negative values

≪ Previous: Overlapping dummies, multicollinearity issue?

Dear all,

I have an unbalanced panel data with 150 countries and 51 years. I work with time and country fixed effects.
I used the following command:
forvalues y = 1/6 {
xtscc f`y' l(0/2)shock l(1/2)f1 l(1/2)a1 l(1/2)u1 i.year , fe

replace b_f = _b[shock] if _n==`y'+1
replace se_f = _se[shock] if _n==`y'+1
}

However, I get this:
command xtscc is unrecognized

Does anyone know why is this the case? Any feedback is hugely appreciated.

Kind Regards,
Katerina
Stata/SE 16.0

↧

Normalizing variable which has negative values

February 12, 2020, 6:47 am

≫ Next: Creating lagged variables where the lagged variable is disrupted every time another group changes

≪ Previous: command xtscc is unrecognized

Hello

I am trying to "normalize" a variable which has negative values, which entails obtaining values smaller than 3 for both the skewness and the kurtosis of the resulting distribution.

If I simply use the log function I loose a lot of cases, which I would like to avoid. I also tried many other options (using the gladder command, among other options), including the most "popular" one which is adding the a constant "a" (with "a" being the absolute of the biggest negative value of the variable), and then doing the log as follows: generate ln_variable = ln(variable+a). However, when I do this, the skewness of ln_variable is a large negative number and its kurtosis is a very large positive value. Does anybody know of a transformation which I can use in order to normalise my variable (i.e. of a transformation which does not entail getting rid of negative values and which also keeps skewness and kurtosis smaller than 3)? Any help you can provide would be much appreciated.
Kind regards,

Joao

↧

Creating lagged variables where the lagged variable is disrupted every time another group changes

February 12, 2020, 7:39 am

≫ Next: Suppressing cumulative frequencies

≪ Previous: Normalizing variable which has negative values

Hey,

I have a problem with lagged variables. We used this: gen defl_Earnings_sd_L1=defl_Earnings_sd[_n-1] to create a lagged variable, but then the lagged variables continues everytime we it changes company ID. I want it to have a 0 or . every time it changes company so the lagged variable does not used lagged earnings from companies than come before them in the dataset.

An example of the dataset, where it goes wrong:
Company ID; Year; Earnings; Lagged earnings
9000; 2010; 750; .
1000; 2010; 1500; 750
1000; 2011; 1700; 1500
1000; 2012; 1750; 1700
1000; 2013; 1100;1750
1000; 2014; 1200; 1100
1100; 2010; 100; 1200
1100; 2011; 120; 100
1100; 2012; 130; 120

Dataset what I want to see:
Company ID; Year; Earnings; Lagged earnings
1000; 2010; 1500; .
1000; 2011; 1700; 1500
1000; 2012; 1750; 1700
1000; 2013; 1100;1750
1000; 2014; 1200; 1100
1100; 2010; 100; .
1100; 2011; 120; 100
1100; 2012; 130; 120

We tried this, but it had an error: gen defl_Earnings_sd_L1=defl_Earnings_sd[_n-1], by Company_Key #option by not allowed

Can somebody help me with this?

↧

Suppressing cumulative frequencies

February 12, 2020, 8:18 am

≫ Next: Help setting up an index variable

≪ Previous: Creating lagged variables where the lagged variable is disrupted every time another group changes

Hi everyone,

I have been exporting tables to Word using putdocx and tab2docx.

For example,

Code:

putdocx begin
putdocx paragraph , font(,14) halign(center) style(Title)
putdocx text ("Strategy"), bold underline
tab2docx strat1
putdocx save "$tables/Principal Docx.docx", replace

However, I've been asked to suppress cumulative frequencies. Basically, I need to export something like this -

Code:

by strat1, sort: gen freq = _N
gen perc = (freq/_N)*100
tabdisp strat1, cell(freq perc)

Is there a way to export tabdisp to word? Or alternatively, is there a way to just show the frequencies and percentages when I tab variables (and not cumulative frequencies)? I haven't been able to find anything helpful in the help files.

Thanks and apologies,
Kabira

↧

Help setting up an index variable

February 12, 2020, 8:55 am

≫ Next: Problem with margins of ordered probit

≪ Previous: Suppressing cumulative frequencies

Hello,

I have a dataset that I need to convert from a long format to a wide format.

EMPLID TEST_SUBJECT TEST_DT ST_SCORE
"0101537" "QUANT" 20842 140
"0101537" "VERBAL" 20842 151
"0101537" "WRITING" 20842 3.5
"0101706" "" . .
"0104472" "" . .
"0130658" "QUANT" 20492 148
"0130658" "VERBAL" 20492 157
"0130658" "WRITING" 20492 3
"0153858" "" . .
"0155986" "QUANT" 18770 145
"0155986" "QUANT" 20977 143
"0155986" "VERBAL" 18770 166
"0155986" "VERBAL" 20977 164
"0155986" "WRITING" 18770 4
"0155986" "WRITING" 20977 4

TEST_DT is what I am using to determine which score was the newest.

I'm trying to set up an index variable so that for each EMPLID their newest quantitative score is set as index 1, the newest verbal score is set as index 2, and the newest writing score is set as index 3. Then if they have a second set of scores, the second newest quant score is set as index 4, the second newest verbal score is set as index 5, and the second newest writing score is set as index 6.

I'm trying to figure out if I can use a loop to accomplish this, but I'm not sure if a loop will work with the data in this format.

Any advice would be really appreciated as I'm totally new to STATA.

↧

Problem with margins of ordered probit

February 12, 2020, 9:31 am

≫ Next: 2020 London Stata Conference

≪ Previous: Help setting up an index variable

Hello,

I have regressed an ordered probit and had some are coefficients positive or negative. But when I ask for the margins effects of the outcome 3, the coefficients are in the opposit sign.
It is very strange and I don't know how it happen. Do you have an idea about this sign changement ?

Thank you

↧

2020 London Stata Conference

February 12, 2020, 9:40 am

≫ Next: Bootstrap with probability of selection proportional to survey weights

≪ Previous: Problem with margins of ordered probit

This echoes https://events.timberlake.co.uk/even...ata-conference

26th UK Stata Conference (London): First Announcement and Call for Presentations

Dates: Thursday 10 September and Friday 11 September 2020

Venue: Cass Business School Executive Education Campus, 2nd Floor, 200 Aldersgate, London EC1A 4HD

You are warmly invited to attend the 26th Stata Conference in London.

Offers of presentations are also being sought.

Please email the scientific organisers if you are interested in presenting, sending an abstract and indicating whether you wish to give:

(i) a 20 min talk (followed by 10 min discussion)

(ii) a 10 min talk (followed by 5 min discussion)

(iii) a longer review or tutorial (about an hour)

or

(iv) a poster presentation.

The deadline for submission of abstracts is 31 May 2020.

The final programme will be announced before the end of July 2020.

Please see below for further information about how to submit, registration fees, and reduced rates for paper presenters and students,etc.

Scientific organisers:

Nick Cox, University of Durham njcoxstata@gmail.com
Rachael Hughes, University of Bristol Rachael.Hughes@bristol.ac.uk
Tim Morris, MRC Clinical Trials Unit, UCL tim.morris@ucl.ac.uk
Patrick Royston, MRC Clinical Trials Unit, UCL j.royston@ucl.ac.uk

Logistics organised by Timberlake Consultants, distributors of Stata in the UK, Brazil, Ireland, Middle East, Poland, Portugal, and Spain.

(Visit the Timberlake website at http://www.timberlake.co.uk/)

Further information

The London conference is the longest-running series of Stata conferences. It is open to all interested and highly international. In past years participants
have been from Britain, other European countries, and other continents too. StataCorp will be represented.

Presentation topics might include:

- discussion of user-written Stata programs

- case studies of research or teaching using Stata

- discussions of data management problems

- reviews of analytical issues

- surveys or critiques of Stata facilities in specific fields, etc.

The meeting will include the usual "wishes and grumbles" session at which you may air your thoughts to Stata developers, and (at additional cost) the option
of an informal meal at a London restaurant on the Thursday evening.

Timberlake Consultants generously sponsors registration fee waivers for presentations (one fee waiver per presentation, regardless of number of authors
involved). Timberlake will also pay a small fee to a presenter of a longer review or tutorial paper. Presenters need to register.

Registration fees:
Non-students - attendance to both days - £96.00 including VAT.
Non-students - attendance to one day only - £66.00 including VAT.
Students - attendance to both days - £66.00 including VAT.
Students - attendance to one day only - £48.00 including VAT.
Dinner (optional) - £36.00 including VAT.

The scientific organisers look forward to hearing from you with presentation offers or to discuss the suitability of a potential contribution. The
submission deadline is 31 May 2020.

Confirmation of the programme, and details of how to register, will be circulated before the end of July 2020.

Please send abstracts (up to 300 words) in plain text format or some flavour of TeX. References may be included as appropriate.

Please give name and affiliation of the presenter.

Potential visitors to London might like to know that, by British standards, September is usually relatively dry and warm.

Please contact us before 31 May 2020, and preferably sooner!

For proceedings of previous Stata conferences, in London and elsewhere, visit

http://stata.com/meeting/proceedings.html

↧

Bootstrap with probability of selection proportional to survey weights

February 12, 2020, 9:43 am

≫ Next: Extracting datas from a big database

≪ Previous: 2020 London Stata Conference

Hello,

I want to run IRT and Mantel-Haensel procedures as a parametric bootstrap by resampling (with replacement) from my dataset with the probability of selecting each observation proportional to the survey weights. I was told that this may allow me to carry out the procedures robust to survey weighting and clustering of survey weights due to complex sample design.

My questions for the Statalist community:
- How do I specify probability of selection proportional to the survey weights?
- What would the code look like for running this in Stata? Do you recommend using the bsample or bootstrap estimation procedures? Ideally, we would like to do 1000 reps, so, I believe the bootstrap estimation would be appropriate for this?
- Since Stata does not allow svy weights with Mantel-Haensel, is this a feasible endeavor?

Thank you!
Helena

↧

Extracting datas from a big database

February 12, 2020, 11:08 am

≫ Next: meologit and estat icc error

≪ Previous: Bootstrap with probability of selection proportional to survey weights

Hi everyone !

As a student in my first year of my Master's degree, I have to make a dissertation about International Trade and Africa. This dissertation includes of course an econometric study. Since I have a gravity equation to do, I've download the Gravity database, which is a stata file, from the CEPII's website.
At first, I was thinking about copy/paste the data's from Stata to Excel, then build a new database with the datas I needed, then put the datas back on Stata ( which is easier for me since I use Excel quite a lot ). The problem is that, the database is huge ( more than 3 millions lines ) and I can't copy/paste the whole database. But my teacher told me that it might be possible to do it directly on Stata, but she don't know how to do, and me neither ( since it's my first year learning econometrics and the whole stuff ). She told me to ask on this forum to have answers.

So that's why I'm asking you if there's any command on Stata that can help me achieving what I want. If you want more precisions, the Gravity database consists of several variables associated to one country ( there's an ISO-code to name the countries ) which have impacts on its trades with another country ( and this is done for every country and every trade's relation with another country, that's why there's more than 3 millions lines ), and I need to select data's from only few countries.

Thanks for the help !

↧

meologit and estat icc error

February 12, 2020, 11:26 am

≫ Next: Creating price dataset from scratch

≪ Previous: Extracting datas from a big database

Hi Everyone,

I am using the meologit command as my data is nested and my response variable is ordinal (the latter has 3 categories). The Stata manual online clearly says that the command estat icc works with meologit to estimate the interclass correlation coefficient: https://www.stata.com/manuals/meestaticc.pdf#meestaticc

I have Stata 14.2, and, when I give the command estat icc, it is giving me the following error:

estat icc

requested action not valid after most recent estimation command
r(321);

[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 321
requested action not valid after most recent estimation command;
This message can be produced by predict or test and indicates
that the requested action cannot be performed.

(end of search)

Does anyone know why I am getting this error? Is it because I have Stata 14.2?

I would greatly appreciate any help with this matter.

↧

Creating price dataset from scratch

February 12, 2020, 11:29 am

≫ Next: -nsplit-command-

≪ Previous: meologit and estat icc error

Hi all.

I want to create a price dataset from scratch in Stata like the one below in Excel just for more years, cities and and commodities - how could I go about this?

Thanks in advance.

	Zurich_rice	Zurich_corn
1922	5.6	3.4
1923	4.3	2.1
1924	2.1	4.1
1925	2.3	2.3
1926	4.2	4.1
1927	3.2	6.3
1928	2.2	5.4

↧

-nsplit-command-

February 12, 2020, 11:47 am

≫ Next: Showing mean values of a variable within a different group of observations with conditions

≪ Previous: Creating price dataset from scratch

hi all,
I recently installed -nsplit- in order to split a numeric variable into two segments. the first part of the variable is the date I am interested in and the second part is a time (not interested in)

example: the variable labdate is in the form : "26mar2009 10:46:00" and is numeric so substr wont work. I want to isolate the 26mar2009 portion in %td format using -nsplit labdate,digits ( x x) gen(y y)- I cant seem to get this to work.

I dont really want to convert labdate into a string variable then proceed since my dataset is HUGE.

any thoughts?

thanks
Vishal

↧

Showing mean values of a variable within a different group of observations with conditions

February 12, 2020, 11:48 am

≫ Next: Combining AND + OR Statements

≪ Previous: -nsplit-command-

Hi everyone!

Started Stata recently

I'm using Stata/SE 15.1

I'm using a British Household Survey panel data. I have work hour preferences of employed people hrpref(1=over-employed, 2=under=employed, 3=matched) in different periods of time. What I want to do is, to compare the average working hours of people by each group within the overall dataset.

Dummies: manual=1(if they do manual work) jbft=1(if they work full-time)

Compare:
Over-employed: Males: who are working fulltime with part-time in a non-manual sector
Over-employed: Males: who are working fulltime with part-time in a manual sector,
Over-employed: Females: who are working fulltime with part-time in a non-manual sector,
Over-employed: Females: who are working fulltime with part-time in a manual sector,

Under-employed: Males: ....

Matched: Males: ...

Basically, I want to construct a table which shows the average working hours of the people with the same preferences by gender and their work characteristics. To see whether the over-employed people are the people who are putting the most hours.

How do I tell Stata to show me the mean values for each group under these conditions?
Should I use "by" and "egen" command?

Any guidance would be appreciated

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pid int year byte(sex hrpref wrkhrs manual jbft)
10017992 2004 2 3 35 0 1
10017992 2005 2 1 35 0 1
10017992 2006 2 3 36 0 1
10017992 2007 2 3 36 0 1
10023526 2002 2 1 39 0 1
10023526 2003 2 3 32 0 1
10023526 2004 2 3 39 0 1
10023526 2005 2 1 39 0 1
10023526 2006 2 1 39 0 1
10023526 2007 2 1 39 0 1
10023526 2008 2 3 39 0 1
10028005 2002 1 3 37 0 1
10028005 2003 1 3 37 0 1
10028005 2004 1 3 38 0 1
10028005 2007 1 3 37 0 1
10028005 2008 1 3 37 0 1
10049363 2000 2 3 12 0 2
10049363 2003 2 3 36 0 1
10049363 2004 2 3 36 0 1
10049363 2005 2 3 37 0 1
10055266 2000 2 2 35 0 1
10055266 2001 2 1 40 0 1
10055266 2002 2 1 35 0 1
10055266 2003 2 1 35 0 1
10055266 2005 2 1 35 0 1
10055266 2006 2 1 35 0 1
10055266 2007 2 1 35 0 1
10055266 2008 2 1 35 0 1
10055304 2005 1 2  2 0 2
10055304 2006 1 2 14 0 2
10060111 2003 1 3 35 0 1
10060111 2004 1 3 35 0 1
10060111 2005 1 3 35 0 1
10060111 2006 1 2 12 1 2
10060111 2007 1 3 37 0 1
10060111 2008 1 3 37 0 1
10079653 2000 2 1 35 0 1
10079653 2002 2 1 38 0 1
10079653 2006 2 2 16 0 2
10079653 2007 2 3 20 0 2
10079688 2000 1 3 35 0 1
10079688 2001 1 3 36 1 1
10080643 2002 2 1 37 0 1
10080643 2003 2 3 35 0 1
10080643 2004 2 3 37 0 1
10080643 2005 2 3 37 0 1
10080643 2006 2 3 35 0 1
10080643 2007 2 1 37 0 1
10080643 2008 2 1 35 0 1
10087486 2002 1 1 38 0 1
10087486 2003 1 1 38 0 1
10087486 2004 1 3 38 0 1
10087486 2005 1 3 38 0 1
10087486 2006 1 1 38 0 1
10087486 2007 1 3 38 0 1
10087486 2008 1 1 38 0 1
10094083 2002 1 1 60 0 1
10094083 2003 1 1 65 0 1
10094083 2004 1 1 60 0 1
10094083 2005 1 1 60 0 1
10094083 2006 1 1 38 0 1
10094083 2007 1 1 60 0 1
10094113 2002 2 1 38 0 1
10094113 2003 2 1 38 0 1
10094113 2004 2 3 38 0 1
10094113 2005 2 1 38 0 1
10094172 2004 2 3  0 0 2
10094172 2005 2 1 38 0 1
10094172 2006 2 3 35 0 1
10094172 2007 2 3 42 0 1
10094172 2008 2 3 35 0 1
10099689 2000 1 2 27 1 2
10099689 2001 1 3 34 1 1
10099689 2002 1 3 40 1 1
10099689 2003 1 3 40 1 1
10099689 2005 1 3 40 1 1
10099689 2006 1 3 37 0 1
10099689 2007 1 3 35 0 1
10099689 2008 1 3 40 1 1
10099719 2002 2 3 35 0 1
10099719 2003 2 3 35 0 1
10099719 2004 2 3 35 0 1
10099719 2005 2 3 35 0 1
10099719 2006 2 3 36 0 1
10099719 2007 2 3 35 0 1
10099719 2008 2 3 35 0 1
10099778 2001 2 3 35 0 1
10099778 2002 2 3 35 0 1
10099778 2003 2 3 40 0 1
10101977 2001 2 1 40 0 1
10101977 2002 2 1 35 0 1
10101977 2003 2 1 35 0 1
10101977 2004 2 1 35 0 1
10101977 2005 2 1 39 0 1
10101977 2006 2 3 27 0 1
10101977 2007 2 3 27 0 1
10101977 2008 2 3 27 0 1
10103031 2002 1 3 39 0 1
10103031 2003 1 3 39 1 1
10103066 2002 2 1 45 0 1
end

Thank you in advance,
Nico

↧

Combining AND + OR Statements

February 12, 2020, 11:56 am

≫ Next: Creating a new dummy variable based on other dummies

≪ Previous: Showing mean values of a variable within a different group of observations with conditions

Hi all,

Quick question I am tripping up on: if I use this syntax

Code:

replace XXX = XXX if var1 < 100 & var2 == 2000 | var3 == 3000

Does this do the replacement based on the var1 AND the var2 OR the var3 statement? i.e., I want the var1 to be true for it to replace, and then I want either the var2 OR the var3 to be true to replace.

I believe that if I reordered to this, it does not work as intended. I.e., it doesn't group var2/3 together. I think it instead needs either var2==2000 and then var3==3000 AND var1<100 to be true.

Code:

replace XXX = XXX if var2 == 2000 | var3 == 3000 & var1 < 100

↧

Creating a new dummy variable based on other dummies

February 12, 2020, 12:05 pm

≫ Next: Bayesian models using Metropolis Hasting Algorithm

≪ Previous: Combining AND + OR Statements

Hi Statalist

I am trying to create a dummy that is based on the values of other dummies.
I think its easier to explain with an example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 hhid str16 iid byte(hhmember activity) float(dummy_head dummy_spouse dummy_both)
"01010140020171" "010101400201711" 1  1 1 0 1
"01010140020171" "010101400201712" 2  1 0 1 1
"01010140020284" "010101400202842" 2  1 0 1 1
"01010140020284" "010101400202841" 1  1 1 0 1
"01010140020297" "010101400202971" 1  5 0 0 0
"01010140020297" "010101400202972" 2  1 0 1 1
"01010140020409" "010101400204092" 2  1 0 1 1
"01010140020409" "010101400204091" 1  1 1 0 1
"01010140020471" "010101400204711" 1  1 1 0 1
"01010140020551" "010101400205512" 2  1 0 1 1
"01010140020551" "010101400205511" 1  1 1 0 1
"01010140020761" "010101400207611" 1  1 1 0 1
"01010140020762" "010101400207622" 2  1 0 1 1
"01010140020762" "010101400207621" 1  1 1 0 1
"01020030030004" "010200300300041" 1 10 0 0 0
"01020030030004" "010200300300042" 2  1 0 1 1
"01020030030022" "010200300300221" 1  1 1 0 1
"01020030030022" "010200300300222" 2  1 0 1 1
"01020030030140" "010200300301401" 1  1 1 0 1
"01020030030161" "010200300301611" 1  1 1 0 1
"01020030030161" "010200300301612" 2  1 0 1 1
"01020030030174" "010200300301742" 2  1 0 1 1
"01020030030174" "010200300301741" 1  1 1 0 1
"01020030030200" "010200300302002" 2  1 0 1 1
"01020030030200" "010200300302001" 1  1 1 0 1
"01020030030430" "010200300304302" 2  1 0 1 1
"01020030030430" "010200300304301" 1  1 1 0 1
"01020030030479" "010200300304792" 2  1 0 1 1
"01020030030479" "010200300304791" 1  1 1 0 1
"01020170030001" "010201700300012" 2  1 0 1 1
"01020170030001" "010201700300011" 1  1 1 0 1
"01020170030017" "010201700300171" 1  1 1 0 1
"01020170030022" "010201700300221" 1  1 1 0 1
"01020170030048" "010201700300482" 2  1 0 1 1
"01020170030048" "010201700300481" 1  1 1 0 1
"01020170030100" "010201700301001" 1  1 1 0 1
"01020170030100" "010201700301002" 2  1 0 1 1
"01020170030209" "010201700302092" 2  1 0 1 1
"01020170030209" "010201700302091" 1  1 1 0 1
"01020170030241" "010201700302412" 2  1 0 1 1
"01020170030241" "010201700302415" 2  1 0 1 1
"01020170030241" "010201700302411" 1  1 1 0 1
"01020170030246" "010201700302461" 1  1 1 0 1
"01030130040161" "010301300401611" 1  1 1 0 1
"01030130040161" "010301300401612" 2  1 0 1 1
"01030130040219" "010301300402192" 2  . 0 0 0
"01030130040219" "010301300402191" 1  1 1 0 1
"01030130040259" "010301300402591" 1  1 1 0 1
"01030130040346" "010301300403461" 1 10 0 0 0
"01030130040468" "010301300404681" 1  1 1 0 1
"01030130040685" "010301300406851" 1  1 1 0 1
"01030130040685" "010301300406852" 2  1 0 1 1
"01030130040739" "010301300407392" 2  1 0 1 1
"01030130040739" "010301300407391" 1  1 1 0 1
"01030130040745" "010301300407452" 2  1 0 1 1
"01030130040745" "010301300407451" 1  1 1 0 1
"01030133010068" "010301330100682" 2  1 0 1 1
"01030133010068" "010301330100681" 1  1 1 0 1
"01030133010092" "010301330100921" 1 15 0 0 0
"01030133010175" "010301330101751" 1  1 1 0 1
"01030133010175" "010301330101752" 2  1 0 1 1
"01030133010188" "010301330101881" 1  1 1 0 1
"01030133010188" "010301330101882" 2  1 0 1 1
"01030133010300" "010301330103001" 1  1 1 0 1
"01030133010322" "010301330103221" 1  1 1 0 1
"01030133010322" "010301330103222" 2  1 0 1 1
"01030133010411" "010301330104111" 1  1 1 0 1
"01030133010411" "010301330104112" 2  1 0 1 1
"01030133010652" "010301330106521" 1  1 1 0 1
"01030133010652" "010301330106522" 2  1 0 1 1
"01040173040004" "010401730400042" 2  1 0 1 1
"01040173040004" "010401730400041" 1  1 1 0 1
"01040173040017" "010401730400171" 1  1 1 0 1
"01040173040017" "010401730400172" 2  1 0 1 1
"01040173040022" "010401730400221" 1  1 1 0 1
"01040173040034" "010401730400342" 2  1 0 1 1
"01040173040034" "010401730400341" 1  1 1 0 1
"01040173040041" "010401730400411" 1  1 1 0 1
"01040173040086" "010401730400862" 2  1 0 1 1
"01040173040086" "010401730400861" 1  1 1 0 1
"01040173040092" "010401730400922" 2  1 0 1 1
"01040173040092" "010401730400921" 1  1 1 0 1
"01040173040094" "010401730400941" 1  8 0 0 0
"01040310010030" "010403100100301" 1  1 1 0 1
"01040310010102" "010403100101021" 1  1 1 0 1
"01040310010102" "010403100101022" 2  1 0 1 1
"01040310010174" "010403100101741" 1  1 1 0 1
"01040310010174" "010403100101742" 2  1 0 1 1
"01040310010180" "010403100101802" 2  1 0 1 1
"01040310010180" "010403100101801" 1  1 1 0 1
"01040310010462" "010403100104621" 1  1 1 0 1
"01040310010482" "010403100104821" 1  1 1 0 1
"01040310010482" "010403100104822" 2  1 0 1 1
"01040310010745" "010403100107451" 1  1 1 0 1
"01040310011128" "010403100111282" 2  1 0 1 1
"01040310011128" "010403100111281" 1  1 1 0 1
"01040380030347" "010403800303471" 1  1 1 0 1
"01040380030396" "010403800303962" 2  1 0 1 1
"01040380030396" "010403800303961" 1  1 1 0 1
"01040380030460" "010403800304601" 1  1 1 0 1
end
label values hhmember HHmember
label def HHmember 1 "HEAD", modify
label def HHmember 2 "SPOUSE", modify
label values activity Activity
label def Activity 1 "AGRICULTURE/LIVESTOCK", modify
label def Activity 5 "GOVERNMENT", modify
label def Activity 8 "NGO / RELIGIOUS", modify
label def Activity 10 "SELF EMPLOYED ALONE", modify
label def Activity 15 "DISABLED", modify

If we take the green household as an example, the first row is for the household head and the second row is for his/her spouse.

I would like to create a variable that is equal to 1 if at least one of them is involved in agriculture/livestock (activity = 1)

The first thing I did was create a dummy to indicate whether the household head is involved in agriculture/livestock:

Code:

 gen dummy_head = 0
replace dummy_head = 1 if hhmember == 1 & activity == 1

Then I created a dummy to indicate whether the spouse is involved in agriculture/livestock:

Code:

 gen dummy_spouse = 0
replace dummy_spouse = 1 if hhmember == 2 & activity == 1

To create dummy_both, I used the following code:

Code:

bysort hhid: gen dummy_both = (dummy_head ==1 ) | (dummy_spouse == 1)

Which at first glance seemed to be correct but its not. The red households prove this.

I would like dummy_both to be 1 for all households that have either the head or the spouse or both involved in agriculture/livestock.

So for household 01010140020297, I would like dummy_both to be equal to 1 for both the spouse and the household head (even if one of them is not involved in agriculture/livestock)

How do I go about this? I have searched through the forum but I can't seem to find any relevant answers.

I also tried using egen, according to the FAQ link below, but to no avail.
https://www.stata.com/support/faqs/d...ble-recording/

I would appreciate any help I can get.

Thank you!
Kevin

↧

Bayesian models using Metropolis Hasting Algorithm

February 12, 2020, 12:11 pm

≫ Next: String comparison between two variables?

≪ Previous: Creating a new dummy variable based on other dummies

Greetings to all

I need to set the software to estimate the parameters Bk (Bk0; Bka; Bkd; Bkh), for k -t; s using the Metropolis Hastings algorithm. The procedure is described in Appendix A of the attached document, which also shows the bivariate Poisson model that will then be used to calculate probabilities.

Thank you so much in advance

↧

String comparison between two variables?

February 12, 2020, 12:19 pm

≫ Next: Replace values across multiple variables with wildcard

≪ Previous: Bayesian models using Metropolis Hasting Algorithm

Hi everyone

I have a question about comparing/matching two string variables

My data panel is as follow:

hhID	Member2017	Member2019
1	john a	stefan new
1	bachacha smith	john a
1	mary xyz	moga samuel
2	filbert lopijet	mem fu
2	nelly oder	filbert lopijet
2	akaimo commanda	filbert lopijet
2	agayi gaitano	lagara wice
2	lagara wice	filbert lopijet

I want to check whether the household member in 2017 (member2017) is present in 2019 (member2019). As such, I want to have a dummy variable (stay): stay ==1 if the name in member2017 appears in member2019, stay==0 if it does not appear. The check is conducted for each family (FamilyID == 1,2,....).

For example, within the household with hhID ==1, "john a" in 'Member2017' has a match in 'Member2019', so stay ==1, while "bachacha smith" has no match, so stay == 0

The result can look sthg like this

hhID	Member2017	Member2019	stay
1	john a	stefan new	1
1	bachacha smith	john a	0
1	mary xyz	moga samuel	0
2	filbert lopijet	mem_fu	1
2	nelly oder	filbert lopijet	0
2	akaimo commanda	filbert lopijet	0
2	agayi gaitano	lagara wice	0
2	lagara wice	filbert lopijet	1

The data sample is more than 20000 obs so I need some codes to do this automatically. Any suggestion you can provide will be greatly appreciated

Thank you

Kind regards,

Anh

↧