

The keywords are ordered into job types (management, ai, dev, web), i.e.: Therefore, we will first search through the email subjects for the keywords in job-types.json.

We will want to find the job titles in the subject and classify them (this could also be done in the message itself). Load the ground truthed dataset exported as a CSV from the previous notebook as recruiter_df. Meta data about the jobs from descriptions (title, location, duration, salary, experience)Īnd these results are used to filter job emails into interesting or uninteresting jobs. Job requirement keyword, job requirements, and job requirement types Job titles (data scientist, CTO) and position types (AI, tech, management) where a job title can have several position types. The recruiter emails are analyzed and labeled for: Analyze and label the recruiter emails: 03_recruiter_NER.ipynb The dataset is exported for the next notebook. Performing ground truthing on both the recruiter and non-recruiter emails ensures that our dataset is correct. (make small gif/video of clicking to ground truth) To ground truth the results you will need ipysheet to display the results as a spreadsheet in your notebook and to check the boxes of incorrectly classified emails. Is_commercial = (test_set_df.isin())Īfter filtering the false positives, the ground truthing can begin on both classes. Therefore, a list of domains has been given to filter out these false positives, i.e. 2.3 Filter and ground truth resultsīecause the classifer only looks at the subject of emails to determine if an email is a recruiter email or not, it can be tricked by news letters and other emails that have titles such as "How to successfuly prepare for a data science interview in today's job market". The codeblocks in 2.0 result in a column ("predictions") appended to our recruiter dataframe. The classification of the emails is performed by a pre-trained model in scikit-learn. If you think you have been sent the same emails several times based on the subject, you can drop the duplicates. The date formatting is transformed into YYYY, MM, DD. Test_set_df = pd.read_json('files/hide/scraped_mails.json') Classifying the emails: 02_recruiter_classifier.ipynb 2.1 Loading and formatting the datasetĪfter loading the json file generated from MailHandler into a Pandas dataframe, i.e. For privacy reasons, the files that have personal data have been placed into files/hide/ by default because the hide folder is ignored by GIT. When MailHandler has successfully scraped all emails, the emails can be saved into a JSON file. The MailHandler also runs langdetect on the subjects of the scraped emails to label the emails by language. llect_emails(folder='INBOX', cutoff_date=MAIL_CUTOFF_DATE, limit=1000) When running MailHandler, it can take a while to scrape many emails.Ī maximum limit of emails can be set by passing it as a parameter to MailHandler. llect_emails(folder='INBOX', cutoff_date=MAIL_CUTOFF_DATE) MailHandler can be run on any folder in your email account. Mailer('scrape', inbound='', outbound='', **credentials) 1.4 Run MailHandler Supply the Mailer with the correct inbound and outbound servers, i.e. Note the abbreviated format of the month. If you know roughly when you were written by recruiters, you can add a cutoff date and it will only go back from today until this date. Run this codeblock and provide your email credentials. If you are using Gmail, you will be required to temporarily turn on less secure access. MailHandler is a fully functional email scraper. To find emails from recruiters, you need to scrape your emails.
XOJO GMAIL HOW TO
How to find job offers in your email: 01_recruiter_collect_emails.ipynb Skip over the first notebook 01_recruiter_collect_emails.ipynb, the second notebook 02_recruiter_classifier.ipynb can optionally also be skipped. If you would like to try Re:cruiter out without scraping and classifiying your own data, you can use the dummy dataset, files/dummy_data/job_email_examples.csv. The notebooks run in Jupyter Notebook/Lab.
XOJO GMAIL INSTALL
With pip, setup your virtual environment and run pip install -r requirements.txt, Re:cruiter is a prototyped end-to-end NLP recipe to automatically find job offers in your email, analyze and label them, and even reply back.
