Utils

`count_tokens(text)`

Counts the number of tokens in a given text. Args: text (str): The text to tokenize. Returns: int: The number of tokens in text. Examples:

count_tokens("This is a sentence.") 6 Notes: The encoding used is determined by the tiktoken.encoding_for_model function.

Source code in autoresearcher/utils/count_tokens.py

def count_tokens(text):
    """
    Counts the number of tokens in a given text.
    Args:
      text (str): The text to tokenize.
    Returns:
      int: The number of tokens in `text`.
    Examples:
      >>> count_tokens("This is a sentence.")
      6
    Notes:
      The encoding used is determined by the `tiktoken.encoding_for_model` function.
    """
    # encoding = tiktoken.get_encoding("cl100k_base")
    encoding = tiktoken.encoding_for_model("gpt-4")

    tokens = encoding.encode(text)
    return len(tokens)

`generate_keyword_combinations(research_question)`

Generates keyword combinations for a given research question. Args: research_question (str): The research question to generate keyword combinations for. Returns: list: A list of keyword combinations for the given research question. Examples:

generate_keyword_combinations("What is the impact of AI on healthcare?") ["AI healthcare", "impact AI healthcare", "AI healthcare impact"]

Source code in autoresearcher/utils/generate_keyword_combinations.py

def generate_keyword_combinations(research_question):
    """
    Generates keyword combinations for a given research question.
    Args:
      research_question (str): The research question to generate keyword combinations for.
    Returns:
      list: A list of keyword combinations for the given research question.
    Examples:
      >>> generate_keyword_combinations("What is the impact of AI on healthcare?")
      ["AI healthcare", "impact AI healthcare", "AI healthcare impact"]
    """
    prompt = keyword_combination_prompt.format(research_question=research_question)
    response = openai_call(prompt, use_gpt4=False, temperature=0, max_tokens=200)
    combinations = response.split("\n")
    return [
        combination.split(": ")[1]
        for combination in combinations
        if ": " in combination
    ]

`get_citation_by_doi(doi)`

Retrieves a citation for a given DOI. Args: doi (str): The DOI of the citation to retrieve. Returns: str: The citation for the given DOI. Raises: ValueError: If the response is not valid JSON. Notes: Requires an email address to be set in the EMAIL environment variable. Examples:

get_citation_by_doi("10.1038/s41586-020-2003-7") "Liu, Y., Chen, X., Han, M., Li, Y., Li, L., Zhang, J., ... & Zhang, Y. (2020). A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature, 581(7809), 561-570."

Source code in autoresearcher/utils/get_citations.py

def get_citation_by_doi(doi):
    """
    Retrieves a citation for a given DOI.
    Args:
      doi (str): The DOI of the citation to retrieve.
    Returns:
      str: The citation for the given DOI.
    Raises:
      ValueError: If the response is not valid JSON.
    Notes:
      Requires an email address to be set in the EMAIL environment variable.
    Examples:
      >>> get_citation_by_doi("10.1038/s41586-020-2003-7")
      "Liu, Y., Chen, X., Han, M., Li, Y., Li, L., Zhang, J., ... & Zhang, Y. (2020). A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature, 581(7809), 561-570."
    """
    url = f"https://api.citeas.org/product/{doi}?email={EMAIL}"
    response = requests.get(url)
    try:
        data = response.json()
        return data["citations"][0]["citation"]
    except ValueError:
        return response.text