Getting Started with Google Gemini API: Installation and First API Call in Python

Why Google Gemini for Agent Development

Google Gemini represents Google DeepMind's most capable family of large language models. Unlike earlier Google AI offerings that required complex GCP setup, the Gemini API is accessible through a simple Python SDK with a free tier generous enough for prototyping entire agent systems. Gemini models natively support text, images, video, audio, and code — making them uniquely suited for building multi-modal agents.

The google-generativeai SDK is the official Python client. It handles authentication, request formatting, streaming, and response parsing so you can focus on building agent logic rather than managing HTTP calls.

Prerequisites

Before you begin, ensure you have:

Python 3.9 or later installed
A Google AI Studio API key (free at aistudio.google.com)
Basic familiarity with Python

Step 1: Install the SDK

Install the official Google Generative AI package:

pip install google-generativeai

Verify the installation:

python -c "import google.generativeai as genai; print('SDK installed successfully')"

Step 2: Configure Your API Key

There are two ways to provide your API key. The recommended approach uses an environment variable:

export GOOGLE_API_KEY="your-api-key-here"

Then in your Python code, configure the SDK:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

For quick experiments you can pass the key directly, but never commit API keys to version control:

genai.configure(api_key="your-api-key-here")  # Only for local testing

Step 3: Make Your First API Call

The core interaction pattern in Gemini is generate_content. Here is the simplest possible call:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = genai.GenerativeModel("gemini-2.0-flash")

response = model.generate_content("Explain what an AI agent is in three sentences.")

print(response.text)

The GenerativeModel class is your primary interface. You specify which model to use — gemini-2.0-flash is fast and cost-effective, while gemini-2.0-pro offers stronger reasoning for complex tasks.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Step 4: Parse the Response Object

The response object contains more than just text. Understanding its structure is important for building robust agents:

response = model.generate_content("What is retrieval augmented generation?")

# The generated text
print(response.text)

# Safety ratings for content filtering
for candidate in response.candidates:
    print(f"Finish reason: {candidate.finish_reason}")
    for rating in candidate.safety_ratings:
        print(f"  {rating.category}: {rating.probability}")

# Token usage statistics
print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
print(f"Response tokens: {response.usage_metadata.candidates_token_count}")
print(f"Total tokens: {response.usage_metadata.total_token_count}")

The usage_metadata field is critical for cost tracking in production agents. Each model has different pricing per million tokens, and monitoring usage prevents unexpected bills.

Step 5: Configure Generation Parameters

Control the model's behavior with generation configuration:

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    generation_config=genai.GenerationConfig(
        temperature=0.2,       # Lower = more deterministic
        top_p=0.8,             # Nucleus sampling threshold
        top_k=40,              # Token selection pool size
        max_output_tokens=1024,# Maximum response length
    ),
)

response = model.generate_content("Write a function to sort a list in Python.")
print(response.text)

For agent applications, a lower temperature (0.1-0.3) produces more reliable tool-calling behavior, while higher values (0.7-1.0) work better for creative content generation.

Step 6: System Instructions

System instructions set the agent's persona and behavioral guidelines. They persist across the entire conversation:

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    system_instruction="You are a senior Python developer. Always provide complete, runnable code examples. Explain tradeoffs between different approaches."
)

response = model.generate_content("How should I handle database connections in a FastAPI app?")
print(response.text)

System instructions are the foundation of every agent you build with Gemini. They define what the agent does, how it responds, and what constraints it operates under.

Common Pitfalls

API key not found: Ensure the environment variable is set in the same shell session where you run Python. Use os.environ.get("GOOGLE_API_KEY") with a fallback for debugging.

Rate limiting: The free tier allows 15 requests per minute for Gemini Pro. Implement exponential backoff for production agents.

Response blocked by safety filters: If response.text raises an error, check response.prompt_feedback to see which safety category triggered the block.

FAQ

What is the difference between Gemini Flash and Gemini Pro?

Gemini Flash is optimized for speed and cost — it responds faster and costs significantly less per token. Gemini Pro offers stronger reasoning, better instruction following, and higher accuracy on complex tasks. For most agent development, start with Flash and upgrade to Pro only for tasks where Flash falls short.

Is the Gemini API free to use?

Google AI Studio offers a free tier with rate limits (typically 15 requests per minute for Pro, 30 for Flash). This is sufficient for development and prototyping. For production workloads, you pay per million tokens through either AI Studio or Vertex AI.

Can I use Gemini with async Python code?

Yes. The SDK provides generate_content_async for use with asyncio. This is essential for building non-blocking agent systems that handle multiple requests concurrently.

#GoogleGemini #Python #GettingStarted #Tutorial #GenerativeAI #AgenticAI #LearnAI #AIEngineering